The primary goal of this project was to develop interactive visualization tools and workflows to help Duke researchers analyze and visualize their large datasets. The visualizations developed and documented for this project were created using Holoviews and Datashader in JupyterLab notebooks that were hosted in OnDemand sessions on the Duke Compute Cluster
Dr. William Meyerson, is a Yale School of Medicine graduate who is currently studying to become a physician-scientist at Duke Medical Center. Dr. Meyerson is interested in analyzing the effects of social media on mental health, specifically if late Reddit posting times have an influence on the development of sleep disorders.
About the Data
Dr. Meyerson provided multiple datasets to be visualized and analyzed for this project. The first and largest dataset contains information about over 200 million Reddit posts, divided up into 63 .tsv files. This dataset was pulled from the PushShift Reddit Dataset (insert citation). The second dataset, provided in a .csv file format, contained demographic information about around 10,000 subreddits grouped into 30 subreddit clusters, including affluence, age, partisanship, sociality, and gender. This dataset was pulled from a paper ___ (insert citation). The third dataset, provided in a .tsv file format, contained geographical information in terms of longitude and latitude about where approximately 75,000 Reddit users revealed their location to be. The fourth dataset, also provided in a .tsv file format, also provided geographical information about where approximately 75,000 Reddit users revealed their location to be, but in terms of names of states/cities/countries. These two datasets were both pulled from a paper ___ (insert citation).
To view the full set of visualization accomplished in the project and reference code, see: