Git Product home page Git Product logo

waller-anderson-2020-data's Introduction

Reddit social dimensions

Data and code for the community embedding, social dimensions, and analyses from the 2021 paper "Quantifying social organization and political polarization in online platforms" by Isaac Waller and Ashton Anderson.

Data

The following data has been made available in the data directory:

Community embedding

The community embedding of Reddit used in the paper. embedding-vectors.tsv contains the 150-dimensional vectors for each community, while embedding-metadata.tsv contains the name, description and associated data for each community. Communities with similar user bases are similar in the embedding; see Methods, Creating the community embedding.

embedding-vectors.tsv:

0.019756        -0.07609199999999999    -0.017321       -0.024236       0.112748        -0.10828099999999999    -0.35062        0.401909        -0.254341  0.260575 0.204183    ...
-0.004445       -0.036706       -0.019637       0.129492        -0.045198       -0.067518       0.07739700000000001     0.16213 -0.022069       0.060171   0.34275100000000003      0.032792        -0.124957       0.114371        ...
...

embedding-metadata.tsv:

community       description     over18
keto    The Ketogenic Diet is a low carb, high fat method of eating. And /r/keto is place to share thoughts, ideas, benefits, and experiences around eating within a Ketogenic Diet...     False
AskReddit       /r/AskReddit is the place to ask and answer thought-provoking questions.        False
...

Social dimensions

The communities used to construct each social dimension are listed in social-dimensions.yaml. The first pair was manually provided while the rest were automatically found as per Methods, Finding social dimensions.

social-dimensions.yaml:

dimensions:
	- name: age
	  seeds:
		- [teenagers, RedditForGrownups]
		- [youngatheists, TrueAtheism]
....

Social dimension scores

The scores for all of the 10,000 Reddit communities on each of our social dimensions (ex. age, partisan) and associated neutral dimensions are available in scores.csv.

scores.csv:

community,age,gender,partisan,...
keto,0.17760505920402261,0.10308876095697105,-0.015496712806190574,...
AskReddit,-0.07415413657149496,0.13052107711645367,0.05281928294403579,...
...

Figure data

The underlying data for all main text figures from the paper are available in data/figure_data. Code to reproduce all figures is available in full_code/commembed/plots.

Citation

If you use any data or code from this repository, please cite our paper:

Waller, I., Anderson, A. Quantifying social organization and political polarization in online platforms. Nature 600, 264โ€“268 (2021). https://doi.org/10.1038/s41586-021-04167-x

Reproduction code

Code to reproduce the analyses from the paper is available in full_code/.

Requirements

  • Python 3.x
  • Spark and pyspark
  • pandas
  • Software that can run Jupyter notebooks

Instructions to reproduce social dimensions

  1. Load the full_code/social-dimensions.ipynb notebook.
  2. Run all cells in the notebook.
  3. Resulting scores for all communities will be saved in the scores.csv file, as well as the scores Pandas DataFrame in the notebook for you to explore.

See full_code/scores.csv from the repository for full example output, which this code should reproduce exactly.

Instructions to reproduce analyses / plots from paper

  1. You will need to first download the Pushshift data (see script full_code/commembed/data/download.sh) and then import it to parquet format (see script full_code/commembed/data/import_data.py).
  2. Notebooks to generate all the plots are in the full_code/notebooks folder. They are ordered because some notebooks generate data that later notebooks depend on.

Contact

If you have any questions, please contact us.

waller-anderson-2020-data's People

Contributors

isaacwaller avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.