Git Product home page Git Product logo

covaxxy-misinfo's Introduction

Reproducibility code for "Online misinformation is linked to early COVID-19 vaccination hesitancy and refusal" Francesco Pierri, Brea Perry, Matthew R. DeVerna, Kai-Cheng Yang, Alessandro Flammini, Filippo Menczer and John Bryden. Nature Scientific Reports (2022) https://www.nature.com/articles/s41598-022-10070-w

Structure

  .
  ├── README.md 
  ├── config.ini 
  └── data
  │   ├── county_level
  │   ├── covid19
  │   ├── misc
  │   ├── state_level
  │   └── twitter
  ├── intermediate_files
  ├── logs
  ├── output_files
  └── src
  └── v1-streaming
  • config.ini - configuration file that specifies paths and filenames for the scripts
  • data - folder which contains subfolders with raw data at the state and county level, as well as Twitter data. Check related README files for further details
  • intermediate_files - folder which contains intermediate data to be merged
  • logs - folder which contains logs for the output of scripts
  • src - folder which contains scripts to be executed
  • v1-streaming - folder which contains the code used to stream the tweets

Keywords and Low-credibility sources

You can find keywords used to filter Twitter stream in src/keywords.txt. You can find the list of low-credibility sources in intermediate_files/low_credibility.csv. Check the Github repository associated to our CoVaxxy project for further details.

Instructions to replicate results

  1. Clone this repository in your local directory.
  2. Put Twitter data in the data/twitter folder. You must put .json files with one tweet json per line. Check the Github repository associated to our CoVaxxy project to see how to download our dataset and reconstruct it using Twitter API.
  3. Go to the src folder and execute Python (we used version 3.8.5) scripts (see associated src/README.md file for further details) in the following order:
    • python3 twitter_data_processing.py ../config.ini - to process Twitter data
    • python3 get_cases_and_deaths.py ../config.ini - download COVID-19 number of cases and deaths; modify config.ini to set the date range.
    • python3 aggregate_cases_and_deaths.py ../config.ini - aggregate COVID-19 numbers of cases and deaths for further use
    • python3 merge_datasets.py ../config.ini - merge together intermediate data in a single dataframe to be used for correlation.
  4. Run STATA script (src/stata_script.do) to get correlation results using output_files/master_data--{%Y-%m-%d__%H-%M-%S}.csv.
  5. To do Granger Causality analysis, go to the src folder and execute Python (we used version 3.8.5) scripts (see associated src/README.md file for further details) in the following order:
    • python3 get_temporal_data.py ../config.ini - to generate daily aggregates at a user level
    • python3 generate_aggregate_files.py ../config.ini - to then aggregate by county or state
    • python3 causality.py ../config.ini - to run causality analysis

Dependencies

  • covidcast - install by running pip install covidcast. Details can be found here
  • carmen - install by running pip install carmen. Details can be found here
  • urlexpander - install by running pip install urlexpander. Details can be found here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.