Git Product home page Git Product logo

covid19antiasian's Introduction

Replication data and code for Study 1 (Social Media Data Analysis)

Author: Jae Yeon Kim ([email protected])

Paper: https://osf.io/preprints/socarxiv/dvm7r/ (accepted at Perspectives on Politics)

Session information

  1. Programming languages
  • R version 4.0.4 (2021-02-15)
  • Python 3.8.8
  • Bash 5.1.4(1)-release
  1. Operation system
  • Platform: x86_64-pc-linux-gnu (64-bit)
  • Running under: Ubuntu 21.04

Data collection

Raw data: tweet_ids

The data source is the large-scale COVID-19 Twitter chatter dataset (v.15) created by Panacealab. The original dataset only provided tweet IDs, not tweets, following Twitter's developer terms. I turned these tweet IDs back into a JSON file (tweets) using Twarc. This process is called hydrating and is very time-consuming. To ease the process, I created an R package, called tidytweetjson, that efficiently parses this large JSON file into a tidyverse-ready data frame. To help replication, I also saved the IDs of the tweets by typing the following command in the terminal: grep "INFO archived" twarc.log | awk '{print $5}' > tweet_ids

Replication code

  • 00_setup.sh: Shell script for collecting Tweets and their related metadata based on Tweet IDs

  • 01_google_trends.r: R script for collecting Google search API data

  • 01_sample.Rmd: R markdown file for sampling Twitter data

  • 02_parse.r: R script for parsing Twitter data. This script produced a cleaned and wrangled data named 'parsed.rds.' This file is not included in this repository to not violate Twitter's Developer Terms. Also, its file size is quite large (1.4 GB).

Descriptive analysis

Replication code

  • 03_explore.Rmd: R markdown file for further wrangling and exploring data. This file creates Figure 2. (overall_trend.png)

  • 04_01_hashtags.R: R script file for creating a wordlcoud of hashtags. This file creates Figure 1. (hash_cloud.png)

  • 04_clean.ipynb: Python notebook for cleaning texts

Topic modeling

Replication code

  • 05_topic_modeling.Rmd: R markdown for topic modeling analysis. This file creates Figure 3 (dynamic_topic_day.png)

covid19antiasian's People

Contributors

jaeyk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.