Replication data and code for Study 1 (Social Media Data Analysis)

Author: Jae Yeon Kim (jkim638@jhu.edu)

Paper: https://osf.io/preprints/socarxiv/dvm7r/ (accepted at Perspectives on Politics)

Session information

Programming languages

R version 4.0.4 (2021-02-15)
Python 3.8.8
Bash 5.1.4(1)-release

Operation system

Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 21.04

Data collection

Raw data: tweet_ids

The data source is the large-scale COVID-19 Twitter chatter dataset (v.15) created by Panacealab. The original dataset only provided tweet IDs, not tweets, following Twitter's developer terms. I turned these tweet IDs back into a JSON file (tweets) using Twarc. This process is called hydrating and is very time-consuming. To ease the process, I created an R package, called tidytweetjson, that efficiently parses this large JSON file into a tidyverse-ready data frame. To help replication, I also saved the IDs of the tweets by typing the following command in the terminal: grep "INFO archived" twarc.log | awk '{print $5}' > tweet_ids

Replication code

00_setup.sh: Shell script for collecting Tweets and their related metadata based on Tweet IDs
01_google_trends.r: R script for collecting Google search API data
01_sample.Rmd: R markdown file for sampling Twitter data
02_parse.r: R script for parsing Twitter data. This script produced a cleaned and wrangled data named 'parsed.rds.' This file is not included in this repository to not violate Twitter's Developer Terms. Also, its file size is quite large (1.4 GB).

Descriptive analysis

Replication code

03_explore.Rmd: R markdown file for further wrangling and exploring data. This file creates Figure 2. (overall_trend.png)
04_01_hashtags.R: R script file for creating a wordlcoud of hashtags. This file creates Figure 1. (hash_cloud.png)
04_clean.ipynb: Python notebook for cleaning texts

Topic modeling

Replication code

05_topic_modeling.Rmd: R markdown for topic modeling analysis. This file creates Figure 3 (dynamic_topic_day.png)

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
App		App
code		code
functions		functions
outputs		outputs
processed_data		processed_data
raw_data		raw_data
.gitignore		.gitignore
README.md		README.md
hateasiancovid.Rproj		hateasiancovid.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replication data and code for Study 1 (Social Media Data Analysis)

Data collection

Descriptive analysis

Topic modeling

About

Releases

Packages

Languages

jaeyk/covid19antiasian

Folders and files

Latest commit

History

Repository files navigation

Replication data and code for Study 1 (Social Media Data Analysis)

Data collection

Descriptive analysis

Topic modeling

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages