Reproducibility code for "Online misinformation is linked to early COVID-19 vaccination hesitancy and refusal" Francesco Pierri, Brea Perry, Matthew R. DeVerna, Kai-Cheng Yang, Alessandro Flammini, Filippo Menczer and John Bryden. Nature Scientific Reports (2022) https://www.nature.com/articles/s41598-022-10070-w
.
├── README.md
├── config.ini
└── data
│ ├── county_level
│ ├── covid19
│ ├── misc
│ ├── state_level
│ └── twitter
├── intermediate_files
├── logs
├── output_files
└── src
└── v1-streaming
config.ini
- configuration file that specifies paths and filenames for the scriptsdata
- folder which contains subfolders with raw data at the state and county level, as well as Twitter data. Check related README files for further detailsintermediate_files
- folder which contains intermediate data to be mergedlogs
- folder which contains logs for the output of scriptssrc
- folder which contains scripts to be executedv1-streaming
- folder which contains the code used to stream the tweets
You can find keywords used to filter Twitter stream in src/keywords.txt
. You can find the list of low-credibility sources in intermediate_files/low_credibility.csv
. Check the Github repository associated to our CoVaxxy project for further details.
- Clone this repository in your local directory.
- Put Twitter data in the
data/twitter
folder. You must put.json
files with one tweetjson
per line. Check the Github repository associated to our CoVaxxy project to see how to download our dataset and reconstruct it using Twitter API. - Go to the
src
folder and execute Python (we used version 3.8.5) scripts (see associatedsrc/README.md
file for further details) in the following order:python3 twitter_data_processing.py ../config.ini
- to process Twitter datapython3 get_cases_and_deaths.py ../config.ini
- download COVID-19 number of cases and deaths; modifyconfig.ini
to set the date range.python3 aggregate_cases_and_deaths.py ../config.ini
- aggregate COVID-19 numbers of cases and deaths for further usepython3 merge_datasets.py ../config.ini
- merge together intermediate data in a single dataframe to be used for correlation.
- Run STATA script (
src/stata_script.do
) to get correlation results usingoutput_files/master_data--{%Y-%m-%d__%H-%M-%S}.csv
. - To do Granger Causality analysis, go to the
src
folder and execute Python (we used version 3.8.5) scripts (see associatedsrc/README.md
file for further details) in the following order:python3 get_temporal_data.py ../config.ini
- to generate daily aggregates at a user levelpython3 generate_aggregate_files.py ../config.ini
- to then aggregate by county or statepython3 causality.py ../config.ini
- to run causality analysis