Here we release the data.
See /facebook_candidate. We share the IDs of the posts in CSV files with the following schema:
Column | Note |
---|---|
post_id | ID of the post |
post_url | URL of the post |
See /facebook_instagram_keyword. We share the IDs of the posts in CSV files with the following schema:
Column | Note |
---|---|
post_id | ID of the post |
platform | facebook or instagram |
post_url | URL of the post |
relevant* | 1=related to midterm; 0=irrelevant |
Post relevance label: see the paper for details. The procedure is not perfect, use with discretion.
See /reddit_keyword.
We share the data from each day in json files.
Each line of the file is an object.
We add a relevant
label to each object.
See /twitter_candidate. We share the IDs of the tweets in CSV files with the following schema:
Column | Note |
---|---|
tid | ID of the tweet |
See /twitter_keyword. We share the IDs of the tweets in CSV files with the following schema:
Column | Note |
---|---|
tid | ID of the tweet |
relevant* | 1=related to midterm; 0=irrelevant |
Post relevance label: see the paper for details. The procedure is not perfect, use with discretion.
See facebook_ads_ids.csv.gz
.
We share the IDs of the ads in CSV files with the following schema:
Column | Note |
---|---|
id | ID of the ad |
Since the 4chan data is too large for GitHub, we host it on zenodo.
For the archive files, each line is a thread in json format.
For the snapshot files, each line is a json object containing the snapshots of 4chan's catalog
endpoint.
Note: we did not perform any keyword matching on the 4chan data so not all content is related to the 2022 US midterm elections.