Skip to content

A Large Annotated Corpus of Claims and Cited Evidence Extracted from Wikipedia for Automatic Fact Checking

Notifications You must be signed in to change notification settings

aalok-sathe/wikifactcheck-english

Repository files navigation

WikiFactCheck-English

This repository contains the data to accompany 'Automated Fact-Checking of Claims from Wikipedia'.

Contents as follows:

.
│
├── wikifactcheck-en_full0.jsonl
├── wikifactcheck-en_full1.jsonl
├── wikifactcheck-en_full2.jsonl
├── wikifactcheck-en_full3.jsonl
├── wikifactcheck-en_full4.jsonl
│
├── wikifactcheck-en_test.jsonl
└── wikifactcheck-en_train.jsonl

As explained in the paper, the annotated portion of the corpus is split into train and test sets. The entirety of the data (including annotated as well as non-annotated) is contained in the full sets, split into 5 for space constraints.

You may want to make use of the provided loading script to make use of the dataset in your code.

usage: loadwfc-en.py [-h] [-d] [-f]
                     [-r [{train,test,full} [{train,test,full} ...]]]
                     [-n NUMLINES] [-t {json,python}]

optional arguments:
  -h, --help            show this help message and exit
  -d, --download        download dataset
  -f, --force           force re-download?
  -r [{train,test,full} [{train,test,full} ...]], --read [{train,test,full} [{train,test,full} ...]]
                        read from particular datasets (default: all)
  -n NUMLINES, --numlines NUMLINES
                        numlines to read from each one
  -t {json,python}, --fmt {json,python}
                        output format for --read option

Releases

No releases published

Packages

No packages published

Languages