Skip to content

Latest commit

 

History

History
20 lines (14 loc) · 839 Bytes

README.md

File metadata and controls

20 lines (14 loc) · 839 Bytes

kgdata PyPI Documentation

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

  • Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
  • Create embedded key-value databases to access entities from the dumps.
  • Extract Wikidata ontology.
  • Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
  • Create Pyserini indices to search Wikidata’s entities.
  • and more

For a full documentation, please see the website.

Installation

From PyPI (using pre-built binaries):

pip install kgdata[spark]   # omit spark to manually specify its version if your cluster has different version