Skip to content

j-min/WikiExtractor_To_the_one_text

Repository files navigation

WikiExtractor_To_the_one_text

Simple extension for Python script that extracts and cleans text from a Wikipedia database dump. Most of the codes are from WikiExtrator

##Installation

(sudo) python setup.py install

Usage

python WikiExtractor.py Wiki_dump.xml -options

ex) python WikiExtractor.py enwiki-latest-pages-articles.xml -b 500K -o extracted

For detailed options, see WikiExtrator

python To_the_one_text.py Input_directory Name_of_the_single_output_file

Releases

No releases published

Packages

No packages published

Languages