Wiktionary dump file parser and multilingual data extractor
-
Updated
Dec 27, 2024 - Python
Wiktionary dump file parser and multilingual data extractor
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.
Convert LibreOffice Writer Documents to Wikitext markup
Wikicompiler is a fully extensible python library that compile and evaluate text from Wikipedia dump. You can extract text, do text analysis or even evaluate the AST(Abstract Syntax Tree) yourself
French Lexicon Ontology.
Selected data processing scripts including language agnostic multilingual wiktionary parser
Seq2Seq model that restores punctuation on English input text.
LSTM and QRNN Language Model Toolkit for PyTorch (adapted to fast.ai version)
A Python package to parse and extract data from the German Wiktionary. It allows users to access wikitext content, either by fetching it directly online or by loading a dump file locally.
Add a description, image, and links to the wikitext topic page so that developers can more easily learn about it.
To associate your repository with the wikitext topic, visit your repo's landing page and select "manage topics."