Skip to content

Combining Document Layout Analysis and OCR to extract text, figures and tables from journal articles.

License

Notifications You must be signed in to change notification settings

alanmeeson/article-parser

Repository files navigation

Article Parser

Extracts text, figures and tables from academic journal articles.

Getting Started

Install & Use

pip install -r requirements_prod_pt1.txt
pip install -r requirements_prod_pt2.txt

python -m article_parser paper.pdf out_dir

Note: the two separate install files, this is because Detectron2 will fail to install if torch is not already present.

Setting up a development environment

pip install -r requirements_dev.txt
pip install -r requirements_prod_pt1.txt
pip install -r requirements_prod_pt2.txt

About

Combining Document Layout Analysis and OCR to extract text, figures and tables from journal articles.

Resources

License

Stars

Watchers

Forks

Packages

No packages published