Skip to content
This repository has been archived by the owner on Oct 28, 2020. It is now read-only.

Latest commit

 

History

History
15 lines (10 loc) · 368 Bytes

README.md

File metadata and controls

15 lines (10 loc) · 368 Bytes

Scrape Gutenberg DE

Scrape all Books from Projekt Gutenberg-DE. Usefull, i.e., if you need a large corpus of German text to do some serious language modeling.

Usage

git clone https://github.com/jfilter/scrape-gutenberg-de --depth 1
pipenv install
pipenv run scrapy runspider scrape.py -o data.json

License

MIT.