Skip to content

Web Scraping da biblioteca de livros disponíveis gratuitamente no Domínio Público.

Notifications You must be signed in to change notification settings

PublicaLivros/scraping-dominio-publico

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraping Dominio Publico (Gov Brasil)

Tools for scraping data and files from the Public Domain (gov BR).

badge-js badge-python badge-shellscript

Dependencies:

  • Python 3.11+
  • NodeJS 19+
  • A Linux bash environment (if you want to use the script).

How to use

Run the file "run.sh" in a bash terminal and choose an option. Alternatively, if you want to execute it directly, include the option number as an argument.

Run the script with the menu to choose an option:

./run.sh

Run the script with the option already included:

# In this case, the option "1" from the menu
./run.sh 1

The scraping data is stored in the "json" directory with the name "raw_data.json", and the downloaded book files are saved in the "booklibrary" directory. It is necessary to perform the scraping of the data first before running the script to download them.

About

Web Scraping da biblioteca de livros disponíveis gratuitamente no Domínio Público.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published