Skip to content

Latest commit

 

History

History
32 lines (28 loc) · 1.42 KB

README.md

File metadata and controls

32 lines (28 loc) · 1.42 KB

Scraper for Congress data. Uses Scrapy.

Cronjobs:

scrapy crawl pdfurl >> scraping_pdf_url_log.txt 2>&1
scrapy crawl proyecto >> scraping_proyecto.log.txt 2>&1
scrapy crawl seguimientos >> scraping_seguimientos.log.txt 2>&1
scrapy crawl iniciativa >> scraping_iniciativas.log.txt 2>&1
scrapy crawl updater >> scraping_updater.log.txt 2>&1
scrapy crawl expediente >> scraping_expediente.log.txt 2>&1
python proyectos_de_ley/manage.py update_index --age=24 --settings=proyectos_de_ley.settings.production   >> updating_index.log.txt 2>&1

Configure

You need a config.json file with credentials for the PostgreSQL so PDL can save the scraped data.

{                                                                                
    "drivername": "postgresql",                                                  
    "username": "username for postgresql database",                                                
    "password": "my password",                                                  
    "host": "localhost",                                                         
    "port": "5432",                                                              
    "database": "pdl",                                                           
    "crawlera_user": "optional",                                                    
    "crawlera_pass": "optional",
    "crawlera_enabled": "false",
    "legislature": "2016"
}