Scripts to process quarterly Lobbying Disclosure Act Reports from the United States Senate.
- Python 3.x —
brew install python
- Pipenv —
brew install pipenv
.env.example
: Sample configuration variables.scrape_lda_filings.py
: The principal code of this repo, which pulls these disclosure reports and related files. NOTE: This needs to be refactored into numerous smaller files.utils/
: Utilities called inscrape_lda_filings.py
.reports
: A folder to contain all downloaded quarters' reports.
-
Clone this repo and
cd
into it:$ git clone git@github.com:The-Politico/scraper_senate-lobbying-disclosures.git $ cd scraper_senate-lobbying-disclosures
-
Create a
.env
file with the following setting (see .env.example):SENATE_LDA_API_KEY='token-goes-here'
-
Setup a Python 3 virtual environment, step into it and install dependencies:
$ pipenv install --dev
After pulling someone else's changes from Github you may need to run a couple of commands to sync your local database and virtual environment:
-
Use
pipenv sync
to make sure your local dependencies line up with the latest version of the requirements file (be sure you're in your virtual environment for this step):$ pipenv install --dev $ pipenv sync
The following configuration is automatically read from a .env
file in the project's root.
Variable | What it does |
---|---|
SENATE_LDA_API_KEY |
Required: An API key from the Senate LDA site, used to request data from their systems. Sign up at this link, or use the INT's existing key as listed in the password manager. |
For now, run the following code (replacing 2020
and Q4
with your desired year and quarter):
pipenv run python -c \
'from scrape_lda_filings import scrape_lda_filings; filings = scrape_lda_filings("2020", "Q4")'
© 2020 – present POLITICO LLC.