CampaignFinanceDataPipeline

Data Pipeline for NC Campaign Finance Dashboard

Docker folder contains a simplified environment to run the backend environment and experiments

Python scripts based on Mikhail Yuryevich Bilenko's Ph.D. dissertation: Learnable Similarity Functions and their Application to Record Linkage and Clustering.

The Python scripts are Jupyter Notebooks, but should be easily converted to an inloine python script.

Create a directory for the Python scrips

Create a data subdirectory to import the files obtained from NCSBE

Data - http://nc-campaign-finance-storage.s3-website-us-east-1.amazonaws.com/
- active_committee_list
- committee_doc_list
- committee_list
- raw_files

The scripts are meant to be run in order

* 01 - Preprocess - imports the raw files, sets up the Postgres tables and preps the data for dedupe
* 02 - Dedupe - this is a actual part that goes over the entire universe of donors and payees and determines if they are the same despite speeling and missing information
* 03 - Post Dedupe - this creates the views, copies the canonical ids to the transactions and parses out the various sources of committee information to determine party, candidate and active years
* 04 - Import Election History - Imports the various election return files by candidate/race/precinct and count of vote type according to the data produced by the NCSBE since 2000

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Committee Clean		Committee Clean
CommitteeCleanNLP		CommitteeCleanNLP
Data Pipeline		Data Pipeline
docker		docker
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CampaignFinanceDataPipeline

Create a directory for the Python scrips

Create a data subdirectory to import the files obtained from NCSBE

The scripts are meant to be run in order

About

Releases

Packages

Contributors 4

Languages

License

ncopenpass/CampaignFinanceDataPipeline

Folders and files

Latest commit

History

Repository files navigation

CampaignFinanceDataPipeline

Create a directory for the Python scrips

Create a data subdirectory to import the files obtained from NCSBE

The scripts are meant to be run in order

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages