Skip to content
This repository has been archived by the owner on Jun 9, 2023. It is now read-only.

in-rolls/bihar-2020-electoral-rolls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

Bihar Electoral Rolls (2020)

🚫 This repository has been archived. The code was written to scrape data at a point in time.

We scraped the 2020 Bihar Electoral Rolls from http://ele.bihar.gov.in/pdfsearch/ (Publication Date: 07-02-2020). In all, there were 72,723 primary rolls from 243 constituencies.

The file name has the following format: FinalRoll_ACNo_<AC NO 1~243>PartNo_<PART NO>.pdf

Scripts

  1. We used the script to download the files and upload them to Google Cloud Storage (gs://in-electoral-rolls-2020/bihar).
  • There were a few files which we couldn't download in the first try. The script for downloading those is here.
  1. Notebook to check if we downloaded all the files
  2. Notebook to check file size and produce metadata CSV for files
  3. Notebook gets the metadata from the webpage (including names etc.) and appends to the csv obtained step 3

Log Files and Metadata CSV

How Do I Get the Electoral Rolls?

We have instituted the same process as here.

Given privacy concerns, we are releasing the data only for research purposes. To access the pdfs, you must agree to take all precautions to maintain the privacy of Indian electors. (There is a difference between data being available in pdfs, split across different sites, sometimes behind CAPTCHA, and a common data dump.) You will get read access to Google Coldline storage bucket for a month. The buckets are setup as requester pays. So you need to create a project that will be used for billing. You can access them as follows:

gsutil -u projectname_for_billing ls gs://in-electoral-rolls-2020/bihar