This is a scraper for the corporate registry of the country of Georgia. It is implemented in Python, using the excellent Scrapy framework.
Although there are still bugs, this scraper has significantly exceeded the capabilities of our old scraper, so please use this one from now on.
Should be pretty simple:
virtualenv geo_corp_scrape
cd geo_corp_scrape
source bin/activate
and clone the repo- cd into the repo folder and
pip install -r requirements.txt
cp settings.py.example settings.py
and edit to suit.- Install poppler
scrapy crawl corps
-- That's it.
You should get a series of JSON files representing the scraped data.