GitHub - younginnovations/moldova-courtcases: Contains scripts to scrape the courtcases from the court websites and a simple API to serve those data.

Background

This contains the scripts to scrape the courtcases and make the data available through API. Scrapping is a very heavy process, takings days to complete the process. Don't do the scrapping regularly and use the API to use the courtcases data.

The following system diagram shows that the scrapy script pulls the information from court sites and saves to MySQL database and stores files separately. Then there's API which pulls the information from the database and the OCDS portal uses the API to show the court information in the companies pages.

Scrapping Courtcases

Requirements for scrapping

pip install scrapy
pip install peewee
pip install MySQL-python

Running the scrapy

Cleanup data for fresh scrapping mysql -uroot -p -e "DROP DATABASE IF EXISTS moldova_courtcases;CREATE DATABASE moldova_courtcases CHARACTER SET utf8 COLLATE utf8_general_ci;"
copy dbconfig.py.bak to dbconfig.py and update database information
scrapy list should show the spider name
scrapy crawl Cases will start to crawl, create html file and save to database

Scrapy with pause and resume

follow https://doc.scrapy.org/en/latest/topics/jobs.html
scrapy crawl Cases -s JOBDIR=crawls/cases-1 to pause and restart the job
ctrl+c to stop and above command to restart again

Testing Scrapy in Shell

Run scrapy shell https://cac.instante.justice.md/ro/hot

You may run scrapy code and see the results one by one

Run the followings one line at a time

from scrapy.selector import Selector
decisions = sel.xpath('//table/tbody/tr')
courtName = sel.xpath('//h2[contains(@class,"site-name")]/a/text()').extract()
courtName

You will see the court name in the shell

Once it works, then copy the working code in the sourcefile

API

Requirements

pip install flask
pip install gunicorn

Testing API

python api.py will serve the API in port 8090.

Running API in CentOS-based server

copy moldovacourts_api.service.bak to moldovacourts_api.service and update the project directory information
create soft-link ln -s /home/moldova-ocds/pydev/src/moldovacourts/moldovacourts_api.service /etc/systemd/system/moldovacourts_api.service
systemctl start moldovacourts_api.service to start the moldova_api gunicorn server

Using API

domain:8090/courtcasescount?q=name gives the count of cases for the given company name
domain:8090/courtcases?q=name gives the cases lists in json for the given company name

[
  {
    "caseNumber": "26-2-587-02022017", 
    "caseType": "Civil", 
    "court": "Judec\u0103toria Drochia", 
    "deliveryDate": "", 
    "theme": "Ac\u021biuni privind \u00eencasarea datoriei", 
    "title": "AE\u00ce Sofmicrocredit vs R\u0103di\u021b\u0103 Igor Profire, \u021aurlea Violeta, Banu Sergiu - \u00eencasarea datoriei"
  }, 
  {
    "caseNumber": "20-2c-5683-27022017", 
    "caseType": "Civil", 
    "court": "Judec\u0103toria Chi\u0219in\u0103u", 
    "deliveryDate": "", 
    "theme": "Litigii privind executarea obligatiilor", 
    "title": "Casa Nationala de Asigurari So vs SRL Ladita Fermecata"
  }, 
  ...
]

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
courtcases		courtcases
files		files
.gitignore		.gitignore
LICENSE.md		LICENSE.md
api.py		api.py
moldovacourts_api.service.bak		moldovacourts_api.service.bak
readme.md		readme.md
scrapy.cfg		scrapy.cfg
system.jpg		system.jpg
wsgi.py		wsgi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Background

Scrapping Courtcases

Requirements for scrapping

Running the scrapy

Scrapy with pause and resume

Testing Scrapy in Shell

API

Requirements

Testing API

Running API in CentOS-based server

Using API

About

Releases

Packages

Contributors 2

Languages

License

younginnovations/moldova-courtcases

Folders and files

Latest commit

History

Repository files navigation

Background

Scrapping Courtcases

Requirements for scrapping

Running the scrapy

Scrapy with pause and resume

Testing Scrapy in Shell

API

Requirements

Testing API

Running API in CentOS-based server

Using API

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages