Skip to content

Commit

Permalink
adding readme and fixing data
Browse files Browse the repository at this point in the history
  • Loading branch information
senisioi committed Jan 17, 2022
1 parent 4f93273 commit 3cb3653
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 98 deletions.
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,27 @@
# rolegal
Romanian Legal Data Processing

This repository contains two datasets:

### 1. Historical Public Procurement Legislation (PPL)

This dataset consist in an archive that containes raw scraped documents covering PPL. And a .csv file containing the metadata for each file in the archive: published year, month, header, source URL, type (if primary or secondary).

Files:

- historical_procurement_legislation.zip
- historical_procurement_legislation.csv


### 2. A subset of annotated legislative bills

This dataset is extracted from the public pages of the Parliament (Senate and Chamber of Deputies). The files have been downloaded in PDF format the tesseract-ocr has been applied to convert them into Romanian. The archive contains a list of directories named after the PLX id of each legislative proposal from the Chamber of Deputies. Each directory contains a list of txt files encompassing the entire folder of a bill (written advices from different comissions, various forms that were passed. etc.)
For each proposal each directory, there are two more directories called "impact" or "nonrelevant". The "impact" directory contains the articles, paragraphs and fragments that have been annotated as impacting public procurement legislation. The "nonrelevant" contains the remaining content of the bill.


Files:

- cdep_senat_txt_annotated.zip
- impacting_laws.csv


Binary file modified historical_procurement_legislation.zip
Binary file not shown.
98 changes: 0 additions & 98 deletions step0.py

This file was deleted.

0 comments on commit 3cb3653

Please sign in to comment.