June, 7, 2024

We had major update and recompiled the corpus. First, its data was updated with the documents issued to the end of 2023. Then, sone new variables added. Then, we prowided CONLL-U format morphosyntactic tagging.

March, 17, 2019

We had updated the corpus of the most complicated sentences because we fixed a bug causing corruption in some of the sentences. Note that every number more than three digits long is replaced to "999" by technical reasons.

December, 15, 2018

We have published a corpus of the most complicated sentences in Russian law texts. It was made by segmentation of texts into sentences, then we choose some of them by metrics. It is in CSV Unicode. See file most_complicated_sentences.zip

December, 05, 2018

We added documents adopted by the end of year 2017. All zipfiles were reloaded to the download source because of minor changes to some of the documents. You need to reload all zipfiles. Links and md5sums updated accordingly.

March, 02, 2018

We added an example in Python 3 of how to load data from this dataset to a Pandas DataFrame. Hope it helps to understand how to use the dataset. Commit: https://github.com/irlcode/RusLawOD/commit/54efe4dbeb3b28cdb309eb27c31b1ca3f749f712

Feb., 05, 2018

Modified XML and reloaded all files accordingly. Commit: https://github.com/irlcode/RusLawOD/commit/c5778eadd7f2dc2ecc4e81f626ca5d3251e3fc40

Dec., 9, 2017

Actually the dataset were made available

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEWS.md

NEWS.md

June, 7, 2024

March, 17, 2019

December, 15, 2018

December, 05, 2018

March, 02, 2018

Feb., 05, 2018

Dec., 9, 2017

Files

NEWS.md

Latest commit

History

NEWS.md

File metadata and controls

June, 7, 2024

March, 17, 2019

December, 15, 2018

December, 05, 2018

March, 02, 2018

Feb., 05, 2018

Dec., 9, 2017