Legal Judgement Predictor for ECHR

Usage of LLMs embeddings in classification of legal cases from ECHR dataset

This repository contains the code and project report for the Legal Judgment Predictor project, which is a part of the MSc in Computer Science curriculum focused on Artificial Intelligence for the academic year 2022/2023. The project was conducted by Giacomo Lagomarsini and Leonardo Stoppani under the guidance of Prof. Giuseppe Attardi.

Project Overview

The project aimed to address the challenge of efficiently processing long legal documents for classification tasks using advanced Natural Language Processing (NLP) techniques. Specifically, the project focused on classifying European Court of Human Rights (ECHR) cases as either positive (if any human rights article or protocol has been violated) or negative (otherwise), which was made publicily available by the authors of the paper Neural Legal Judgement Prediction in English.

Dataset

The dataset used in this project consists of European Court of Human Rights (ECHR) cases, sourced from publicly available legal documents. The dataset includes textual information regarding legal judgments along with corresponding labels indicating whether any human rights article or protocol has been violated (positive class) or not (negative class).

Due to privacy and legal considerations, the dataset used in this project cannot be shared directly through this repository. However, instructions for accessing and preprocessing the dataset can be found in the project report available in the report/ directory.

The dataset was preprocessed to extract relevant textual features and encode them into a format suitable for training machine learning models. Various text processing techniques, including tokenization, stemming, and vectorization, were applied to prepare the dataset for model training and evaluation.

For further details on the dataset and preprocessing steps, please refer to the project report.

Models

The project implemented and evaluated several deep learning models for classifying ECHR cases based on textual information. The models included:

LSTM (Long Short-Term Memory) networks
Attention networks
Sparse transformers
Hierarchical transformers
BERT (Bidirectional Encoder Representations from Transformers)

The models were trained and evaluated using the preprocessed ECHR dataset to predict whether a given legal judgment violated any human rights article or protocol. The models were evaluated based on various performance metrics, including accuracy, precision, recall, and F1 score.

For further details on the models and evaluation results, please refer to the project report.

Repository Structure

src/: This directory contains the code implementation of various models, LSTM, attention networks, sparse transformers, and hierarchical transformers.
ECHR_Dataset/: This directory contains the dataset used for training and evaluation, sourced from the European Court of Human Rights (ECHR).
notebook/: This directory contains Jupyter notebooks for data preprocessing, exploratory data analysis, and model training.

Usage

To run the code and reproduce the experiments conducted in the project, follow these steps:

Clone the repository:

git clone https://github.com/lilf4p/hlt-project.git

Navigate to the src/ directory:
```
cd hlt-project/src
```
Run the Python scripts corresponding to the models and experiments of interest.

Dependencies

The project code requires the following dependencies:

Python 3.x
PyTorch
scikit-learn
pandas
NumPy
Hugging Face Transformers

Install the dependencies using pip and the requirements file:

pip install -r requirements.txt

Contributors

Acknowledgements

We express our gratitude to Prof. Giuseppe Attardi for providing the necessary hardware resources for conducting the analysis and experiments for this project.

For more details, please refer to the full project report available in the report pdf.

Note: Please ensure that you have the necessary hardware resources and dependencies installed before running the code. For any questions or issues, feel free to contact the contributors.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
ECHR_Dataset		ECHR_Dataset
images		images
notebook		notebook
paper		paper
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legal Judgement Predictor for ECHR

Usage of LLMs embeddings in classification of legal cases from ECHR dataset

Project Overview

Dataset

Models

Repository Structure

Usage

Dependencies

Contributors

Acknowledgements

References

About

Releases

Packages

Contributors 2

Languages

lilf4p/hlt-project

Folders and files

Latest commit

History

Repository files navigation

Legal Judgement Predictor for ECHR

Usage of LLMs embeddings in classification of legal cases from ECHR dataset

Project Overview

Dataset

Models

Repository Structure

Usage

Dependencies

Contributors

Acknowledgements

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages