Title: Fairness in group recommender systems
Author: Bc. Ladislav Malecek
Supervisor: Mgr. Ladislav Peska, Ph.D.
Requirements: Python version and libraries that are required are described in pyproject.toml
-
Install the required dependencies We are using python 3.9 and poetry as a dependency manager. Install poetry and then run
poetry install
to install all the dependencies. If you are not using poetry, you can install the dependencies manually. -
Run our reproducibility script
poetry run python run_experiments.py
Repository structure of the most important files and folders:
.
├── create_groups
│ ├── create_prs_groups.py
│ ├── create_random_groups.py
│ └── create_topk_groups.py
│
├── evaluation
│ ├── evaluation_longterm.ipynb
│ ├── evaluation_uniform.ipynb
│ └── evaluation_weighted.ipynb
│
├── experiments
│ ├── run_longterm_algorithms.py
│ ├── run_uniform_algorithms.py
│ └── run_weighted_algorithms.py
│
├── gather_datasets
│ └── download_and_transform.py
│
├── matrix_factorization
│ └── matrix_factorization.py
│
└── run_experiments.sh
The evaluation has 5 parts:
- Gather datasets
- Create artificial groups
- Calculate recommendation ratings that serve as the ground truth.
- Run recommendation algorithms
- Evaluate results
For more information to any mentioned python script, available arguments, and defaults settings, run python <script> --help
We have created an automatic tool for downloading, cleaning and proccessing the required datasets. Preffered variant would be to have the datasets already clean and ready in this repository, or hosted somewhere else, but that is not possible due to the datasets' licencing.
Run
poetry run python gather_datasets/download_and_transform.py
This python script has multiple options to make it convinient and reusable for different projects as well. You can specify which of the supported datasets will be downloaded, if they will be compressed and if they will be only stored or as well cleaned and processed and where the results will be stored.
We have created an automatic tool for creating artificial groups.
For creation of PRS groups run
poetry run python create_groups/create_prs_groups.py
For creation of random groups run
poetry run python create_groups/create_random_groups.py
and for creation of top-k groups run
poetry run python create_groups/create_topk_groups.py
We use fast and conviniently parallelizable algorithm called 'Alternating Least Squares (ALS) matrix factorization' to calculate the ground truths. The implementation for explicit and implicit datasets differs, for explicit datasets we are using our own implementation, and for implicit datasets we are using the implementation from the implicit
library.
Run
poetry run python ./matrix_factorization/matrix_factorization.py
For each scenario we have created a python script that runs the algorithms and saves the results to the results
folder.
Run the scripts in the experiments
folder to run the algorithms.
Run the jupyter notebooks in the evaluation
folder to evaluate the results.