NewsRecLib is a library based on PyTorch Lightning and Hydra for the development and evaluation of neural news recommenders (NNR). The framework is highly configurable and modularized, decoupling core model components from one another. It enables running experiments from a single configuration file that navigates the pipeline from dataset selection and loading to model evaluation. NewsRecLib provides implementations of several neural news recommenders, training methods, standard evaluation benchmarks, hypeparameter optimization algorithms, extensive logging functionalities, and evaluation metrics (ranging from accuracy-based to beyond accuracy performance evaluation).
The foremost goals of NewsRecLib are to promote reproducible research and rigorous experimental evaluation.
NewsRecLib requires Python version 3.9 or later.
NewsRecLib requires PyTorch, PyTorch Lightning, and TorchMetrics version 2.0 or later. If you want to use NewsRecLib with GPU, please ensure CUDA or cudatoolkit version of 11.8.
git clone https://github.com/andreeaiana/newsreclib.git
cd newsreclib
conda create --name newsreclib_env python=3.9
conda activate newsreclib_env
pip install -e .
NewsRecLib's entry point is the function train
, which accepts a
configuration file that drives the entire experiment.
The following example shows how to train a NRMS
model on the
MINDsmall
dataset with the original configurations (i.e., news
encoder contextualizing pretrained embeddings, model trained by
optimizing cross-entropy loss), using an existing configuration file.
python newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent
In the basic experiment, the experiment configuration only specifies required hyperparameter values which are not set in the configurations of the corresponding modules.
defaults:
- override /data: mind_rec_bert_sent.yaml
- override /model: nrms.yaml
- override /callbacks: default.yaml
- override /logger: many_loggers.yaml
- override /trainer: gpu.yaml
data:
dataset_size: "small"
model:
use_plm: False
pretrained_embeddings_path: ${paths.data_dir}MINDsmall_train/transformed_word_embeddings.npy
embed_dim: 300
num_heads: 15
For training the NRMS
model on the MINDlarge
dataset, execute the following command:
python newsreclib/train.py experiment=nrms_mindlarge_pretrainedemb_celoss_bertsent
To understand how to adjust configuration files when transitioning from smaller to larger datasets, refer to the examples provided in nrms_mindsmall_pretrainedemb_celoss_bertsent
and nrms_mindlarge_pretrainedemb_celoss_bertsent
. These files will guide you in scaling your configurations appropriately.
Note: The same procedure applies for the advanced configuration shown below.
The advanced scenario depicts a more complex experimental setting.
Users cn overwrite from the main experiment configuration file any of the
predefined module configurations. The following code snippet shows how
to train a NRMS
model with a PLM-based news encoder,
and a supervised contrastive loss objective instead of the default settings.
python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent
This is achieved by creating an experiment configuration file with the following specifications:
defaults:
- override /data: mind_rec_bert_sent.yaml
- override /model: nrms.yaml
- override /callbacks: default.yaml
- override /logger: many_loggers.yaml
- override /trainer: gpu.yaml
data:
dataset_size: "small"
use_plm: True
tokenizer_name: "roberta-base"
tokenizer_use_fast: True
tokenizer_max_len: 96
model:
loss: "sup_con_loss"
temperature: 0.1
use_plm: True
plm_model: "roberta-base"
frozen_layers: [0, 1, 2, 3, 4, 5, 6, 7]
pretrained_embeddings_path: None
embed_dim: 768
num_heads: 16
Alternatively, configurations can be overridden from the command line, as follows:
python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent data.batch_size=128
- Training
- Click behavior fusion strategies: early fusion, late fusion
- Training objectives: cross-entropy loss, supervised contrastive loss, dual
- All optimizers and learning rate schedulers of PyTorch
- Early stopping
- Model checkpointing
- Hyperparameter optimization
- Integrated using Optuna and Hydra's Optuna Sweeper plugin
- Datasets
- Recommendation Models
- General recommenders (GeneralRec)
- Fairness-aware recommenders (FairRec)
- Evaluation
- Integration with TorchMetrics
- Accuracy-based metrics: AUROC, MRR, nDCG@k
- Diversity: entropy
- Personalization: generalized Jaccard
- Extensive logging
- Logging and visualization with WandB
- Quick export to CSV files
- Detailed information about training, hyperparmeters, evaluation, metadata
We welcome all contributions to NewsRecLib! You can get involved by contributing code, making improvements to the documentation, reporting or investigating bugs and issues.
This repository was inspired by:
Other useful repositories:
NewsRecLib uses a MIT License.
We did our best to provide all the bibliographic information of the methods, models, datasets, and techniques available in NewsRecLib to credit their authors. Please remember to cite them if you use NewsRecLib in your research.
If you use NewsRecLib, please cite the following publication:
@inproceedings{iana2023newsreclib,
title={NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation},
author={Iana, Andreea and Glava{\v{s}}, Goran and Paulheim, Heiko},
booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
pages={296--310},
year={2023}
}