GitHub - andreeaiana/newsreclib: PyTorch-Lightning Library for Neural News Recommendation

NewsRecLib is a library based on PyTorch Lightning and Hydra for the development and evaluation of neural news recommenders (NNR). The framework is highly configurable and modularized, decoupling core model components from one another. It enables running experiments from a single configuration file that navigates the pipeline from dataset selection and loading to model evaluation. NewsRecLib provides implementations of several neural news recommenders, training methods, standard evaluation benchmarks, hypeparameter optimization algorithms, extensive logging functionalities, and evaluation metrics (ranging from accuracy-based to beyond accuracy performance evaluation).

The foremost goals of NewsRecLib are to promote reproducible research and rigorous experimental evaluation.

Installation

NewsRecLib requires Python version 3.9 or later.

NewsRecLib requires PyTorch, PyTorch Lightning, and TorchMetrics version 2.0 or later. If you want to use NewsRecLib with GPU, please ensure CUDA or cudatoolkit version of 11.8.

Install from source

CONDA

   git clone https://github.com/andreeaiana/newsreclib.git
   cd newsreclib
   conda create --name newsreclib_env python=3.9
   conda activate newsreclib_env
   pip install -e .

Quick Start

NewsRecLib's entry point is the function train, which accepts a configuration file that drives the entire experiment.

Basic Configuration

The following example shows how to train a NRMS model on the MINDsmall dataset with the original configurations (i.e., news encoder contextualizing pretrained embeddings, model trained by optimizing cross-entropy loss), using an existing configuration file.

    python newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent

In the basic experiment, the experiment configuration only specifies required hyperparameter values which are not set in the configurations of the corresponding modules.

    defaults:
        - override /data: mind_rec_bert_sent.yaml
        - override /model: nrms.yaml
        - override /callbacks: default.yaml
        - override /logger: many_loggers.yaml
        - override /trainer: gpu.yaml
    data:
        dataset_size: "small"
    model:
        use_plm: False
        pretrained_embeddings_path: ${paths.data_dir}MINDsmall_train/transformed_word_embeddings.npy
        embed_dim: 300
        num_heads: 15

For training the NRMS model on the MINDlarge dataset, execute the following command:

python newsreclib/train.py experiment=nrms_mindlarge_pretrainedemb_celoss_bertsent

To understand how to adjust configuration files when transitioning from smaller to larger datasets, refer to the examples provided in nrms_mindsmall_pretrainedemb_celoss_bertsent and nrms_mindlarge_pretrainedemb_celoss_bertsent. These files will guide you in scaling your configurations appropriately.

Note: The same procedure applies for the advanced configuration shown below.

Advanced Configuration

The advanced scenario depicts a more complex experimental setting. Users cn overwrite from the main experiment configuration file any of the predefined module configurations. The following code snippet shows how to train a NRMS model with a PLM-based news encoder, and a supervised contrastive loss objective instead of the default settings.

    python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent

This is achieved by creating an experiment configuration file with the following specifications:

    defaults:
        - override /data: mind_rec_bert_sent.yaml
        - override /model: nrms.yaml
        - override /callbacks: default.yaml
        - override /logger: many_loggers.yaml
        - override /trainer: gpu.yaml
    data:
        dataset_size: "small"
        use_plm: True
        tokenizer_name: "roberta-base"
        tokenizer_use_fast: True
        tokenizer_max_len: 96
    model:
        loss: "sup_con_loss"
        temperature: 0.1
        use_plm: True
        plm_model: "roberta-base"
        frozen_layers: [0, 1, 2, 3, 4, 5, 6, 7]
        pretrained_embeddings_path: None
        embed_dim: 768
        num_heads: 16

Alternatively, configurations can be overridden from the command line, as follows:

    python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent data.batch_size=128

Features

Training
- Click behavior fusion strategies: early fusion, late fusion
- Training objectives: cross-entropy loss, supervised contrastive loss, dual
- All optimizers and learning rate schedulers of PyTorch
- Early stopping
- Model checkpointing
Hyperparameter optimization
- Integrated using Optuna and Hydra's Optuna Sweeper plugin
Datasets
- Adreesa: 1-week and 3-months
- MIND: MINDsmall and MINDlarge
- xMIND: all languages, dataset sizes and splits
Recommendation Models
- General recommenders (GeneralRec)
  - CAUM (code, config)
  - CenNewsRec (code, config)
  - DKN (code, config)
  - LSTUR (code, config)
  - MINER (code, config)
  - MINS (code, config)
  - NAML (code, config)
  - NPA (code, config)
  - NRMS (code, config)
  - TANR (code, config)
- Fairness-aware recommenders (FairRec)
  - MANNeR (code, config)
  - SentiDebias (code, config)
  - SentiRec (code, config)
Evaluation
- Integration with TorchMetrics
- Accuracy-based metrics: AUROC, MRR, nDCG@k
- Diversity: entropy
- Personalization: generalized Jaccard
Extensive logging
- Logging and visualization with WandB
- Quick export to CSV files
- Detailed information about training, hyperparmeters, evaluation, metadata

Contributing

We welcome all contributions to NewsRecLib! You can get involved by contributing code, making improvements to the documentation, reporting or investigating bugs and issues.

Resources

This repository was inspired by:

Other useful repositories:

https://github.com/recommenders-team/recommenders

License

NewsRecLib uses a MIT License.

Citation

We did our best to provide all the bibliographic information of the methods, models, datasets, and techniques available in NewsRecLib to credit their authors. Please remember to cite them if you use NewsRecLib in your research.

If you use NewsRecLib, please cite the following publication:

@inproceedings{iana2023newsreclib,
  title={NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation},
  author={Iana, Andreea and Glava{\v{s}}, Goran and Paulheim, Heiko},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  pages={296--310},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
configs		configs
docs		docs
newsreclib		newsreclib
notebooks		notebooks
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Install from source

CONDA

Quick Start

Basic Configuration

Advanced Configuration

Features

Contributing

Resources

License

Citation

About

Releases

Packages

Contributors 4

Languages

License

andreeaiana/newsreclib

Folders and files

Latest commit

History

Repository files navigation

Installation

Install from source

CONDA

Quick Start

Basic Configuration

Advanced Configuration

Features

Contributing

Resources

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages