This repository contains the PyTorch implementation for the experiments in Descartes: Generating Short Descriptions of Wikipedia Articles. The codebase was built upon huggingface transformers library.
@inproceedings{sakota2022descartes,
title={Descartes: Generating short descriptions of Wikipedia articles},
author={Šakota, Marija and Peyrard, Maxime and West, Robert},
booktitle={Proceedings of The Web Conference (WWW)},
year={2023}
}
Please consider citing our work, if you found the provided resources useful.
Start by cloning the repository:
git clone https://github.com/epfl-dlab/descartes.git
We recomment creating a new conda virtual environment:
conda env create -n descartes --file=requirements.yaml
conda activate descartes
Install transformers from source:
cd transformers
pip install -e .
Then, install the remaining packages using:
pip install -r versions.txt
To train the model, run the train_descartes.sh
script. To specify the directory where data is located use --data_dir
. For the directory where model checkpoints will be saved, specify --output_dir
. If you want to use knowledge graph embeddings, use --use_graph_embds
flag. If you want to use existing descriptions, specify the path of the model used to embed them with --bert_path
(for example, bert-base-multilingual-uncased
). For monolingual baselines, use --baseline
tag.
To test the model, run the test_model.sh
script. To specify the directory where data is located use --data_dir
. For the directory where model was stored use --output_dir
. For the directory where textual outputs will be saved, use --output_folder
.
Distributed under the MIT License. See LICENSE for more information.