Code and pretrained models to reproduce experiments in "MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".
Linux with python 3.6 or above (not compatible with python 3.9 yet). If your operating system is Windows, you can use WSL with ubuntu 20.04LTS.
git clone git@github.com:facebookresearch/muss.git
cd muss/
pip install -e . # Install package
python -m spacy download pt_core_news_md en_core_web_md fr_core_news_md es_core_news_md # Install required spacy models
ulimit -n 100000 # If you train a new model
Some scripts might still contain a few bugs, if you notice anything wrong, feel free to open an issue or submit a Pull Request.
First, download the template of the desired language in the folder resources/models
. Pretrained models should be downloaded automatically, but you can also find them here:
muss_en_wikilarge_mined
muss_en_mined
muss_fr_mined
muss_es_mined
muss_pt_mined
Then run the command:
python scripts/simplify.py FILE_PATH_TO_SIMPLIFY --model-name MODEL_NAME
# English
python scripts/simplify.py scripts/examples.en --model-name muss_en_wikilarge_mined
# French
python scripts/simplify.py scripts/examples.fr --model-name muss_fr_mined
# Spanish
python scripts/simplify.py scripts/examples.es --model-name muss_es_mined
# Portuguese
python scripts/simplify.py scripts/examples.pt --model-name muss_pt_mined
If you are going to add a new language to this project, in folder resources/models/language_models/wikipedia
donwload the files of the target language from https://huggingface.co/edugp/kenlm/tree/main/wikipedia. These language models are used to filter high quality sentences in the paraphrase mining phase.
To run paraphrase mining run the command below:
python scripts/mine_sequences.py
python scripts/train_model.py NAME_OF_DATASET --language LANGUAGE
Please head over to EASSE for Sentence Simplification evaluation.
The MUSS license is CC-BY-NC. See the LICENSE file for more details.
- Louis Martin (louismartincs@gmail.com)
- Raphael Assis (contato.raphael.assis@gmail.com)
If you use MUSS in your research, please cite MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
@article{martin2021muss,
title={MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases},
author={Martin, Louis and Fan, Angela and de la Clergerie, {\'E}ric and Bordes, Antoine and Sagot, Beno{\^\i}t},
journal={arXiv preprint arXiv:2005.00352},
year={2021}
}