Multilingual Unsupervised Sentence Simplification

Code and pretrained models to reproduce experiments in "MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".

Prerequisites

Linux with python 3.6 or above (not compatible with python 3.9 yet). If your operating system is Windows, you can use WSL with ubuntu 20.04LTS.

Installing

git clone git@github.com:facebookresearch/muss.git
cd muss/
pip install -e .  # Install package
python -m spacy download pt_core_news_md en_core_web_md fr_core_news_md es_core_news_md # Install required spacy models
ulimit -n 100000 # If you train a new model

How to use

Some scripts might still contain a few bugs, if you notice anything wrong, feel free to open an issue or submit a Pull Request.

Simplify sentences from a file using pretrained models

First, download the template of the desired language in the folder resources/models. Pretrained models should be downloaded automatically, but you can also find them here:

muss_en_wikilarge_mined
muss_en_mined
muss_fr_mined
muss_es_mined
muss_pt_mined

Then run the command:

python scripts/simplify.py FILE_PATH_TO_SIMPLIFY --model-name MODEL_NAME

# English
python scripts/simplify.py scripts/examples.en --model-name muss_en_wikilarge_mined
# French
python scripts/simplify.py scripts/examples.fr --model-name muss_fr_mined
# Spanish
python scripts/simplify.py scripts/examples.es --model-name muss_es_mined
# Portuguese
python scripts/simplify.py scripts/examples.pt --model-name muss_pt_mined

Mine the data

If you are going to add a new language to this project, in folder resources/models/language_models/wikipedia donwload the files of the target language from https://huggingface.co/edugp/kenlm/tree/main/wikipedia. These language models are used to filter high quality sentences in the paraphrase mining phase.

To run paraphrase mining run the command below:

python scripts/mine_sequences.py

Train the model

python scripts/train_model.py NAME_OF_DATASET --language LANGUAGE

Evaluate simplifications

Please head over to EASSE for Sentence Simplification evaluation.

License

The MUSS license is CC-BY-NC. See the LICENSE file for more details.

Authors

Louis Martin (louismartincs@gmail.com)
Raphael Assis (contato.raphael.assis@gmail.com)

Citation

If you use MUSS in your research, please cite MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases

@article{martin2021muss,
  title={MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases},
  author={Martin, Louis and Fan, Angela and de la Clergerie, {\'E}ric and Bordes, Antoine and Sagot, Beno{\^\i}t},
  journal={arXiv preprint arXiv:2005.00352},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
documentations		documentations
muss		muss
muss_system_outputs		muss_system_outputs
resources/datasets		resources/datasets
scripts		scripts
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual Unsupervised Sentence Simplification

Prerequisites

Installing

How to use

Simplify sentences from a file using pretrained models

Mine the data

Train the model

Evaluate simplifications

License

Authors

Citation

About

Releases

Packages

Languages

License

AssisRaphael/muss-ptBR

Folders and files

Latest commit

History

Repository files navigation

Multilingual Unsupervised Sentence Simplification

Prerequisites

Installing

How to use

Simplify sentences from a file using pretrained models

Mine the data

Train the model

Evaluate simplifications

License

Authors

Citation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages