This repo contains the source code for the paper "Target-Level Sentence Simplification as Controlled Paraphrasing", presented at the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022).
The idea in the paper is to apply Future Discriminators for Generation (FUDGE) to steer paraphrastic generations towards simpler alternatives.
For this study, we reimplemented FUDGE as a standalone, custom LogitProcessor which can be used with most of HuggingFace's generation utilities. This facilitates experimentation with more various decoding techniques, e.g. beam search, top-k/top-p sampling.
This repo is setup as follows:
analysis
: contain jupyter notebooks for analysing and plotting results.data_prep
: scripts for preparing various simplification data sets, including Newsela as used in this study.easse_patch
: a single script which can be used to replace one of the original scripts in EASSE.legacy
: original scripts and files from the original FUDGE repo.model_outputs
: jsonl files containing scored outputs generated by our models and baselines.- implementation and experimentation scripts.
conda create -n fudge python=3.8.5 -y
pip install -r requirements.txt
mkdir installs
cd installs
# for evaluation
git clone git@github.com:feralvam/easse.git
cd easse
git checkout 462c4c9ecc8a92d3a0aa948c0b76ddb1e82e9ed3
pip install -e .
cd ..
# for inference with MUSS
git clone git@github.com:tannonk/muss.git
cd muss
pip install -e .
cd ../..
# for customised evaluation, replace the qualisty_estimation.py script in easse with our patch.
cp easse_patch/quality_estimation.py installs/easse/easse/
Note, to help us inspect quality estimation outputs, we provide a modified version of EASSE's quality_estimation.py
which includes the option to not aggregate the computed QE metrics on the corpus level.
To make use of this, replace the quality_estimation.py
in the EASSE package with our version located in easse_patch/
.
Note: Many paths are hardcoded in the data_prep
and analysis
scripts.
Before running, you should adjust these to suit your setup. We recommend making a new directory resources
(or symlink to a storage dir) for data and models.
mkdir -p resources/data
The script run_experiments.sh
contains all experiment commands defined as functions with example calls on how to execute them.
The basic workflow consists of the following steps:
For our experiments, we train our FUDGE classifiers on the Newsela Corpus (Xu et al. 2015). Access must be requested here. However, any comparable labelled data can be used, e.g. Simple Endlish Wiki.
For the evaluating with aligned sentences, we use the manually aligned splits from Jiang et al. (2020).
# prepare training data
bash data_prep/prepare_newsela_data_for_fudge.sh
# prepare evaluation (sentence-aligned) data
bash data_prep/collect_newsela_wiki_manual_alignments.sh sents
FUDGE model training is defined in main.py
. To train a FUDGE discriminator on paragraph subsequences, i.e. including multi-sentence subsequences, run
# newsela discriminator on paragraph-level with level 4 simplifications
nohup bash run_experiments.sh train_simple_newsela_discriminator 2 4 article_paragraphs >| logs/train_l4.log &
To train a paraphrastic generator, we fine-tune BART with web-mined paraphrases from Martin et al., 2020. Big thanks to Louis Martin for helping us get this data!
# bart-large paraphraser trained on muss mined en
nohup bash run_experiments.sh finetune_bart_large_on_muss_mined >| logs/finetune.log &
Once we have the generator model and a FUDGE discriminator, we can get simplifications using predict_simplify.py
, e.g.:
python predict_simplify.py \
--condition_model <PATH_TO_FUDGE_DISCRIMINATOR> \
--generation_model <PATH_TO_GENERATION_MODEL> \
--condition_lambda 5 \
--num_beams 4 \
--num_return_sequences 1 \
--input_text "Memorial West's class is one of several programs offered through hospitals to help children stay healthy through exercise and proper eating"
# should output something like the following:
# ***
# Complex: Memorial West's class is one of several programs offered through hospitals to help children stay healthy through exercise and proper eating
# Simple: Memorial West's class is one of many programs available through hospitals. It is one of many programs accessible through hospitals to help children stay healthy.
# ***
To decode a test set, run:
python inference.py \
--condition_model <PATH_TO_FUDGE_DISCRIMINATOR> \
--generation_model <PATH_TO_GENERATION_MODEL> \
--infile <PATH_TO_INFILE> \
--batch_size 10 --condition_lambda 5
NOTE: We assume the test set to be a .txt
or .tsv
file. If a .tsv
file is passed, it's assumed that the first column contains the complex sentences to be simplified, e.g.:
VIRGINIA CITY, Nev. — One wonders what Mark Twain.... VIRGINIA CITY, Nev. — Mark Twain is a famous American writer...
For evaluating generated simplifications, we use EASSE
The relevant metrics are computed in simplification_evaluation.py
. To run evaluations, run:
python simplification_evaluation.py \
--src_file <PATH_TO_TEST_SET> \
--hyp_file <PATH_TO_MODEL_OUTPUTS>
FUDGE has one main hyperparameter (lambda, default=1). Selecting the correct value for lamba may depend on the quality of the paraphraser, discriminator and the corpus.
The script hp_search.py
can be used to perform a hyperparameter sweep. We also provide the notebook analyse_hp_sweeps.ipynb
to inspect and plot the results.
If you have MUSS installed and running (e.g. in a separate conda environment), you can adapt simplify_with_muss.sh
to generate simplifications for a given input file, e.g.:
bash simplify_with_muss.sh /srv/scratch6/kew/ats/data/en/aligned/turk_test.tsv /srv/scratch6/kew/ats/muss/outputs/turk_test_HEAD.txt 5
To train the label-supervised method, run:
nohup bash run_experiments.sh finetune_bart_large_on_supervised_labeled_newsela_manual >| newsela_supervised_finetune.log &
@inproceedings{kew-ebling-2022-target,
title = "Target-Level Sentence Simplification as Controlled Paraphrasing",
author = "Kew, Tannon and
Ebling, Sarah",
booktitle = "Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates (Virtual)",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.tsar-1.4",
pages = "28--42",
}
2023-04-13: Installing evaluate==0.4.0
replaced datasets==1.17.0
with datasets==2.11.0
torch==1.7.0 --> torch==2.0.0
Replaced protobuf==4.22.1 with protobuf==3.20.0