Source code for the paper Multitasking Framework for Unsupervised Simple Definition Generation published on ACL 2022.
- Pytorch
- fairseq
- blingfire
In order to install them, you can run this command:
pip install -r requirements-train.txt
- Pytorch
- Sentence-Transformers
- Jieba
- NLTK
- Pandas
- scipy
- xlrd
- EASSE
In order to install them, you can run this command:
pip install -r requirements-eval.txt
git clone https://github.com/feralvam/easse.git
cd easse
pip install .
-
All data including the Chinese and English DG dataset, and the simple text corpora mentioned in the paper have been placed in the folder "data".
-
Please download the pretrained model parameters of MASS from [en|zh], unzip it, and put the unzipped files into the folder "pretrained_model/MASS" and "pretrained_model/MASS-zh" respectively.
-
To preprocess the dataset, please run the following command:
bash run/data_process.sh #for English
# or
bash run/data_process_zh.sh # for Chinese
- To train a SimpDefiner that can simultaneously generated complex and simple definitions, you can run the following command:
bash run/train_oxford_oald_multi_task.sh # for English
# or
bash run/train_cwn_textbook_multi_task.sh # for Chinese
Model checkpoints will be saved in a checkpoint
dir.
- If you want to evaluate the trained model and generate definitions (both complex and simple) using this model, please run the following command:
bash run/evaluate_oxford_oald.sh --model_dir [model-dir] # for English
# or
bash run/evaluate_cwn_textbook.sh --model_dir [model-dir] # for Chinese
The generated definitions will be saved in the same checkpoint
dir.
- If you want to run automatic metrics for the generated definitions, please run the following command:
bash metrics/calc_metrics.sh [model-dir] [oxford|oald|cwn|textbook] [GPU_ID]
The [oxford|oald|cwn|textbook]
arguments are used to assign the specific definitions, where [oxford|oald]
are for English, and [cwn|textbook]
are for Chinese.
@inproceedings{kong-etal-2022-simpdefiner,
title = "Multitasking Framework for Unsupervised Simple Definition Generation",
author = "Kong, Cunliang and
Chen, Yun and
Zhang, Hengyuan and
Yang, Liner and
Yang, Erhong",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
year = "2022"
}
If you have questions, suggestions or bug reports, please email cunliang.kong@outlook.com