The refactored implementation for ACL2022 paper "Graph Pre-training for AMR Parsing and Generation". You may find our paper here (Arxiv). The original implementation is avaliable here
News🎈
- (2022/12/10) fix max_length bugs in AMR parsing and update results.
- (2022/10/16) release the AMRBART-v2 model which is simpler, faster, and stronger.
- python 3.8
- pytorch 1.8
- transformers 4.21.3
- datasets 2.4.0
- Tesla V100 or A100
We recommend to use conda to manage virtual environments:
conda env update --name <env> --file requirements.yml
You may download the AMR corpora at LDC.
Please follow this respository to preprocess AMR graphs:
bash run-process-acl2022.sh
Our model is avaliable at huggingface. Here is how to initialize a AMR parsing model in PyTorch:
from transformers import BartForConditionalGeneration
from model_interface.tokenization_bart import AMRBartTokenizer # We use our own tokenizer to process AMRs
model = BartForConditionalGeneration.from_pretrained("xfbai/AMRBART-large-finetuned-AMR3.0-AMRParsing-v2")
tokenizer = AMRBartTokenizer.from_pretrained("xfbai/AMRBART-large-finetuned-AMR3.0-AMRParsing-v2")
bash run-posttrain-bart-textinf-joint-denoising-6task-large-unified-V100.sh "facebook/bart-large"
For AMR Parsing, run
bash train-AMRBART-large-AMRParsing.sh "xfbai/AMRBART-large-v2"
For AMR-to-text Generation, run
bash train-AMRBART-large-AMR2Text.sh "xfbai/AMRBART-large-v2"
cd evaluation
For AMR Parsing, run
bash eval_smatch.sh /path/to/gold-amr /path/to/predicted-amr
For better results, you can postprocess the predicted AMRs using the BLINK tool following SPRING.
For AMR-to-text Generation, run
bash eval_gen.sh /path/to/gold-text /path/to/predicted-text
If you want to run our code on your own data, try to transform your data into the format here, then run
For AMR Parsing, run
bash inference_amr.sh "xfbai/AMRBART-large-finetuned-AMR3.0-AMRParsing-v2"
For AMR-to-text Generation, run
bash inference_text.sh "xfbai/AMRBART-large-finetuned-AMR3.0-AMR2Text-v2"
Setting | Params | checkpoint |
---|---|---|
AMRBART-large | 409M | model |
Setting | BLEU(JAMR_tok) | Sacre-BLEU | checkpoint | output |
---|---|---|---|---|
AMRBART-large (AMR2.0) | 50.76 | 50.44 | model | output |
AMRBART-large (AMR3.0) | 50.29 | 50.38 | model | output |
To get the tokenized bleu score, you need to use the scorer we provide here. We use this script in order to ensure comparability with previous approaches.
Setting | Smatch(amrlib) | Smatch(amr-evaluation) | Smatch++(smatchpp) | checkpoint | output |
---|---|---|---|---|---|
AMRBART-large (AMR2.0) | 85.5 | 85.3 | 85.4 | model | output |
AMRBART-large (AMR3.0) | 84.4 | 84.2 | 84.3 | model | output |
We thank authors of SPRING, amrlib, and BLINK that share open-source scripts for this project.
@inproceedings{bai-etal-2022-graph,
title = "Graph Pre-training for {AMR} Parsing and Generation",
author = "Bai, Xuefeng and
Chen, Yulong and
Zhang, Yue",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.415",
pages = "6001--6015"
}