Please use tinyurl.com/f3nksb7p for the EACL demo link!!
- Getting Started
- Preprocessing
- Annotation Interface
- Annotations
- GPT Generation
- Coreference Results
- Citation
Accompanying code for the papers Linear Cross-document Event Coreference Resolution with X-AMR & X-AMR Annotation Tool
Paper1: LREC-COLING 2024 https://arxiv.org/abs/2404.08656
Paper2: EACL 2024 https://arxiv.org/abs/2403.15407
-
Install the required packages:
pip install -r requirements.txt
-
Additionally, to install
prodigy
, acquire a license and follow the instructions at https://prodi.gy/docs/install -
Change directory to the
project
:cd project
-
Download spacy library
python -m spacy download en_core_web_lg
-
Download the ECB+ Corpus, PropBank frames and PropBank Website:
python -m spacy project assets
-
Create mention_map from ECB+ corpus
python -m spacy project run ecb-setup
This will create the
mention_map.pkl
pickle file atcorpus/ecb/mention_map.pkl
-
Save propbank map (
pb.dict
) to access roleset definitionspython -m spacy project run save-propbank-dict
This will create the
pb.dict
pickle file atoutputs/common/pb.dict
-
Create propbank website locally to run on the port
8700
(this step can be skipped if not using the annotation interface). You may have to start a new terminal session from the same directory to continue after running this.python -m spacy project run propbank-website
To check if this is working, enter
http://localhost:8700
in your browser.
We will use the Prodigy Annotation tool and load the recipe for our interface.
-
Create ECB+ Annotation Tasks
python -m spacy project run create-ecb-tasks
This will create the
train
,dev
, andtest
json files atcorpus/ecb/tasks
-
Example Task in
dev.json
:
{
"mention_id": "12_10ecb.xml_5",
"topic": "12",
"doc_id": "12_10ecb.xml",
"sentence_id": "0",
"marked_sentence": "The Indian navy has <m> captured </m> 23 Somalian pirates .",
"marked_doc": "The Indian navy has <m> captured </m> 23 Somalian ...",
"lemma": "capture",
"gold_cluster": "ACT17403639225065902",
"text": "The Indian navy has captured 23 Somalian pirates .",
"spans": [
{
"token_start": 4,
"token_end": 4,
"start": 20,
"end": 28,
"label": "EVT"
}
],
"meta": {
"Doc": "12_10ecb.xml",
"Sentence": "0"
},
}
- Example Task with the annotations
{
"mention_id": "12_10ecb.xml_5",
"topic": "12",
"doc_id": "12_10ecb.xml",
"sentence_id": "0",
"marked_sentence": "The Indian navy has <m> captured </m> 23 Somalian pirates .",
"marked_doc": "The Indian navy has <m> captured </m> 23 Somalian ...",
"lemma": "capture",
"gold_cluster": "ACT17403639225065902",
"text": "The Indian navy has captured 23 Somalian pirates .",
"spans": [
{
"token_start": 4,
"token_end": 4,
"start": 20,
"end": 28,
"label": "EVT"
}
],
"meta": {
"Doc": "12_10ecb.xml",
"Sentence": "0"
},
"roleset_id": "capture.01",
"arg0": "Indian_Navy"
"arg1": "23_Somalian_Pirates",
"argL": "Guld_of_Aden",
"argT": "2008"
}
-
Run the prodigy UI for annotating the roleset ids for event triggers in the train set
prodigy wsd-update ann1_train_rsid en_core_web_lg ./corpus/ecb/tasks/train.json ./outputs/common/pb.dict -UP -F ./recipes/wsd.py
Once the annotation is done, you can save the annotated tasks in the
annotations
folder by this:prodigy db-out ann1_train_rsid > annotations/ann1_train_rsid.jsonl
-
Finally, running the prodigy UI for annotating PB-MR on the annotated
rsids
:prodigy srl-update ann1_train_xamr en_core_web_lg ./annotations/ann1_train_rsid.jsonl ./outputs/common/pb.dict -UP -F
And then to save these annotations to a file:
prodigy db-out ann1_train_xamr > annotations/ann1_train_xamr.jsonl
Annotated files can be found at: /project/annotations/ecb/
The files are structured the following way:
ecb
|-- ann1
|-- dev_rs.json
|-- train_rs.json
|-- dev_small_xamr.json
|-- dev_xamr.json
|-- train_xamr.json
|-- test_common_xamr.json
|-- ann2
|-- dev_rs.json
|-- train_rs.json
|-- dev_small_xamr.json
|-- dev_xamr.json
|-- train_xamr.json
|-- test_common_xamr.json
|-- gpt-4
|-- dev_small_g1.json
|-- dev_small_g2.json
Running the G1 method:
python -m spacy project run-g1
Running the G2 method:
python -m spacy project run-g2
To run the coreference algorithm on Ann1's dev set annotations:
python -m scripts.coreference single-ann-results ./annotations/ecb/ann1/dev_xamr.json
and,
python -m scripts.coreference single-ann-results ./annotations/ecb/ann1/dev_xamr.json --use-vn
If you find this repository useful, please use the following 2 citations in your work:
@inproceedings{ahmed-etal-2024-x,
title = "{X}-{AMR} Annotation Tool",
author = "Ahmed, Shafiuddin Rehan and
Cai, Jon and
Palmer, Martha and
Martin, James H.",
editor = "Aletras, Nikolaos and
De Clercq, Orphee",
booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
month = mar,
year = "2024",
address = "St. Julians, Malta",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.eacl-demo.19",
pages = "177--186",
abstract = "This paper presents a novel Cross-document Abstract Meaning Representation (X-AMR) annotation tool designed for annotating key corpus-level event semantics. Leveraging machine assistance through the Prodigy Annotation Tool, we enhance the user experience, ensuring ease and efficiency in the annotation process. Through empirical analyses, we demonstrate the effectiveness of our tool in augmenting an existing event corpus, highlighting its advantages when integrated with GPT-4. Code and annotations: href{https://anonymous.4open.science/r/xamr-9ED0}{anonymous.4open.science/r/xamr-9ED0} footnote Demo: {href{https://youtu.be/TuirftxciNE}{https://youtu.be/TuirftxciNE}} footnote Live Link: {href{https://tinyurl.com/mrxmafwh}{https://tinyurl.com/mrxmafwh}}",
}
@misc{ahmed2024linear,
title={Linear Cross-document Event Coreference Resolution with X-AMR},
author={Shafiuddin Rehan Ahmed and George Arthur Baker and Evi Judge and Michael Regan and Kristin Wright-Bettner and Martha Palmer and James H. Martin},
month=mar,
year={2024},
eprint={2404.08656},
archivePrefix={arXiv},
primaryClass={cs.CL},
url = "https://arxiv.org/pdf/2404.08656.pdf"
}