Rematch-RARE

This repository contains the source code, data, and documentation for the research paper:

@inproceedings{kachwala-etal-2024-rematch,
    title = "{REMATCH}: Robust and Efficient Matching of Local Knowledge Graphs to Improve Structural and Semantic Similarity",
    author = "Kachwala, Zoher  and
      An, Jisun  and
      Kwak, Haewoon  and
      Menczer, Filippo",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2024",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-naacl.64",
    doi = "10.18653/v1/2024.findings-naacl.64",
    pages = "1018--1028",
    abstract = "Knowledge graphs play a pivotal role in various applications, such as question-answering and fact-checking. Abstract Meaning Representation (AMR) represents text as knowledge graphs. Evaluating the quality of these graphs involves matching them structurally to each other and semantically to the source text. Existing AMR metrics are inefficient and struggle to capture semantic similarity. We also lack a systematic evaluation benchmark for assessing structural similarity between AMR graphs. To overcome these limitations, we introduce a novel AMR similarity metric, rematch, alongside a new evaluation for structural similarity called RARE. Among state-of-the-art metrics, rematch ranks second in structural similarity; and first in semantic similarity by 1{--}5 percentage points on the STS-B and SICK-R benchmarks. Rematch is also five times faster than the next most efficient metric.",
}

An example of rematch similarity calculation for a pair of AMRs. After AMRs are parsed from sentences, rematch has a two-step process to calculate similarity. First, sets of motifs are generated. Second, the two sets are used to calculate the Jaccard similarity (intersecting motifs shown in color).

Abstract

Knowledge graphs play a pivotal role in various applications, such as question-answering and fact-checking. Abstract Meaning Representation (AMR) represents text as knowledge graphs. Evaluating the quality of these graphs involves matching them structurally to each other and semantically to the source text. Existing AMR metrics are inefficient and struggle to capture semantic similarity. We also lack a systematic evaluation benchmark for assessing structural similarity between AMR graphs. To overcome these limitations, we introduce a novel AMR similarity metric, rematch, alongside a new evaluation for structural similarity called RARE. Among state-of-the-art metrics, rematch ranks second in structural similarity; and first in semantic similarity by 1--5 percentage points on the STS-B and SICK-R benchmarks. Rematch is also five times faster than the next most efficient metric.

Keywords

Knowledge Graphs, Graph Matching, Abstract Meaning Representation (AMR), Semantic Graphs, Graph Isomorphism, Semantic Similarity, Structural Similarity.

Installation and Usage

Clone the repository:

git clone https://github.com/Zoher15/Rematch-RARE.git

Create and activate conda Environment:

conda env create -f rematch_rare.yml

conda activate rematch_rare

Data Preprocessing

License and download AMR Annotation 3.0
Preprocess data by:
```
bash methods/preprocess_data/preprocess_amr3.sh <dir>
```
<dir> is the directory where your amr_annotation_3.0_LDC2020T02.tgz file is located

Results

Structural Consistency (RARE)

Steps to reproduce these results:

Generate Randomized AMRs with Rewired Edges (RARE):

python experiments/structural_consistency/randomize_amr_rewire.py

Evaluate any metric on RARE test:
```
bash experiments/structural_consistency/structural_consistency.sh <metric>
```
<metric> should be one of rematch, smatch, s2match, sembleu, wlk or wwlk. Depending on the metric, this could take a while to run.

Semantic Consistency

Steps to reproduce these results:

Parse AMRs from STS-B and SICK-R:

a. Follow the instructions to install the transition_amr_parser. Highly recommend creating an independent conda environment called transition_amr_parser. Parse AMR3-structbart-L-smpl and AMR3-joint-ontowiki-seed42 by activating the environment and executing the script (requires cuda):
```
conda env create -f transition_amr_parser.yml
conda activate transition_amr_parser
bash experiments/semantic_consistency/parse_amrs.sh
```
b. (optional) Parse Spring by cloning the repo and following the instructions to install. Highly recommend creating an independent conda environment called spring. Also download and unzip the AMR3 pretrained checkpoint. Ensure that the resulting unzipped file (AMR3.parsing.pt) is in the cloned repo directory spring/. Then run the following, where <spring_dir> is the location of your Spring repo (requires cuda):
```
conda env create -f spring.yml
conda activate spring
bash experiments/semantic_consistency/parse_spring.sh <spring_dir>
```
c. (optional) Parse Amrbart by cloning the repo and following the instructions to install. Highly recommend creating an independent conda environment called amrbart. Then run the following, where <amrbart_dir> is the location of your Amrbart repo (requires cuda):
```
conda env create -f amrbart.yml
conda activate amrbart
bash experiments/semantic_consistency/parse_amrbart.sh <amrbart_dir>
```
Evaluate a metric on the test set:
```
conda activate rematch_rare
bash experiments/semantic_consistency/semantic_consistency.sh <metric> <parser>
```
<metric> should be one of rematch, smatch, s2match, sembleu, wlk or wwlk.

<parser> should be one of AMR3-structbart-L-smpl, AMR3-joint-ontowiki-seed42, spring_unwiki or amrbart_unwiki. Ensure the chosen <parser> has been executed in the previous step.

Hybrid Consistency (Bamboo Benchmark)

Please follow the instructions in the Bamboo repo. Do note that by default, Bamboo uses Pearsonr, but for our analysis we chose Spearmanr. That change can be made easily in the evaluation script by using find and replace. The word pearsonr needs to be replaced with spearmanr.

Efficiency

AMR Metric	Time(s)	RAM(GB)
smatch	927	0.2
s2match	7718	2
sembleu	275	0.2
WLK	315	30
*rematch*	51	0.2

Steps to reproduce this experiment:

Generate the time testbed by:

conda activate rematch_rare
python experiments/efficiency/generate_matchups.py

Evaluate a specific <metric>, one of rematch, smatch, s2match, sembleu or wlk:
```
bash experiments/efficiency/efficiency.sh <metric>
```
If all metrics have been executed, the plots from the paper can be reproduced by (save in data/processed/AMR3.0):
```
python experiments/efficiency/plot_complexity.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Rematch-RARE

Abstract

Keywords

Installation and Usage

Data Preprocessing

Results

Structural Consistency (RARE)

Semantic Consistency

Hybrid Consistency (Bamboo Benchmark)

Efficiency

Files

README.md

Latest commit

History

README.md

File metadata and controls

Rematch-RARE

Abstract

Keywords

Installation and Usage

Data Preprocessing

Results

Structural Consistency (RARE)

Semantic Consistency

Hybrid Consistency (Bamboo Benchmark)

Efficiency