Skip to content

Implementation of a post-hoc cleaning module for CLOCQ, that can help to apply CLOCQ on entity or relation linking tasks.

Notifications You must be signed in to change notification settings

PhilippChr/CLOCQ-pruning-module

Repository files navigation

Pruning Module for adapting CLOCQ to Entity Linking

Description

This repository contains the code for our submission to the SMART 2022 Task, and builds upon the CLOCQ repository. It contains code for a post-hoc pruning module, that operates on the CLOCQ outputs and prunes noisy linkings, to improve precision on entity linking tasks.

In case of any questions, please let us know.

Code Usage

Please first clone and install the CLOCQ code CLOCQ repository for accessing the public CLOCQ API. Downloading any data is not required.

Clone the repo via:

    git clone https://github.com/PhilippChr/CLOCQ-pruning-module.git
    cd CLOCQ-pruning-module/
    conda create --name clocq python=3.8
    conda activate clocq

    # install dependencies
    git clone https://github.com/PhilippChr/CLOCQ.git
    cd CLOCQ/
    pip install -e .
    cd ..
    pip install transformers

Pytorch is required as well:

    # install PyTorch without CUDA
    conda install pytorch torchvision torchaudio -c pytorch

    # install PyTorch for CUDA 10.2 (using GPU)
    conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

    # install PyTorch for CUDA 11.3 (using GPU)
    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

SMART results

First download the SMART data and put it in ./data. You can use the following script, that also downloads the trained model and initializes required folders.

    bash initialize.sh

Next, split the original train set in a train and dev split:

    python prepare_smart_data.py

Then, run original CLOCQ code on the data to identify potential linkings. The parameters can be adjusted in the config.

    python run_clocq.py <PATH_TO_INPUT> <PATH_TO_OUTPUT> [<PATH_TO_CONFIG>]

For example:

    python run_clocq.py data/SMART2022-EL-wikidata-train-split.json results/SMART2022-EL-wikidata-train-split-clocq.json
    python run_clocq.py data/SMART2022-EL-wikidata-dev-split.json results/SMART2022-EL-wikidata-dev-split-clocq.json
    python run_clocq.py data/SMART2022-EL-wikidata-test.json results/SMART2022-EL-wikidata-test-split-clocq.json

Then, train the pruning module:

    python pruning_module.py --train <PATH_TO_TRAIN> <PATH_TO_DEV> [<PATH_TO_CONFIG>]

For example:

    python pruning_module.py --train results/SMART2022-EL-wikidata-train-split-clocq.json results/SMART2022-EL-wikidata-dev-split-clocq.json

Finally, run the pruning module via:

    python pruning_module.py --inference <PATH_TO_INPUT> <PATH_TO_OUTPUT> [<PATH_TO_CONFIG>]

For example:

    python pruning_module.py --inference results/SMART2022-EL-wikidata-test-split-clocq.json results/SMART2022-EL-wikidata-test-final-results.json

You can find the results in the specified output file (results/SMART2022-EL-wikidata-test-final-results.json in the example).

About

Implementation of a post-hoc cleaning module for CLOCQ, that can help to apply CLOCQ on entity or relation linking tasks.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published