Pruning Module for adapting CLOCQ to Entity Linking

Description

This repository contains the code for our submission to the SMART 2022 Task, and builds upon the CLOCQ repository. It contains code for a post-hoc pruning module, that operates on the CLOCQ outputs and prunes noisy linkings, to improve precision on entity linking tasks.

In case of any questions, please let us know.

Code Usage

Please first clone and install the CLOCQ code CLOCQ repository for accessing the public CLOCQ API. Downloading any data is not required.

Clone the repo via:

    git clone https://github.com/PhilippChr/CLOCQ-pruning-module.git
    cd CLOCQ-pruning-module/
    conda create --name clocq python=3.8
    conda activate clocq

    # install dependencies
    git clone https://github.com/PhilippChr/CLOCQ.git
    cd CLOCQ/
    pip install -e .
    cd ..
    pip install transformers

Pytorch is required as well:

    # install PyTorch without CUDA
    conda install pytorch torchvision torchaudio -c pytorch

    # install PyTorch for CUDA 10.2 (using GPU)
    conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

    # install PyTorch for CUDA 11.3 (using GPU)
    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

SMART results

First download the SMART data and put it in ./data. You can use the following script, that also downloads the trained model and initializes required folders.

    bash initialize.sh

Next, split the original train set in a train and dev split:

    python prepare_smart_data.py

Then, run original CLOCQ code on the data to identify potential linkings. The parameters can be adjusted in the config.

    python run_clocq.py <PATH_TO_INPUT> <PATH_TO_OUTPUT> [<PATH_TO_CONFIG>]

For example:

    python run_clocq.py data/SMART2022-EL-wikidata-train-split.json results/SMART2022-EL-wikidata-train-split-clocq.json
    python run_clocq.py data/SMART2022-EL-wikidata-dev-split.json results/SMART2022-EL-wikidata-dev-split-clocq.json
    python run_clocq.py data/SMART2022-EL-wikidata-test.json results/SMART2022-EL-wikidata-test-split-clocq.json

Then, train the pruning module:

    python pruning_module.py --train <PATH_TO_TRAIN> <PATH_TO_DEV> [<PATH_TO_CONFIG>]

For example:

    python pruning_module.py --train results/SMART2022-EL-wikidata-train-split-clocq.json results/SMART2022-EL-wikidata-dev-split-clocq.json

Finally, run the pruning module via:

    python pruning_module.py --inference <PATH_TO_INPUT> <PATH_TO_OUTPUT> [<PATH_TO_CONFIG>]

For example:

    python pruning_module.py --inference results/SMART2022-EL-wikidata-test-split-clocq.json results/SMART2022-EL-wikidata-test-final-results.json

You can find the results in the specified output file (results/SMART2022-EL-wikidata-test-final-results.json in the example).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
config.yml		config.yml
initialize.sh		initialize.sh
prepare_smart_data.py		prepare_smart_data.py
pruning_dataset.py		pruning_dataset.py
pruning_model.py		pruning_model.py
pruning_module.py		pruning_module.py
pruning_server.py		pruning_server.py
run_clocq.py		run_clocq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pruning Module for adapting CLOCQ to Entity Linking

Description

Code Usage

SMART results

About

Releases

Packages

Languages

PhilippChr/CLOCQ-pruning-module

Folders and files

Latest commit

History

Repository files navigation

Pruning Module for adapting CLOCQ to Entity Linking

Description

Code Usage

SMART results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages