A Unified MRC Framework for Named Entity Recognition

The repository contains the code of the recent research advances in Shannon.AI.

A Unified MRC Framework for Named Entity Recognition
Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu and Jiwei Li
In ACL 2020. paper
If you find this repo helpful, please cite the following:

@article{li2019unified,
  title={A Unified MRC Framework for Named Entity Recognition},
  author={Li, Xiaoya and Feng, Jingrong and Meng, Yuxian and Han, Qinghong and Wu, Fei and Li, Jiwei},
  journal={arXiv preprint arXiv:1910.11476},
  year={2019}
}

For any question, please feel free to post Github issues.

Install Requirements

The code requires Python 3.6+.
If you are working on a GPU machine with CUDA 10.1, please run pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html to install PyTorch. If not, please see the PyTorch Official Website for instructions.
Then run the following script to install the remaining dependenices: pip install -r requirements.txt

We build our project on pytorch-lightning. If you want to know more about the arguments used in our training scripts, please refer to pytorch-lightning documentation.

Baseline: BERT-Tagger

We release code, scripts and datafiles for fine-tuning BERT and treating NER as a sequence labeling task.

MRC-NER: Prepare Datasets

You can download the preprocessed MRC-NER datasets used in our paper.
For flat NER datasets, please use ner2mrc/mrsa2mrc.py to transform your BMES NER annotations to MRC-format.
For nested NER datasets, please use ner2mrc/genia2mrc.py to transform your start-end NER annotations to MRC-format.

MRC-NER: Training

The main training procedure is in train/mrc_ner_trainer.py

Scripts for reproducing our experimental results can be found in the ./scripts/mrc_ner/reproduce/ folder. Note that you need to change DATA_DIR, BERT_DIR, OUTPUT_DIR to your own dataset path, bert model path and log path, respectively.
For example, run ./scripts/mrc_ner/reproduce/ace04.sh will start training MRC-NER models and save intermediate log to $OUTPUT_DIR/train_log.txt.
During training, the model trainer will automatically evaluate on the dev set every val_check_interval epochs, and save the topk checkpoints to $OUTPUT_DIR.

MRC-NER: Evaluation

After training, you can find the best checkpoint on the dev set according to the evaluation results in $OUTPUT_DIR/train_log.txt.
Then run python3 evaluate/mrc_ner_evaluate.py $OUTPUT_DIR/<best_ckpt_on_dev>.ckpt $OUTPUT_DIR/lightning_logs/<version_0/hparams.yaml> to evaluate on the test set with the best checkpoint chosen on dev.

MRC-NER: Inference

Code for inference using the trained MRC-NER model can be found in inference/mrc_ner_inference.py file.
For flat NER, we provide the inference script in flat_inference.sh
For nested NER, we provide the inference script in nested_inference.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Unified MRC Framework for Named Entity Recognition

Install Requirements

Baseline: BERT-Tagger

MRC-NER: Prepare Datasets

MRC-NER: Training

MRC-NER: Evaluation

MRC-NER: Inference

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
datasets		datasets
evaluate		evaluate
inference		inference
metrics		metrics
models		models
ner2mrc		ner2mrc
scripts		scripts
tests		tests
train		train
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

ShannonAI/mrc-for-flat-nested-ner

Folders and files

Latest commit

History

Repository files navigation

A Unified MRC Framework for Named Entity Recognition

Install Requirements

Baseline: BERT-Tagger

MRC-NER: Prepare Datasets

MRC-NER: Training

MRC-NER: Evaluation

MRC-NER: Inference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages