Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.
Example:
Mark | Watney | visited | Mars |
---|---|---|---|
B-PER | I-PER | O | B-LOC |
The CoNLL 2003 NER task consists of newswire text from the Reuters RCV1 corpus tagged with four different entity types (PER, LOC, ORG, MISC). Models are evaluated based on span-based F1.
Model | F1 | Paper / Source |
---|---|---|
BiLSTM-CRF+ELMo (Peters et al., 2018) | 92.22 | Deep contextualized word representations |
Peters et al. (2017) | 91.93 | Semi-supervised sequence tagging with bidirectional language models |
Yang et al. (2017) | 91.26 | Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks |
Ma and Hovy (2016) | 91.21 | End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF |
LSTM-CRF (Lample et al., 2016) | 90.94 | Neural Architectures for Named Entity Recognition |