Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.
Example:
Mark | Watney | visited | Mars |
---|---|---|---|
B-PER | I-PER | O | B-LOC |
- 🔗 VLSP 2018 Shared Task: Named Entity Recognition Website
- 📜 VLSP 2018 Shared Task: Named Entity Recognition Paper
- 📁 VLSP 2018 Shared Task: Annotation Guidelines
The size of VLSP 2018 dataset
Type | Train | Dev | Test |
---|---|---|---|
LOC | 8,831 | 3,043 | 2,525 |
ORG | 3,471 | 1,203 | 1,616 |
PER | 6,427 | 2,168 | 3,518 |
MISC | 805 | 179 | 296 |
Model | F1 | Paper/Source | Code |
---|---|---|---|
VNER Attentive Neural Network |
77.52 | Dong et al. '18 | |
vietner CRF (ngrams + word shapes + cluster + w2v) |
76.63 | Pham et al. VLSP'18 | Official |
ZA-NER BiLSTM |
74.70 | Luong et al. VLSP'18 | |
Dong et al. 2018 | 66.07 | Dong et al. VLSP'18 |
- 🔗 VLSP 2016 Shared Task: Named Entity Recognition Website
- 📜 VLSP 2016 Shared Task: Named Entity Recognition Paper
19,692 sentences
- 14,861 sentences are used for training.
- 2,000 sentences are used for development.
- 2,831 sentences are used for testing.
Without gold POS and chunking tags
Model | F1 | Paper/Source | Code |
---|---|---|---|
PhoBERT-large | 94.7 | Nguyen et al. '20 | Official |
PhoBERT-base | 93.6 | Nguyen et al. '20 | Official |
VnCoreNLP used ETNLP embeddings |
91.30 | Nguyen et al. NAACL'18 | Official |
VNER Attentive Neural Network |
90.37 | Dong et al. '18 | |
vietner CRF (ngrams + word shapes + cluster + w2v) |
90.03 | Pham CICLing'18 | Official |
VnCoreNLP dynamic feature induction model |
88.55 | Nguyen et al. NAACL'18 | Official |
With gold POS and chunking tags
Model | F1 | Paper/Source | Code |
---|---|---|---|
VNER Attentive Neural Network |
95.33 | Dong et al. '18 | |
BiLSTM-CRF + POS + Chunk | 94.88 | Nguyen et al. 2018 | Official |
CRF (PoS, Chunk, word + word shapes + cluster + w2v) | 93.93 | Pham CICLing'18 | |
NNVLP (BiLSTM-CNN-CRF) | 92.91 | Pham et al. IJCNLP'17 | Official |
vie-ner-lstm | 92.05 | Pham et al. PACLIC'17 | Official |
Token reguilar expression + ME (Bidirectional Inference) | 88.78 | Le et al. VLSP'16 | |
BiLSTM-CNN-CRF | 88.59 | Pham et al. PACLIC'17 | |
ME + Beam Search | 84.08 | Nguyen et al. VLSP'16 | |
Stack LSTM | 83.80 | Nguyen et al. VLSP'16 | |
BiLSTM-CRF | 83.25 | Nguyen et al. VLSP'16 | |
CRF | 78.38 | Le et al. VLSP'16 |
📜 Papers
- Pham et al. KSE'18, Pham et al. PACLIC'17, Pham et al. IJCNLP'17, Le et al. KSE'17
- Nguyen et al. 2010
- Tran et al. 2007, Pham et al. 2007
📁 Open sources
- vncorenlp/VnCoreNLP (2018)
java
- pth1993/NNVLP (2017)
python,bash
- pth1993/vie-ner-lstm (2017)
python,keras
- ntson2002/lstm-crf-tagging (2017)
python,theano
- polyglot (2014-2017)
c++,java,python
- ai.vitk.ner (2017)
scala,java