27,870 sentences for training and development from the VLSP 2013 POS tagging shared task:
- 27k sentences are used for training.
- 870 sentences are used for development.
Test data: 2120 test sentences from the VLSP 2013 POS tagging shared task.
Model | Accuracy | Method | Reference | Code |
---|---|---|---|---|
PhoBERT-large | 96.8 | Nguyen et al. ArXiv'20 | Official | |
vELECTRA | 96.77 | Bui et al. ArXiv'20 | Official | |
PhoBERT-base | 96.7 | Nguyen et al. ArXiv'20 | Official | |
VnMarMoT | 95.88 | Nguyen et al. NAACL'18 | Official | |
BiLSTM-CRFs + CNN-char | 95.40 | Ma et al. ACL'16 | Nguyen et al. NAACL'18 | Link |
BiLSTM-CRF + LSTM-char | 95.31 | Lample et al. NAACL'16 | Nguyen et al. NAACL'18 | Link |
BiLSTM-CRF | 95.31 | Huang et al. ArXiv'15 | Nguyen et al. NAACL'18 | Link |
RDRPOSTagger | 95.11 | Nguyen et al. EACL'14 | Official | |
JointWPD | 94.03 | Nguyen et al. '18 |
Dataset
- train: 7268 sentences, dev: 1038 sentences, test: 2077 sentences
- labels: N, V, CH, R, E, A, P, Np, M, N, Nc, L, T, Ny, Nu, X, B, S, I, Y, Vy
Model | Accuracy | Method | Reference | Code | Note |
---|---|---|---|---|---|
BiLSTM-CRFs | 93.52 | Nguyen et al. '18 | Official | 10-fold CV | |
VNTagger | 93.40 | Le et al. TALN'10 | Official | 10-fold CV | |
RDRPOSTagger | 91.96 | Pham et al. IJCNLP'17 | Official | 5-fold CV | |
NNVLP | 91.92 | Pham et al. IJCNLP'17 | Official | 5-fold CV | |
vTools | 90.73 | Tran et al. VLSP'13 | Pham et al. IJCNLP'17 | Official | |
Vitk | 88.41 | Pham et al. IJCNLP'17 | Official |
📜 Papers
- Vietnamese POS Tagging for Social Media Text - Ngo et al. 2016
- A POS Tagging Model for Vietnamese Social Media Text Using BiLSTM-CRF with Rich Features - Ngo et al. 2019
- An Empirical Study on POS Tagging for Vietnamese Social Media Text - Ngo et al. 2017
📜 Papers
- Nguyen et al. NICS'18. Building Vietnamese Linguistic Resources for Social Network Text Analysis
- Nguyen et al. ALTA'17, Nguyen et al. 2015
- Nguyen et al. 2014, Nguyen et al. 2011, Nguyen et al. 2011, Nguyen et al. 2010
- Ngo et al. 2016, Phan et al. 2008, Nguyen et al. 2006
- Nguyen et al. 2003
💫 Services
📁 Open sources
- vncorenlp/VnCoreNLP (2018)
java
- pth1993/NNVLP (2017)
python,bash
- pyvi (2016)
python
- Vitk (2016)
java
- kanjirz50/viet-morphological-analysis-crf (2016), demonstration
python
- lupanh/vTools (2015)
python
- truongdo/vita (2015)
c++
- RDRPOSTagger (2013-2017)
python
- vnTagger (2010)
java