Named Entity Recognition for Bahasa Indonesia NER with PyTorch.
Corpus for NER:
- https://github.com/yohanesgultom/nlp-experiments
- https://github.com/yusufsyaifudin/indonesia-ner
The step-by-step implementation in Google Colab is indexed here.
The Fine-tuned Indonesian word embeddings id_ft.bin
is available here, based on word embeddings trained in indonesian-word-embedding.
- BiLSTM
- BiLSTM + Word Embeddings
- BiLSTM + Word Embeddings + Char Embeddings (CNN)
- BiLSTM + Word Embeddings + Char Embeddings (CNN) + Attention Layer
- Transformer (simplified BERT) + Word Embeddings + Char Embeddings (CNN)
Automatic learning rate finder based on pytorch-lr-finder.
Note: since the learning rates are determined automatically from the same range for all models, it may not be the best learning rate. To see the best learning rate, check the google colab version.
Example output:
Gunawan, W., Suhartono, D., Purnomo, F., & Ongko, A. (2018). Named-entity recognition for indonesian language using bidirectional lstm-cnns. Procedia Computer Science, 135, 425-432.