N-Gram as the most intuitive computational language model can be implemented in the word-level or the char-leverl while having freedom-of-choice in terms of the N variable. In this project a fair comparision of uni-gram and bi-gram models in the word and charachter level.
Model | Abstraction Level | Data Partition | F1-macro | F1-micro | acc | prec | recall |
unigram | word | train | 0.878 | 0.878 | 0.878 | 0.769 | 0.955 |
unigram | word | test | 0.642 | 0.644 | 0.644 | 0.556 | 0.817 |
unigram | char | train | 0.557 | 0.590 | 0.590 | 0.517 | 0.379 |
unigram | char | test | 0.539 | 0.558 | 0.558 | 0.477 | 0.372 |
bigram | word | train | 0.980 | 0.981 | 0.981 | 0.960 | 0.995 |
bigram | word | test | 0.539 | 0.567 | 0.567 | 0.497 | 0.950 |
bigram | char | train | 0.679 | 0.687 | 0.687 | 0.620 | 0.650 |
bigram | char | test | 0.640 | 0.648 | 0.648 | 0.587 | 0.590 |