Artificial Intelligence Course 4th Project: Implementing Bigram and Unigram models for filtering comments.
In this group project we (Amirhossein-Rajabpour and arminZolfaghari)
implemented Bigram
and Unigram models
to filter comments.
We trained these models on these positive and negative
datasets. We also used smoothing
in both models (you can change coefficients). For preprocessing first we removed punctuation marks and we also have a cut_down
parameter which specifies
that words with equal or less number of repetition to this parameter should be removed. Also there is a cut_above
parameter that specifies that how many of most repeated words
should be removed.
A sample run:
Check full description here
Project report (in persian): tried different coefficients and tried the models with and without cut_down and cut_above and checked the results here
Check our other AI Course projects: