You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, term frequencies are not necessarily the best representation for the text. Common words like "the", "a", "to" are almost always have the highest term frequency in the text, thus having a high raw count does not necessarily means that the corresponding word is more important. To address this problem, one of the most popular way to "normalize" the term frequencies is to weight term by the inverse of document frequency, or tf–idf. Additionally, for the specific purpose of classification supervised alternatives have been developed that take into account the class label of a document.