nlp-for-email

homework 3 for machine learning class

P(ham/w) = P(w/ham) * P(ham) ignore the denominator

P(w1/ham) = (# the words in the training list + 1) / (# words in the training list + nonrepeat words in the training list)

In fact, I get log (P(w/ham))

For words that not appear in the traing list, the probability is, their probability is the same:

P(w2/ham) = (0 + 1) / (# words in the training list + nonrepeat words in the training list)

Until now, we have known all the necessay probability.

For different words, they are independent so the overall posterior probability in the test set: I just take one word w1 for example.

sum ( w1_repeat times * log (P(w1/ham)) ) + # words not appear in the train list * P(w2/ham)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
dataprocess.py		dataprocess.py
dictionary.py		dictionary.py
dictionary.txt		dictionary.txt
dictionary_spam.txt		dictionary_spam.txt
hamtrain.txt		hamtrain.txt
probability.txt		probability.txt
probability_single.txt		probability_single.txt
probability_single_spam.txt		probability_single_spam.txt
probability_spam.txt		probability_spam.txt
spamtrain.txt		spamtrain.txt
test.py		test.py
testdata_process.py		testdata_process.py
train.txt		train.txt

Provide feedback