Machine Learning: Maximum Likelihood Estimation (MLE) and Maximum a Posteri (MAP) Estimation
ML doesn't work good with sparse data because P(X | Y)
might be zero.
(for example, Xi = birthdate, Xi = Jan_25_1992)
P(Y=1 | X1...Xn) = (P(Y=1) * Mult P(Xi | Y=1) for i) / P (X1...Xn)
We can solve it by using prior with MAP estimation.
- avoid overfitting (regularization / shrinkage)
- tends to look like MLE asymptotically
- point estimation (no representation of uncertainty in θ). Because it could choose spike of θ because it has higher probability
- not invariant under reparameterization
- must assume prior on θ
in other words it is sample mean plus prior mean.
so when n->0 we get
but when n->∞ we get
- The Cramer-Rao Lower Bound
- Central Limit Theorem sum independent random variables are tend toward a normal distribution
- Likelihood Ratio Test (compare zero hypothesis with ml value)
- Wald Test
- etc