Elaborate on Bayesian model that exhibit overfitting (#10)

yousuketakada · Apr 7, 2018 · 7e15960 · 7e15960
1 parent 93d25c7
commit 7e15960
Showing 1 changed file with 27 additions and 4 deletions.
diff --git a/prml_errata.tex b/prml_errata.tex
@@ -1955,10 +1955,33 @@ \subsubsection*{#1}
 is simply an overstatement.
 Bayesian methods, like any other machine learning methods, can overfit
 because the \emph{true} model from which the data set has been generated is unknown in general
-so that one could possibly assume an inappropriate model.
-For instance,
-if too broad the prior distribution~(3.52) is used in the Bayesian regression model of Section~3.3,
-this effectively leads to insufficient regularization and thus overfitting.
+so that one could possibly assume an inappropriate (too expressive) model
+that would give a terribly wrong prediction very confidently.
+This is true even when we take a ``fully'' Bayesian approach as discussed in the following.
+
+Let us take a Bayesian linear regression model of Section~3.3 as an example and
+suppose that the precision~$\beta$ of the target~$t$ in the likelihood~(3.8) is very large
+whereas the precision~$\alpha$ of the parameters~$\mathbf{w}$ in the prior~(3.52) is very small
+(i.e., the conditional distribution of $t$ given $\mathbf{w}$ is narrow whereas
+the prior over $\mathbf{w}$ is broad so that the regularization is insufficient).
+Then, the posterior~$p(\mathbf{w}|\bm{\mathsf{t}})$ given the data set~$\bm{\mathsf{t}}$ is
+sharply peaked around the ML estimate~$\mathbf{w}_{\text{ML}}$ and
+the predictive~$p(t|\bm{\mathsf{t}})$ is also sharply peaked
+(well approximated by the likelihood conditioned on $\mathbf{w}_{\text{ML}}$)
+so that the assumed model reduces to least squares.
+Of course, we can extend the model by incorporating hyperpriors over $\beta$ and $\alpha$,
+thus introducing more Bayesian averaging.
+However, if the extended model is not sensible
+(e.g., the hyperpriors are sharply peaked around wrong values),
+we shall again end up with a wrong posterior and a wrong predictive.
+
+The point here is that, since we do not know the true model,
+we cannot know whether the assumed model is sensible in advance
+(i.e., without any knowledge about the data).
+We can however assess whether a model is better than another
+in terms of, say, \emph{Bayesian model comparison} (Section~3.4),
+though a caveat is that we still need some (implicit) assumptions for this procedure to work;
+see the discussion around (3.73).
 
 Moreover, one should also be aware of a subtlety here, that is,
 (i)~the \emph{generalization error}, which can be measured by cross-validation, and