Skip to content

Commit

Permalink
Elaborate on Bayesian model that exhibit overfitting (#10)
Browse files Browse the repository at this point in the history
  • Loading branch information
yousuketakada committed Apr 7, 2018
1 parent 93d25c7 commit 7e15960
Showing 1 changed file with 27 additions and 4 deletions.
31 changes: 27 additions & 4 deletions prml_errata.tex
Original file line number Diff line number Diff line change
Expand Up @@ -1955,10 +1955,33 @@ \subsubsection*{#1}
is simply an overstatement.
Bayesian methods, like any other machine learning methods, can overfit
because the \emph{true} model from which the data set has been generated is unknown in general
so that one could possibly assume an inappropriate model.
For instance,
if too broad the prior distribution~(3.52) is used in the Bayesian regression model of Section~3.3,
this effectively leads to insufficient regularization and thus overfitting.
so that one could possibly assume an inappropriate (too expressive) model
that would give a terribly wrong prediction very confidently.
This is true even when we take a ``fully'' Bayesian approach as discussed in the following.

Let us take a Bayesian linear regression model of Section~3.3 as an example and
suppose that the precision~$\beta$ of the target~$t$ in the likelihood~(3.8) is very large
whereas the precision~$\alpha$ of the parameters~$\mathbf{w}$ in the prior~(3.52) is very small
(i.e., the conditional distribution of $t$ given $\mathbf{w}$ is narrow whereas
the prior over $\mathbf{w}$ is broad so that the regularization is insufficient).
Then, the posterior~$p(\mathbf{w}|\bm{\mathsf{t}})$ given the data set~$\bm{\mathsf{t}}$ is
sharply peaked around the ML estimate~$\mathbf{w}_{\text{ML}}$ and
the predictive~$p(t|\bm{\mathsf{t}})$ is also sharply peaked
(well approximated by the likelihood conditioned on $\mathbf{w}_{\text{ML}}$)
so that the assumed model reduces to least squares.
Of course, we can extend the model by incorporating hyperpriors over $\beta$ and $\alpha$,
thus introducing more Bayesian averaging.
However, if the extended model is not sensible
(e.g., the hyperpriors are sharply peaked around wrong values),
we shall again end up with a wrong posterior and a wrong predictive.

The point here is that, since we do not know the true model,
we cannot know whether the assumed model is sensible in advance
(i.e., without any knowledge about the data).
We can however assess whether a model is better than another
in terms of, say, \emph{Bayesian model comparison} (Section~3.4),
though a caveat is that we still need some (implicit) assumptions for this procedure to work;
see the discussion around (3.73).

Moreover, one should also be aware of a subtlety here, that is,
(i)~the \emph{generalization error}, which can be measured by cross-validation, and
Expand Down

0 comments on commit 7e15960

Please sign in to comment.