forked from avehtari/BDA_course_Aalto
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBDA_notes_ch3.tex
244 lines (207 loc) · 9.19 KB
/
BDA_notes_ch3.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
\documentclass[a4paper,11pt,english]{article}
\usepackage{babel}
%\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{newtxtext} % times
\usepackage{amsmath}
\usepackage[varqu,varl]{inconsolata} % typewriter
\usepackage{microtype}
\usepackage{url}
\urlstyle{same}
\usepackage[bookmarks=false]{hyperref}
\hypersetup{%
bookmarksopen=true,
bookmarksnumbered=true,
pdftitle={Bayesian data analysis},
pdfsubject={Comments},
pdfauthor={Aki Vehtari},
pdfkeywords={Bayesian probability theory, Bayesian inference, Bayesian data analysis},
pdfstartview={FitH -32768},
colorlinks=true,
linkcolor=blue,
citecolor=blue,
filecolor=blue,
urlcolor=blue
}
% if not draft, smaller printable area makes the paper more readable
\topmargin -4mm
\oddsidemargin 0mm
\textheight 225mm
\textwidth 160mm
%\parskip=\baselineskip
\DeclareMathOperator{\E}{E}
\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\var}{var}
\DeclareMathOperator{\Sd}{Sd}
\DeclareMathOperator{\sd}{sd}
\DeclareMathOperator{\Bin}{Bin}
\DeclareMathOperator{\Beta}{Beta}
\DeclareMathOperator{\Invchi2}{Inv-\chi^2}
\DeclareMathOperator{\NInvchi2}{N-Inv-\chi^2}
\DeclareMathOperator{\logit}{logit}
\DeclareMathOperator{\N}{N}
\DeclareMathOperator{\U}{U}
\DeclareMathOperator{\tr}{tr}
%\DeclareMathOperator{\Pr}{Pr}
\DeclareMathOperator{\trace}{trace}
\pagestyle{empty}
\begin{document}
\thispagestyle{empty}
\section*{Bayesian data analysis -- reading instructions 3}
\smallskip
{\bf Aki Vehtari}
\smallskip
\subsection*{Chapter 3}
Outline of the chapter 3
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item 3.1 Marginalisation
\item 3.2 Normal distribution with a noninformative prior (very important)
\item 3.3 Normal distribution with a conjugate prior (very important)
\item 3.4 Multinomial model (can be skipped)
\item 3.5 Multivariate normal with known variance (needed later)
\item 3.6 Multivariate normal with unknown variance (glance through)
\item 3.7 Bioassay example (very important, related to one of the exercises)
\item 3.8 Summary (summary)
\end{list}
Normal model is used a lot as a building block of the models in the
later chapters, so it is important to learn it now.
Bioassay example is good example used to illustrate many important
concepts and it is used in several exercises over the course.\\
R and Python demos at \url{https://avehtari.github.io/BDA_course_Aalto/demos.html}
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item demo3\_1: visualise joint density and marginal densities of
posterior of normal distribution with unknown mean and variance
\item demo3\_2: visualise factored sampling and corresponding
marginal and conditional density
\item demo3\_3: visualise marginal distribution of mu as a mixture of normals
\item demo3\_4: visualise sampling from the posterior predictive distribution
\item demo3\_5: visualise Newcomb's data
\item demo3\_6: visualise posterior in bioassay example
\end{list}
Find all the terms and symbols listed below. When reading the chapter,
write down questions related to things unclear for you or things you
think might be unclear for others. See also the additional comments below.
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item marginal distribution/density
\item conditional distribution/density
\item joint distribution/density
\item nuisance parameter
\item mixture
\item normal distribution with a noninformative prior
\item normal distribution with a conjugate prior
\item sample variance
\item sufficient statistics
\item $\mu$,$\sigma^2$,$\bar{y}$,$s^2$
\item a simple normal integral
\item $\Invchi2$
\item factored density
\item $t_{n-1}$
\item degrees of freedom
\item posterior predictive distribution
\item to draw
\item $\NInvchi2$
\item variance matrix $\Sigma$
\item nonconjugate model
\item generalized linear model
\item exchangeable
\item binomial model
\item logistic transformation
\item density at a grid
\end{list}
\subsection*{Conjugate prior for Gaussian distribution}
BDA p. 67 (BDA3) mentions that the conjugate prior for Gaussian
distribution has to have a product form $p(\sigma^2)p(\mu|\sigma^2)$.
The book refers to (3.2) and the following discussion. As additional
hint is useful to think the relation of terms $(n-1)s^2$ and
$n(\bar{y}-\mu)^2$ in 3.2 to equations 3.3 and 3.4.
\subsection*{Trace of square matrix}
Trace of square matrix, $\trace$, $\tr A$, $\trace(A)$, $\tr(A)$, is
the sum of diagonal elements. To derive equation 3.11 the following
property has been used $\tr(ABC) = \tr(CAB) = \tr(BCA)$.
\subsection*{History and naming of distributions}
See \emph{Earliest Known Uses of Some of the Words of Mathematics}
\url{http://jeff560.tripod.com/mathword.html}.
\subsection*{Using Monte Carlo to obtain draws from the posterior of generated quantities}
Chapter 3 discusses closed form posteriors for binomial and normal
models given conjugate priors. These are also used as part of the
assignment. The assignment also requires forming a posterior for
derived quantities, and these posterior don't have closed form (so no
need to try derive them). As we know how to sample from the posterior
of binomial and normal models, we can use these posterior draws to get
draws from the posterior of derived quantity.
For example, given posteriors $p(\theta_1|y_1)$ and $p(\theta_2|y_2)$ we want to find the posterior for the difference $p(\theta_1-\theta_2|y_1,y_2)$.
\begin{enumerate}
\item Sample $\theta_1^s$ from $p(\theta_1|y_1)$ and $\theta_2^s$ from
$p(\theta_2|y_2)$, we can compute posterior draws for the derived
quantity as $\delta^s=\theta_1^s-\theta_2^s$ ($s=1,\ldots,S$).
\item $\delta^s$ are then draws from $p(\delta^s|y_1,y_2)$, and they
can be used to illustrate the posterior $p(\delta^s|y_1,y_2)$ with
histogram, and compute posterior mean, sd, and quantiles.
\end{enumerate}
This is one reason why Monte Carlo approaches are so commonly used.
\subsection*{The number of required Monte Carlo draws}
This will discussed in chapter 10. Meanwhile, e.g., 1000 draws is sufficient.
\subsection*{Bioassay}
Bioassay example is is an example of very common statistical inference
task typical, for example, medicine, pharmacology, health care,
cognitive science, genomics, industrial processes etc.
The example is from Racine et al (1986) (see ref in the end of the
BDA3). Swiss company makes classification of chemicals to different
toxicity categories defined by authorities (like EU). Toxicity
classification is based on lethal dose 50\% (LD50) which tells what
amount of chemical kills 50\% of the subjects. Smaller the LD50 more
lethal the chemical is. The original paper mentions "1983 Swiss poison
Regulation" which defines following categories for chemicals orally
given to rats (mg/ml)\\
\begin{tabular}{c c}
Class & LD50 \\
1 & <5 \\
2 & 5-50 \\
3 & 50-500 \\
4 & 500-2000 \\
5 & 2000-5000
\end{tabular}\\
To reduce the number of rats needed in the experiments, the company
started to use Bayesian methods. The paper mentions that in those days
use of just 20 rats to define the classification was very little.
%
Book gives LD50 in log(g/ml). When the result from demo3\_6 is
transformed to scale mg/ml, we see that the mean LD50 is about 900 and
$p(500<\text{LD50}<2000)\approx 0.99$. Thus, the tested chemical can be
classified as category 4 toxic.
Note that the chemical testing is moving away from using rats and
other animals to using, for example, human cells grown in chips,
tissue models and human blood cells. The human-cell based approaches
are also more accurate to predict the effect for humans.
$\logit$ transformation can be justified information theoretically
when binomial likelihood is used.
Example codes in demo3\_6 can be helpful in exercises related to
Bioassay example.
\subsection*{Bayesian vs. frequentist statements in two group comparisons}
When asking to compare groups, some students get confused as the
frequentist testing is quite different. The frequentist testing is
often focusing on a) differently named tests for different models and
b) null hypothesis testing. In Bayesian inference a) the same Bayes
rule and investigation of posterior is used for all models, b) null
hypothesis testing is less common. We come later to decision making
given posterior and utility/ cost function (Lecture 10.1) and more
about null hypothesis testing (Lecture 12.1). Now it is assumed you
will report the posterior (e.g. histogram), possible summaries, and
report what you can infer from that. Specifically as in this
assignment the group comparisons are based on continuous model
parameter, the probability of 0 difference is 0 (later lecture 12.1
covers null hypothesis testing). Instead of forcing dichotomous answer
(yes/no) about whether there is difference, report the whole posterior
that tells also how big that difference might be. What big means
depends on the application, which brings us back to the fact of
importance of domain expertise. You are not experts on the application
examples used in the assignment, but you can think how would you
report what you have learned to a domain expert.
Frank Harrell's recommendations how to state results in two group
comparisons are excellent
\url{http://www.fharrell.com/2017/10/bayesian-vs-frequentist-statements.html}.
\end{document}
%%% Local Variables:
%%% TeX-PDF-mode: t
%%% TeX-master: t
%%% End: