-
Notifications
You must be signed in to change notification settings - Fork 2
/
2_Continuous_Bayes.Rmd
203 lines (146 loc) · 7.26 KB
/
2_Continuous_Bayes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
title: "2. Bayes's Theorem for continuous distributions"
output: html_notebook
---
## Preliminaries
Load necessary libraries.
```{r}
# Load packages
library(tidyverse)
library(triangle) # for the triangular distribution
```
## Bayes with continuous parameters
Bayes's Theorem works a little differently for continuous random variables than it does for discrete random variables. Each parameter value from a discrete distribution can have nonzero probability. However, for continuous random variables, there are uncountably many possible values, so the probability of any single parameter value must be zero. Therefore, we have to switch to using probability density functions instead of probability mass functions.
To illustrate how this works, we define a function called `bayes_plot_continuous` that we'll use a little later.
```{r}
# bayes_plot_continuous takes a prior function
# and a likelihood function on a range of values,
# calculates the posterior, and then plots all three functions.
bayes_plot_continuous <- function(prior, likelihood, from, to) {
# Calculate the area under the likelihood function (by integration).
likelihood_area <- integrate(likelihood, lower = from, upper = to)
# Scale the likelihood function by its area.
likelihood_scaled <- function(theta) {
likelihood(theta)/likelihood_area$value
}
# Calculate the numerator of the posterior function.
posterior_numer <- function(theta) {
prior(theta) * likelihood(theta)
}
# Calculate the denominator of the posterior function (by integration).
posterior_denom <- integrate(posterior_numer, from, to)
# The posterior is just the ratio.
posterior <- function(theta) {
posterior_numer(theta)/posterior_denom$value
}
# Plot the posterior function.
posterior_plot <- ggplot(NULL, aes(x = x, color = col,
linetype = col)) +
stat_function(data = tibble(x = c(from, to),
col = factor(1)),
fun = prior) +
stat_function(data = tibble(x = c(from, to),
col = factor(2)),
fun = posterior) +
stat_function(data = tibble(x = c(from, to),
col = factor(3)),
fun = likelihood_scaled) +
theme_bw() +
theme(panel.grid = element_blank()) +
labs(title = "Prior, Posterior, and Scaled Likelihood",
x = expression(theta),
y = NULL) +
scale_colour_manual(name = "Function",
values = c("blue", "black", "red"),
labels = c("Prior",
"Posterior",
"Scaled Likelihood")) +
scale_linetype_manual(name = "Function",
values = c("dotted", "solid", "dashed"),
labels = c("Prior",
"Posterior",
"Scaled Likelihood"))
posterior_plot
}
```
## Binomial likelihood
Suppose we have binomial data. In 18 trials, we observe 12 successes. The likelihood function is expressed as follows:
$$
p(x = 12 \mid \theta) \propto \theta^{12} (1 - \theta)^{6}.
$$
The parameter $\theta$ can take on any real number value between zero and one. Therefore, any probability distribution or likelihood function involving $\theta$ must be expressed as a continuous function (as opposed to, say, a table of values like for a discrete random variable).
The code below defines the binomial likelihood function for 12 successes in a sample of size 18.
```{r}
likelihood1 <- function(theta) { theta ^ 12 * (1 - theta) ^ 6 }
```
Look at a plot of this likelihood function:
```{r}
likelihood1_plot <- ggplot(tibble(x = c(0, 1)), aes(x = x)) +
stat_function(fun = likelihood1) +
theme_bw() +
theme(panel.grid = element_blank()) +
labs(title = "Binomial Likelihood",
x = expression(theta),
y = NULL)
likelihood1_plot
```
Keep in mind that this is not a probability density function; the area under this curve is nowhere close to one:
```{r}
integrate(likelihood1, lower = 0, upper = 1)
```
What this shows is for each possible value of $\theta$, what is the probability of seeing 12 successes out of 18 trials given that value of $\theta$. If a sample of 18 is taken from a population whose success rate $\theta$ is somewhere in the 0.6--0.75 range, there is a pretty good chance of seeing 12 successes. Not so much for values outside that range.
## Choosing a uniform prior
Assume a uniform prior:
$$
p(\theta) = 1.
$$
Then the posterior is
$$
p(\theta \mid x) \propto p(\theta) p(x \mid \theta) = p(x \mid \theta).
$$
In this case, the posterior has exactly the same shape as the likelihood. Again, the likelihood function is not a probability density function---its area is not one. But we can scale the likelihood function by dividing it by its area, and now it will have area one. One can see in the graph that with a uniform prior, the posterior is just the scaled likelihood function. With no prior information, the data is our best guess for the posterior.
```{r}
prior1 <- function(theta) { 1 }
bayes_plot_continuous(prior1, likelihood1, 0, 1)
```
## Choosing a prior far from the data
Suppose we now choose a prior that is relatively far from the data, say, a normal distribution centered at 0.3 with standard deviation 0.1:
$$
\theta \sim N(0.3, 0.1).
$$
The posterior is a compromise between the prior and the data, so if the prior and data are far apart, the posterior will end up being in between them somewhere.
```{r}
prior2 <- function(theta) { dnorm(theta, mean = 0.3, sd = 0.1) }
bayes_plot_continuous(prior2, likelihood1, 0, 1)
```
## Choosing a prior close to the data
What about a prior that is close to the data? Something like
$$
\theta \sim N(0.7, 0.1).
$$
In this case, the data and the prior reinforce each other.
```{r}
prior3 <- function(theta) { dnorm(theta, mean = 0.7, sd = 0.1) }
bayes_plot_continuous(prior3, likelihood1, 0, 1)
```
## Changing the shape of the prior
Of course, any kind of distribution can be a prior. What about a triangular distribution? Note that even with the peak at 0.5, there is still a lot of prior mass spread from 0 to 1. This triangular prior is quite diffuse, so the data will have more weight in determining the posterior.
```{r}
prior4 <- function(theta) { dtriangle(theta, a = 0, b = 1) }
bayes_plot_continuous(prior4, likelihood1, 0, 1)
```
## Changing the data
Now let's go back through the previous cases, but make the data much weaker. Instead of 12 successes in 18 trials, suppose we only observe 2 successes in 3 trials. Notice that the proportion of successes hasn't changed here, just the sample size. Observe that in each case, the posterior is much closer to the prior. There just isn't enough data for us to revise our prior belief radically.
```{r}
likelihood2 <- function(theta) { theta ^ 2 * (1 - theta) ^ 1 }
bayes_plot_continuous(prior1, likelihood2, 0, 1)
```
```{r}
bayes_plot_continuous(prior2, likelihood2, 0, 1)
```
```{r}
bayes_plot_continuous(prior3, likelihood2, 0, 1)
```
```{r}
bayes_plot_continuous(prior4, likelihood2, 0, 1)
```