2_Continuous_Bayes.Rmd

---
title: "2. Bayes's Theorem for continuous distributions"
output: html_notebook
---


## Preliminaries

Load necessary libraries.

```{r}
# Load packages
library(tidyverse)
library(triangle) # for the triangular distribution
```


## Bayes with continuous parameters

Bayes's Theorem works a little differently for continuous random variables than it does for discrete random variables. Each parameter value from a discrete distribution can have nonzero probability. However, for continuous random variables, there are uncountably many possible values, so the probability of any single parameter value must be zero. Therefore, we have to switch to using probability density functions instead of probability mass functions.

To illustrate how this works, we define a function called `bayes_plot_continuous` that we'll use a little later.
 
```{r}
# bayes_plot_continuous takes a prior function 
# and a likelihood function on a range of values, 
# calculates the posterior, and then plots all three functions.
bayes_plot_continuous <- function(prior, likelihood, from, to) {
    # Calculate the area under the likelihood function (by integration).
    likelihood_area <- integrate(likelihood, lower = from, upper = to)
    
    # Scale the likelihood function by its area.
    likelihood_scaled <- function(theta) {
        likelihood(theta)/likelihood_area$value
    }
    
    # Calculate the numerator of the posterior function.
    posterior_numer <- function(theta) {
        prior(theta) * likelihood(theta)
    }
    
    # Calculate the denominator of the posterior function (by integration).
    posterior_denom <- integrate(posterior_numer, from, to)
    
    # The posterior is just the ratio.
    posterior <- function(theta) {
        posterior_numer(theta)/posterior_denom$value
        }
    
    # Plot the posterior function.
    posterior_plot <- ggplot(NULL, aes(x = x, color = col,
                                       linetype = col)) +
        stat_function(data = tibble(x = c(from, to), 
                                        col = factor(1)),
                      fun = prior) +
        stat_function(data = tibble(x = c(from, to), 
                                        col = factor(2)),
                      fun = posterior) +
        stat_function(data = tibble(x = c(from, to), 
                                        col = factor(3)),
                      fun = likelihood_scaled) +
        theme_bw() +
        theme(panel.grid = element_blank()) +
        labs(title = "Prior, Posterior, and Scaled Likelihood",
             x = expression(theta),
             y = NULL) +
        scale_colour_manual(name = "Function",
                            values = c("blue", "black", "red"),
                            labels = c("Prior",
                                       "Posterior",
                                       "Scaled Likelihood")) +
        scale_linetype_manual(name = "Function",
                              values = c("dotted", "solid", "dashed"),
                              labels = c("Prior",
                                         "Posterior",
                                         "Scaled Likelihood"))
    posterior_plot
}
```


## Binomial likelihood

Suppose we have binomial data. In 18 trials, we observe 12 successes. The likelihood function is expressed as follows:

$$
p(x = 12 \mid \theta) \propto \theta^{12} (1 - \theta)^{6}.
$$

The parameter $\theta$ can take on any real number value between zero and one. Therefore, any probability distribution or likelihood function involving $\theta$ must be expressed as a continuous function (as opposed to, say, a table of values like for a discrete random variable).

The code below defines the binomial likelihood function for 12 successes in a sample of size 18.

```{r}
likelihood1 <- function(theta) { theta ^ 12 * (1 - theta) ^ 6 }
```

Look at a plot of this likelihood function:

```{r}
likelihood1_plot <- ggplot(tibble(x = c(0, 1)), aes(x = x)) +
    stat_function(fun = likelihood1) +
    theme_bw() +
    theme(panel.grid = element_blank()) +
    labs(title = "Binomial Likelihood",
         x = expression(theta),
         y = NULL)
likelihood1_plot
```

Keep in mind that this is not a probability density function; the area under this curve is nowhere close to one:

```{r}
integrate(likelihood1, lower = 0, upper = 1)
```

What this shows is for each possible value of $\theta$, what is the probability of seeing 12 successes out of 18 trials given that value of $\theta$. If a sample of 18 is taken from a population whose success rate $\theta$ is somewhere in the 0.6--0.75 range, there is a pretty good chance of seeing 12 successes. Not so much for values outside that range.


## Choosing a uniform prior

Assume a uniform prior:

$$
p(\theta) = 1.
$$

Then the posterior is

$$
p(\theta \mid x) \propto p(\theta) p(x \mid \theta) = p(x \mid \theta).
$$

In this case, the posterior has exactly the same shape as the likelihood. Again, the likelihood function is not a probability density function---its area is not one. But we can scale the likelihood function by dividing it by its area, and now it will have area one. One can see in the graph that with a uniform prior, the posterior is just the scaled likelihood function. With no prior information, the data is our best guess for the posterior.

```{r}
prior1 <- function(theta) { 1 }
bayes_plot_continuous(prior1, likelihood1, 0, 1)
```


## Choosing a prior far from the data

Suppose we now choose a prior that is relatively far from the data, say, a normal distribution centered at 0.3 with standard deviation 0.1:

$$
\theta \sim N(0.3, 0.1).
$$

The posterior is a compromise between the prior and the data, so if the prior and data are far apart, the posterior will end up being in between them somewhere.

```{r}
prior2 <- function(theta) { dnorm(theta, mean = 0.3, sd = 0.1) }
bayes_plot_continuous(prior2, likelihood1, 0, 1)
```


## Choosing a prior close to the data

What about a prior that is close to the data? Something like

$$
\theta \sim N(0.7, 0.1).
$$

In this case, the data and the prior reinforce each other.

```{r}
prior3 <- function(theta) { dnorm(theta, mean = 0.7, sd = 0.1) }
bayes_plot_continuous(prior3, likelihood1, 0, 1)
```


## Changing the shape of the prior

Of course, any kind of distribution can be a prior. What about a triangular distribution? Note that even with the peak at 0.5, there is still a lot of prior mass spread from 0 to 1. This triangular prior is quite diffuse, so the data will have more weight in determining the posterior.

```{r}
prior4 <- function(theta) { dtriangle(theta, a = 0, b = 1) }
bayes_plot_continuous(prior4, likelihood1, 0, 1)
```


## Changing the data

Now let's go back through the previous cases, but make the data much weaker. Instead of 12 successes in 18 trials, suppose we only observe 2 successes in 3 trials. Notice that the proportion of successes hasn't changed here, just the sample size. Observe that in each case, the posterior is much closer to the prior. There just isn't enough data for us to revise our prior belief radically.

```{r}
likelihood2 <- function(theta) { theta ^ 2 * (1 - theta) ^ 1 } 
bayes_plot_continuous(prior1, likelihood2, 0, 1)
```

```{r}
bayes_plot_continuous(prior2, likelihood2, 0, 1)
```

```{r}
bayes_plot_continuous(prior3, likelihood2, 0, 1)
```

```{r}
bayes_plot_continuous(prior4, likelihood2, 0, 1)
```