Skip to content

Commit

Permalink
Update truncated and MvNormal (#325)
Browse files Browse the repository at this point in the history
* Update `truncated` and `MvNormal`

* Apply suggestions from code review

* Update tutorials/10-bayesian-differential-equations/10_bayesian-differential-equations.jmd

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update Manifest.toml

* Remove `Truncated`

* Remove use of `describe`

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
  • Loading branch information
devmotion and github-actions[bot] authored Oct 10, 2022
1 parent 3cd94dd commit ca91841
Show file tree
Hide file tree
Showing 17 changed files with 393 additions and 379 deletions.
2 changes: 0 additions & 2 deletions tutorials/02-logistic-regression/02_logistic-regression.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,6 @@ n, _ = size(train)
# Sample using HMC.
m = logistic_regression(train, train_label, n, 1)
chain = sample(m, HMC(0.05, 10), MCMCThreads(), 1_500, 3)

describe(chain)
```

```julia; echo=false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,13 @@ We will begin with importing the relevant libraries.

```julia
# Import libraries.
using Turing, Flux, Plots, Random, ReverseDiff
using Turing
using Flux
using Plots
using ReverseDiff

using LinearAlgebra
using Random

# Hide sampling progress.
Turing.setprogress!(false);
Expand Down Expand Up @@ -87,14 +93,9 @@ length(parameters_initial) # number of paraemters in NN
The probabilistic model specification below creates a `parameters` variable, which has IID normal variables. The `parameters` vector represents all parameters of our neural net (weights and biases).

```julia
# Create a regularization term and a Gaussian prior variance term.
alpha = 0.09
sig = sqrt(1.0 / alpha)

# Specify the probabilistic model.
@model function bayes_nn(xs, ts, nparameters, reconstruct)
@model function bayes_nn(xs, ts, nparameters, reconstruct; alpha=0.09)
# Create the weight and bias vector.
parameters ~ MvNormal(zeros(nparameters), sig .* ones(nparameters))
parameters ~ MvNormal(Zeros(nparameters), I / alpha)

# Construct NN from parameters
nn = reconstruct(parameters)
Expand Down
3 changes: 3 additions & 0 deletions tutorials/03-bayesian-neural-network/Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
[deps]
AdvancedVI = "b5ca4192-6429-45e5-a2d9-87aec30a685c"
Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
Expand All @@ -10,6 +12,7 @@ Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
[compat]
AdvancedVI = "0.1"
Bijectors = "0.9"
FillArrays = "0.11"
Flux = "0.12"
Plots = "1"
ReverseDiff = "1"
Expand Down
4 changes: 3 additions & 1 deletion tutorials/04-hidden-markov-model/04_hidden-markov-model.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,9 @@ g = Gibbs(HMC(0.01, 50, :m, :T), PG(120, :s))
chn = sample(BayesHmm(y, 3), g, 1000);
```

Let's see how well our chain performed. Ordinarily, using the `describe` function from [MCMCChain](https://github.com/TuringLang/MCMCChain.jl) would be a good first step, but we have generated a lot of parameters here (`s[1]`, `s[2]`, `m[1]`, and so on). It's a bit easier to show how our model performed graphically.
Let's see how well our chain performed.
Ordinarily, using `display(chn)` would be a good first step, but we have generated a lot of parameters here (`s[1]`, `s[2]`, `m[1]`, and so on).
It's a bit easier to show how our model performed graphically.

The code below generates an animation showing the graph of the data above, and the data our model generates in each sample.

Expand Down
30 changes: 19 additions & 11 deletions tutorials/05-linear-regression/05_linear-regression.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ using MLDataUtils: shuffleobs, splitobs, rescale!
# Functionality for evaluating the model predictions.
using Distances

# Functionality for constructing arrays with identical elements efficiently.
using FillArrays

# Functionality for working with scaled identity matrices.
using LinearAlgebra

# Set a seed for reproducibility.
using Random
Random.seed!(0)
Expand Down Expand Up @@ -93,45 +99,47 @@ $$

where $\alpha$ is an intercept term common to all observations, $\boldsymbol{\beta}$ is a coefficient vector, $\boldsymbol{X_i}$ is the observed data for car $i$, and $\sigma^2$ is a common variance term.

For $\sigma^2$, we assign a prior of `truncated(Normal(0, 100), 0, Inf)`. This is consistent with [Andrew Gelman's recommendations](http://www.stat.columbia.edu/%7Egelman/research/published/taumain.pdf) on noninformative priors for variance. The intercept term ($\alpha$) is assumed to be normally distributed with a mean of zero and a variance of three. This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that. Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10. We do not know that our coefficients are different from zero, and we don't know which ones are likely to be the most important, so the variance term is quite high. Lastly, each observation $y_i$ is distributed according to the calculated `mu` term given by $\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}$.
For $\sigma^2$, we assign a prior of `truncated(Normal(0, 100); lower=0)`.
This is consistent with [Andrew Gelman's recommendations](http://www.stat.columbia.edu/%7Egelman/research/published/taumain.pdf) on noninformative priors for variance.
The intercept term ($\alpha$) is assumed to be normally distributed with a mean of zero and a variance of three.
This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that.
Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10.
We do not know that our coefficients are different from zero, and we don't know which ones are likely to be the most important, so the variance term is quite high.
Lastly, each observation $y_i$ is distributed according to the calculated `mu` term given by $\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}$.

```julia
# Bayesian linear regression.
@model function linear_regression(x, y)
# Set variance prior.
σ ~ truncated(Normal(0, 100), 0, Inf)
σ² ~ truncated(Normal(0, 100); lower=0)

# Set intercept prior.
intercept ~ Normal(0, sqrt(3))

# Set the priors on our coefficients.
nfeatures = size(x, 2)
coefficients ~ MvNormal(nfeatures, sqrt(10))
coefficients ~ MvNormal(Zeros(nfeatures), 10.0 * I)

# Calculate all the mu terms.
mu = intercept .+ x * coefficients
return y ~ MvNormal(mu, sqrt(σ₂))
return y ~ MvNormal(mu, σ² * I)
end
```

With our model specified, we can call the sampler. We will use the No U-Turn Sampler ([NUTS](http://turing.ml/docs/library/#-turingnuts--type)) here.

```julia
model = linear_regression(train, train_target)
chain = sample(model, NUTS(0.65), 3_000);
chain = sample(model, NUTS(0.65), 3_000)
```

As a visual check to confirm that our coefficients have converged, we show the densities and trace plots for our parameters using the `plot` functionality.
We can also check the densities and traces of the parameters visually using the `plot` functionality.

```julia
plot(chain)
```

It looks like each of our parameters has converged. We can check our numerical esimates using `describe(chain)`, as below.

```julia
describe(chain)
```
It looks like all parameters have converged.

## Comparing to OLS

Expand Down
5 changes: 4 additions & 1 deletion tutorials/05-linear-regression/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
Expand All @@ -17,8 +19,9 @@ Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
[compat]
DataFrames = "1"
Distances = "0.10"
Distributions = "0.25"
Distributions = "0.25.42"
FileIO = "1"
FillArrays = "0.12"
GLM = "1"
MCMCChains = "5"
MLDataUtils = "0.5"
Expand Down
17 changes: 6 additions & 11 deletions tutorials/07-poisson-regression/07_poisson-regression.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -202,19 +202,14 @@ The exponentiated mean of the coefficient `b1` is roughly half of that of `b2`.

# Removing the Warmup Samples

As can be seen from the plots above, the parameters converge to their final distributions after a few iterations. These initial values during the warmup phase increase the standard deviations of the parameters and are not required after we get the desired distributions. Thus, we remove these warmup values and once again view the diagnostics.

To remove these warmup values, we take all values except the first 200. This is because we set the second parameter of the NUTS sampler (which is the number of adaptations) to be equal to 200. `describe(chain)` is used to view the standard deviations in the estimates of the parameters. It also gives other useful information such as the means and the quantiles.

```julia
# Note the standard deviation before removing the warmup samples
describe(chain)
```
As can be seen from the plots above, the parameters converge to their final distributions after a few iterations.
The initial values during the warmup phase increase the standard deviations of the parameters and are not required after we get the desired distributions.
Thus, we remove these warmup values and once again view the diagnostics.
To remove these warmup values, we take all values except the first 200.
This is because we set the second parameter of the NUTS sampler (which is the number of adaptations) to be equal to 200.

```julia
# Removing the first 200 values of the chains.
chains_new = chain[201:2500, :, :]
describe(chains_new)
chains_new = chain[201:end, :, :]
```

```julia
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ using MLDataUtils: shuffleobs, splitobs, rescale!
# We need a softmax function which is provided by NNlib.
using NNlib: softmax

# Functionality for constructing arrays with identical elements efficiently.
using FillArrays

# Functionality for working with scaled identity matrices.
using LinearAlgebra

# Set a seed for reproducibility.
using Random
Random.seed!(0)
Expand Down Expand Up @@ -100,8 +106,8 @@ We select the `setosa` species as the baseline class (the choice does not matter
# Priors of intercepts and coefficients.
intercept_versicolor ~ Normal(0, σ)
intercept_virginica ~ Normal(0, σ)
coefficients_versicolor ~ MvNormal(4, σ)
coefficients_virginica ~ MvNormal(4, σ)
coefficients_versicolor ~ MvNormal(Zeros(4), σ^2 * I)
coefficients_virginica ~ MvNormal(Zeros(4), σ^2 * I)

# Compute the likelihood of the observations.
values_versicolor = intercept_versicolor .+ x * coefficients_versicolor
Expand Down
3 changes: 3 additions & 0 deletions tutorials/08-multinomial-logistic-regression/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
[deps]
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
Expand All @@ -7,6 +9,7 @@ StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"

[compat]
FillArrays = "0.13"
MLDataUtils = "0.5"
NNlib = "0.7, 0.8"
RDatasets = "0.7"
Expand Down
14 changes: 6 additions & 8 deletions tutorials/09-variational-inference/09_variational-inference.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -267,9 +267,11 @@ Random.seed!(1);
```

```julia
# Import RDatasets.
using FillArrays
using RDatasets

using LinearAlgebra

# Hide the progress prompt while sampling.
Turing.setprogress!(false);
```
Expand Down Expand Up @@ -333,17 +335,17 @@ test = Matrix(test_cut[:, remove_names]);
# Bayesian linear regression.
@model function linear_regression(x, y, n_obs, n_vars, ::Type{T}=Vector{Float64}) where {T}
# Set variance prior.
σ ~ truncated(Normal(0, 100), 0, Inf)
σ² ~ truncated(Normal(0, 100), 0, Inf)

# Set intercept prior.
intercept ~ Normal(0, 3)

# Set the priors on our coefficients.
coefficients ~ MvNormal(zeros(n_vars), 10 * ones(n_vars))
coefficients ~ MvNormal(Zeros(n_vars), 10.0 * I)

# Calculate all the mu terms.
mu = intercept .+ x * coefficients
return y ~ MvNormal(mu, σ)
return y ~ MvNormal(mu, σ² * I)
end;
```

Expand Down Expand Up @@ -682,10 +684,6 @@ But using this interface it becomes trivial to go beyond the mean-field assumpti

Here we'll instead consider the variational family to be a full non-diagonal multivariate Gaussian. As in the previous section we'll implement this by transforming a standard multivariate Gaussian using `Scale` and `Shift`, but now `Scale` will instead be using a lower-triangular matrix (representing the Cholesky of the covariance matrix of a multivariate normal) in contrast to the diagonal matrix we used in for the mean-field approximate posterior.

```julia
using LinearAlgebra
```

```julia
# Using `ComponentArrays.jl` together with `UnPack.jl` makes our lives much easier.
using ComponentArrays, UnPack
Expand Down
3 changes: 3 additions & 0 deletions tutorials/09-variational-inference/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@ Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
ComponentArrays = "b0b7db55-cfe3-40fc-9ded-d10e2dbeff66"
ConjugatePriors = "1624bea9-42b1-5fc1-afd3-e96f729c8d6c"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
PyPlot = "d330b81b-6aea-500a-939a-2ce795aea3ee"
RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
Expand All @@ -18,6 +20,7 @@ Bijectors = "0.9"
ComponentArrays = "0.11"
ConjugatePriors = "0.4"
FileIO = "1.5"
FillArrays = "0.9"
GLM = "1.3"
LaTeXStrings = "1.2"
Plots = "1.6"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,10 +89,10 @@ Therefore, we write the Lotka-Volterra parameter estimation problem using the Tu
@model function fitlv(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.5, 0.5), 0.5, 2.5)
β ~ truncated(Normal(1.2, 0.5), 0, 2)
γ ~ truncated(Normal(3.0, 0.5), 1, 4)
δ ~ truncated(Normal(1.0, 0.5), 0, 2)
α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)

# Simulate Lotka-Volterra model.
p = [α, β, γ, δ]
Expand Down Expand Up @@ -151,10 +151,10 @@ I.e., we fit the model only to the $y$ variable of the system without providing
@model function fitlv2(data::AbstractVector, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.5, 0.5), 0.5, 2.5)
β ~ truncated(Normal(1.2, 0.5), 0, 2)
γ ~ truncated(Normal(3.0, 0.5), 1, 4)
δ ~ truncated(Normal(1.0, 0.5), 0, 2)
α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)

# Simulate Lotka-Volterra model but save only the second state of the system (predators).
p = [α, β, γ, δ]
Expand Down Expand Up @@ -251,10 +251,10 @@ Now we define the Turing model for the Lotka-Volterra model with delay and sampl
@model function fitlv_dde(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ Truncated(Normal(1.5, 0.5), 0.5, 2.5)
β ~ Truncated(Normal(1.2, 0.5), 0, 2)
γ ~ Truncated(Normal(3.0, 0.5), 1, 4)
δ ~ Truncated(Normal(1.0, 0.5), 0, 2)
α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)

# Simulate Lotka-Volterra model.
p = [α, β, γ, δ]
Expand Down Expand Up @@ -332,10 +332,10 @@ Here we will not choose a `sensealg` and let it use the default choice:
@model function fitlv_sensealg(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.5, 0.5), 0.5, 2.5)
β ~ truncated(Normal(1.2, 0.5), 0, 2)
γ ~ truncated(Normal(3.0, 0.5), 1, 4)
δ ~ truncated(Normal(1.0, 0.5), 0, 2)
α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)

# Simulate Lotka-Volterra model and use a specific algorithm for computing sensitivities.
p = [α, β, γ, δ]
Expand Down Expand Up @@ -394,12 +394,12 @@ plot(EnsembleSummary(data))
@model function fitlv_sde(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.3, 0.5), 0.5, 2.5)
β ~ truncated(Normal(1.2, 0.25), 0.5, 2)
γ ~ truncated(Normal(3.2, 0.25), 2.2, 4.0)
δ ~ truncated(Normal(1.2, 0.25), 0.5, 2.0)
ϕ1 ~ truncated(Normal(0.12, 0.3), 0.05, 0.25)
ϕ2 ~ truncated(Normal(0.12, 0.3), 0.05, 0.25)
α ~ truncated(Normal(1.3, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
γ ~ truncated(Normal(3.2, 0.25); lower=2.2, upper=4)
δ ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
ϕ1 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
ϕ2 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)

# Simulate stochastic Lotka-Volterra model.
p = [α, β, γ, δ, ϕ1, ϕ2]
Expand Down
Loading

0 comments on commit ca91841

Please sign in to comment.