Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update truncated and MvNormal #325

Merged
merged 6 commits into from
Oct 10, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions tutorials/02-logistic-regression/02_logistic-regression.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,6 @@ n, _ = size(train)
# Sample using HMC.
m = logistic_regression(train, train_label, n, 1)
chain = sample(m, HMC(0.05, 10), MCMCThreads(), 1_500, 3)

describe(chain)
```

```julia; echo=false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,13 @@ We will begin with importing the relevant libraries.

```julia
# Import libraries.
using Turing, Flux, Plots, Random, ReverseDiff
using Turing
using Flux
using Plots
using ReverseDiff

using LinearAlgebra
using Random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we are switching styles for imports?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no consistent style currently (and also not when this PR is merged), I mainly felt that the diff is simpler when adding or removing packages if they are on separate lines.


# Hide sampling progress.
Turing.setprogress!(false);
Expand Down Expand Up @@ -87,14 +93,9 @@ length(parameters_initial) # number of paraemters in NN
The probabilistic model specification below creates a `parameters` variable, which has IID normal variables. The `parameters` vector represents all parameters of our neural net (weights and biases).

```julia
# Create a regularization term and a Gaussian prior variance term.
alpha = 0.09
sig = sqrt(1.0 / alpha)

# Specify the probabilistic model.
@model function bayes_nn(xs, ts, nparameters, reconstruct)
@model function bayes_nn(xs, ts, nparameters, reconstruct; alpha=0.09)
# Create the weight and bias vector.
parameters ~ MvNormal(zeros(nparameters), sig .* ones(nparameters))
parameters ~ MvNormal(Zeros(nparameters), I / alpha)

# Construct NN from parameters
nn = reconstruct(parameters)
Expand Down
3 changes: 3 additions & 0 deletions tutorials/03-bayesian-neural-network/Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
[deps]
AdvancedVI = "b5ca4192-6429-45e5-a2d9-87aec30a685c"
Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
Expand All @@ -10,6 +12,7 @@ Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
[compat]
AdvancedVI = "0.1"
Bijectors = "0.9"
FillArrays = "0.11"
Flux = "0.12"
Plots = "1"
ReverseDiff = "1"
Expand Down
4 changes: 3 additions & 1 deletion tutorials/04-hidden-markov-model/04_hidden-markov-model.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,9 @@ g = Gibbs(HMC(0.01, 50, :m, :T), PG(120, :s))
chn = sample(BayesHmm(y, 3), g, 1000);
```

Let's see how well our chain performed. Ordinarily, using the `describe` function from [MCMCChain](https://github.com/TuringLang/MCMCChain.jl) would be a good first step, but we have generated a lot of parameters here (`s[1]`, `s[2]`, `m[1]`, and so on). It's a bit easier to show how our model performed graphically.
Let's see how well our chain performed.
Ordinarily, using `display(chn)` would be a good first step, but we have generated a lot of parameters here (`s[1]`, `s[2]`, `m[1]`, and so on).
It's a bit easier to show how our model performed graphically.

The code below generates an animation showing the graph of the data above, and the data our model generates in each sample.

Expand Down
30 changes: 19 additions & 11 deletions tutorials/05-linear-regression/05_linear-regression.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ using MLDataUtils: shuffleobs, splitobs, rescale!
# Functionality for evaluating the model predictions.
using Distances

# Functionality for constructing arrays with identical elements efficiently.
using FillArrays

# Functionality for working with scaled identity matrices.
using LinearAlgebra

# Set a seed for reproducibility.
using Random
Random.seed!(0)
Expand Down Expand Up @@ -93,45 +99,47 @@ $$

where $\alpha$ is an intercept term common to all observations, $\boldsymbol{\beta}$ is a coefficient vector, $\boldsymbol{X_i}$ is the observed data for car $i$, and $\sigma^2$ is a common variance term.

For $\sigma^2$, we assign a prior of `truncated(Normal(0, 100), 0, Inf)`. This is consistent with [Andrew Gelman's recommendations](http://www.stat.columbia.edu/%7Egelman/research/published/taumain.pdf) on noninformative priors for variance. The intercept term ($\alpha$) is assumed to be normally distributed with a mean of zero and a variance of three. This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that. Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10. We do not know that our coefficients are different from zero, and we don't know which ones are likely to be the most important, so the variance term is quite high. Lastly, each observation $y_i$ is distributed according to the calculated `mu` term given by $\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}$.
For $\sigma^2$, we assign a prior of `truncated(Normal(0, 100); lower=0)`.
This is consistent with [Andrew Gelman's recommendations](http://www.stat.columbia.edu/%7Egelman/research/published/taumain.pdf) on noninformative priors for variance.
The intercept term ($\alpha$) is assumed to be normally distributed with a mean of zero and a variance of three.
This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that.
Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10.
We do not know that our coefficients are different from zero, and we don't know which ones are likely to be the most important, so the variance term is quite high.
Lastly, each observation $y_i$ is distributed according to the calculated `mu` term given by $\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}$.

```julia
# Bayesian linear regression.
@model function linear_regression(x, y)
# Set variance prior.
σ ~ truncated(Normal(0, 100), 0, Inf)
σ² ~ truncated(Normal(0, 100); lower=0)

# Set intercept prior.
intercept ~ Normal(0, sqrt(3))

# Set the priors on our coefficients.
nfeatures = size(x, 2)
coefficients ~ MvNormal(nfeatures, sqrt(10))
coefficients ~ MvNormal(Zeros(nfeatures), 10.0 * I)

# Calculate all the mu terms.
mu = intercept .+ x * coefficients
return y ~ MvNormal(mu, sqrt(σ₂))
return y ~ MvNormal(mu, σ² * I)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nicer than the old syntax.

end
```

With our model specified, we can call the sampler. We will use the No U-Turn Sampler ([NUTS](http://turing.ml/docs/library/#-turingnuts--type)) here.

```julia
model = linear_regression(train, train_target)
chain = sample(model, NUTS(0.65), 3_000);
chain = sample(model, NUTS(0.65), 3_000)
```

As a visual check to confirm that our coefficients have converged, we show the densities and trace plots for our parameters using the `plot` functionality.
We can also check the densities and traces of the parameters visually using the `plot` functionality.

```julia
plot(chain)
```

It looks like each of our parameters has converged. We can check our numerical esimates using `describe(chain)`, as below.

```julia
describe(chain)
```
It looks like all parameters have converged.

## Comparing to OLS

Expand Down
5 changes: 4 additions & 1 deletion tutorials/05-linear-regression/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@ DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
Expand All @@ -17,8 +19,9 @@ Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
[compat]
DataFrames = "1"
Distances = "0.10"
Distributions = "0.25"
Distributions = "0.25.42"
FileIO = "1"
FillArrays = "0.12"
GLM = "1"
MCMCChains = "5"
MLDataUtils = "0.5"
Expand Down
17 changes: 6 additions & 11 deletions tutorials/07-poisson-regression/07_poisson-regression.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -202,19 +202,14 @@ The exponentiated mean of the coefficient `b1` is roughly half of that of `b2`.

# Removing the Warmup Samples

As can be seen from the plots above, the parameters converge to their final distributions after a few iterations. These initial values during the warmup phase increase the standard deviations of the parameters and are not required after we get the desired distributions. Thus, we remove these warmup values and once again view the diagnostics.

To remove these warmup values, we take all values except the first 200. This is because we set the second parameter of the NUTS sampler (which is the number of adaptations) to be equal to 200. `describe(chain)` is used to view the standard deviations in the estimates of the parameters. It also gives other useful information such as the means and the quantiles.

```julia
# Note the standard deviation before removing the warmup samples
describe(chain)
```
As can be seen from the plots above, the parameters converge to their final distributions after a few iterations.
The initial values during the warmup phase increase the standard deviations of the parameters and are not required after we get the desired distributions.
Thus, we remove these warmup values and once again view the diagnostics.
To remove these warmup values, we take all values except the first 200.
This is because we set the second parameter of the NUTS sampler (which is the number of adaptations) to be equal to 200.

```julia
# Removing the first 200 values of the chains.
chains_new = chain[201:2500, :, :]
describe(chains_new)
chains_new = chain[201:end, :, :]
```

```julia
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,12 @@ using MLDataUtils: shuffleobs, splitobs, rescale!
# We need a softmax function which is provided by NNlib.
using NNlib: softmax

# Functionality for constructing arrays with identical elements efficiently.
using FillArrays

# Functionality for working with scaled identity matrices.
using LinearAlgebra

# Set a seed for reproducibility.
using Random
Random.seed!(0)
Expand Down Expand Up @@ -100,8 +106,8 @@ We select the `setosa` species as the baseline class (the choice does not matter
# Priors of intercepts and coefficients.
intercept_versicolor ~ Normal(0, σ)
intercept_virginica ~ Normal(0, σ)
coefficients_versicolor ~ MvNormal(4, σ)
coefficients_virginica ~ MvNormal(4, σ)
coefficients_versicolor ~ MvNormal(Zeros(4), σ^2 * I)
coefficients_virginica ~ MvNormal(Zeros(4), σ^2 * I)

# Compute the likelihood of the observations.
values_versicolor = intercept_versicolor .+ x * coefficients_versicolor
Expand Down
3 changes: 3 additions & 0 deletions tutorials/08-multinomial-logistic-regression/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
[deps]
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d"
NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
Expand All @@ -7,6 +9,7 @@ StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"

[compat]
FillArrays = "0.13"
MLDataUtils = "0.5"
NNlib = "0.7, 0.8"
RDatasets = "0.7"
Expand Down
14 changes: 6 additions & 8 deletions tutorials/09-variational-inference/09_variational-inference.jmd
Original file line number Diff line number Diff line change
Expand Up @@ -267,9 +267,11 @@ Random.seed!(1);
```

```julia
# Import RDatasets.
using FillArrays
using RDatasets

using LinearAlgebra

# Hide the progress prompt while sampling.
Turing.setprogress!(false);
```
Expand Down Expand Up @@ -333,17 +335,17 @@ test = Matrix(test_cut[:, remove_names]);
# Bayesian linear regression.
@model function linear_regression(x, y, n_obs, n_vars, ::Type{T}=Vector{Float64}) where {T}
# Set variance prior.
σ ~ truncated(Normal(0, 100), 0, Inf)
σ² ~ truncated(Normal(0, 100), 0, Inf)

# Set intercept prior.
intercept ~ Normal(0, 3)

# Set the priors on our coefficients.
coefficients ~ MvNormal(zeros(n_vars), 10 * ones(n_vars))
coefficients ~ MvNormal(Zeros(n_vars), 10.0 * I)

# Calculate all the mu terms.
mu = intercept .+ x * coefficients
return y ~ MvNormal(mu, σ)
return y ~ MvNormal(mu, σ² * I)
end;
```

Expand Down Expand Up @@ -682,10 +684,6 @@ But using this interface it becomes trivial to go beyond the mean-field assumpti

Here we'll instead consider the variational family to be a full non-diagonal multivariate Gaussian. As in the previous section we'll implement this by transforming a standard multivariate Gaussian using `Scale` and `Shift`, but now `Scale` will instead be using a lower-triangular matrix (representing the Cholesky of the covariance matrix of a multivariate normal) in contrast to the diagonal matrix we used in for the mean-field approximate posterior.

```julia
using LinearAlgebra
```

```julia
# Using `ComponentArrays.jl` together with `UnPack.jl` makes our lives much easier.
using ComponentArrays, UnPack
Expand Down
3 changes: 3 additions & 0 deletions tutorials/09-variational-inference/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@ Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
ComponentArrays = "b0b7db55-cfe3-40fc-9ded-d10e2dbeff66"
ConjugatePriors = "1624bea9-42b1-5fc1-afd3-e96f729c8d6c"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
PyPlot = "d330b81b-6aea-500a-939a-2ce795aea3ee"
RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
Expand All @@ -18,6 +20,7 @@ Bijectors = "0.9"
ComponentArrays = "0.11"
ConjugatePriors = "0.4"
FileIO = "1.5"
FillArrays = "0.9"
GLM = "1.3"
LaTeXStrings = "1.2"
Plots = "1.6"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,10 +89,10 @@ Therefore, we write the Lotka-Volterra parameter estimation problem using the Tu
@model function fitlv(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.5, 0.5), 0.5, 2.5)
β ~ truncated(Normal(1.2, 0.5), 0, 2)
γ ~ truncated(Normal(3.0, 0.5), 1, 4)
δ ~ truncated(Normal(1.0, 0.5), 0, 2)
α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)

# Simulate Lotka-Volterra model.
p = [α, β, γ, δ]
Expand Down Expand Up @@ -151,10 +151,10 @@ I.e., we fit the model only to the $y$ variable of the system without providing
@model function fitlv2(data::AbstractVector, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.5, 0.5), 0.5, 2.5)
β ~ truncated(Normal(1.2, 0.5), 0, 2)
γ ~ truncated(Normal(3.0, 0.5), 1, 4)
δ ~ truncated(Normal(1.0, 0.5), 0, 2)
α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)

# Simulate Lotka-Volterra model but save only the second state of the system (predators).
p = [α, β, γ, δ]
Expand Down Expand Up @@ -251,10 +251,10 @@ Now we define the Turing model for the Lotka-Volterra model with delay and sampl
@model function fitlv_dde(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ Truncated(Normal(1.5, 0.5), 0.5, 2.5)
β ~ Truncated(Normal(1.2, 0.5), 0, 2)
γ ~ Truncated(Normal(3.0, 0.5), 1, 4)
δ ~ Truncated(Normal(1.0, 0.5), 0, 2)
α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)

# Simulate Lotka-Volterra model.
p = [α, β, γ, δ]
Expand Down Expand Up @@ -332,10 +332,10 @@ Here we will not choose a `sensealg` and let it use the default choice:
@model function fitlv_sensealg(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.5, 0.5), 0.5, 2.5)
β ~ truncated(Normal(1.2, 0.5), 0, 2)
γ ~ truncated(Normal(3.0, 0.5), 1, 4)
δ ~ truncated(Normal(1.0, 0.5), 0, 2)
α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)

# Simulate Lotka-Volterra model and use a specific algorithm for computing sensitivities.
p = [α, β, γ, δ]
Expand Down Expand Up @@ -394,12 +394,12 @@ plot(EnsembleSummary(data))
@model function fitlv_sde(data, prob)
# Prior distributions.
σ ~ InverseGamma(2, 3)
α ~ truncated(Normal(1.3, 0.5), 0.5, 2.5)
β ~ truncated(Normal(1.2, 0.25), 0.5, 2)
γ ~ truncated(Normal(3.2, 0.25), 2.2, 4.0)
δ ~ truncated(Normal(1.2, 0.25), 0.5, 2.0)
ϕ1 ~ truncated(Normal(0.12, 0.3), 0.05, 0.25)
ϕ2 ~ truncated(Normal(0.12, 0.3), 0.05, 0.25)
α ~ truncated(Normal(1.3, 0.5); lower=0.5, upper=2.5)
β ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
γ ~ truncated(Normal(3.2, 0.25); lower=2.2, upper=4)
δ ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
ϕ1 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
ϕ2 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)

# Simulate stochastic Lotka-Volterra model.
p = [α, β, γ, δ, ϕ1, ϕ2]
Expand Down
Loading