Update truncated and MvNormal (#325)

* Update `truncated` and `MvNormal` * Apply suggestions from code review * Update tutorials/10-bayesian-differential-equations/10_bayesian-differential-equations.jmd Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update Manifest.toml * Remove `Truncated` * Remove use of `describe` Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
TuringLang · Oct 10, 2022 · ca91841 · ca91841
1 parent 3cd94dd
commit ca91841
Show file tree

Hide file tree

Showing 17 changed files with 393 additions and 379 deletions.
diff --git a/tutorials/02-logistic-regression/02_logistic-regression.jmd b/tutorials/02-logistic-regression/02_logistic-regression.jmd
@@ -131,8 +131,6 @@ n, _ = size(train)
 # Sample using HMC.
 m = logistic_regression(train, train_label, n, 1)
 chain = sample(m, HMC(0.05, 10), MCMCThreads(), 1_500, 3)
-
-describe(chain)
 ```
 
 ```julia; echo=false

diff --git a/tutorials/03-bayesian-neural-network/03_bayesian-neural-network.jmd b/tutorials/03-bayesian-neural-network/03_bayesian-neural-network.jmd
@@ -12,7 +12,13 @@ We will begin with importing the relevant libraries.
 
 ```julia
 # Import libraries.
-using Turing, Flux, Plots, Random, ReverseDiff
+using Turing
+using Flux
+using Plots
+using ReverseDiff
+
+using LinearAlgebra
+using Random
 
 # Hide sampling progress.
 Turing.setprogress!(false);
@@ -87,14 +93,9 @@ length(parameters_initial) # number of paraemters in NN
 The probabilistic model specification below creates a `parameters` variable, which has IID normal variables. The `parameters` vector represents all parameters of our neural net (weights and biases).
 
 ```julia
-# Create a regularization term and a Gaussian prior variance term.
-alpha = 0.09
-sig = sqrt(1.0 / alpha)
-
-# Specify the probabilistic model.
-@model function bayes_nn(xs, ts, nparameters, reconstruct)
+@model function bayes_nn(xs, ts, nparameters, reconstruct; alpha=0.09)
     # Create the weight and bias vector.
-    parameters ~ MvNormal(zeros(nparameters), sig .* ones(nparameters))
+    parameters ~ MvNormal(Zeros(nparameters), I / alpha)
 
     # Construct NN from parameters
     nn = reconstruct(parameters)

diff --git a/tutorials/03-bayesian-neural-network/Project.toml b/tutorials/03-bayesian-neural-network/Project.toml
@@ -1,7 +1,9 @@
 [deps]
 AdvancedVI = "b5ca4192-6429-45e5-a2d9-87aec30a685c"
 Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
+FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
 Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
+LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
@@ -10,6 +12,7 @@ Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
 [compat]
 AdvancedVI = "0.1"
 Bijectors = "0.9"
+FillArrays = "0.11"
 Flux = "0.12"
 Plots = "1"
 ReverseDiff = "1"

diff --git a/tutorials/04-hidden-markov-model/04_hidden-markov-model.jmd b/tutorials/04-hidden-markov-model/04_hidden-markov-model.jmd
@@ -131,7 +131,9 @@ g = Gibbs(HMC(0.01, 50, :m, :T), PG(120, :s))
 chn = sample(BayesHmm(y, 3), g, 1000);
 ```
 
-Let's see how well our chain performed. Ordinarily, using the `describe` function from [MCMCChain](https://github.com/TuringLang/MCMCChain.jl) would be a good first step, but we have generated a lot of parameters here (`s[1]`, `s[2]`, `m[1]`, and so on). It's a bit easier to show how our model performed graphically.
+Let's see how well our chain performed.
+Ordinarily, using `display(chn)` would be a good first step, but we have generated a lot of parameters here (`s[1]`, `s[2]`, `m[1]`, and so on).
+It's a bit easier to show how our model performed graphically.
 
 The code below generates an animation showing the graph of the data above, and the data our model generates in each sample.
 

diff --git a/tutorials/05-linear-regression/05_linear-regression.jmd b/tutorials/05-linear-regression/05_linear-regression.jmd
@@ -28,6 +28,12 @@ using MLDataUtils: shuffleobs, splitobs, rescale!
 # Functionality for evaluating the model predictions.
 using Distances
 
+# Functionality for constructing arrays with identical elements efficiently.
+using FillArrays
+
+# Functionality for working with scaled identity matrices.
+using LinearAlgebra
+
 # Set a seed for reproducibility.
 using Random
 Random.seed!(0)
@@ -93,45 +99,47 @@ $$
 
 where $\alpha$ is an intercept term common to all observations, $\boldsymbol{\beta}$ is a coefficient vector, $\boldsymbol{X_i}$ is the observed data for car $i$, and $\sigma^2$ is a common variance term.
 
-For $\sigma^2$, we assign a prior of `truncated(Normal(0, 100), 0, Inf)`. This is consistent with [Andrew Gelman's recommendations](http://www.stat.columbia.edu/%7Egelman/research/published/taumain.pdf) on noninformative priors for variance. The intercept term ($\alpha$) is assumed to be normally distributed with a mean of zero and a variance of three. This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that. Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10. We do not know that our coefficients are different from zero, and we don't know which ones are likely to be the most important, so the variance term is quite high. Lastly, each observation $y_i$ is distributed according to the calculated `mu` term given by $\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}$.
+For $\sigma^2$, we assign a prior of `truncated(Normal(0, 100); lower=0)`.
+This is consistent with [Andrew Gelman's recommendations](http://www.stat.columbia.edu/%7Egelman/research/published/taumain.pdf) on noninformative priors for variance.
+The intercept term ($\alpha$) is assumed to be normally distributed with a mean of zero and a variance of three.
+This represents our assumptions that miles per gallon can be explained mostly by our assorted variables, but a high variance term indicates our uncertainty about that.
+Each coefficient is assumed to be normally distributed with a mean of zero and a variance of 10.
+We do not know that our coefficients are different from zero, and we don't know which ones are likely to be the most important, so the variance term is quite high.
+Lastly, each observation $y_i$ is distributed according to the calculated `mu` term given by $\alpha + \boldsymbol{\beta}^\mathsf{T}\boldsymbol{X_i}$.
 
 ```julia
 # Bayesian linear regression.
 @model function linear_regression(x, y)
     # Set variance prior.
-    σ₂ ~ truncated(Normal(0, 100), 0, Inf)
+    σ² ~ truncated(Normal(0, 100); lower=0)
 
     # Set intercept prior.
     intercept ~ Normal(0, sqrt(3))
 
     # Set the priors on our coefficients.
     nfeatures = size(x, 2)
-    coefficients ~ MvNormal(nfeatures, sqrt(10))
+    coefficients ~ MvNormal(Zeros(nfeatures), 10.0 * I)
 
     # Calculate all the mu terms.
     mu = intercept .+ x * coefficients
-    return y ~ MvNormal(mu, sqrt(σ₂))
+    return y ~ MvNormal(mu, σ² * I)
 end
 ```
 
 With our model specified, we can call the sampler. We will use the No U-Turn Sampler ([NUTS](http://turing.ml/docs/library/#-turingnuts--type)) here.
 
 ```julia
 model = linear_regression(train, train_target)
-chain = sample(model, NUTS(0.65), 3_000);
+chain = sample(model, NUTS(0.65), 3_000)
 ```
 
-As a visual check to confirm that our coefficients have converged, we show the densities and trace plots for our parameters using the `plot` functionality.
+We can also check the densities and traces of the parameters visually using the `plot` functionality.
 
 ```julia
 plot(chain)
 ```
 
-It looks like each of our parameters has converged. We can check our numerical esimates using `describe(chain)`, as below.
-
-```julia
-describe(chain)
-```
+It looks like all parameters have converged.
 
 ## Comparing to OLS
 

diff --git a/tutorials/05-linear-regression/Project.toml b/tutorials/05-linear-regression/Project.toml
@@ -3,7 +3,9 @@ DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
 Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
+FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
 GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
+LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d"
 MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d"
 NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
@@ -17,8 +19,9 @@ Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
 [compat]
 DataFrames = "1"
 Distances = "0.10"
-Distributions = "0.25"
+Distributions = "0.25.42"
 FileIO = "1"
+FillArrays = "0.12"
 GLM = "1"
 MCMCChains = "5"
 MLDataUtils = "0.5"

diff --git a/tutorials/07-poisson-regression/07_poisson-regression.jmd b/tutorials/07-poisson-regression/07_poisson-regression.jmd
@@ -202,19 +202,14 @@ The exponentiated mean of the coefficient `b1` is roughly half of that of `b2`.
 
 # Removing the Warmup Samples
 
-As can be seen from the plots above, the parameters converge to their final distributions after a few iterations. These initial values during the warmup phase increase the standard deviations of the parameters and are not required after we get the desired distributions. Thus, we remove these warmup values and once again view the diagnostics.
-
-To remove these warmup values, we take all values except the first 200. This is because we set the second parameter of the NUTS sampler (which is the number of adaptations) to be equal to 200. `describe(chain)` is used to view the standard deviations in the estimates of the parameters. It also gives other useful information such as the means and the quantiles.
-
-```julia
-# Note the standard deviation before removing the warmup samples
-describe(chain)
-```
+As can be seen from the plots above, the parameters converge to their final distributions after a few iterations.
+The initial values during the warmup phase increase the standard deviations of the parameters and are not required after we get the desired distributions.
+Thus, we remove these warmup values and once again view the diagnostics.
+To remove these warmup values, we take all values except the first 200.
+This is because we set the second parameter of the NUTS sampler (which is the number of adaptations) to be equal to 200.
 
 ```julia
-# Removing the first 200 values of the chains.
-chains_new = chain[201:2500, :, :]
-describe(chains_new)
+chains_new = chain[201:end, :, :]
 ```
 
 ```julia

diff --git a/tutorials/08-multinomial-logistic-regression/08_multinomial-logistic-regression.jmd b/tutorials/08-multinomial-logistic-regression/08_multinomial-logistic-regression.jmd
@@ -28,6 +28,12 @@ using MLDataUtils: shuffleobs, splitobs, rescale!
 # We need a softmax function which is provided by NNlib.
 using NNlib: softmax
 
+# Functionality for constructing arrays with identical elements efficiently.
+using FillArrays
+
+# Functionality for working with scaled identity matrices.
+using LinearAlgebra
+
 # Set a seed for reproducibility.
 using Random
 Random.seed!(0)
@@ -100,8 +106,8 @@ We select the `setosa` species as the baseline class (the choice does not matter
     # Priors of intercepts and coefficients.
     intercept_versicolor ~ Normal(0, σ)
     intercept_virginica ~ Normal(0, σ)
-    coefficients_versicolor ~ MvNormal(4, σ)
-    coefficients_virginica ~ MvNormal(4, σ)
+    coefficients_versicolor ~ MvNormal(Zeros(4), σ^2 * I)
+    coefficients_virginica ~ MvNormal(Zeros(4), σ^2 * I)
 
     # Compute the likelihood of the observations.
     values_versicolor = intercept_versicolor .+ x * coefficients_versicolor

diff --git a/tutorials/08-multinomial-logistic-regression/Project.toml b/tutorials/08-multinomial-logistic-regression/Project.toml
@@ -1,4 +1,6 @@
 [deps]
+FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
+LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 MLDataUtils = "cc2ba9b6-d476-5e6d-8eaf-a92d5412d41d"
 NNlib = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
 RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
@@ -7,6 +9,7 @@ StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd"
 Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0"
 
 [compat]
+FillArrays = "0.13"
 MLDataUtils = "0.5"
 NNlib = "0.7, 0.8"
 RDatasets = "0.7"

diff --git a/tutorials/09-variational-inference/09_variational-inference.jmd b/tutorials/09-variational-inference/09_variational-inference.jmd
@@ -267,9 +267,11 @@ Random.seed!(1);
 ```
 
 ```julia
-# Import RDatasets.
+using FillArrays
 using RDatasets
 
+using LinearAlgebra
+
 # Hide the progress prompt while sampling.
 Turing.setprogress!(false);
 ```
@@ -333,17 +335,17 @@ test = Matrix(test_cut[:, remove_names]);
 # Bayesian linear regression.
 @model function linear_regression(x, y, n_obs, n_vars, ::Type{T}=Vector{Float64}) where {T}
     # Set variance prior.
-    σ₂ ~ truncated(Normal(0, 100), 0, Inf)
+    σ² ~ truncated(Normal(0, 100), 0, Inf)
 
     # Set intercept prior.
     intercept ~ Normal(0, 3)
 
     # Set the priors on our coefficients.
-    coefficients ~ MvNormal(zeros(n_vars), 10 * ones(n_vars))
+    coefficients ~ MvNormal(Zeros(n_vars), 10.0 * I)
 
     # Calculate all the mu terms.
     mu = intercept .+ x * coefficients
-    return y ~ MvNormal(mu, σ₂)
+    return y ~ MvNormal(mu, σ² * I)
 end;
 ```
 
@@ -682,10 +684,6 @@ But using this interface it becomes trivial to go beyond the mean-field assumpti
 
 Here we'll instead consider the variational family to be a full non-diagonal multivariate Gaussian. As in the previous section we'll implement this by transforming a standard multivariate Gaussian using `Scale` and `Shift`, but now `Scale` will instead be using a lower-triangular matrix (representing the Cholesky of the covariance matrix of a multivariate normal) in contrast to the diagonal matrix we used in for the mean-field approximate posterior.
 
-```julia
-using LinearAlgebra
-```
-
 ```julia
 # Using `ComponentArrays.jl` together with `UnPack.jl` makes our lives much easier.
 using ComponentArrays, UnPack

diff --git a/tutorials/09-variational-inference/Project.toml b/tutorials/09-variational-inference/Project.toml
@@ -3,8 +3,10 @@ Bijectors = "76274a88-744f-5084-9051-94815aaf08c4"
 ComponentArrays = "b0b7db55-cfe3-40fc-9ded-d10e2dbeff66"
 ConjugatePriors = "1624bea9-42b1-5fc1-afd3-e96f729c8d6c"
 FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
+FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"
 GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
 LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
+LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
 PyPlot = "d330b81b-6aea-500a-939a-2ce795aea3ee"
 RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
@@ -18,6 +20,7 @@ Bijectors = "0.9"
 ComponentArrays = "0.11"
 ConjugatePriors = "0.4"
 FileIO = "1.5"
+FillArrays = "0.9"
 GLM = "1.3"
 LaTeXStrings = "1.2"
 Plots = "1.6"

diff --git a/tutorials/10-bayesian-differential-equations/10_bayesian-differential-equations.jmd b/tutorials/10-bayesian-differential-equations/10_bayesian-differential-equations.jmd
@@ -89,10 +89,10 @@ Therefore, we write the Lotka-Volterra parameter estimation problem using the Tu
 @model function fitlv(data, prob)
     # Prior distributions.
     σ ~ InverseGamma(2, 3)
-    α ~ truncated(Normal(1.5, 0.5), 0.5, 2.5)
-    β ~ truncated(Normal(1.2, 0.5), 0, 2)
-    γ ~ truncated(Normal(3.0, 0.5), 1, 4)
-    δ ~ truncated(Normal(1.0, 0.5), 0, 2)
+    α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
+    β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
+    γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
+    δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)
 
     # Simulate Lotka-Volterra model. 
     p = [α, β, γ, δ]
@@ -151,10 +151,10 @@ I.e., we fit the model only to the $y$ variable of the system without providing
 @model function fitlv2(data::AbstractVector, prob)
     # Prior distributions.
     σ ~ InverseGamma(2, 3)
-    α ~ truncated(Normal(1.5, 0.5), 0.5, 2.5)
-    β ~ truncated(Normal(1.2, 0.5), 0, 2)
-    γ ~ truncated(Normal(3.0, 0.5), 1, 4)
-    δ ~ truncated(Normal(1.0, 0.5), 0, 2)
+    α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
+    β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
+    γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
+    δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)
 
     # Simulate Lotka-Volterra model but save only the second state of the system (predators).
     p = [α, β, γ, δ]
@@ -251,10 +251,10 @@ Now we define the Turing model for the Lotka-Volterra model with delay and sampl
 @model function fitlv_dde(data, prob)
     # Prior distributions.
     σ ~ InverseGamma(2, 3)
-    α ~ Truncated(Normal(1.5, 0.5), 0.5, 2.5)
-    β ~ Truncated(Normal(1.2, 0.5), 0, 2)
-    γ ~ Truncated(Normal(3.0, 0.5), 1, 4)
-    δ ~ Truncated(Normal(1.0, 0.5), 0, 2)
+    α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
+    β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
+    γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
+    δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)
 
     # Simulate Lotka-Volterra model.
     p = [α, β, γ, δ]
@@ -332,10 +332,10 @@ Here we will not choose a `sensealg` and let it use the default choice:
 @model function fitlv_sensealg(data, prob)
     # Prior distributions.
     σ ~ InverseGamma(2, 3)
-    α ~ truncated(Normal(1.5, 0.5), 0.5, 2.5)
-    β ~ truncated(Normal(1.2, 0.5), 0, 2)
-    γ ~ truncated(Normal(3.0, 0.5), 1, 4)
-    δ ~ truncated(Normal(1.0, 0.5), 0, 2)
+    α ~ truncated(Normal(1.5, 0.5); lower=0.5, upper=2.5)
+    β ~ truncated(Normal(1.2, 0.5); lower=0, upper=2)
+    γ ~ truncated(Normal(3.0, 0.5); lower=1, upper=4)
+    δ ~ truncated(Normal(1.0, 0.5); lower=0, upper=2)
 
     # Simulate Lotka-Volterra model and use a specific algorithm for computing sensitivities.
     p = [α, β, γ, δ]
@@ -394,12 +394,12 @@ plot(EnsembleSummary(data))
 @model function fitlv_sde(data, prob)
     # Prior distributions.
     σ ~ InverseGamma(2, 3)
-    α ~ truncated(Normal(1.3, 0.5), 0.5, 2.5)
-    β ~ truncated(Normal(1.2, 0.25), 0.5, 2)
-    γ ~ truncated(Normal(3.2, 0.25), 2.2, 4.0)
-    δ ~ truncated(Normal(1.2, 0.25), 0.5, 2.0)
-    ϕ1 ~ truncated(Normal(0.12, 0.3), 0.05, 0.25)
-    ϕ2 ~ truncated(Normal(0.12, 0.3), 0.05, 0.25)
+    α ~ truncated(Normal(1.3, 0.5); lower=0.5, upper=2.5)
+    β ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
+    γ ~ truncated(Normal(3.2, 0.25); lower=2.2, upper=4)
+    δ ~ truncated(Normal(1.2, 0.25); lower=0.5, upper=2)
+    ϕ1 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
+    ϕ2 ~ truncated(Normal(0.12, 0.3); lower=0.05, upper=0.25)
 
     # Simulate stochastic Lotka-Volterra model.
     p = [α, β, γ, δ, ϕ1, ϕ2]