From 5af5e99052f29c4976043bbfc53658048e71b9c0 Mon Sep 17 00:00:00 2001 From: David Widmann Date: Wed, 23 Mar 2022 02:35:30 +0100 Subject: [PATCH 1/7] Update Gaussian mixture tutorial --- .../01_gaussian-mixture-model.jmd | 168 ++++++------ .../01-gaussian-mixture-model/Manifest.toml | 255 ++++++++++-------- .../01-gaussian-mixture-model/Project.toml | 9 +- 3 files changed, 233 insertions(+), 199 deletions(-) diff --git a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd index 14373a950..cf93e6607 100644 --- a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd +++ b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd @@ -4,162 +4,170 @@ permalink: /:collection/:name/ redirect_from: tutorials/1-gaussianmixturemodel/ --- -The following tutorial illustrates the use *Turing* for clustering data using a Bayesian mixture model. The aim of this task is to infer a latent grouping (hidden structure) from unlabelled data. +The following tutorial illustrates the use *Turing* for clustering data using a Bayesian mixture model. +The aim of this task is to infer a latent grouping (hidden structure) from unlabelled data. -More specifically, we are interested in discovering the grouping illustrated in figure below. This example consists of 2-D data points, i.e. $\boldsymbol{x} = \\{x_i\\}_{i=1}^N, x_i \in \mathbb{R}^2$, which are distributed according to Gaussian distributions. For simplicity, we use isotropic Gaussian distributions but this assumption can easily be relaxed by introducing additional parameters. +## Synthetic data + +We generate a synthetic dataset of $N = 60$ two-dimensional points $x_i \in \mathbb{R}^2$ drawn from a Gaussian mixture model. +For simplicity, we use $K = 2$ clusters with +- equal weights, i.e., we use mixture weights $w = [0.5, 0.5]$, and +- isotropic Gaussian distributions of the points in each cluster. +More concretely, we use the Gaussian distributions $\mathcal{N}([\mu_k, \mu_k]^\mathsf{T}, I)$ with parameters $\mu_1 = -3.5$ and $\mu_2 = 0.5$. ```julia -using Distributions, StatsPlots, Random +using Distributions +using FillArrays +using StatsPlots + +using LinearAlgebra +using Random # Set a random seed. Random.seed!(3) -# Construct 30 data points for each cluster. -N = 30 +# Define Gaussian mixture model. +w = [0.5, 0.5] +μ = [-3.5, 0.5] +mixturemodel = MixtureModel([MvNormal(Fill(μₖ, 2), I) for μₖ in μ], w) -# Parameters for each cluster, we assume that each cluster is Gaussian distributed in the example. -μs = [-3.5, 0.0] +# We draw the data points. +N = 60 +x = rand(mixturemodel, N) +``` -# Construct the data points. -x = mapreduce(c -> rand(MvNormal([μs[c], μs[c]], 1.0), N), hcat, 1:2) +The following plot shows the dataset. -# Visualization. +```julia scatter(x[1, :], x[2, :]; legend=false, title="Synthetic Dataset") ``` ## Gaussian Mixture Model in Turing -To cluster the data points shown above, we use a model that consists of two mixture components (clusters) and assigns each datum to one of the components. The assignment thereof determines the distribution that the data point is generated from. +We are interested in recovering the grouping from the dataset. +More precisely, we want to infer the mixture weights, the parameters $\mu_1$ and $\mu_2$, and the assignment of each datum to a cluster for the generative Gaussian mixture model. -In particular, in a Bayesian Gaussian mixture model with $1 \leq k \leq K$ components for 1-D data each data point $x_i$ with $1 \leq i \leq N$ is generated according to the following generative process. -First we draw the parameters for each cluster, i.e. in our example we draw location of the distributions from a Normal: +In a Bayesian Gaussian mixture model with $K$ components each data point $x_i$ ($i = 1,\ldots,N$) is generated according to the following generative process. +First we draw the parameters for each cluster, i.e., in our example we draw parameters $\mu_k$ for the mean of the isotropic normal distributions: $$ -\mu_k \sim \mathrm{Normal}() \, , \; \forall k +\mu_k \sim \mathcal{N}(0, 1) \qquad (k = 1,\ldots,K) $$ -and then draw mixing weight for the $K$ clusters from a Dirichlet distribution, i.e. +and then we draw mixture weights $w$ for the $K$ clusters from a Dirichlet distribution $$ -w \sim \mathrm{Dirichlet}(K, \alpha) \, . +w \sim \operatorname{Dirichlet}(K, \alpha). $$ After having constructed all the necessary model parameters, we can generate an observation by first selecting one of the clusters and then drawing the datum accordingly, i.e. $$ -z_i \sim \mathrm{Categorical}(w) \, , \; \forall i \\ -x_i \sim \mathrm{Normal}(\mu_{z_i}, 1.) \, , \; \forall i +z_i \sim \operatorname{Categorical}(w) \qquad (i = 1,\ldots,N) \\ +x_i \sim \mathcal{N}(\mu_{z_i}, I) \qquad (i=1,\dlots,N). $$ - For more details on Gaussian mixture models, we refer to Christopher M. Bishop, *Pattern Recognition and Machine Learning*, Section 9. -```julia -using Turing, MCMCChains - -# Turn off the progress monitor. -Turing.setprogress!(false); -``` +We specify the model with Turing. ```julia -@model function GaussianMixtureModel(x) - D, N = size(x) - - # Draw the parameters for cluster 1. - μ1 ~ Normal() - - # Draw the parameters for cluster 2. - μ2 ~ Normal() +using Turing - μ = [μ1, μ2] +@model function gaussian_mixture_model(x) + # Draw the parameters for each of the K=2 clusters from a standard normal distribution. + K = 2 + μ ~ MvNormal(Zeros(K), I) - # Uncomment the following lines to draw the weights for the K clusters - # from a Dirichlet distribution. + # Draw the weights for the K clusters from a Dirichlet distribution. + w ~ Dirichlet(K, 1.0) + # Alternatively, one could use a fixed set of weights. + # w = fill(1/K, K) - # α = 1.0 - # w ~ Dirichlet(2, α) + # Construct categorical distribution of assignments. + distribution_assignments = Categorical(w) - # Comment out this line if you instead want to draw the weights. - w = [0.5, 0.5] + # Construct multivariate normal distributions of each cluster + D, N = size(x) + distribution_clusters = [MvNormal(Fill(μₖ, D), I) for μₖ in μ] # Draw assignments for each datum and generate it from a multivariate normal. k = Vector{Int}(undef, N) for i in 1:N - k[i] ~ Categorical(w) - x[:, i] ~ MvNormal([μ[k[i]], μ[k[i]]], 1.0) + k[i] ~ distribution_assignments + x[:, i] ~ distribution_clusters[k[i]] end - return k -end; -``` -After having specified the model in Turing, we can construct the model function and run a MCMC simulation to obtain assignments of the data points. + return k +end -```julia -gmm_model = GaussianMixtureModel(x); +model = gaussian_mixture_model(x); ``` -To draw observations from the posterior distribution, we use a [particle Gibbs](https://www.stats.ox.ac.uk/%7Edoucet/andrieu_doucet_holenstein_PMCMC.pdf) sampler to draw the discrete assignment parameters as well as a Hamiltonion Monte Carlo sampler for continous parameters. - -Note that we use a `Gibbs` sampler to combine both samplers for Bayesian inference in our model. -We are also calling `MCMCThreads` to generate multiple chains, particularly so we test for convergence. +We run a MCMC simulation to obtain an approximation of the posterior distribution of the parameters $\mu$ and $w$ and assignments $k$. +We use a `Gibbs` sampler that combines a [particle Gibbs](https://www.stats.ox.ac.uk/%7Edoucet/andrieu_doucet_holenstein_PMCMC.pdf) sampler for the discrete parameters (assignments $k$) and a Hamiltonion Monte Carlo sampler for the continous parameters ($\mu$ and $w$). +We generate multiple chains in parallel using multi-threading. ```julia -gmm_sampler = Gibbs(PG(100, :k), HMC(0.05, 10, :μ1, :μ2)) -tchain = sample(gmm_model, gmm_sampler, MCMCThreads(), 100, 3); +sampler = Gibbs(PG(100, :k), HMC(0.05, 10, :μ, :w)) +chains = sample(model, sampler, MCMCThreads(), 100, 3); ``` ```julia; echo=false; error=false let - matrix = get(tchain, :μ1).μ1 - first_chain = matrix[:, 1] - actual = mean(first_chain) # Verify that the output of the chain is as expected. - # μ1 and μ2 appear to switch places, so that's why isapprox(...) || isapprox(...). - @assert isapprox(actual, -3.5; atol=1) || isapprox(actual, 0.2; atol=1) + for i in MCMCChains.chains(chains) + # μ[1] and μ[2] can switch places, so we sort the values first. + chain = Array(chains[:, ["μ[1]", "μ[2]"], i]) + μ_mean = vec(mean(chain; dims=1)) + @assert isapprox(sort(μ_mean), μ; rtol=0.1) + end end ``` ## Visualize the Density Region of the Mixture Model -After successfully doing posterior inference, we can first visualize the trace and density of the parameters of interest. +After sampling we can visualize the trace and density of the parameters of interest. -In particular, in this example we consider the sample values of the location parameter for the two clusters. +We consider the samples of the location parameters $\mu_1$ and $\mu_2$ for the two clusters. ```julia -ids = findall(map(name -> occursin("μ", string(name)), names(tchain))); -p = plot(tchain[:, ids, :]; legend=true, labels=["Mu 1" "Mu 2"], colordim=:parameter) +plot(chains[["μ[1]", "μ[2]"]]; colordim=:parameter, legend=true) ``` -You'll note here that it appears the location means are switching between chains. We will address this in future tutorials. For those who are keenly interested, see [this](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html) article on potential solutions. +It can happen that the modes of $\mu_1$ and $\mu_2$ switch between chains. +For more information see the [Stan documentation](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html) for potential solutions. -For the moment, we will just use the first chain to ensure the validity of our inference. +We also inspect the samples of the mixture weights $w$ visually. ```julia -tchain = tchain[:, :, 1]; +plot(chains[["w[1]", "w[2]"]]; colordim=:parameter, legend=true) ``` -As the samples for the location parameter for both clusters are unimodal, we can safely visualize the density region of our model using the average location. +In the following, we just use the first chain to ensure the validity of our inference. ```julia -# Helper function used for visualizing the density region. -function predict(x, y, w, μ) - # Use log-sum-exp trick for numeric stability. - return Turing.logaddexp( - log(w[1]) + logpdf(MvNormal([μ[1], μ[1]], 1.0), [x, y]), - log(w[2]) + logpdf(MvNormal([μ[2], μ[2]], 1.0), [x, y]), - ) -end; +chain = chains[:, :, 1]; ``` +As the distributions of the samples for the parameters $\mu_1$, $\mu_2$, $w_1$, and $w_2$ are unimodal, we can safely visualize the density region of our model using the average values. + ```julia +# Model with average parameters +μ_mean = [mean(chain, "μ[$i]") for i in 1:2] +w_mean = [mean(chain, "w[$i]") for i in 1:2] +mixturemodel_mean = MixtureModel([MvNormal(Fill(μₖ, 2), I) for μₖ in μ_mean], w_mean) + contour( - range(-5; stop=3), - range(-6; stop=2), - (x, y) -> predict(x, y, [0.5, 0.5], [mean(tchain[:μ1]), mean(tchain[:μ2])]), + range(-7.5, 3; length=1_000), + range(-6.5, 3; length=1_000), + (x, y) -> logpdf(mixturemodel_mean, [x, y]); + widen=false, ) scatter!(x[1, :], x[2, :]; legend=false, title="Synthetic Dataset") ``` ## Inferred Assignments -Finally, we can inspect the assignments of the data points inferred using Turing. As we can see, the dataset is partitioned into two distinct groups. +Finally, we can inspect the assignments of the data points inferred using Turing. +As we can see, the dataset is partitioned into two distinct groups. ```julia -assignments = mean(MCMCChains.group(tchain, :k)).nt.mean +assignments = [mean(chain, "k[$i]") for i in 1:N] scatter( x[1, :], x[2, :]; diff --git a/tutorials/01-gaussian-mixture-model/Manifest.toml b/tutorials/01-gaussian-mixture-model/Manifest.toml index 42f82c83d..168928f2b 100644 --- a/tutorials/01-gaussian-mixture-model/Manifest.toml +++ b/tutorials/01-gaussian-mixture-model/Manifest.toml @@ -8,15 +8,15 @@ version = "1.1.0" [[AbstractMCMC]] deps = ["BangBang", "ConsoleProgressMonitor", "Distributed", "Logging", "LoggingExtras", "ProgressLogging", "Random", "StatsBase", "TerminalLoggers", "Transducers"] -git-tree-sha1 = "db0a7ff3fbd987055c43b4e12d2fa30aaae8749c" +git-tree-sha1 = "47aca4cf0dc430f20f68f6992dc4af0e4dc8ebee" uuid = "80f14c24-f653-4e6a-9b94-39d6b0f70001" -version = "3.2.1" +version = "4.0.0" [[AbstractPPL]] -deps = ["AbstractMCMC"] -git-tree-sha1 = "15f34cc635546ac072d03fc2cc10083adb4df680" +deps = ["AbstractMCMC", "DensityInterface", "Setfield", "SparseArrays"] +git-tree-sha1 = "6320752437e9fbf49639a410017d862ad64415a5" uuid = "7a57a42e-76ec-4ea3-a279-07e840d6d9cf" -version = "0.2.0" +version = "0.5.2" [[AbstractTrees]] git-tree-sha1 = "03e0550477d86222521d254b741d470ba17ea0b5" @@ -31,27 +31,27 @@ version = "3.3.3" [[AdvancedHMC]] deps = ["AbstractMCMC", "ArgCheck", "DocStringExtensions", "InplaceOps", "LinearAlgebra", "ProgressMeter", "Random", "Requires", "Setfield", "Statistics", "StatsBase", "StatsFuns", "UnPack"] -git-tree-sha1 = "189473a73d664fe2496675775b6c8a732b8dfe26" +git-tree-sha1 = "68136ef13a2f549a20e3572c8f9f2b83b901ac1a" uuid = "0bf59076-c3b1-5ca4-86bd-e02cd72cde3d" -version = "0.3.3" +version = "0.3.4" [[AdvancedMH]] deps = ["AbstractMCMC", "Distributions", "Random", "Requires"] -git-tree-sha1 = "8ad8bfddf8bb627d689ecb91599c349cbf15e971" +git-tree-sha1 = "5d9e09a242d4cf222080398468244389c3428ed1" uuid = "5b7e9947-ddc0-4b3f-9b55-0d8042f74170" -version = "0.6.6" +version = "0.6.7" [[AdvancedPS]] deps = ["AbstractMCMC", "Distributions", "Libtask", "Random", "StatsFuns"] -git-tree-sha1 = "06da6c283cf17cf0f97ed2c07c29b6333ee83dc9" +git-tree-sha1 = "78620daebe1b87dfe17cac4bc08cec73b057eb0a" uuid = "576499cb-2369-40b2-a588-c64705576edc" -version = "0.2.4" +version = "0.3.7" [[AdvancedVI]] deps = ["Bijectors", "Distributions", "DistributionsAD", "DocStringExtensions", "ForwardDiff", "LinearAlgebra", "ProgressMeter", "Random", "Requires", "StatsBase", "StatsFuns", "Tracker"] -git-tree-sha1 = "130d6b17a3a9d420d9a6b37412cae03ffd6a64ff" +git-tree-sha1 = "2f0ddff49ae4c812ba7b348b8427636f8bbd6c05" uuid = "b5ca4192-6429-45e5-a2d9-87aec30a685c" -version = "0.1.3" +version = "0.1.4" [[ArgCheck]] git-tree-sha1 = "a3a402a35a2f7e0b87828ccabbd5ebfbebe356b4" @@ -75,9 +75,9 @@ version = "3.5.0+3" [[ArrayInterface]] deps = ["Compat", "IfElse", "LinearAlgebra", "Requires", "SparseArrays", "Static"] -git-tree-sha1 = "745233d77146ad221629590b6d82fe7f1ddb478f" +git-tree-sha1 = "6e8fada11bb015ecf9263f64b156f98b546918c7" uuid = "4fba245c-0d91-5ea0-9b3e-6abc04ee57a9" -version = "4.0.3" +version = "5.0.5" [[Artifacts]] uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33" @@ -96,9 +96,9 @@ version = "0.4.4" [[BangBang]] deps = ["Compat", "ConstructionBase", "Future", "InitialValues", "LinearAlgebra", "Requires", "Setfield", "Tables", "ZygoteRules"] -git-tree-sha1 = "d648adb5e01b77358511fb95ea2e4d384109fac9" +git-tree-sha1 = "b15a6bc52594f5e4a3b825858d1089618871bf9d" uuid = "198e06fe-97b7-11e9-32a5-e1d131e6ad66" -version = "0.3.35" +version = "0.3.36" [[Base64]] uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f" @@ -134,15 +134,15 @@ version = "0.5.1" [[ChainRules]] deps = ["ChainRulesCore", "Compat", "IrrationalConstants", "LinearAlgebra", "Random", "RealDot", "SparseArrays", "Statistics"] -git-tree-sha1 = "098b5eeb1170f569a45f363066b0e405868fc210" +git-tree-sha1 = "8b887daa6af5daf705081061e36386190204ac87" uuid = "082447d4-558c-5d27-93f4-14fc19e9eca2" -version = "1.27.0" +version = "1.28.1" [[ChainRulesCore]] deps = ["Compat", "LinearAlgebra", "SparseArrays"] -git-tree-sha1 = "7dd38532a1115a215de51775f9891f0f3e1bac6a" +git-tree-sha1 = "9950387274246d08af38f6eef8cb5480862a435f" uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4" -version = "1.12.1" +version = "1.14.0" [[ChangesOfVariables]] deps = ["ChainRulesCore", "LinearAlgebra", "Test"] @@ -192,9 +192,9 @@ version = "0.3.0" [[Compat]] deps = ["Base64", "Dates", "DelimitedFiles", "Distributed", "InteractiveUtils", "LibGit2", "Libdl", "LinearAlgebra", "Markdown", "Mmap", "Pkg", "Printf", "REPL", "Random", "SHA", "Serialization", "SharedArrays", "Sockets", "SparseArrays", "Statistics", "Test", "UUIDs", "Unicode"] -git-tree-sha1 = "44c37b4636bc54afac5c574d2d02b625349d6582" +git-tree-sha1 = "96b0bc6c52df76506efc8a441c6cf1adcb1babc4" uuid = "34da2185-b29b-5c13-b0c7-acf172513d20" -version = "3.41.0" +version = "3.42.0" [[CompilerSupportLibraries_jll]] deps = ["Artifacts", "Libdl"] @@ -293,15 +293,15 @@ uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b" [[Distributions]] deps = ["ChainRulesCore", "DensityInterface", "FillArrays", "LinearAlgebra", "PDMats", "Printf", "QuadGK", "Random", "SparseArrays", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns", "Test"] -git-tree-sha1 = "9d3c0c762d4666db9187f363a76b47f7346e673b" +git-tree-sha1 = "c43e992f186abaf9965cc45e372f4693b7754b22" uuid = "31c24e10-a181-5473-b8eb-7969acd0382f" -version = "0.25.49" +version = "0.25.52" [[DistributionsAD]] deps = ["Adapt", "ChainRules", "ChainRulesCore", "Compat", "DiffRules", "Distributions", "FillArrays", "LinearAlgebra", "NaNMath", "PDMats", "Random", "Requires", "SpecialFunctions", "StaticArrays", "StatsBase", "StatsFuns", "ZygoteRules"] -git-tree-sha1 = "61805bf57113a52435a13ca0bb588daf8848784d" +git-tree-sha1 = "b51ed93e06497fc4e7ff78bbca03c4f7951d2ec2" uuid = "ced4e74d-a319-5a8a-b0ac-84af2272839c" -version = "0.6.37" +version = "0.6.38" [[DocStringExtensions]] deps = ["LibGit2"] @@ -315,15 +315,15 @@ uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6" [[DualNumbers]] deps = ["Calculus", "NaNMath", "SpecialFunctions"] -git-tree-sha1 = "84f04fe68a3176a583b864e492578b9466d87f1e" +git-tree-sha1 = "90b158083179a6ccbce2c7eb1446d5bf9d7ae571" uuid = "fa6b7ba4-c1ee-5f82-b5fc-ecf0adba8f74" -version = "0.6.6" +version = "0.6.7" [[DynamicPPL]] -deps = ["AbstractMCMC", "AbstractPPL", "BangBang", "Bijectors", "ChainRulesCore", "Distributions", "MacroTools", "Random", "ZygoteRules"] -git-tree-sha1 = "532397f64ad49472fb60e328369ecd5dedeff02f" +deps = ["AbstractMCMC", "AbstractPPL", "BangBang", "Bijectors", "ChainRulesCore", "Distributions", "LinearAlgebra", "MacroTools", "Random", "Setfield", "Test", "ZygoteRules"] +git-tree-sha1 = "5d1704965e4bf0c910693b09ece8163d75e28806" uuid = "366bfd00-2699-11ea-058f-f148b4cae6d8" -version = "0.15.1" +version = "0.19.1" [[EarCut_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] @@ -332,16 +332,15 @@ uuid = "5ae413db-bbd1-5e63-b57d-d24a61df00f5" version = "2.2.3+0" [[EllipsisNotation]] -deps = ["ArrayInterface"] -git-tree-sha1 = "d7ab55febfd0907b285fbf8dc0c73c0825d9d6aa" +git-tree-sha1 = "18ee049accec8763be17a933737c1dd0fdf8673a" uuid = "da5c29d0-fa7d-589e-88eb-ea29b0a81949" -version = "1.3.0" +version = "1.0.0" [[EllipticalSliceSampling]] deps = ["AbstractMCMC", "ArrayInterface", "Distributions", "Random", "Statistics"] -git-tree-sha1 = "c25a7254cf745720ddf9051cd0d2792b3baaca0e" +git-tree-sha1 = "bed775e32c6f38a19c1dbe0298480798e6be455f" uuid = "cad2338a-1db2-11e9-3401-43bc07c9ede2" -version = "0.4.6" +version = "0.5.0" [[Expat_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] @@ -363,9 +362,9 @@ version = "4.4.0+0" [[FFTW]] deps = ["AbstractFFTs", "FFTW_jll", "LinearAlgebra", "MKL_jll", "Preferences", "Reexport"] -git-tree-sha1 = "463cb335fa22c4ebacfd1faba5fde14edb80d96c" +git-tree-sha1 = "505876577b5481e50d089c1c68899dfb6faebc62" uuid = "7a1cc6ca-52ef-59f5-83cd-3a7055c09341" -version = "1.4.5" +version = "1.4.6" [[FFTW_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] @@ -375,9 +374,9 @@ version = "3.3.10+0" [[FillArrays]] deps = ["LinearAlgebra", "Random", "SparseArrays", "Statistics"] -git-tree-sha1 = "deed294cde3de20ae0b2e0355a6c4e1c6a5ceffc" +git-tree-sha1 = "246621d23d1f43e3b9c368bf3b72b2331a27c286" uuid = "1a297f60-69ca-5386-bcde-b61e274b549b" -version = "0.12.8" +version = "0.13.2" [[FixedPointNumbers]] deps = ["Statistics"] @@ -444,9 +443,9 @@ version = "0.64.0+0" [[GeometryBasics]] deps = ["EarCut_jll", "IterTools", "LinearAlgebra", "StaticArrays", "StructArrays", "Tables"] -git-tree-sha1 = "58bcdf5ebc057b085e58d95c138725628dd7453c" +git-tree-sha1 = "83ea630384a13fc4f002b77690bc0afeb4255ac9" uuid = "5c1252a2-5f33-56bf-86c9-59e7332b4326" -version = "0.4.1" +version = "0.4.2" [[Gettext_jll]] deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "Libdl", "Libiconv_jll", "Pkg", "XML2_jll"] @@ -489,16 +488,21 @@ git-tree-sha1 = "65e4589030ef3c44d3b90bdc5aac462b4bb05567" uuid = "34004b35-14d8-5ef3-9330-4cdb6864b03a" version = "0.3.8" +[[IRTools]] +deps = ["InteractiveUtils", "MacroTools", "Test"] +git-tree-sha1 = "7f43342f8d5fd30ead0ba1b49ab1a3af3b787d24" +uuid = "7869d1d1-7146-5819-86e3-90919afe41df" +version = "0.4.5" + [[IfElse]] git-tree-sha1 = "debdd00ffef04665ccbb3e150747a77560e8fad1" uuid = "615f187c-cbe4-4ef1-ba3b-2fcf58d6d173" version = "0.1.1" [[IniFile]] -deps = ["Test"] -git-tree-sha1 = "098e4d2c533924c921f9f9847274f2ad89e018b8" +git-tree-sha1 = "f550e6e32074c939295eb5ea6de31849ac2c9625" uuid = "83e8ac13-25f8-5344-8a64-a9f2b223428f" -version = "0.5.0" +version = "0.5.1" [[InitialValues]] git-tree-sha1 = "4da0f88e9a39111c2fa3add390ab15f3a44f3ca3" @@ -529,15 +533,15 @@ version = "0.13.5" [[IntervalSets]] deps = ["Dates", "EllipsisNotation", "Statistics"] -git-tree-sha1 = "3cc368af3f110a767ac786560045dceddfc16758" +git-tree-sha1 = "bcf640979ee55b652f3b01650444eb7bbe3ea837" uuid = "8197267c-284f-5f27-9208-e0e47529a953" -version = "0.5.3" +version = "0.5.4" [[InverseFunctions]] deps = ["Test"] -git-tree-sha1 = "a7254c0acd8e62f1ac75ad24d5db43f5f19f3c65" +git-tree-sha1 = "91b5dcf362c5add98049e6c29ee756910b03051d" uuid = "3587e190-3f89-42d0-90ee-14403ec27112" -version = "0.1.2" +version = "0.1.3" [[InvertedIndices]] git-tree-sha1 = "bee5f1ef5bf65df56bdd2e40447590b272a5471f" @@ -589,6 +593,17 @@ git-tree-sha1 = "f6250b16881adf048549549fba48b1161acdac8c" uuid = "c1c5ebd0-6772-5130-a774-d5fcae4a789d" version = "3.100.1+0" +[[LERC_jll]] +deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] +git-tree-sha1 = "bf36f528eec6634efc60d7ec062008f171071434" +uuid = "88015f11-f218-50d7-93a8-a6af411a945d" +version = "3.0.0+1" + +[[LRUCache]] +git-tree-sha1 = "d64a0aff6691612ab9fb0117b0995270871c5dfc" +uuid = "8ac3fa9e-de4c-5943-b1dc-09c6b5f20637" +version = "1.3.0" + [[LZO_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] git-tree-sha1 = "e5b909bcf985c5e2605737d2ce278ed791b89be6" @@ -602,9 +617,9 @@ version = "1.3.0" [[Latexify]] deps = ["Formatting", "InteractiveUtils", "LaTeXStrings", "MacroTools", "Markdown", "Printf", "Requires"] -git-tree-sha1 = "2a8650452c07a9c89e6a58f296fd638fadaca021" +git-tree-sha1 = "4f00cc36fede3c04b8acf9b2e2763decfdcecfa6" uuid = "23fbe1c1-3f47-55db-b15f-69d7ec21a316" -version = "0.15.11" +version = "0.15.13" [[LazyArtifacts]] deps = ["Artifacts", "Pkg"] @@ -672,22 +687,16 @@ uuid = "4b2f31a3-9ecc-558c-b454-b3730dcb73e9" version = "2.35.0+0" [[Libtask]] -deps = ["Libtask_jll", "LinearAlgebra", "Statistics"] -git-tree-sha1 = "90c6ed7f9ac449cddacd80d5c1fca59c97d203e7" +deps = ["IRTools", "LRUCache", "LinearAlgebra", "MacroTools", "Statistics"] +git-tree-sha1 = "ed1b54f6df6fb7af8b315cfdc288ab5572dbd3ba" uuid = "6f1fad26-d15e-5dc8-ae53-837a1d7b8c9f" -version = "0.5.3" - -[[Libtask_jll]] -deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] -git-tree-sha1 = "901fc8752bbc527a6006a951716d661baa9d54e9" -uuid = "3ae2931a-708c-5973-9c38-ccf7496fb450" -version = "0.4.3+0" +version = "0.7.0" [[Libtiff_jll]] -deps = ["Artifacts", "JLLWrappers", "JpegTurbo_jll", "Libdl", "Pkg", "Zlib_jll", "Zstd_jll"] -git-tree-sha1 = "340e257aada13f95f98ee352d316c3bed37c8ab9" +deps = ["Artifacts", "JLLWrappers", "JpegTurbo_jll", "LERC_jll", "Libdl", "Pkg", "Zlib_jll", "Zstd_jll"] +git-tree-sha1 = "c9551dd26e31ab17b86cbd00c2ede019c08758eb" uuid = "89763e89-9b03-5906-acba-b20f662cd828" -version = "4.3.0+0" +version = "4.3.0+1" [[Libuuid_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] @@ -701,9 +710,9 @@ uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" [[LogExpFunctions]] deps = ["ChainRulesCore", "ChangesOfVariables", "DocStringExtensions", "InverseFunctions", "IrrationalConstants", "LinearAlgebra"] -git-tree-sha1 = "e5718a00af0ab9756305a0392832c8952c7426c1" +git-tree-sha1 = "58f25e56b706f95125dcb796f39e1fb01d913a71" uuid = "2ab3a3ac-af41-5b50-aa03-7779005ae688" -version = "0.3.6" +version = "0.3.10" [[Logging]] uuid = "56ddb016-857b-54e1-b83d-db4d58db5568" @@ -716,9 +725,9 @@ version = "0.4.7" [[MCMCChains]] deps = ["AbstractMCMC", "AxisArrays", "Compat", "Dates", "Distributions", "Formatting", "IteratorInterfaceExtensions", "KernelDensity", "LinearAlgebra", "MCMCDiagnosticTools", "MLJModelInterface", "NaturalSort", "OrderedCollections", "PrettyTables", "Random", "RecipesBase", "Serialization", "Statistics", "StatsBase", "StatsFuns", "TableTraits", "Tables"] -git-tree-sha1 = "ddafbd2a95114d13721f2b6ddeeaee9529d6bc2b" +git-tree-sha1 = "872da3b1f21fa79c66723225efabc878f18509ed" uuid = "c7f686f2-ff18-58e9-bc7b-31028e88f75d" -version = "5.0.3" +version = "5.1.0" [[MCMCDiagnosticTools]] deps = ["AbstractFFTs", "DataAPI", "Distributions", "LinearAlgebra", "MLJModelInterface", "Random", "SpecialFunctions", "Statistics", "StatsBase", "Tables"] @@ -728,15 +737,15 @@ version = "0.1.3" [[MKL_jll]] deps = ["Artifacts", "IntelOpenMP_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"] -git-tree-sha1 = "5455aef09b40e5020e1520f551fa3135040d4ed0" +git-tree-sha1 = "e595b205efd49508358f7dc670a940c790204629" uuid = "856f044c-d86e-5d09-b602-aeab76dc8ba7" -version = "2021.1.1+2" +version = "2022.0.0+0" [[MLJModelInterface]] deps = ["Random", "ScientificTypesBase", "StatisticalTraits"] -git-tree-sha1 = "8da86dcf5a9ea48413c7e920a990f0ea1869f9cb" +git-tree-sha1 = "74d7fb54c306af241c5f9d4816b735cb4051e125" uuid = "e80e1ace-859a-464e-9ed9-23947d8ae3ea" -version = "1.3.6" +version = "1.4.2" [[MacroTools]] deps = ["Markdown", "Random"] @@ -787,16 +796,16 @@ uuid = "a63ad114-7e13-5084-954f-fe012c677804" uuid = "14a3606d-f60d-562e-9121-12d972cd8159" [[MultivariateStats]] -deps = ["Arpack", "LinearAlgebra", "SparseArrays", "Statistics", "StatsBase"] -git-tree-sha1 = "6d019f5a0465522bbfdd68ecfad7f86b535d6935" +deps = ["Arpack", "LinearAlgebra", "SparseArrays", "Statistics", "StatsAPI", "StatsBase"] +git-tree-sha1 = "7008a3412d823e29d370ddc77411d593bd8a3d03" uuid = "6f286f6a-111f-5878-ab1e-185364afe411" -version = "0.9.0" +version = "0.9.1" [[NNlib]] deps = ["Adapt", "ChainRulesCore", "Compat", "LinearAlgebra", "Pkg", "Requires", "Statistics"] -git-tree-sha1 = "996a3dca9893cb0741bbd08e48b2e2aa0d551898" +git-tree-sha1 = "a59a614b8b4ea6dc1dcec8c6514e251f13ccbe10" uuid = "872c559c-99b0-510c-b3b7-b6c96a88d5cd" -version = "0.8.2" +version = "0.8.4" [[NaNMath]] git-tree-sha1 = "b086b7ea07f8e38cf122f5016af580881ac914fe" @@ -850,9 +859,9 @@ uuid = "05823500-19ac-5b8b-9628-191a04bc5112" [[OpenSSL_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] -git-tree-sha1 = "648107615c15d4e09f7eca16307bc821c1f718d8" +git-tree-sha1 = "ab05aa4cc89736e95915b01e7279e61b1bfe33b8" uuid = "458c3c95-2e84-50aa-8efc-19380b2a3a95" -version = "1.1.13+0" +version = "1.1.14+0" [[OpenSpecFun_jll]] deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "Libdl", "Pkg"] @@ -879,15 +888,15 @@ version = "8.44.0+0" [[PDMats]] deps = ["LinearAlgebra", "SparseArrays", "SuiteSparse"] -git-tree-sha1 = "ee26b350276c51697c9c2d88a072b339f9f03d73" +git-tree-sha1 = "e8185b83b9fc56eb6456200e873ce598ebc7f262" uuid = "90014a1f-27ba-587c-ab20-58faa44d9150" -version = "0.11.5" +version = "0.11.7" [[Parsers]] deps = ["Dates"] -git-tree-sha1 = "13468f237353112a01b2d6b32f3d0f80219944aa" +git-tree-sha1 = "85b5da0fa43588c75bb1ff986493443f821c70b7" uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0" -version = "2.2.2" +version = "2.2.3" [[Pixman_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] @@ -907,21 +916,21 @@ version = "2.0.1" [[PlotUtils]] deps = ["ColorSchemes", "Colors", "Dates", "Printf", "Random", "Reexport", "Statistics"] -git-tree-sha1 = "6f1b25e8ea06279b5689263cc538f51331d7ca17" +git-tree-sha1 = "bb16469fd5224100e422f0b027d26c5a25de1200" uuid = "995b91a9-d308-5afd-9ec6-746e21dbc043" -version = "1.1.3" +version = "1.2.0" [[Plots]] -deps = ["Base64", "Contour", "Dates", "Downloads", "FFMPEG", "FixedPointNumbers", "GR", "GeometryBasics", "JSON", "Latexify", "LinearAlgebra", "Measures", "NaNMath", "PlotThemes", "PlotUtils", "Printf", "REPL", "Random", "RecipesBase", "RecipesPipeline", "Reexport", "Requires", "Scratch", "Showoff", "SparseArrays", "Statistics", "StatsBase", "UUIDs", "UnicodeFun", "Unzip"] -git-tree-sha1 = "5c907bdee5966a9adb8a106807b7c387e51e4d6c" +deps = ["Base64", "Contour", "Dates", "Downloads", "FFMPEG", "FixedPointNumbers", "GR", "GeometryBasics", "JSON", "Latexify", "LinearAlgebra", "Measures", "NaNMath", "Pkg", "PlotThemes", "PlotUtils", "Printf", "REPL", "Random", "RecipesBase", "RecipesPipeline", "Reexport", "Requires", "Scratch", "Showoff", "SparseArrays", "Statistics", "StatsBase", "UUIDs", "UnicodeFun", "Unzip"] +git-tree-sha1 = "1690b713c3b460c955a2957cd7487b1b725878a7" uuid = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" -version = "1.25.11" +version = "1.27.1" [[Preferences]] deps = ["TOML"] -git-tree-sha1 = "2cf929d64681236a2e074ffafb8d568733d2e6af" +git-tree-sha1 = "d3538e7f8a790dc8903519090857ef8e1283eecd" uuid = "21216c6a-2e73-6563-6e65-726566657250" -version = "1.2.3" +version = "1.2.5" [[PrettyTables]] deps = ["Crayons", "Formatting", "Markdown", "Reexport", "Tables"] @@ -972,9 +981,9 @@ version = "0.3.2" [[Ratios]] deps = ["Requires"] -git-tree-sha1 = "01d341f502250e81f6fec0afe662aa861392a3aa" +git-tree-sha1 = "dc84268fe0e3335a62e315a3a7cf2afa7178a734" uuid = "c84ed2f1-dad5-54f0-aa8e-dbefe2724439" -version = "0.4.2" +version = "0.4.3" [[RealDot]] deps = ["LinearAlgebra"] @@ -989,9 +998,15 @@ version = "1.2.1" [[RecipesPipeline]] deps = ["Dates", "NaNMath", "PlotUtils", "RecipesBase"] -git-tree-sha1 = "37c1631cb3cc36a535105e6d5557864c82cd8c2b" +git-tree-sha1 = "995a812c6f7edea7527bb570f0ac39d0fb15663c" uuid = "01d81517-befc-4cb6-b9ec-a95719d0359c" -version = "0.5.0" +version = "0.5.1" + +[[RecursiveArrayTools]] +deps = ["Adapt", "ArrayInterface", "ChainRulesCore", "DocStringExtensions", "FillArrays", "LinearAlgebra", "RecipesBase", "Requires", "StaticArrays", "Statistics", "ZygoteRules"] +git-tree-sha1 = "f5dd036acee4462949cc10c55544cc2bee2545d6" +uuid = "731186ca-8d62-57ce-b412-fbd966d074cd" +version = "2.25.1" [[Reexport]] git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b" @@ -1024,13 +1039,19 @@ version = "0.3.0+0" [[Roots]] deps = ["CommonSolve", "Printf", "Setfield"] -git-tree-sha1 = "0abe7fc220977da88ad86d339335a4517944fea2" +git-tree-sha1 = "6085b8ac184add45b586ed8d74468310948dcfe8" uuid = "f2b01f46-fcfa-551c-844a-d8ac1e96c665" -version = "1.3.14" +version = "1.4.0" [[SHA]] uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce" +[[SciMLBase]] +deps = ["ArrayInterface", "CommonSolve", "ConstructionBase", "Distributed", "DocStringExtensions", "IteratorInterfaceExtensions", "LinearAlgebra", "Logging", "RecipesBase", "RecursiveArrayTools", "StaticArrays", "Statistics", "Tables", "TreeViews"] +git-tree-sha1 = "c086056df381502621dc6b5f1d1a0a1c2d0185e7" +uuid = "0bca4576-84f4-4d90-8ffe-ffa030f20462" +version = "1.28.0" + [[ScientificTypesBase]] git-tree-sha1 = "a8e18eb383b5ecf1b5e6fc237eb39255044fd92b" uuid = "30f210dd-8aff-4c5f-94ba-8e64358c1161" @@ -1082,9 +1103,9 @@ uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf" [[SpecialFunctions]] deps = ["ChainRulesCore", "IrrationalConstants", "LogExpFunctions", "OpenLibm_jll", "OpenSpecFun_jll"] -git-tree-sha1 = "2735e252e72ee0367ebdb10b6148343fd15c2481" +git-tree-sha1 = "5ba658aeecaaf96923dce0da9e703bd1fe7666f9" uuid = "276daf66-3868-5448-9aa4-cd146d93841b" -version = "1.8.3" +version = "2.1.4" [[SplittablesBase]] deps = ["Setfield", "Test"] @@ -1094,15 +1115,15 @@ version = "0.1.14" [[Static]] deps = ["IfElse"] -git-tree-sha1 = "00b725fffc9a7e9aac8850e4ed75b4c1acbe8cd2" +git-tree-sha1 = "87e9954dfa33fd145694e42337bdd3d5b07021a6" uuid = "aedffcd0-7271-4cad-89d0-dc628f76c6d3" -version = "0.5.5" +version = "0.6.0" [[StaticArrays]] deps = ["LinearAlgebra", "Random", "Statistics"] -git-tree-sha1 = "95c6a5d0e8c69555842fc4a927fc485040ccc31c" +git-tree-sha1 = "6976fab022fea2ffea3d945159317556e5dad87c" uuid = "90137ffa-7385-5640-81b9-e52037218182" -version = "1.3.5" +version = "1.4.2" [[StatisticalTraits]] deps = ["ScientificTypesBase"] @@ -1165,10 +1186,10 @@ uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c" version = "1.0.1" [[Tables]] -deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "TableTraits", "Test"] -git-tree-sha1 = "bb1064c9a84c52e277f1096cf41434b675cd368b" +deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "OrderedCollections", "TableTraits", "Test"] +git-tree-sha1 = "5ce79ce186cc678bbb5c5681ca3379d1ddae11a1" uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c" -version = "1.6.1" +version = "1.7.0" [[Tar]] deps = ["ArgTools", "SHA"] @@ -1186,21 +1207,27 @@ uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40" [[Tracker]] deps = ["Adapt", "DiffRules", "ForwardDiff", "LinearAlgebra", "LogExpFunctions", "MacroTools", "NNlib", "NaNMath", "Printf", "Random", "Requires", "SpecialFunctions", "Statistics"] -git-tree-sha1 = "434a953e6ad7abf6a07a1e0b99baaa704753cec0" +git-tree-sha1 = "0874c1b5de1b5529b776cfeca3ec0acfada97b1b" uuid = "9f7883ad-71c0-57eb-9f7f-b5c9e6d3789c" -version = "0.2.19" +version = "0.2.20" [[Transducers]] deps = ["Adapt", "ArgCheck", "BangBang", "Baselet", "CompositionsBase", "DefineSingletons", "Distributed", "InitialValues", "Logging", "Markdown", "MicroCollections", "Requires", "Setfield", "SplittablesBase", "Tables"] -git-tree-sha1 = "1cda71cc967e3ef78aa2593319f6c7379376f752" +git-tree-sha1 = "c76399a3bbe6f5a88faa33c8f8a65aa631d95013" uuid = "28d57a85-8fef-5791-bfe6-a80928e7c999" -version = "0.4.72" +version = "0.4.73" + +[[TreeViews]] +deps = ["Test"] +git-tree-sha1 = "8d0d7a3fe2f30d6a7f833a5f19f7c7a5b396eae6" +uuid = "a2a6695c-b41b-5b7d-aed9-dbfdeacea5d7" +version = "0.3.0" [[Turing]] -deps = ["AbstractMCMC", "AdvancedHMC", "AdvancedMH", "AdvancedPS", "AdvancedVI", "BangBang", "Bijectors", "DataStructures", "Distributions", "DistributionsAD", "DocStringExtensions", "DynamicPPL", "EllipticalSliceSampling", "ForwardDiff", "Libtask", "LinearAlgebra", "MCMCChains", "NamedArrays", "Printf", "Random", "Reexport", "Requires", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns", "Tracker", "ZygoteRules"] -git-tree-sha1 = "e22a11c2029137b35adf00a0e4842707c653938c" +deps = ["AbstractMCMC", "AdvancedHMC", "AdvancedMH", "AdvancedPS", "AdvancedVI", "BangBang", "Bijectors", "DataStructures", "Distributions", "DistributionsAD", "DocStringExtensions", "DynamicPPL", "EllipticalSliceSampling", "ForwardDiff", "Libtask", "LinearAlgebra", "MCMCChains", "NamedArrays", "Printf", "Random", "Reexport", "Requires", "SciMLBase", "SpecialFunctions", "Statistics", "StatsBase", "StatsFuns", "Tracker", "ZygoteRules"] +git-tree-sha1 = "ef0fdc72023c4480a9372f32db88cce68b186e8a" uuid = "fce5fe82-541a-59a6-adf8-730c64b5f9a0" -version = "0.18.0" +version = "0.21.1" [[URIs]] git-tree-sha1 = "97bbe755a53fe859669cd907f2d96aee8d2c1355" @@ -1238,9 +1265,9 @@ version = "1.19.0+0" [[Wayland_protocols_jll]] deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"] -git-tree-sha1 = "66d72dc6fcc86352f01676e8f0f698562e60510f" +git-tree-sha1 = "4528479aa01ee1b3b4cd0e6faef0e04cf16466da" uuid = "2381bf8a-dfd0-557d-9999-79630e7b1b91" -version = "1.23.0+0" +version = "1.25.0+0" [[Widgets]] deps = ["Colors", "Dates", "Observables", "OrderedCollections"] diff --git a/tutorials/01-gaussian-mixture-model/Project.toml b/tutorials/01-gaussian-mixture-model/Project.toml index ee936c397..85d2cfcb9 100644 --- a/tutorials/01-gaussian-mixture-model/Project.toml +++ b/tutorials/01-gaussian-mixture-model/Project.toml @@ -1,14 +1,13 @@ [deps] Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f" -MCMCChains = "c7f686f2-ff18-58e9-bc7b-31028e88f75d" -Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" +FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b" +LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c" StatsPlots = "f3b207a7-027a-5e70-b257-86293d7955fd" Turing = "fce5fe82-541a-59a6-adf8-730c64b5f9a0" [compat] Distributions = "0.25" -MCMCChains = "5" -Plots = "1" +FillArrays = "0.13" StatsPlots = "0.14" -Turing = "0.18" +Turing = "0.21" From 7d0aefc66d3f278b8efa01999b3c7f781255ab02 Mon Sep 17 00:00:00 2001 From: David Widmann Date: Wed, 23 Mar 2022 02:43:28 +0100 Subject: [PATCH 2/7] Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --- .../01_gaussian-mixture-model.jmd | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd index cf93e6607..14e87e800 100644 --- a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd +++ b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd @@ -11,8 +11,10 @@ The aim of this task is to infer a latent grouping (hidden structure) from unlab We generate a synthetic dataset of $N = 60$ two-dimensional points $x_i \in \mathbb{R}^2$ drawn from a Gaussian mixture model. For simplicity, we use $K = 2$ clusters with -- equal weights, i.e., we use mixture weights $w = [0.5, 0.5]$, and -- isotropic Gaussian distributions of the points in each cluster. + + - equal weights, i.e., we use mixture weights $w = [0.5, 0.5]$, and + - isotropic Gaussian distributions of the points in each cluster. + More concretely, we use the Gaussian distributions $\mathcal{N}([\mu_k, \mu_k]^\mathsf{T}, I)$ with parameters $\mu_1 = -3.5$ and $\mu_2 = 0.5$. ```julia @@ -98,7 +100,7 @@ end model = gaussian_mixture_model(x); ``` -We run a MCMC simulation to obtain an approximation of the posterior distribution of the parameters $\mu$ and $w$ and assignments $k$. +We run a MCMC simulation to obtain an approximation of the posterior distribution of the parameters $\mu$ and $w$ and assignments $k$. We use a `Gibbs` sampler that combines a [particle Gibbs](https://www.stats.ox.ac.uk/%7Edoucet/andrieu_doucet_holenstein_PMCMC.pdf) sampler for the discrete parameters (assignments $k$) and a Hamiltonion Monte Carlo sampler for the continous parameters ($\mu$ and $w$). We generate multiple chains in parallel using multi-threading. From 49dba1da5216ba5a81073c3f74b456e80a161c90 Mon Sep 17 00:00:00 2001 From: David Widmann Date: Wed, 23 Mar 2022 09:13:49 +0100 Subject: [PATCH 3/7] Some additional fixes --- .../01_gaussian-mixture-model.jmd | 39 ++++++++++--------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd index 14e87e800..39901455c 100644 --- a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd +++ b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd @@ -4,7 +4,7 @@ permalink: /:collection/:name/ redirect_from: tutorials/1-gaussianmixturemodel/ --- -The following tutorial illustrates the use *Turing* for clustering data using a Bayesian mixture model. +The following tutorial illustrates the use of Turing for clustering data using a Bayesian mixture model. The aim of this task is to infer a latent grouping (hidden structure) from unlabelled data. ## Synthetic data @@ -50,19 +50,20 @@ We are interested in recovering the grouping from the dataset. More precisely, we want to infer the mixture weights, the parameters $\mu_1$ and $\mu_2$, and the assignment of each datum to a cluster for the generative Gaussian mixture model. In a Bayesian Gaussian mixture model with $K$ components each data point $x_i$ ($i = 1,\ldots,N$) is generated according to the following generative process. -First we draw the parameters for each cluster, i.e., in our example we draw parameters $\mu_k$ for the mean of the isotropic normal distributions: -$$ -\mu_k \sim \mathcal{N}(0, 1) \qquad (k = 1,\ldots,K) -$$ -and then we draw mixture weights $w$ for the $K$ clusters from a Dirichlet distribution -$$ -w \sim \operatorname{Dirichlet}(K, \alpha). -$$ -After having constructed all the necessary model parameters, we can generate an observation by first selecting one of the clusters and then drawing the datum accordingly, i.e. -$$ -z_i \sim \operatorname{Categorical}(w) \qquad (i = 1,\ldots,N) \\ -x_i \sim \mathcal{N}(\mu_{z_i}, I) \qquad (i=1,\dlots,N). -$$ +First we draw the model parameters, i.e., in our example we draw parameters $\mu_k$ for the mean of the isotropic normal distributions and the mixture weights $w$ of the $K$ clusters. +We use standard normal distributions as priors for $\mu_k$ and a Dirichlet distribution with parameters $\alpha_1 = \cdots = \alpha_K = 1$ as prior for $w$: +\begin{align*} +\mu_k &\sim \mathcal{N}(0, 1) \qquad (k = 1,\ldots,K)\\ +w &\sim \operatorname{Dirichlet}(\alpha_1, \ldots, \alpha_K) +\end{align*} +After having constructed all the necessary model parameters, we can generate an observation by first selecting one of the clusters +\begin{equation*} +z_i \sim \operatorname{Categorical}(w) \qquad (i = 1,\ldots,N), +\end{equation*} +and then drawing the datum accordingly, i.e., in our example drawing +\begin{equation*} +x_i \sim \mathcal{N}([\mu_{z_i}, \mu_{z_i}] \mathsf{T}, I) \qquad (i=1,\ldots,N). +\end{equation*} For more details on Gaussian mixture models, we refer to Christopher M. Bishop, *Pattern Recognition and Machine Learning*, Section 9. We specify the model with Turing. @@ -75,7 +76,7 @@ using Turing K = 2 μ ~ MvNormal(Zeros(K), I) - # Draw the weights for the K clusters from a Dirichlet distribution. + # Draw the weights for the K clusters from a Dirichlet distribution with parameters αₖ = 1. w ~ Dirichlet(K, 1.0) # Alternatively, one could use a fixed set of weights. # w = fill(1/K, K) @@ -83,11 +84,11 @@ using Turing # Construct categorical distribution of assignments. distribution_assignments = Categorical(w) - # Construct multivariate normal distributions of each cluster + # Construct multivariate normal distributions of each cluster. D, N = size(x) distribution_clusters = [MvNormal(Fill(μₖ, D), I) for μₖ in μ] - # Draw assignments for each datum and generate it from a multivariate normal. + # Draw assignments for each datum and generate it from the multivariate normal distribution. k = Vector{Int}(undef, N) for i in 1:N k[i] ~ distribution_assignments @@ -134,7 +135,7 @@ plot(chains[["μ[1]", "μ[2]"]]; colordim=:parameter, legend=true) It can happen that the modes of $\mu_1$ and $\mu_2$ switch between chains. For more information see the [Stan documentation](https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html) for potential solutions. -We also inspect the samples of the mixture weights $w$ visually. +We also inspect the samples of the mixture weights $w$. ```julia plot(chains[["w[1]", "w[2]"]]; colordim=:parameter, legend=true) @@ -149,7 +150,7 @@ chain = chains[:, :, 1]; As the distributions of the samples for the parameters $\mu_1$, $\mu_2$, $w_1$, and $w_2$ are unimodal, we can safely visualize the density region of our model using the average values. ```julia -# Model with average parameters +# Model with mean of samples as parameters. μ_mean = [mean(chain, "μ[$i]") for i in 1:2] w_mean = [mean(chain, "w[$i]") for i in 1:2] mixturemodel_mean = MixtureModel([MvNormal(Fill(μₖ, 2), I) for μₖ in μ_mean], w_mean) From 5ba72657001d43faf70a0af05f669d21e4dab967 Mon Sep 17 00:00:00 2001 From: David Widmann Date: Wed, 23 Mar 2022 10:58:01 +0100 Subject: [PATCH 4/7] More fixes --- .../01_gaussian-mixture-model.jmd | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd index 39901455c..abd53d5d2 100644 --- a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd +++ b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd @@ -35,7 +35,7 @@ mixturemodel = MixtureModel([MvNormal(Fill(μₖ, 2), I) for μₖ in μ], w) # We draw the data points. N = 60 -x = rand(mixturemodel, N) +x = rand(mixturemodel, N); ``` The following plot shows the dataset. @@ -52,18 +52,20 @@ More precisely, we want to infer the mixture weights, the parameters $\mu_1$ and In a Bayesian Gaussian mixture model with $K$ components each data point $x_i$ ($i = 1,\ldots,N$) is generated according to the following generative process. First we draw the model parameters, i.e., in our example we draw parameters $\mu_k$ for the mean of the isotropic normal distributions and the mixture weights $w$ of the $K$ clusters. We use standard normal distributions as priors for $\mu_k$ and a Dirichlet distribution with parameters $\alpha_1 = \cdots = \alpha_K = 1$ as prior for $w$: -\begin{align*} +$$ +\begin{aligned} \mu_k &\sim \mathcal{N}(0, 1) \qquad (k = 1,\ldots,K)\\ w &\sim \operatorname{Dirichlet}(\alpha_1, \ldots, \alpha_K) -\end{align*} +\end{aligned} +$$ After having constructed all the necessary model parameters, we can generate an observation by first selecting one of the clusters -\begin{equation*} +$$ z_i \sim \operatorname{Categorical}(w) \qquad (i = 1,\ldots,N), -\end{equation*} +$$ and then drawing the datum accordingly, i.e., in our example drawing -\begin{equation*} +$$ x_i \sim \mathcal{N}([\mu_{z_i}, \mu_{z_i}] \mathsf{T}, I) \qquad (i=1,\ldots,N). -\end{equation*} +$$ For more details on Gaussian mixture models, we refer to Christopher M. Bishop, *Pattern Recognition and Machine Learning*, Section 9. We specify the model with Turing. From cc539efe729e4115c8f07b913047ab7127ba0865 Mon Sep 17 00:00:00 2001 From: David Widmann Date: Wed, 23 Mar 2022 11:02:52 +0100 Subject: [PATCH 5/7] Update tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd Co-authored-by: Rik Huijzer --- .../01-gaussian-mixture-model/01_gaussian-mixture-model.jmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd index abd53d5d2..722d4f309 100644 --- a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd +++ b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd @@ -109,7 +109,9 @@ We generate multiple chains in parallel using multi-threading. ```julia sampler = Gibbs(PG(100, :k), HMC(0.05, 10, :μ, :w)) -chains = sample(model, sampler, MCMCThreads(), 100, 3); +nsamples = 100 +nchains = 3 +chains = sample(model, sampler, MCMCThreads(), nsamples, nchains); ``` ```julia; echo=false; error=false From 13b72e22308392d97a78c25237d5a9f273514ea4 Mon Sep 17 00:00:00 2001 From: David Widmann Date: Wed, 23 Mar 2022 15:32:57 +0100 Subject: [PATCH 6/7] Some minor changes --- .../01-gaussian-mixture-model/01_gaussian-mixture-model.jmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd index 722d4f309..9fbe73a29 100644 --- a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd +++ b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd @@ -7,7 +7,7 @@ redirect_from: tutorials/1-gaussianmixturemodel/ The following tutorial illustrates the use of Turing for clustering data using a Bayesian mixture model. The aim of this task is to infer a latent grouping (hidden structure) from unlabelled data. -## Synthetic data +## Synthetic Data We generate a synthetic dataset of $N = 60$ two-dimensional points $x_i \in \mathbb{R}^2$ drawn from a Gaussian mixture model. For simplicity, we use $K = 2$ clusters with @@ -64,7 +64,7 @@ z_i \sim \operatorname{Categorical}(w) \qquad (i = 1,\ldots,N), $$ and then drawing the datum accordingly, i.e., in our example drawing $$ -x_i \sim \mathcal{N}([\mu_{z_i}, \mu_{z_i}] \mathsf{T}, I) \qquad (i=1,\ldots,N). +x_i \sim \mathcal{N}([\mu_{z_i}, \mu_{z_i}]^\mathsf{T}, I) \qquad (i=1,\ldots,N). $$ For more details on Gaussian mixture models, we refer to Christopher M. Bishop, *Pattern Recognition and Machine Learning*, Section 9. @@ -126,7 +126,7 @@ let end ``` -## Visualize the Density Region of the Mixture Model +## Inferred Mixture Model After sampling we can visualize the trace and density of the parameters of interest. From dbca176a1208564f2d09636155ca69ce9733c8a9 Mon Sep 17 00:00:00 2001 From: David Widmann Date: Wed, 23 Mar 2022 16:20:31 +0100 Subject: [PATCH 7/7] Another typo --- .../01-gaussian-mixture-model/01_gaussian-mixture-model.jmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd index 9fbe73a29..060fe2ab8 100644 --- a/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd +++ b/tutorials/01-gaussian-mixture-model/01_gaussian-mixture-model.jmd @@ -104,7 +104,7 @@ model = gaussian_mixture_model(x); ``` We run a MCMC simulation to obtain an approximation of the posterior distribution of the parameters $\mu$ and $w$ and assignments $k$. -We use a `Gibbs` sampler that combines a [particle Gibbs](https://www.stats.ox.ac.uk/%7Edoucet/andrieu_doucet_holenstein_PMCMC.pdf) sampler for the discrete parameters (assignments $k$) and a Hamiltonion Monte Carlo sampler for the continous parameters ($\mu$ and $w$). +We use a `Gibbs` sampler that combines a [particle Gibbs](https://www.stats.ox.ac.uk/%7Edoucet/andrieu_doucet_holenstein_PMCMC.pdf) sampler for the discrete parameters (assignments $k$) and a Hamiltonion Monte Carlo sampler for the continuous parameters ($\mu$ and $w$). We generate multiple chains in parallel using multi-threading. ```julia