You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not sure if this is enough motivation to keep UnivariateGMM, but here's a little benchmark that seems to show that logpdf on UnivariateGMM is much more efficient than logpdf on MixtureModel of Normals.
using Distributions
using StatsFuns
using BenchmarkTools
using Random
# Function to simulate some data for GMM logpdf.functiongendata(nobs, nmix)
x =randn(nobs)
mu =collect(range(-3, 3, length=nmix))
sig =rand(nmix)
w =let
_w =rand(nmix)
_w /sum(_w)
endreturn mu, sig, w, x
end# Shorthand for GMM logpdfgmm_lpdf(mu, sig, w, x; dims) =sum(logsumexp(normlogpdf.(mu, sig, x) .+log.(w), dims=dims))
# Generate data.
Random.seed!(0);
mu, sig, w, x =gendata(100, 5)
### Benchmark ###@btimegmm_lpdf(mu', sig', w', x[:, :], dims=2) # V1: 15.5 μs@btimesum(logsumexp(logpdf.(Normal.(mu', sig'), x[:,:]) .+log.(w'), dims=2)) # V2: 20.6μs@btimesum(logpdf.(UnivariateGMM(mu, sig, Categorical(w)), x)) # V3: 20.9 μs@btimesum(logpdf.(MixtureModel(Normal.(mu, sig), w), x)) # V4: 317.2 μs
I've timed 4 things that do the same thing here -- compute the sum of the log density of a location-scale mixture of Normals evaluated at a vector of 100 univariate values.
The UnivariateGMM version (V3) is over 10x faster than the MixtureModel one (V4) in this example.
These are all vectorized, so there's definitely a way to optimize further here. But I just want to illustrate the utility of having UnivariateGMM.
I think it is completely unnecessary. It could just be implemented as an alias of
MixtureModel
.If the aim is to support storing it as vectors of means and stds, it would be better doing it via StructArrays.jl.
The text was updated successfully, but these errors were encountered: