Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove UnivariateGMM #844

Open
simonbyrne opened this issue Mar 19, 2019 · 2 comments
Open

Remove UnivariateGMM #844

simonbyrne opened this issue Mar 19, 2019 · 2 comments

Comments

@simonbyrne
Copy link
Member

I think it is completely unnecessary. It could just be implemented as an alias of MixtureModel.

If the aim is to support storing it as vectors of means and stds, it would be better doing it via StructArrays.jl.

@simonbyrne
Copy link
Member Author

(also, I apologise for opening this issue the day #615 was merged)

@luiarthur
Copy link

luiarthur commented Aug 14, 2020

Hi,

Not sure if this is enough motivation to keep UnivariateGMM, but here's a little benchmark that seems to show that logpdf on UnivariateGMM is much more efficient than logpdf on MixtureModel of Normals.

using Distributions
using StatsFuns
using BenchmarkTools
using Random

# Function to simulate some data for GMM logpdf.
function gendata(nobs, nmix)
  x = randn(nobs)
  mu = collect(range(-3, 3, length=nmix))
  sig = rand(nmix)
  w = let
    _w = rand(nmix)
    _w / sum(_w)
  end
  return mu, sig, w, x
end

# Shorthand for GMM logpdf
gmm_lpdf(mu, sig, w, x; dims) = sum(logsumexp(normlogpdf.(mu, sig, x) .+ log.(w), dims=dims))

# Generate data.
Random.seed!(0);
mu, sig, w, x  = gendata(100, 5)

### Benchmark ###

@btime gmm_lpdf(mu', sig', w',  x[:, :], dims=2)  # V1: 15.5 μs

@btime sum(logsumexp(logpdf.(Normal.(mu', sig'), x[:,:]) .+ log.(w'), dims=2))  # V2: 20.6μs

@btime sum(logpdf.(UnivariateGMM(mu, sig, Categorical(w)), x))  # V3: 20.9 μs

@btime sum(logpdf.(MixtureModel(Normal.(mu, sig), w), x))  # V4: 317.2 μs

I've timed 4 things that do the same thing here -- compute the sum of the log density of a location-scale mixture of Normals evaluated at a vector of 100 univariate values.

The UnivariateGMM version (V3) is over 10x faster than the MixtureModel one (V4) in this example.

These are all vectorized, so there's definitely a way to optimize further here. But I just want to illustrate the utility of having UnivariateGMM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants