Skip to content

Commit

Permalink
Add custom logsumexp for LdaModel optimization (#1745)
Browse files Browse the repository at this point in the history
* inline a barebones version of logsumexp for improved performance

logsumexp accounts for 50% of the run time of ldamodel.  Much
of this time is spent by "robustness" checks performed by
scipy's logsumexp (eg, _asarray_validated, checks for NaNs, etc.).

Removing these checks greatly improves the overall performance
of ldamodel.  Eg, run time when fitting a lda model on the
enron dataset (from UCI) decreases from 20-40%.

* converting to numpy style docstring

* remove trailing whitespaces
  • Loading branch information
arlenk authored and menshikh-iv committed Nov 30, 2017
1 parent a6605cb commit 08cfa93
Showing 1 changed file with 27 additions and 6 deletions.
33 changes: 27 additions & 6 deletions gensim/models/ldamodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,37 @@
from gensim.models import basemodel, CoherenceModel
from gensim.models.callbacks import Callback

# log(sum(exp(x))) that tries to avoid overflow
try:
from scipy.special import logsumexp
except ImportError:
from scipy.misc import logsumexp


logger = logging.getLogger('gensim.models.ldamodel')


def logsumexp(x):
"""Log of sum of exponentials
Parameters
----------
x : array_like
Input data
Returns
-------
float
log of sum of exponentials of elements in `x`
Notes
-----
for performance, does not support NaNs or > 1d arrays like
scipy.special.logsumexp()
"""

x_max = np.max(x)
x = np.log(np.sum(np.exp(x - x_max)))
x += x_max

return x


def update_dir_prior(prior, N, logphat, rho):
"""
Updates a given prior using Newton's method, described in
Expand Down

0 comments on commit 08cfa93

Please sign in to comment.