Add custom logsumexp for LdaModel optimization (#1745)

* inline a barebones version of logsumexp for improved performance logsumexp accounts for 50% of the run time of ldamodel. Much of this time is spent by "robustness" checks performed by scipy's logsumexp (eg, _asarray_validated, checks for NaNs, etc.). Removing these checks greatly improves the overall performance of ldamodel. Eg, run time when fitting a lda model on the enron dataset (from UCI) decreases from 20-40%. * converting to numpy style docstring * remove trailing whitespaces
piskvorky · Nov 30, 2017 · 08cfa93 · 08cfa93
1 parent a6605cb
commit 08cfa93
Showing 1 changed file with 27 additions and 6 deletions.
diff --git a/gensim/models/ldamodel.py b/gensim/models/ldamodel.py
@@ -47,16 +47,37 @@
 from gensim.models import basemodel, CoherenceModel
 from gensim.models.callbacks import Callback
 
-# log(sum(exp(x))) that tries to avoid overflow
-try:
-    from scipy.special import logsumexp
-except ImportError:
-    from scipy.misc import logsumexp
-
 
 logger = logging.getLogger('gensim.models.ldamodel')
 
 
+def logsumexp(x):
+    """Log of sum of exponentials
+
+    Parameters
+    ----------
+    x : array_like
+        Input data
+
+    Returns
+    -------
+    float
+        log of sum of exponentials of elements in `x`
+
+    Notes
+    -----
+        for performance, does not support NaNs or > 1d arrays like
+        scipy.special.logsumexp()
+
+    """
+
+    x_max = np.max(x)
+    x = np.log(np.sum(np.exp(x - x_max)))
+    x += x_max
+
+    return x
+
+
 def update_dir_prior(prior, N, logphat, rho):
     """
     Updates a given prior using Newton's method, described in