-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inline a barebones version of logsumexp for improved performance #1745
Conversation
logsumexp accounts for 50% of the run time of ldamodel. Much of this time is spent by "robustness" checks performed by scipy's logsumexp (eg, _asarray_validated, checks for NaNs, etc.). Removing these checks greatly improves the overall performance of ldamodel. Eg, run time when fitting a lda model on the enron dataset (from UCI) decreases from 20-40%.
Hi @arlenk, can you attach files from cProfile before/after change + code that you used for profiling? |
gensim/models/ldamodel.py
Outdated
|
||
logger = logging.getLogger('gensim.models.ldamodel') | ||
|
||
|
||
def logsumexp(x): | ||
""" | ||
barebones log-sum-exp that tries to avoid overflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use numpy-style docstrings
can't seem to attach a py file, so here's a zip of the profiling code and the cProfile output. The speed up varies based on the number of passes and other parameters. On my computer, I see speedups of 10-30% on average. Curious if that's specific to my configuration or not. one other benefit is that this speeds up ldamulticore and ldaseqmodel as well. |
I see only 5% speedup (because this method doesn't consume 50% of the time, the slowest method is @arlenk congratz with the first PR! |
Rewriting the whole LDA inner loop in C (Cython) would be super cool. It should be also quite easy to do (unless I'm missing something). I expect the NumPy overhead is massive, will all the checks, dynamic allocations, Python overhead... |
I took a stab at moving more of the hotspots into cython in the branch below. From some profiling, most of the time spent is in the following few spots:
I broke these functions out into a cython module. For now, I just created a fastldamodel.py (a copy of ldamodel.py but with the cython functions above) module to see if there is any interest in pursuing this. Based on some profiling on my pc (attached), the speed up is about 100% (meaning the new code takes half the time -- though this is dependent on number of topics, number of passes, etc.). I also noticed that the current ldamodel code uses float32 datatypes, but the user can change this if desired. If we go down the cython route, we'd have to decide if we want to allow the user to change the datatype as this would require different cython functions for each data type (I believe). Alternatively, we could use the cython code when the datatype is float32 (and perhaps float64) and fall back to the slower versions otherwise. |
Awesome! Float32 is the default, that should be our focus (at least to start with). Double precision is not that important. |
@arlenk can you create PR with your cython optimization, it will be great! |
moved to a separate pull request: |
logsumexp accounts for 50% of the run time of ldamodel. Much
of this time is spent by "robustness" checks performed by
scipy's logsumexp (eg, _asarray_validated, checks for NaNs, etc.).
Removing these checks greatly improves the overall performance
of ldamodel. Eg, run time when fitting a lda model on the
enron dataset (from UCI) decreases from 20-40%.