Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix method
estimate_memory
from gensim.models.FastText
& huge per…
…formance improvement. Fix #1824 (#1916) * Cythonize fasttext.ft_hash for 100x performance improvement * Cythonize fasttext.compute_ngrams for 2x performance improvement * Reduce fasttext memory usage by computing ngrams on the fly * Fix compute_ngrams for Python 2 * Store OOV vec in variable for more informative assertion error in testPersistenceForOldVersions * Revert all changes to fasttext_wrapper * Fix indentation for multi-line expressions * Rename utils_any2vec_fast to _utils_any2vec * fasttext: Cache ngram buckets for words during training This removes the expensive calls to `compute_ngrams` and `ft_hash` during training and uses a simple lookup in an int -> int[] mapping instead, resulting in a dramatic increase in training performance. * Remove last occurences of wv.ngrams_word and wv.ngrams * fasttext: use buckets_word cache also for non-Cython training * fasttext: Add buckets_ngram size to memory estimate * fasttext: Don't store buckets_word with the model * fasttext: Use smaller model for test_estimate_memory * fasttext: Fix pure python training code * fasttext: Fix asserts for test_estimate_memory * fasttext: Fix typo and style errors * fasttext: Simplify code as per @jayantj's review * Update MANIFEST.in and documentation with utils_any2vec implementations * last fixes (add option for cython compiler, fix descriptions, etc)
- Loading branch information