Skip to content

4.1.1

Compare
Choose a tag to compare
@mpenkov mpenkov released this 14 Sep 13:49
· 272 commits to develop since this release

4.1.1, 2021-09-14

This is a bugfix release that addresses compatibility issues with older versions of numpy.

4.1.0, 2021-08-15

Gensim 4.1 brings two major new functionalities:

There are several minor changes that are not backwards compatible with previous versions of Gensim.
The affected functionality is relatively less used, so it is unlikely to affect most users, so we have opted to not require a major version bump.
Nevertheless, we describe them below.

Improved parameter edge-case handling in KeyedVectors most_similar and most_similar_cosmul methods

We now handle both positive and negative keyword parameters consistently.
They may now be either:

  1. A string, in which case the value is reinterpreted as a list of one element (the string value)
  2. A vector, in which case the value is reinterpreted as a list of one element (the vector)
  3. A list of strings
  4. A list of vectors

So you can now simply do:

    model.most_similar(positive='war', negative='peace')

instead of the slightly more involved

model.most_similar(positive=['war'], negative=['peace'])

Both invocations remain correct, so you can use whichever is most convenient.
If you were somehow expecting gensim to interpret the strings as a list of characters, e.g.

model.most_similar(positive=['w', 'a', 'r'], negative=['p', 'e', 'a', 'c', 'e'])

then you will need to specify the lists explicitly in gensim 4.1.

Deprecated obsolete step parameter from doc2vec

With the newer version, do this:

model.infer_vector(..., epochs=123)

instead of this:

model.infer_vector(..., steps=123)

Plus a large number of smaller improvements and fixes, as usual.

⚠️ If migrating from old Gensim 3.x, read the Migration guide first.

👍 New features

  • #3169: Implement shrink_windows argument for Word2Vec, by @M-Demay
  • #3163: Optimize word mover distance (WMD) computation, by @flowlight0
  • #3157: New KeyedVectors.vectors_for_all method for vectorizing all words in a dictionary, by @Witiko
  • #3153: Vectorize word2vec.predict_output_word for speed, by @M-Demay
  • #3146: Use FastSS for fast kNN over Levenshtein distance, by @Witiko
  • #3128: Materialize and copy the corpus passed to SoftCosineSimilarity, by @Witiko
  • #3115: Make LSI dispatcher CLI param for number of jobs optional, by @robguinness
  • #3091: LsiModel: Only log top words that actually exist in the dictionary, by @kmurphy4
  • #2980: Added EnsembleLda for stable LDA topics, by @sezanzeb
  • #2978: Optimize performance of Author-Topic model, by @horpto
  • #3000: Tidy up KeyedVectors.most_similar() API, by @simonwiles

📚 Tutorials and docs

🔴 Bug fixes

  • #3178: Fix Unicode string incompatibility in gensim.similarities.fastss.editdist, by @Witiko
  • #3174: Fix loading Phraser models stored in Gensim 3.x into Gensim 4.0, by @emgucv
  • #3136: Fix indexing error in word2vec_inner.pyx, by @bluekura
  • #3131: Add missing import to NMF docs and models/init.py, by @properGrammar
  • #3116: Fix bug where saved Phrases model did not load its connector_words, by @aloknayak29
  • #2830: Fixed KeyError in coherence model, by @pietrotrope

⚠️ Removed functionality & deprecations

  • #3176: Eliminate obsolete step parameter from doc2vec infer_vector and similarity_unseen_docs, by @rock420
  • #2965: Remove strip_punctuation2 alias of strip_punctuation, by @sciatro
  • #3180: Move preprocessing functions from gensim.corpora.textcorpus and gensim.corpora.lowcorpus to gensim.parsing.preprocessing, by @rock420

🔮 Testing, CI, housekeeping

  • #3156: Update Numpy minimum version to 1.17.0, by @PrimozGodec
  • #3143: replace _mul function with explicit casts, by @mpenkov
  • #2952: Allow newer versions of the Morfessor module for the tests, by @pabs3
  • #2965: Remove strip_punctuation2 alias of strip_punctuation, by @sciatro