accuracy function for analogy test error #2455

prokok · 2019-04-20T08:15:14Z

Problem description

Analogy test for accuracy function does not work properly because of the typo (topn = False) on line 1187 in keyedvectors.py script.
(https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py)

Solution

There are 2 solutions regarding this matter: changing the typo in accuracy function or changing the most_smiliar function

Change accuracy function in gensim/models/keyedvectors.py script on line 1187
(https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py)

sims = most_similar(self, positive=[b, c], negative=[a], topn=False, restrict_vocab=restrict_vocab)
==>
sims = most_similar(self, positive=[b, c], negative=[a], topn=None, restrict_vocab=restrict_vocab)

Change the most_similar function in gensim/models/keyedvectors.py script on line 514,556
(https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py)

if topn is not None and topn < 1: ==> if not topn and topn<1:

AND

if topn is None: ==> if not topn:

Versions

Darwin-18.5.0-x86_64-i386-64bit
Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.16.2
SciPy 1.2.1
gensim 3.7.2
FAST_VERSION 0

piskvorky · 2019-04-20T11:29:29Z

Thanks for reporting @vlfgns5.

The problem seems to have been introduced by #2356. @Witiko can you have a look, review the logic? Are there other places similarly affected by that change?

Witiko · 2019-04-23T11:36:38Z

@piskvorky You are correct. Whereas previously topn was only cast to a boolean, #2356 gave topn=0 different semantics (return an empty iterable) than topn=None (return all similarities, undocumented behavior). The PR missed callers that would use topn=False, my apologies.

According to a quick grep, there are no callers other than WordEmbeddingsKeyedVectors.accuracy that use topn=False. As @vlfgns5 suggests, we can either fix the caller by changing topn=False to topn=None, or we can make most_similar accept both False and None. Since #2356 may have broken some user code that depended on topn=False, I think changing both the caller (to enforce topn=None as the canonical usage) and most_similar (to support legacy user code) is the best option.

Note that due to changes in Python 3, @vlfgns5's fix (if not topn and topn < 1:) raises a type error when topn is None, since NoneType and int are mutually unorderable. I suggest the following fix instead: if isinstance(topn, int) and topn < 1:. A less explicit condition would be ideal.

Witiko · 2019-04-23T12:18:27Z

Digging deeper into this, properly supporting most_similar(…, topn=None) needs more work. When no indexer is provided to most_similar, then topn=None (and topn=False) return the expected result. However, when an indexer (such as Annoyindexer) is provided, then the call fails, because Annoy only supports integer topn. Therefore, we need an additional fix that disables the indexer when topn is None:

< if indexer is not None:
> if indexer is not None and isinstance(topn, int):
      return indexer.most_similar(mean, topn)

This is a separate issue that has not yet been reported, but it should perhaps be fixed together with this, see PR #2461.

prokok changed the title ~~accuracy function for analogy test gives back empty lists as result~~ accuracy function for analogy test gives back empty lists. Apr 20, 2019

prokok changed the title ~~accuracy function for analogy test gives back empty lists.~~ accuracy function for analogy test error Apr 20, 2019

Witiko mentioned this issue Apr 23, 2019

Fix WordEmbeddingsKeyedVectors.most_similar #2461

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accuracy function for analogy test error #2455

accuracy function for analogy test error #2455

prokok commented Apr 20, 2019 •

edited

Loading

piskvorky commented Apr 20, 2019 •

edited

Loading

Witiko commented Apr 23, 2019 •

edited

Loading

Witiko commented Apr 23, 2019 •

edited

Loading

accuracy function for analogy test error #2455

accuracy function for analogy test error #2455

Comments

prokok commented Apr 20, 2019 • edited Loading

Problem description

Solution

Versions

piskvorky commented Apr 20, 2019 • edited Loading

Witiko commented Apr 23, 2019 • edited Loading

Witiko commented Apr 23, 2019 • edited Loading

prokok commented Apr 20, 2019 •

edited

Loading

piskvorky commented Apr 20, 2019 •

edited

Loading

Witiko commented Apr 23, 2019 •

edited

Loading

Witiko commented Apr 23, 2019 •

edited

Loading