Strange result when retrieving a not existing word embedding for a document #520

claudiogreco · 2015-11-10T12:32:50Z

Hello,

after creating a doc2vec model using the code reported in the tutorial with python 3.5 and gensim 0.12.3, I have wrongly tried to retrieve a missing document, but I have received a strange numpy.ndarray as a result whose shape is (1, 1235, 300), where 1235 is the number of documents and 300 is the size of embeddings. Why does it happen? Shouldn't be raised an exception? Eventually, how can we check if a document is missing or not?

Thank you in advance,
Claudio

gojomo · 2015-11-10T21:16:50Z

This is caused by an unfortunate interaction between out internal method which converts a string-tag to an int-index (which returns None for not-present) and numpy array indexing, which returns that nested 1xCOUNTxSIZE result when passed None.

Was a KeyError what you'd expected?

Until that's fixed, you can check whether a tag is in the trained set with key in model.docvecs.

claudiogreco · 2015-11-10T22:49:16Z

Ok, I'll check whether a tag is in the trained set using your suggestion for now. Thank you for your help.

tmylk · 2016-01-09T20:34:22Z

@gojomo should this be closed as a workaround exists?

gojomo · 2016-01-10T09:49:12Z

@tmylk no, there should really be a KeyError (rather than a giant nested array) when the requested key isn't present. I'll prep a fix before next week.

fix for #520: raise KeyError when no matching doctag

gojomo · 2016-01-16T03:28:40Z

Fixed by #582.

gojomo self-assigned this Nov 10, 2015

gojomo added a commit that referenced this issue Jan 12, 2016

fix for #520: raise KeyError when no matching doctag

2e53063

gojomo added a commit that referenced this issue Jan 12, 2016

note for #520

812b75c

gojomo added a commit that referenced this issue Jan 16, 2016

Merge pull request #582 from piskvorky/docvecs_keyerror

1ab5df2

fix for #520: raise KeyError when no matching doctag

gojomo closed this as completed Jan 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange result when retrieving a not existing word embedding for a document #520

Strange result when retrieving a not existing word embedding for a document #520

claudiogreco commented Nov 10, 2015

gojomo commented Nov 10, 2015

claudiogreco commented Nov 10, 2015

tmylk commented Jan 9, 2016

gojomo commented Jan 10, 2016

gojomo commented Jan 16, 2016

Strange result when retrieving a not existing word embedding for a document #520

Strange result when retrieving a not existing word embedding for a document #520

Comments

claudiogreco commented Nov 10, 2015

gojomo commented Nov 10, 2015

claudiogreco commented Nov 10, 2015

tmylk commented Jan 9, 2016

gojomo commented Jan 10, 2016

gojomo commented Jan 16, 2016