-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange result when retrieving a not existing word embedding for a document #520
Comments
This is caused by an unfortunate interaction between out internal method which converts a string-tag to an int-index (which returns None for not-present) and numpy array indexing, which returns that nested 1xCOUNTxSIZE result when passed None. Was a KeyError what you'd expected? Until that's fixed, you can check whether a tag is in the trained set with |
Ok, I'll check whether a tag is in the trained set using your suggestion for now. Thank you for your help. |
@gojomo should this be closed as a workaround exists? |
@tmylk no, there should really be a KeyError (rather than a giant nested array) when the requested key isn't present. I'll prep a fix before next week. |
fix for #520: raise KeyError when no matching doctag
Fixed by #582. |
Hello,
after creating a doc2vec model using the code reported in the tutorial with python 3.5 and gensim 0.12.3, I have wrongly tried to retrieve a missing document, but I have received a strange numpy.ndarray as a result whose shape is (1, 1235, 300), where 1235 is the number of documents and 300 is the size of embeddings. Why does it happen? Shouldn't be raised an exception? Eventually, how can we check if a document is missing or not?
Thank you in advance,
Claudio
The text was updated successfully, but these errors were encountered: