Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WMD Earth mover Distance for sentences that contain a single word return inf #979

Closed
rbahumi opened this issue Oct 26, 2016 · 4 comments
Closed

Comments

@rbahumi
Copy link
Contributor

rbahumi commented Oct 26, 2016

`import gensim
from reuse import load_model

if name == "main":
print "gensim version: %s" % gensim.version

path = "models_no_short_text_20.10"
model = load_model(path)

sentence_obama = 'Obama speaks to the media in Illinois'.lower().split()
sentence_president = 'The president greets the press in Chicago'.lower().split()
print model.wmdistance(sentence_president, sentence_obama)

obama = "Obama".lower().split()
print 'Result for sentence "%s": %s' % (" ".join(obama), model.wmdistance(obama, obama))


obama = "Obama".lower().split() * 5
print 'Result for sentence "%s": %s' % (" ".join(obama), model.wmdistance(obama, obama))`

Will ##output:

gensim version: 0.13.2
3.69295766061
Result for sentence "obama": inf
Result for sentence "obama obama obama obama obama": inf

@rbahumi
Copy link
Contributor Author

rbahumi commented Oct 26, 2016

This is caused by the np_sum(distance_matrix) is equal to 0 when the distance matrix is a 1*1 matrix

https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/word2vec.py#L1415

@tmylk
Copy link
Contributor

tmylk commented Oct 28, 2016

Thanks for highlighting this. What would be a better value to return?

@rbahumi
Copy link
Contributor Author

rbahumi commented Oct 31, 2016

@tmylk, thanks for your replay. I suggest it should return float 0.0. If you agree, I have attached a pull request that deals with it. Please take a look.

tmylk pushed a commit that referenced this issue Oct 31, 2016
Issue: WMD Earth mover Distance for sentences that contain a single word return inf
Fix: Return float 0.0 when the dictionary size is 1 (both sentences are composed with the same token)
@tmylk
Copy link
Contributor

tmylk commented Oct 31, 2016

Fixed in #986

@tmylk tmylk closed this as completed Oct 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants