-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word2Vec and Doc2Vec do not update word embeddings if negative
keyword is set to 0
#1983
Comments
Thanks for report @swierh 👍 CC: @gojomo @manneshiva this is expected behavior or how this should works? |
I agree it could be surprising, and there should be a warning or exception when this error is made. (The chief hint currently is the near-instantaneous training. You might get a similar fast-but-useless result if setting But note there's a level at which this behavior makes logical sense: with zero negative examples with which to do negative-sampling, and with hierarchical-softmax not enabled (left at its default |
@gojomo |
A warning or even |
The warning should also happen if this error is made with |
I was looking for full softmax training (by setting hs=0 and negative=0) in gensim's Word2Vec implementation and could not found any. Upon investigation, I discovered that gensim's Word2Vec does not support full softmax training. I believe the original Google C implementation also does not have full softmax training implementation. Since gensim only supports either hierarchical softmax or negative sampling training, I propose that there should be a check to not allow setting negative=0. Or even better, gensim should make sure negative > 0. If the maintainers are okay, I can submit a PR to raise ValueError when negative < 0. Also, perhaps it is good to update the documentation to note that full softmax training is not supported. |
A PR for better user messaging when nonsensical parameters are used would be appreciated! The existing However, common word2vec implementations (except in academic demos) often don't offer full softmax, as using the shortcuts of either negative-sampling, or hierarchical-softmax, were essential for word2vec to be practical with corpora/vocabularies of usual interest. The Google So I think any expectation that "no negative sampling" means softmax would be used instead involves unwarranted assumptions not based on Gensim docs or similar-library precedents. A few words to armor against that assumption could still be beneficial, if minimal in just the right place(s). But a good error message if user tries |
Thank you for the clarification and explanation, I understand it better now. I will submit a PR doing what you suggest. I was looking at Google C source code yesterday, and saw the logic is
So hierarchical softmax training and negative training are independent. Gensim also follows this logic, since gensim is a direct port I believe So apart from having Also, the docs says
I think this is confusing to say Correct me if I understand any part wrongly. All in all, I think the confusion comes from not-so-intuitive API from Google C source code. There should be 1 argument to specify loss function so that we won't get no loss function or multiple loss functions. |
Yes, Gensim's implementation started as a direct port from the Google C code, so inherited its peculiarities. In practice, anyone who has But, it's worked this way for so long we'd not want to break it unless as part of a more-general and advance-advised API cleanup. So for now, while an error-that-must-be-fixed is appropriate if both are (Similar confusion was seen in #2550 and #2844, and I thought we'd added some sort of warning for one or both of the confused cases, but maybe that was in some exploratory work never integrated.) |
Description
Setting the
negative
keyword to0
for Doc2Vec causes the training to not update word embeddings after the random initialisation.This happens silently and is behavior I wasn't expecting.
Steps/Code/Corpus to Reproduce
Results
As can be seen below, the results for the models that have a
negative=0
show the same results after 1 or 2 epochs of training, where the models withnegative=1
show different (and somewhat more sensible) results.Doc2Vec:
Word2Vec:
Logs during training
Doc2Vec
model1a:
model1b:
model2a:
model2b:
Word2Vec
model1a:
model1b:
model2a:
model2b:
Versions
The text was updated successfully, but these errors were encountered: