-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastText with hs=1 and negative>0 #2550
Comments
I don't read that documentation text as necessarily implying that Historically, it just so happens that the original |
Thanks @gojomo for clarifying the historical reasons behind it. I'll again try to follow the flow in the source, but it's quite confusing to do so for this specific case. |
Essentially, both hierarchical-softmax and negative-sampling are different ways to interpret the output-layer of the neural-network, then assess the errors for back-propagation (through the "hidden layer" to the input word-vectors). If both are enabled, two sets of internal NN hidden-to-output weights are allocated. (Historically in the word2vec.c source & earlier gensim versions, these were called When doing the main, example by example training loop, each type of training was considered in turn, such that it was like having two models, but with a shared set of input-word-vectors. So one That could look like a benefit, if you weren't counting run-time. "Wow, 5 epochs with both enabled is better than either alone!" Perhaps, but probably not as good as giving either mode more total run-time, for example extra epochs. |
Thanks a lot @gojomo, very clear now. I'll close this issue. |
I get a bug if use hs=1 with negative=0 for updating vocab:
gives:
Maybe I open an issue. Do you get the same if you want to continue training with above hyperparameters? |
If you have a minimal example to reproduce this error, you should file a new issue with the complete set of steps. |
The docs says:
So I would expect that if
hs=1
, the model will use hierarchical softmax and the value ofnegative
is irrelevant, right?This doesn't seem to be the case: if I run two perfectly deterministic (i.e.,
worker=1
, fixedseed
andPYTHONHASHSEED
set) runs on the same input with:hs=1 negative=0
, andhs=1 negative=5
resulting word vectors have different values.
How can
hs
andnegative
coexist? I've looked at the code but I couldn't find any place implementing the "exclusive" logic implied by the documentation above.The text was updated successfully, but these errors were encountered: