-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransformerEncoder is not causal #2478
Comments
Yes, this has to be a bug. I tried using the resulting model to do some basic predictions and it was never able to give a prediction for the next word that made any sense, despite reaching a perplexity of 2.9 on the validation and test data. Without a mask, every word except the last (the one that isn't present in the input) should be easy for the model to predict flawlessly. |
You can play with it here (a copy of the notebook with a cell added at the end for making predictions) |
At this point I noticed it is totally broken and found pytorch/tutorials#2478.
/assigntome |
/assigntome |
/assigntome |
@bigheary Can you tell me more specifically, why you think src_mask should be mask future words (tgt_mask) ?? |
The text of the tutorial says:
|
The purpose of the mask is to prevent the model from seeing the token it's supposed to be predicting. Without a mask, the task is trivial and the model doesn't learn. Other tutorials also include this mask. Andrej Karpathy's video, for example, lingers for 15 minutes on the subject, starting here, and then later comes back and discusses it again, saying "you have to mask ... so that nodes from the future never talk with the past, because they would give away the answer". (He points out that non-predictive applications don't need the mask, but the tutorial at issue here begins "This is a tutorial on training a model to predict the next word[...].") |
/assigntome |
Add Link
https://pytorch.org/tutorials/beginner/transformer_tutorial.html
for language modeling, src_mask should be mask future words
Describe the bug
is there anything wrong?
Describe your environment
colab
cc @pytorch/team-text-core @Nayef211 @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen
The text was updated successfully, but these errors were encountered: