-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perhaps there is a misprint at line 40 #2111
Labels
Comments
@weiguo-li Can you please add link of the file here? |
\assigntome |
frasertajima
added a commit
to frasertajima/tutorials
that referenced
this issue
May 31, 2023
fix to "perhaps there is a misprint at line 40 pytorch#2111"; review of referenced paper https://arxiv.org/pdf/1706.03762.pdf section 3.2.3 suggests: "Similarly, self-attention layers in the decoder allow each position in the decoder to attend to all positions in the decoder up to and including that position. We need to prevent leftward information flow in the decoder to preserve the auto-regressive property. We implement this inside of scaled dot-product attention by masking out (setting to −∞) all values in the input of the softmax which correspond to illegal connections. See Figure 2." Thus the suggested change in reference from nn.Transform.Encoder to nn.Transform.Decoder seems reasonable.
svekars
pushed a commit
that referenced
this issue
May 31, 2023
Fix to "perhaps there is a misprint at line 40 #2111"; review of referenced paper https://arxiv.org/pdf/1706.03762.pdf section 3.2.3 suggests: "Similarly, self-attention layers in the decoder allow each position in the decoder to attend to all positions in the decoder up to and including that position. We need to prevent leftward information flow in the decoder to preserve the auto-regressive property. We implement this inside of scaled dot-product attention by masking out (setting to −∞) all values in the input of the softmax which correspond to illegal connections. See Figure 2." Thus the suggested change in reference from nn.Transform.Encoder to nn.Transform.Decoder seems reasonable.
Closing as fixed by #2363 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
instead of # self-attention layers in
nn.TransformerEncoder
are only allowed to attend,self-attention layers in
nn.TransformerDecoder
are only allowed to attend.....Decoder rather than Encoder
cc @svekars @carljparker
The text was updated successfully, but these errors were encountered: