Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perhaps there is a misprint at line 40 #2111

Closed
weiguo-li opened this issue Nov 4, 2022 · 3 comments · Fixed by #2363
Closed

perhaps there is a misprint at line 40 #2111

weiguo-li opened this issue Nov 4, 2022 · 3 comments · Fixed by #2363
Labels
docathon-h1-2023 A label for the docathon in H1 2023 easy grammar

Comments

@weiguo-li
Copy link

weiguo-li commented Nov 4, 2022

instead of # self-attention layers in nn.TransformerEncoder are only allowed to attend,

self-attention layers in nn.TransformerDecoder are only allowed to attend.....

Decoder rather than Encoder

cc @svekars @carljparker

@svekars svekars added the grammar label Dec 5, 2022
@BaliDataMan
Copy link

@weiguo-li Can you please add link of the file here?

@svekars svekars added easy docathon-h1-2023 A label for the docathon in H1 2023 labels May 31, 2023
@frasertajima
Copy link
Contributor

\assigntome

frasertajima added a commit to frasertajima/tutorials that referenced this issue May 31, 2023
fix to "perhaps there is a misprint at line 40 pytorch#2111";

review of referenced paper https://arxiv.org/pdf/1706.03762.pdf section 3.2.3 suggests:
"Similarly, self-attention layers in the decoder allow each position in the decoder to attend to
all positions in the decoder up to and including that position. We need to prevent leftward
information flow in the decoder to preserve the auto-regressive property. We implement this
inside of scaled dot-product attention by masking out (setting to −∞) all values in the input
of the softmax which correspond to illegal connections. See Figure 2."

Thus the suggested change in reference from nn.Transform.Encoder to nn.Transform.Decoder seems reasonable.
svekars pushed a commit that referenced this issue May 31, 2023
Fix to "perhaps there is a misprint at line 40 #2111";

review of referenced paper https://arxiv.org/pdf/1706.03762.pdf section 3.2.3 suggests:
"Similarly, self-attention layers in the decoder allow each position in the decoder to attend to
all positions in the decoder up to and including that position. We need to prevent leftward
information flow in the decoder to preserve the auto-regressive property. We implement this
inside of scaled dot-product attention by masking out (setting to −∞) all values in the input
of the softmax which correspond to illegal connections. See Figure 2."
Thus the suggested change in reference from nn.Transform.Encoder to nn.Transform.Decoder seems reasonable.
@frasertajima frasertajima mentioned this issue Jun 1, 2023
4 tasks
@kit1980
Copy link
Member

kit1980 commented Jun 2, 2023

Closing as fixed by #2363

@kit1980 kit1980 closed this as completed Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docathon-h1-2023 A label for the docathon in H1 2023 easy grammar
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants