TransformerEncoder is not causal #2478

bigheary · 2023-06-18T15:26:46Z

Add Link

https://pytorch.org/tutorials/beginner/transformer_tutorial.html

for language modeling， src_mask should be mask future words

Describe the bug

is there anything wrong?

Describe your environment

colab

cc @pytorch/team-text-core @Nayef211 @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

jorendorff · 2023-06-25T20:55:50Z

Yes, this has to be a bug. I tried using the resulting model to do some basic predictions and it was never able to give a prediction for the next word that made any sense, despite reaching a perplexity of 2.9 on the validation and test data. Without a mask, every word except the last (the one that isn't present in the input) should be easy for the model to predict flawlessly.

jorendorff · 2023-06-26T02:32:57Z

You can play with it here (a copy of the notebook with a cell added at the end for making predictions)

At this point I noticed it is totally broken and found pytorch/tutorials#2478.

xanderex-sid · 2023-11-01T17:46:21Z

/assigntome

nvs-abhilash · 2023-11-02T20:54:03Z

/assigntome

xanderex-sid · 2023-11-03T20:31:43Z

/assigntome

xanderex-sid · 2023-11-05T04:24:33Z

@bigheary Can you tell me more specifically, why you think src_mask should be mask future words (tgt_mask) ??

jorendorff · 2023-11-06T15:35:17Z

The text of the tutorial says:

Along with the input sequence, a square attention mask is required because the self-attention layers in nn.TransformerDecoder are only allowed to attend the earlier positions in the sequence. For the language modeling task, any tokens on the future positions should be masked.

jorendorff · 2023-11-06T16:34:10Z

The purpose of the mask is to prevent the model from seeing the token it's supposed to be predicting. Without a mask, the task is trivial and the model doesn't learn.

Other tutorials also include this mask. Andrej Karpathy's video, for example, lingers for 15 minutes on the subject, starting here, and then later comes back and discusses it again, saying "you have to mask ... so that nodes from the future never talk with the past, because they would give away the answer". (He points out that non-predictive applications don't need the mask, but the tutorial at issue here begins "This is a tutorial on training a model to predict the next word[...].")

ahoblitz · 2023-11-08T00:32:08Z

/assigntome

ahoblitz · 2023-11-08T00:47:31Z

Added a generate_square_subsequent_mask with 30 epochs to get the following based on some of the discussion here

Let me know if this follows the spirit of the conversation or not

bigheary added the bug label Jun 18, 2023

jorendorff added a commit to jorendorff/machine-learning that referenced this issue Jun 27, 2023

final state of tutorial code

e95fe65

At this point I noticed it is totally broken and found pytorch/tutorials#2478.

svekars added the module: torchtext label Jun 28, 2023

svekars added medium docathon-h2-2023 labels Nov 1, 2023

github-actions bot assigned xanderex-sid Nov 1, 2023

svekars unassigned xanderex-sid Nov 2, 2023

github-actions bot assigned nvs-abhilash Nov 2, 2023

nvs-abhilash removed their assignment Nov 3, 2023

github-actions bot assigned xanderex-sid Nov 3, 2023

xanderex-sid removed their assignment Nov 7, 2023

github-actions bot assigned ahoblitz Nov 8, 2023

ahoblitz mentioned this issue Nov 8, 2023

Patch 2478 #2662

Merged

4 tasks

svekars closed this as completed in #2662 Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransformerEncoder is not causal #2478

TransformerEncoder is not causal #2478

bigheary commented Jun 18, 2023 •

edited by pytorch-bot bot

Loading

jorendorff commented Jun 25, 2023 •

edited

Loading

jorendorff commented Jun 26, 2023

xanderex-sid commented Nov 1, 2023

nvs-abhilash commented Nov 2, 2023

xanderex-sid commented Nov 3, 2023

xanderex-sid commented Nov 5, 2023

jorendorff commented Nov 6, 2023

jorendorff commented Nov 6, 2023 •

edited

Loading

ahoblitz commented Nov 8, 2023

ahoblitz commented Nov 8, 2023

TransformerEncoder is not causal #2478

TransformerEncoder is not causal #2478

Comments

bigheary commented Jun 18, 2023 • edited by pytorch-bot bot Loading

Add Link

Describe the bug

Describe your environment

jorendorff commented Jun 25, 2023 • edited Loading

jorendorff commented Jun 26, 2023

xanderex-sid commented Nov 1, 2023

nvs-abhilash commented Nov 2, 2023

xanderex-sid commented Nov 3, 2023

xanderex-sid commented Nov 5, 2023

jorendorff commented Nov 6, 2023

jorendorff commented Nov 6, 2023 • edited Loading

ahoblitz commented Nov 8, 2023

ahoblitz commented Nov 8, 2023

bigheary commented Jun 18, 2023 •

edited by pytorch-bot bot

Loading

jorendorff commented Jun 25, 2023 •

edited

Loading

jorendorff commented Nov 6, 2023 •

edited

Loading