Replies: 1 comment
-
WG feedback is that this is okay. No code change is required, this is up to submitters. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
During the finetuning of the GPT-J model, an extra token (pad="[PAD]") was inadvertently introduced as part of the model, which increased the
lm_head
dimension by 1 (from 50400 to 50401).This extra token is now practically redundant (both in dataset preprocessing and post-processing if padding is used during batching).
Creating a thread to discuss making it optional to use 50401 vs 50400 in the
lm_head
.Beta Was this translation helpful? Give feedback.
All reactions