GPT-J extra (pad) token #1533

attafosu · 2023-12-12T16:34:37Z

attafosu
Dec 12, 2023

During the finetuning of the GPT-J model, an extra token (pad="[PAD]") was inadvertently introduced as part of the model, which increased the lm_head dimension by 1 (from 50400 to 50401).

This extra token is now practically redundant (both in dataset preprocessing and post-processing if padding is used during batching).

Creating a thread to discuss making it optional to use 50401 vs 50400 in the lm_head.

mrmhodak · 2024-01-16T17:31:10Z

mrmhodak
Jan 16, 2024
Maintainer

WG feedback is that this is okay. No code change is required, this is up to submitters.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-J extra (pad) token #1533

{{title}}

Replies: 1 comment

{{title}}

Select a reply

GPT-J extra (pad) token #1533

attafosu Dec 12, 2023

Replies: 1 comment

mrmhodak Jan 16, 2024 Maintainer

attafosu
Dec 12, 2023

mrmhodak
Jan 16, 2024
Maintainer