diff --git a/docs/source/sft_trainer.mdx b/docs/source/sft_trainer.mdx index 6d9ab661cb..a6ea9e84a0 100644 --- a/docs/source/sft_trainer.mdx +++ b/docs/source/sft_trainer.mdx @@ -111,6 +111,8 @@ trainer = SFTTrainer( trainer.train() ``` +Make sure to have a `pad_token_id` which is different from `eos_token_id` which can result in the model not properly predicting EOS (End of Sentence) tokens during generation. + #### Using token_ids directly for `response_template` Some tokenizers like Llama 2 (`meta-llama/Llama-2-XXb-hf`) tokenize sequences differently depending whether they have context or not. For example: