-
-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Tokenizer's BOS/EOS/PAD not set for inference #139
Comments
Actually it's also seen during fine-tuning falcon:
|
@utensil , Which dataset format are you using? |
we should probably change these to info and move it to after we add the special tokens from the config |
Doesn't https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/utils/models.py#L68-L70 already solve this? |
Hm, yes. I think the problem is due to that code block in the first post. It may overwrite previous setting. I think just removing it is enough? The main work is in #64 |
If we were to remove the block in the first post, we need to make sure llama configs/tokenizers have those added in somewhere to prevent a regression. |
If I understand it right, the code from #180 |
Yes, this should be closed. Thank you! |
Following https://discord.com/channels/1104757954588196865/1111279858136383509/1113729100763381804, a user saw that the tokenizer's bos/eos/pad is not set when inference mode.
We can fix this by just setting this following yaml.
https://github.com/OpenAccess-AI-Collective/axolotl/blob/288fd62431be84a7112fd461feeb9322f1177d3c/scripts/finetune.py#L66-L68
We need to update this as Alpaca is not the only method now.
Depends on #64
The text was updated successfully, but these errors were encountered: