This repository has been archived by the owner on Oct 25, 2024. It is now read-only.
[NeuralChat] Use unk token instead of eos token #1198
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Type of Change
Bug fix
Description
In the train.py code the
pad_token
is set to theeos_token
here. Becauseeos_token
's are present in every conversation when we calculate thetotal_len
of a conversation here we basically ignore one token per conversationint(target.ne(tokenizer.pad_token_id).sum()) == int(target.ne(tokenizer.eos_token_id).sum())
so we have to add in the extra counts by adding+ len([rou for rou in rounds if rou != ""])
. This in itself isn't bad, but if we use another model, say teknium/OpenHermes-2.5-Mistral-7B, they actually use a differenteos_token
here and so thetotal_len
calculation ends up being wrong:where 123 is the correct
total_len
and 0 is typically the unk token id. This causes the mismatch warning to get thrown here. A fix for this is to use the unk token like the original LLaVA code uses here and then change thetotal_len
calculation to the original implementation here.Expected Behavior & Potential Risk
the expected behavior that triggered by this PR
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed