Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

[NeuralChat] Fixed off by one error on masking #1193

Merged
merged 1 commit into from
Jan 26, 2024
Merged

[NeuralChat] Fixed off by one error on masking #1193

merged 1 commit into from
Jan 26, 2024

Conversation

dillonalaird
Copy link
Contributor

Type of Change

Bug fix

Description

There's an off by one error in the preprocessor for generating the inputs and targets. This line sets the instruction_len = len(tokenizer_image_token(parts[0], tokenizer)) - 1 but if we do that we get the following mask:

ipdb> p target
tensor([ -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         1579,   297,   272,  3469,   349,  3075,   304,  2760, 28723,     2,
         1247, 28747,  1824,  4480,   541,   347,  2598,   356,   272,   852,
          302,   272,  1579, 28804, 21631, 28747,   415,   852,   302,   272,
         1579,  4190,   396, 21662,  1116, 28723,     2,  1247, 28747,  1691,
          272,  1579,  7810,  1060,   272,  5948,   442,  5822,   805,   298,
          272,  2081, 28804, 21631, 28747,   415,  1579,   349,  7810,  1060,
          272,  5948, 28725,   690,   349, 22558,   395,   905,   304,   799,
        11999, 28723,     2])

If we decode from the end of the mask to the first end of sentence, so 1579 to 2 we get:

ipdb> p tokenizer.decode([1579,   297,   272,  3469,   349,  3075,   304,  2760, 28723,     2])
'bus in the image is white and red.</s>'

but this is missing the first word "The" in the sentence, which can be seen if we print the entire conversation:

ipdb> p conversation
"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. User: <image>\nWhat are the colors of the bus in the image? Assistant: The bus in the image is white and red.</s>User: What feature can be seen on the back of the bus? Assistant: The back of the bus features an advertisement.</s>User: Is the bus driving down the street or pulled off to the side? Assistant: The bus is driving down the street, which is crowded with people and other vehicles.</s>"

The original LLaVA code uses the -2 when calculating the instruction_len here

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: Dillon Laird <dillonalaird@gmail.com>
@hshen14 hshen14 merged commit 525076d into intel:main Jan 26, 2024
9 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants