[NeuralChat] Fixed off by one error on masking #1193

dillonalaird · 2024-01-26T03:56:20Z

Type of Change

Bug fix

Description

There's an off by one error in the preprocessor for generating the inputs and targets. This line sets the instruction_len = len(tokenizer_image_token(parts[0], tokenizer)) - 1 but if we do that we get the following mask:

ipdb> p target
tensor([ -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
         1579,   297,   272,  3469,   349,  3075,   304,  2760, 28723,     2,
         1247, 28747,  1824,  4480,   541,   347,  2598,   356,   272,   852,
          302,   272,  1579, 28804, 21631, 28747,   415,   852,   302,   272,
         1579,  4190,   396, 21662,  1116, 28723,     2,  1247, 28747,  1691,
          272,  1579,  7810,  1060,   272,  5948,   442,  5822,   805,   298,
          272,  2081, 28804, 21631, 28747,   415,  1579,   349,  7810,  1060,
          272,  5948, 28725,   690,   349, 22558,   395,   905,   304,   799,
        11999, 28723,     2])

If we decode from the end of the mask to the first end of sentence, so 1579 to 2 we get:

ipdb> p tokenizer.decode([1579,   297,   272,  3469,   349,  3075,   304,  2760, 28723,     2])
'bus in the image is white and red.</s>'

but this is missing the first word "The" in the sentence, which can be seen if we print the entire conversation:

ipdb> p conversation
"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. User: <image>\nWhat are the colors of the bus in the image? Assistant: The bus in the image is white and red.</s>User: What feature can be seen on the back of the bus? Assistant: The back of the bus features an advertisement.</s>User: Is the bus driving down the street or pulled off to the side? Assistant: The bus is driving down the street, which is crowded with people and other vehicles.</s>"

The original LLaVA code uses the -2 when calculating the instruction_len here

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: Dillon Laird <dillonalaird@gmail.com>

dillonalaird requested a review from lvliang-intel as a code owner January 26, 2024 03:56

fixed off by one error on masking

c3b5973

Signed-off-by: Dillon Laird <dillonalaird@gmail.com>

hshen14 assigned hshen14 and lvliang-intel and unassigned hshen14 and lvliang-intel Jan 26, 2024

lvliang-intel requested a review from lkk12014402 January 26, 2024 05:11

lkk12014402 approved these changes Jan 26, 2024

View reviewed changes

hshen14 approved these changes Jan 26, 2024

View reviewed changes

hshen14 merged commit 525076d into intel:main Jan 26, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NeuralChat] Fixed off by one error on masking #1193

[NeuralChat] Fixed off by one error on masking #1193

dillonalaird commented Jan 26, 2024

[NeuralChat] Fixed off by one error on masking #1193

[NeuralChat] Fixed off by one error on masking #1193

Conversation

dillonalaird commented Jan 26, 2024

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?