The test `LlamaIntegrationTest::test_conversion` test is failing #23400

ydshieh · 2023-05-16T13:12:06Z

The following command

RUN_SLOW=1 python3 -m pytest -v tests/models/llama/test_tokenization_llama.py::LlamaIntegrationTest::test_conversion

gives

>           self.assertEqual(old_serialized, new_serialized)
E           AssertionError: '{\n [1465 chars]   "Sequence": {\n          "id": "B",\n      [1794589 chars]}\n}' != '{\n [1465 chars]   "SpecialToken": {\n          "id": "<s>",\n[1794837 chars]}\n}'

tests/models/llama/test_tokenization_llama.py:337: AssertionError

Who can help?

@ArthurZucker

Narsil · 2023-05-16T15:29:29Z

~~I looked into it.~~

~~The difference is that the newly converted tokenizer has ids 32000-32004 as special ids which correspond if I'm not mistaken to OpenAssistant llama fork.~~

~~Those do not seem to be declared here: https://huggingface.co/hf-internal-testing/llama-tokenizer/tree/main~~

~~I'm not sure which part of the code adds them to the slow tokenizer, but this seems indeed like a bug.~~

Looked at the wrong file. Everything works it's only a different type_id in the post processor.

We simply need to update the tokenizer.json on the hub with the correct value (1)

Narsil · 2023-05-16T15:37:42Z

(There's also a slight issue with the EOS token being added into the processor for no reason.

Narsil · 2023-05-16T15:50:08Z

https://huggingface.co/hf-internal-testing/llama-tokenizer/discussions/3

Goes along with

#23400

ydshieh · 2023-05-16T16:06:04Z

Confirmed it works!

ydshieh mentioned this issue May 16, 2023

[Llama Tokenizer] Fast llama template #22959

Merged

Narsil mentioned this issue May 16, 2023

small fix to remove unused eos in processor when it's not used. #23408

Merged

5 tasks

Narsil closed this as completed in #23408 May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The test `LlamaIntegrationTest::test_conversion` test is failing #23400

The test `LlamaIntegrationTest::test_conversion` test is failing #23400

ydshieh commented May 16, 2023 •

edited

Loading

Narsil commented May 16, 2023 •

edited

Loading

Narsil commented May 16, 2023

Narsil commented May 16, 2023

ydshieh commented May 16, 2023

The test LlamaIntegrationTest::test_conversion test is failing #23400

The test LlamaIntegrationTest::test_conversion test is failing #23400

Comments

ydshieh commented May 16, 2023 • edited Loading

Who can help?

Narsil commented May 16, 2023 • edited Loading

Narsil commented May 16, 2023

Narsil commented May 16, 2023

ydshieh commented May 16, 2023

The test `LlamaIntegrationTest::test_conversion` test is failing #23400

The test `LlamaIntegrationTest::test_conversion` test is failing #23400

ydshieh commented May 16, 2023 •

edited

Loading

Narsil commented May 16, 2023 •

edited

Loading