-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama 3 - Regression with apostrophes #7006
Comments
Quality really degraded in the last week. especially with the llama3 models. I have pulled so many changes I'm not sure when it happened, but I agree with you, I think going back a week to 2 will see these problems go away. I'm on linux running on Nvidia. |
I'm seeing something similar on Command R+, except with mixed/doubled apostrophes (e.g. "They'`re"). I assume it's the pre-tokenizer, as per the "missing pre-tokenizer type, using: 'default'" warning in the server log with the big bold "GENERATION QUALITY WILL BE DEGRADED! CONSIDER REGENERATING THE MODEL" below it. Trying to generate a new gguf from the HF weights for command-r-plus fails since #6920, but it might help with llama-3. |
Kafkaesque. |
I'm not a developer, but using LM Studio I have noticed the same thing after upgrading to the latest version, which included an updated llama.cpp, using Q8 llama 3 70b models on an M3 Max. Plenty of apostrophe errors, ranging from adding a space between the apostrophe and an "s" (example: Mary' s glass of water, instead of Mary's glass of water), or omitted "s" in the same context (example: Mary' glass of water, instead of Mary's glass of water). I have also noticed double apostrophes here and there. I haven't changed my prompts, model settings, or model files -- and this didn't occur with prior versions of LM Studio that used an older llama.cpp, with llama-3 70b models. Hope that helps diagnose the issue. |
Is this issue still present with latest |
Just tested this in KCPP 1.65 because for us backwards compatibility is a selling point we want the tokenizer behavior to be correct on old models too. For context: So from what I can see this is solved, others can confirm. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
I am using seeing a lot of improperly formatted apostrophes. Some examples I have seen:
I don't remember ever seeing this issue about a week ago, so I suspect it might be a recently introduced bug. It happens fairly consistently (but randomly).
To reproduce this, I just ask the assistant to generate lots of random story text, and it eventually hits the bug after about 1k tokens.
Specs
Example
Request:
Response:
The text was updated successfully, but these errors were encountered: