-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0' #37
Comments
|
The latest This is a snippet from the release notes: |
Cool thanks! |
Do you know if Mixtral is also supported? |
Looks like it! |
It'll be nice if some fast inference engine like vLLM would support attention sinks. Do you have any plans to do that? |
I agree. I'm not very familiar with the world of fast inference engines like vLLM, TGI, etc., so it would be a bit hard to justify the time investment. So at this time, I don't have plans for that. |
In a single-turn QA testing, something strange happened in this colab. When setting the
There is a unknown user in the output with duplicated content. This could be a limitation of the model itself or an incorrect usage of streamingLLM? |
latest transformers has stronger issues. Any chance to update this repo for 4.36.1+?
The text was updated successfully, but these errors were encountered: