-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the Gemma 2 model #84
Conversation
@guoqingbao the model runs but seems to give garbage output. Do you see anything which stands out? |
Perhaps there is a need for 'attn_logit_softcapping' if 'final_logit_softcapping' is used. |
I found the recent update #83 overwritten previous bug fixes including #80 @EricLBuehler |
I have submitted a PR #86 to resolve the issue where both 'attn_logit_softcapping' and 'final_logit_softcapping' are necessary for gemma-2 inference. The corresponding PA kernel has also been revised. I'm curious why this isn't supported in vLLM. Google only mentioned that softcapping is beneficial for training. |
@guoqingbao sounds good, I'll close this PR! |
Refs #79