-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Flashinfer 0.0.3 compat with Sglang #283
Comments
@Qubitium Thanks for reminding us, I am not sure where is this |
@hnyls2002 Thanks. For more context the model is llama 6b (yi-6b) to be exact and when I mean by infinite loop, the model never stops calling forward despite the error and the error just loops/repeats. So the second bug appears that RuntimeError is not properly handled inside the runtime server. |
@Qubitium Hi, I have just tested the latest main branch with this new PR(flashinfer-ai/flashinfer#177), I believe this bug has been fixed by @yzh119. Today earlier, not only there will be Could you please try it and make sure that this bug has been fixed? (This latest main branch has not been made into wheels, and building packages probably cost a long time.) |
@hnyls2002 @yzh119's fix looks good on my end too. All errors resolved. |
Using flashinfer 0.0.3 requires one line change #282 but there is a compat issue where same model runs fine on 0.0.2 but under 0.0.3 throws an infinite loop of the following on sglang:
I am unsure if this is compat issue due to sglang or flashinfer 0.0.3.
@merrymercy @yzh119
The text was updated successfully, but these errors were encountered: