Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Flashinfer 0.0.3 compat with Sglang #283

Closed
Qubitium opened this issue Mar 12, 2024 · 4 comments · Fixed by #282
Closed

[BUG] Flashinfer 0.0.3 compat with Sglang #283

Qubitium opened this issue Mar 12, 2024 · 4 comments · Fixed by #282

Comments

@Qubitium
Copy link
Contributor

Qubitium commented Mar 12, 2024

Using flashinfer 0.0.3 requires one line change #282 but there is a compat issue where same model runs fine on 0.0.2 but under 0.0.3 throws an infinite loop of the following on sglang:

Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 184, in exposed_step
    self.forward_step()
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 211, in forward_step
    self.forward_decode_batch(self.running_batch)
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 505, in forward_decode_batch
    next_token_ids, _ = batch.sample(logits)
                        ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/infer_batch.py", line 476, in sample
    sampled_index = torch.multinomial(probs_sort, num_samples=1)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

I am unsure if this is compat issue due to sglang or flashinfer 0.0.3.

@merrymercy @yzh119

@hnyls2002
Copy link
Collaborator

hnyls2002 commented Mar 12, 2024

@Qubitium Thanks for reminding us, I am not sure where is this nan bug happening, I will look at it soon.

@Qubitium
Copy link
Contributor Author

@hnyls2002 Thanks. For more context the model is llama 6b (yi-6b) to be exact and when I mean by infinite loop, the model never stops calling forward despite the error and the error just loops/repeats. So the second bug appears that RuntimeError is not properly handled inside the runtime server.

@hnyls2002
Copy link
Collaborator

@Qubitium Hi, I have just tested the latest main branch with this new PR(flashinfer-ai/flashinfer#177), I believe this bug has been fixed by @yzh119. Today earlier, not only there will be nan problems with infinite loops, but the decoding results are a mess. Now it seems to be going well.

Could you please try it and make sure that this bug has been fixed? (This latest main branch has not been made into wheels, and building packages probably cost a long time.)

@hnyls2002 hnyls2002 linked a pull request Mar 12, 2024 that will close this issue
@Qubitium
Copy link
Contributor Author

@hnyls2002 @yzh119's fix looks good on my end too. All errors resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants