[BUG] Flashinfer 0.0.3 compat with Sglang #283

Qubitium · 2024-03-12T00:33:39Z

Using flashinfer 0.0.3 requires one line change #282 but there is a compat issue where same model runs fine on 0.0.2 but under 0.0.3 throws an infinite loop of the following on sglang:

Exception in ModelRpcClient:
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 184, in exposed_step
    self.forward_step()
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 211, in forward_step
    self.forward_decode_batch(self.running_batch)
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/model_rpc.py", line 505, in forward_decode_batch
    next_token_ids, _ = batch.sample(logits)
                        ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sglang/srt/managers/router/infer_batch.py", line 476, in sample
    sampled_index = torch.multinomial(probs_sort, num_samples=1)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

I am unsure if this is compat issue due to sglang or flashinfer 0.0.3.

@merrymercy @yzh119

The text was updated successfully, but these errors were encountered:

hnyls2002 · 2024-03-12T02:22:23Z

@Qubitium Thanks for reminding us, I am not sure where is this nan bug happening, I will look at it soon.

Qubitium · 2024-03-12T02:49:39Z

@hnyls2002 Thanks. For more context the model is llama 6b (yi-6b) to be exact and when I mean by infinite loop, the model never stops calling forward despite the error and the error just loops/repeats. So the second bug appears that RuntimeError is not properly handled inside the runtime server.

hnyls2002 · 2024-03-12T11:44:43Z

@Qubitium Hi, I have just tested the latest main branch with this new PR(flashinfer-ai/flashinfer#177), I believe this bug has been fixed by @yzh119. Today earlier, not only there will be nan problems with infinite loops, but the decoding results are a mess. Now it seems to be going well.

Could you please try it and make sure that this bug has been fixed? (This latest main branch has not been made into wheels, and building packages probably cost a long time.)

Qubitium · 2024-03-12T13:32:30Z

@hnyls2002 @yzh119's fix looks good on my end too. All errors resolved.

hnyls2002 mentioned this issue Mar 12, 2024

Fix flashinfer >= 0.0.3 compat #282

Merged

hnyls2002 linked a pull request Mar 12, 2024 that will close this issue

Fix flashinfer >= 0.0.3 compat #282

Merged

hnyls2002 closed this as completed in #282 Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Flashinfer 0.0.3 compat with Sglang #283

[BUG] Flashinfer 0.0.3 compat with Sglang #283

Qubitium commented Mar 12, 2024 •

edited

Loading

hnyls2002 commented Mar 12, 2024 •

edited

Loading

Qubitium commented Mar 12, 2024

hnyls2002 commented Mar 12, 2024

Qubitium commented Mar 12, 2024

[BUG] Flashinfer 0.0.3 compat with Sglang #283

[BUG] Flashinfer 0.0.3 compat with Sglang #283

Comments

Qubitium commented Mar 12, 2024 • edited Loading

hnyls2002 commented Mar 12, 2024 • edited Loading

Qubitium commented Mar 12, 2024

hnyls2002 commented Mar 12, 2024

Qubitium commented Mar 12, 2024

Qubitium commented Mar 12, 2024 •

edited

Loading

hnyls2002 commented Mar 12, 2024 •

edited

Loading