Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter empty prompt in random bench serving #2011

Merged
merged 2 commits into from
Nov 12, 2024

Conversation

ispobock
Copy link
Collaborator

Motivation

Fix issue:

 RuntimeWarning: divide by zero encountered in scalar floor_divide
  ratio = (input_lens[i] + prompt_len - 1) // prompt_len

Reproduce:

python3 -m sglang.bench_serving --backend sglang --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json --dataset-name random --random-input 128 --random-output 64 --num-prompts 3200 --request-rate 32 --random-range-ratio 1.0

@ispobock ispobock requested review from merrymercy and zhyncs and removed request for merrymercy November 12, 2024 05:53
@ispobock
Copy link
Collaborator Author

By the way, if an empty prompt request is sent while other requests are in decoding, the server will fail for both normal and overlap cases:

[2024-11-12 11:34:43 TP0] Traceback (most recent call last):
  File "/workdir/repos/sglang/python/sglang/srt/managers/scheduler.py", line 1210, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/workdir/tools/miniconda/miniconda3/envs/sgl-vl/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workdir/repos/sglang/python/sglang/srt/managers/scheduler.py", line 368, in event_loop_overlap
    batch = self.get_next_batch_to_run()
  File "/workdir/repos/sglang/python/sglang/srt/managers/scheduler.py", line 615, in get_next_batch_to_run
    self.running_batch.merge_batch(self.last_batch)
  File "/workdir/repos/sglang/python/sglang/srt/managers/schedule_batch.py", line 959, in merge_batch
    raise e
  File "/workdir/repos/sglang/python/sglang/srt/managers/schedule_batch.py", line 956, in merge_batch
    self.output_ids = torch.concat([self.output_ids, other.output_ids])
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)

cc: @merrymercy

@zhyncs zhyncs merged commit b808a38 into sgl-project:main Nov 12, 2024
11 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants