[Bug] requires "python-multipart" to be installed with docker image #949

kerthcet · 2024-08-06T08:03:23Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

I tested sglang in Kubernetes with the minimum configuration below:

  containers:
  - args:
    - --model-path
    - /workspace/models/models--facebook--opt-125m
    - --served-model-name
    - opt-125m
    - --host
    - 0.0.0.0
    - --port
    - "8080"
    command:
    - python3
    - -m
    - sglang.launch_server
    image: lmsysorg/sglang:v0.2.9-cu121

However, it emits error like:

Form data requires "python-multipart" to be installed.
You can install "python-multipart" with:

pip install python-multipart

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 5, in <module>
    from sglang.srt.server import launch_server
  File "/sgl-workspace/sglang/python/sglang/srt/server.py", line 162, in <module>
    async def openai_v1_files(file: UploadFile = File(...), purpose: str = Form("batch")):
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 944, in decorator
    self.add_api_route(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 883, in add_api_route
    route = route_class(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 519, in __init__
    self.body_field = get_body_field(dependant=self.dependant, name=self.unique_id)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 817, in get_body_field
    check_file_field(final_field)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/dependencies/utils.py", line 100, in check_file_field
    raise RuntimeError(multipart_not_installed_error) from None
RuntimeError: Form data requires "python-multipart" to be installed.
You can install "python-multipart" with:

pip install python-multipart

Based on my understanding, this should be included in the docker image, anything I missed here?
Thanks !

Reproduction

Part of the yaml file as described above.

Environment

Running with docker image `lmsysorg/sglang:v0.2.9-cu121`

The text was updated successfully, but these errors were encountered:

zhyncs · 2024-08-06T08:06:11Z

Please use lmsysorg/sglang:latest.

zhyncs · 2024-08-06T08:07:37Z

fixed #895

kerthcet · 2024-08-06T08:10:01Z

Thanks for the response, when will we release a new version?

zhyncs · 2024-08-06T08:14:01Z

Hi @kerthcet In fact, the new version has been released.
ref
https://pypi.org/project/sglang/
https://hub.docker.com/r/lmsysorg/sglang/tags

kerthcet · 2024-08-06T08:15:05Z

Sounds great, let me have a try on v0.2.10-cu124.

kerthcet · 2024-08-06T09:43:01Z

I just met another error with v0.2.10-cu124:

[gpu=0] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=13.40 GB
[gpu=0] Memory pool end. avail mem=1.06 GB
[gpu=0] Capture cuda graph begin. This can take up to several minutes.
Initialization failed. controller_init_state: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/controller_single.py", line 150, in start_controller_process
    controller = ControllerSingle(
  File "/sgl-workspace/sglang/python/sglang/srt/managers/controller_single.py", line 84, in __init__
    self.tp_server = ModelTpServer(
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 92, in __init__
    self.model_runner = ModelRunner(
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 140, in __init__
    self.init_cuda_graphs()
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 341, in init_cuda_graphs
    self.cuda_graph_runner.capture(batch_size_list)
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 133, in capture
    ) = self.capture_one_batch_size(bs, forward)
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 196, in capture_one_batch_size
    run_once()
  File "/sgl-workspace/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 193, in run_once
    return forward(input_ids, input_metadata.positions, input_metadata)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2.py", line 287, in forward
    hidden_states = self.model(input_ids, positions, input_metadata, input_embeds)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2.py", line 255, in forward
    hidden_states, residual = layer(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2.py", line 207, in forward
    hidden_states = self.self_attn(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/models/qwen2.py", line 156, in forward
    attn_output = self.attn(q, k, v, input_metadata)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/layers/radix_attention.py", line 182, in forward
    return self.decode_forward(q, k, v, input_metadata)
  File "/sgl-workspace/sglang/python/sglang/srt/layers/radix_attention.py", line 166, in decode_forward_flashinfer
    o = input_metadata.flashinfer_decode_wrapper.forward(
  File "/usr/local/lib/python3.8/dist-packages/flashinfer/decode.py", line 629, in forward
    out = self._wrapper.forward(
ValueError: FlashInfer Internal Error: Invalid configuration : num_frags_x=1 num_frags_y=4 num_frags_z=1 num_warps_x=1 num_warps_z=4 please create an issue (https://github.com/flashinfer-ai/flashinfer/issues) and report the issue to the developers.

zhyncs · 2024-08-06T09:45:25Z

Hi @kerthcet Thank you for reporting this issue, we will take a look. cc @yzh119

kerthcet · 2024-08-06T09:45:58Z

containers:
  - args:
    - --model-path
    - /workspace/models/models--Qwen--Qwen2-0.5B-Instruct
    - --served-model-name
    - qwen2-05b
    - --host
    - 0.0.0.0
    - --port
    - "8080"
    command:
    - python3
    - -m
    - sglang.launch_server

zhyncs · 2024-08-06T09:47:30Z

By the way, why did you start with cu121 and then switch to cu124 later?

kerthcet · 2024-08-06T09:49:48Z

Well, I care little about the cuda version, just want to make it out on Kubernetes. Does that matter?

zhyncs · 2024-08-06T09:51:14Z

If the CUDA version of your host machine is 12.1, then I still recommend you use the cu121 image.

kerthcet · 2024-08-06T09:51:57Z

Make sense, thanks for the input. I'll have another try.

yzh119 · 2024-08-07T04:25:11Z

@kerthcet what's your GPU architecture? I suppose you might be using sm_86 which has small shared memory size. I'll fix flashinfer's behavior for sm_86 in v0.1.4.

yzh119 · 2024-08-07T07:44:56Z

I can't reproduce the error with sm_86's shared memory limit (96kb/sm). It only happens when I limit the shared memory size to 64kb/sm (turing(sm_75)'s spec).

kerthcet · 2024-08-07T10:05:17Z

Hi I may use T4 or A10, I can't remember that. I can test later. So can you suggest which GPU arch I can test with ?

yzh119 · 2024-08-09T19:30:09Z

T4 has sm_75 which is kind of old and I'll fix the behavior later, A10 (sm_80) should work.

kerthcet · 2024-08-16T08:08:02Z

Tested with A10, it works. I can test with T4 once fixed with that arch. Thanks for the feedbacks.

zhyncs · 2024-08-16T08:09:06Z

Thanks for verification!

zhyncs closed this as completed Aug 6, 2024

zhyncs reopened this Aug 6, 2024

zhyncs added the flashinfer label Aug 6, 2024

zhyncs added the await-response label Aug 7, 2024

zhyncs closed this as completed Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] requires "python-multipart" to be installed with docker image #949

[Bug] requires "python-multipart" to be installed with docker image #949

kerthcet commented Aug 6, 2024 •

edited

Loading

zhyncs commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

kerthcet commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

yzh119 commented Aug 7, 2024 •

edited

Loading

yzh119 commented Aug 7, 2024 •

edited

Loading

kerthcet commented Aug 7, 2024

yzh119 commented Aug 9, 2024

kerthcet commented Aug 16, 2024 •

edited

Loading

zhyncs commented Aug 16, 2024

[Bug] requires "python-multipart" to be installed with docker image #949

[Bug] requires "python-multipart" to be installed with docker image #949

Comments

kerthcet commented Aug 6, 2024 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

zhyncs commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

kerthcet commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

zhyncs commented Aug 6, 2024

kerthcet commented Aug 6, 2024

yzh119 commented Aug 7, 2024 • edited Loading

yzh119 commented Aug 7, 2024 • edited Loading

kerthcet commented Aug 7, 2024

yzh119 commented Aug 9, 2024

kerthcet commented Aug 16, 2024 • edited Loading

zhyncs commented Aug 16, 2024

kerthcet commented Aug 6, 2024 •

edited

Loading

yzh119 commented Aug 7, 2024 •

edited

Loading

yzh119 commented Aug 7, 2024 •

edited

Loading

kerthcet commented Aug 16, 2024 •

edited

Loading