Is there a way to pass arguments to a backend? (VLLM specifically) #4313

Jordanb716 · 2024-12-04T06:20:04Z

Jordanb716
Dec 4, 2024

I'm trying to run a model through VLLM, and getting:

err=ValueError('Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA P102-100 GPU has compute capability 6.1. You can use float16 instead by explicitly setting thedtype flag in CLI, for example: --dtype=half.')

But I can't for the life of me figure out how to pass that flag to VLLM. Is there something I could add to the model config file, an env variable, or something like that? I'm running v2.23.0-cublas-cuda12-ffmpeg through kubernetes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to pass arguments to a backend? (VLLM specifically) #4313

{{title}}

Replies: 0 comments

Select a reply

Is there a way to pass arguments to a backend? (VLLM specifically) #4313

Jordanb716 Dec 4, 2024

Replies: 0 comments

Jordanb716
Dec 4, 2024