Is there a way to pass arguments to a backend? (VLLM specifically) #4313
Unanswered
Jordanb716
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to run a model through VLLM, and getting:
err=ValueError('Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your NVIDIA P102-100 GPU has compute capability 6.1. You can use float16 instead by explicitly setting the
dtypeflag in CLI, for example: --dtype=half.')
But I can't for the life of me figure out how to pass that flag to VLLM. Is there something I could add to the model config file, an env variable, or something like that? I'm running v2.23.0-cublas-cuda12-ffmpeg through kubernetes.
Beta Was this translation helpful? Give feedback.
All reactions