[Usage]: Bad Request with multiple multimodal inputs when using vision LLM. #8053
Closed
1 task done
Labels
usage
How to use vllm
Your current environment
How would you like to use vllm
I have tried InterVL and MimiCPM by requesting with multiple multimodal inputs, but both failed to response and it comes with bad request error. I have done some research and noticed some VLMs like phi-3 already support such inputs. #5820. Is this feature still under construction? or Did I miss anything?
ONLINE INFER EXAMPLE
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: