-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Number of available GPU blocks drop significantly for Phi3-vision #6124
Comments
@ywang96 Can you share some insight? Does it have something to do with the recent changes in VLM support? |
There used to be a bug in the model's memory profiling where it didn't actually pass in images. During inference, this underestimation might have caused OOM. After the fix, the available block count is reduced significantly which better reflects the true memory usage of the model. Re: your problem, this is expected as the model has 128k context length. If it can't fit in your GPU, try reducing the context length via |
thanks for the explanation @DarkLight1337 ! |
Just for future reference - the bug was discovered and fixed in #5888 and #5214. We have also updated examples/phi3v_example.py. The current profiling strategy is rather conservative, but improving it is definitely part of the next milestone! |
@ywang96 I get same error using with Is there some way to fix it?
|
As stated in the error message, you may have to decrease |
@DarkLight1337 Thanks decreasing the max_model_len solved the problem! |
Your current environment
Two docker containers based on images built from vllm source 3de6e6a and 3f3b6b2
🐛 Describe the bug
I passed the same model Phi-3-vision-128k-instruct to each docker container:
For the version needs VLMConfig, here are the parameters
And with the container based on 3de6e6a more latest, it raises error:
But the container based on 3f3b6b2:
The text was updated successfully, but these errors were encountered: