fix(vllm): do not set videos if we don't have any #3885
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This pull request to
backend/python/vllm/backend.py
includes changes to improve the handling of multi-modal data and error management in theload_image
method.Improvements to multi-modal data handling:
async def _predict(self, request, context, streaming=False)
: Refactored the handling ofmulti_modal_data
to use a dictionary that is conditionally populated withimage
andvideo
data, simplifying the code and making it more readable.Error management improvements:
def load_image(self, image_path: str)
: Modified the method to returnNone
instead of callingload_video
when an error occurs, ensuring that the method's behavior is more predictable and consistent.Notes for Reviewers
This fixes a bug where if a model didn't support both video and image understanding would fail. Seems vLLM is sensible and if there is a "video" key in the args it will check if the model capabilities include video understanding and fail otherwise.
Signed commits