Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Add multi-image support for minicpmv offline inference #7122

Merged
merged 6 commits into from
Aug 5, 2024

Conversation

HwwwwwwwH
Copy link
Contributor

Since it just need few changes outside minicpmv.py, I add multi-images support first for it. And I'm doing more tests for it.

You can use multi-image inputs as follow:

messages = [{
    "role":
    "user",
    "content":
    "(<image>./</image>)\n" + \
    "(<image>./</image>)\n" + \
    "What's the contents of these two images?"
}]
prompt = tokenizer.apply_chat_template(messages,
                                       tokenize=False,
                                       add_generation_prompt=True)

outputs = llm.generate({
    "prompt": prompt,
    "multi_modal_data": {
        "image": [image1, image2]
    }
}, sampling_params=sampling_params)

Copy link

github-actions bot commented Aug 4, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@HwwwwwwwH
Copy link
Contributor Author

@DarkLight1337 Do you have time to review this? See if the modifications is fine for vLLM.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to clean up the code.

vllm/model_executor/models/minicpmv.py Outdated Show resolved Hide resolved
vllm/model_executor/models/minicpmv.py Outdated Show resolved Hide resolved
vllm/model_executor/models/minicpmv.py Outdated Show resolved Hide resolved
HwwwwwwwH and others added 2 commits August 4, 2024 18:36
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
@DarkLight1337
Copy link
Member

We can merge this after you have added tests that check the model's behaviour for multi-image input.

@HwwwwwwwH
Copy link
Contributor Author

/ready

@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 4, 2024
@DarkLight1337
Copy link
Member

DarkLight1337 commented Aug 4, 2024

Can you move NestedInputs outside of the test functions, since they are duplicated?

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be slight differences compared to the HF version but overall the results are reasonable.

@DarkLight1337 DarkLight1337 merged commit 7b86e7c into vllm-project:main Aug 5, 2024
66 checks passed
@ywang96
Copy link
Member

ywang96 commented Aug 5, 2024

Going to make a change on the title to indicate this only works for offline inference. Thank you for the PR!

@ywang96 ywang96 changed the title [Model] Add multi-image support for minicpmv [Model] Add multi-image support for minicpmv offline inference Aug 5, 2024
@HwwwwwwwH HwwwwwwwH deleted the multi-images branch August 5, 2024 04:43
@cjfcsjt
Copy link

cjfcsjt commented Aug 8, 2024

Thanks for implementing this! Does this support interleaved text-image reasoning?

@HwwwwwwwH
Copy link
Contributor Author

Thanks for implementing this! Does this support interleaved text-image reasoning?

Of course! Just do not use chat_template and construct related prompt.

sfc-gh-mkeralapura pushed a commit to sfc-gh-mkeralapura/vllm that referenced this pull request Aug 12, 2024
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
@seanzhang-zhichen
Copy link

support video inference with vllm openai server ? if that, can show some example?

@DarkLight1337
Copy link
Member

support video inference with vllm openai server ? if that, can show some example?

Not supported yet. It will be addressed in another PR.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Sep 11, 2024

For now you can pass in a video via multi-image input. We already have an example of this in the docs about VLMs.

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: Alvant <alvasian@yandex.ru>
KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants