-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI/Build] Update CPU tests to include all "standard" tests #5481
Conversation
@DarkLight1337 I noticed that when running the I'm afraid that this will significantly slow down the test. What do you think about this? |
Is this specific to LLaVA model or does this also occur for the other models? If it's the latter case then I think this change would have a relatively small impact compared to the baseline. |
Hmm, this may be because the vision tower and multi-modal projector have not been optimized for vLLM yet. Let's wait for it to be implemented as described in #4194. |
@DarkLight1337 @Isotr0py hi, thanks for looking on this, Initially when enabling on the CPU CI, I find there are some issues on Llava CPU backend, so I disabled that part firstly |
Hmm... maybe |
@DarkLight1337 Hi, just did a quick check locally, with the latest code, the llava test will fail due to some result mismatch due to float16 vs bfloat16. below diff can help to fix this issue.
|
According to the CI log, it currently takes 5-10 seconds for each LLaVA-1.5 iteration and 20-40 seconds for each LLaVA-NeXT iteration. This is much longer than the other models which take less than 2 seconds (you can verify this by searching for the s/it string outputted by tqdm). |
#5591 has been merged. Let's see the performance now... |
Getting this error:
Does the CPU test not recompile vLLM? @WoosukKwon |
Hmm... Edit: I see |
Hi Roger,
The patch in #5591 is adding CUDA kernel only - should be OK to add the CPU related kernel under: csrc/cpu
We could also help to do this if required.
CC @bigPYJ1151
Thanks, -yuan
…________________________________
From: Roger Wang ***@***.***>
Sent: Friday, June 21, 2024 12:29 AM
To: vllm-project/vllm ***@***.***>
Cc: Yuan ***@***.***>; Comment ***@***.***>
Subject: Re: [vllm-project/vllm] [CI/Build] Enable LLaVA CPU test (PR #5481)
Getting this error:
Error in calling custom op gelu_quick: '_OpNamespace' '_C' object has no attribute 'gelu_quick'
Does the CPU test not recompile vLLM? @WoosukKwon<https://github.com/WoosukKwon>
Hmm...gelu_quick was actually added in #5591<#5591> as well though I'm not sure how to add that to be compatible with CPU
—
Reply to this email directly, view it on GitHub<#5481 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAAKXDOUUS4GTDUREPZS3RTZIL7PFAVCNFSM6AAAAABJHMQPFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRGA4TQMZUGA>.
You are receiving this because you commented.Message ID: ***@***.***>
|
@zhouyuan yea - I already made a PR #5717 and it's just waiting for review now |
There is no observable speed increase so far. Perhaps the multi-modal projector also has to be optimized? |
Hmm... yea - the other place to optimize is |
|
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
The test duration has gone up from 18 minutes to 30 minutes. Given we currently merge 10-20 PRs per day, if we assume that CI AWS is triggered 3x per commit (the minimum is 2x - once pre-merge and once post-merge, but it's unlikely that the CI passes on the first try after @bigPYJ1151 do you know whether it's possible to increase the number of agents to 2? Otherwise, I'll prune some tests from this PR. |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 The Looks like some audio language model tests require chunked-prefill, will open a PR for it recently. |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
After removing the tests for unsupported models (involving embedding and chunked prefill), the test duration is down to 26 minutes, which should be OK for now. |
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
I have added |
@Isotr0py PTAL and see if this looks ok to you as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: OmerD <omer@run.ai>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Loc Huynh <jc1da.3011@gmail.com>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
…ject#5481) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This change should help catch issues related to VLMs that are specific to CPU (e.g. #5451, #7735, #8061).
Edit: Updated the list of related issues in light of recent PRs.