-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Regression] segfault in turbomind for OpenGVLab/InternVL2-Llama3-76B and OpenGVLab/InternVL-Chat-V1-5 #2164
Comments
If I try pytorch backend, I get on startup:
Same for InternVL 1-5 model. My nvidia-smi if helpful:
This is after the model is already on the 0-3 GPUs But some suggest cuda 11.8 is fine for sm_90: https://discuss.pytorch.org/t/cuda-version-conundrum/185714/3 |
Ah, even OpenGVLab/InternVL-Chat-V1-5 segfaults. So lmdeploy is broken somehow, because I'm using the same docker build scripts as I used for already-running cases just fine, only difference is using latest lmdeploy repo hash |
I'm also confused by this in readme:
But the docker/Dockerfile still references cu118 and (I guess) uses tritonserver that only as cuda 11.8. Is this a problem for deploying on H100? It's worked on lmdeploy from (maybe) 2-3 weeks ago, so I guess not, but maybe the pytorch issue is related. |
Why can't the docker image use updated triton server image? https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-24-06.html#rel-24-06 that uses cuda 12.5? And why is the triton server image used as base image at all? Seems overly complicaetd and you don't even use the triton server. Why not just normal Ubuntu with python 3.10? |
The exact same build process but on f613814 works fine, no segfault, so definitely a regression. |
@pseudotensor hi, thanks for your feedback. Looks like it only happens in docker image of triton-serve based. #1971 can fix it. |
@pseudotensor hi, could you kindly try on this updated dockerfile from #2182? Any feedback would be greatly appreciated. |
The initial version of lmdeploy inherits FasterTransformer and triton inference server. |
Still hitting segfaults, unsure same issue: #2223 Probably same, so not fixed. |
Hi, @pseudotensor |
Hi I plan to do the debugging thing: #2223 (comment) Just busy with other stuff. I'm unable to give access to the machine directly, but we can do a shared debugging session if that's helpful. You can email me at pseudotensor@gmail.com to setup details |
Checklist
Describe the bug
Text handling by the model is fine, but any image leads to crash
Reproduction
Just do any single image request as one would for other vision models. This is sufficient to cause crash every time:
I haven't used the latest lmdeploy for other models like internVL 1-5 that work fine with older version, so it's possible those are broken too. I'll try InternVL 1-5 to see if lmdeploy is generally broken.
Environment
Using latest docker w/ extra build for vision stuff on 4*H100
See how docker image is built here: #2163
Error traceback
The text was updated successfully, but these errors were encountered: