-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson #9735
[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson #9735
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
e41c937
to
da79e3e
Compare
I don't think you need to create another platform class. if Jetson does not support nvlink, you can just detect Jetson and return false for the nvlink detection. |
creating a platform just for Jetson is too overkill. |
It's not just NVLink; currently that's only used for custom allreduce as far as I can tell. The CUDA platform class assumes NVML is available throughout to avoid initialising a CUDA context, but Jetson doesn't have NVML, so a different approach needs to be taken. I didn't want to pollute the CUDA platform by initializing a CUDA context, so created a different one. Happy to rework if you prefer a different approach? |
Alternatively, could try to abstract away the NVML calls to make calls to an alternative backend on Jetson - jetson_stats might be usable, but not sure if all features will be implemented (especially if we want to use other NVML features in future). |
I believe Jetson is also currently the only CUDA-supporting device to use a unified memory configuration, so some optimisations may be possible there (e.g. handover of weights between CPU and GPU without copy). However it could be argued to save the separate backend for when such optimisations are actually implemented. |
Looking at the logs, adding 8.7 to the ARCHS is not affecting the binary size at all: This PR:
Latest main:
Does this make sense? |
Possibly - as far as I know 8.6 and 8.7 are essentially compatible, so it's possible it's just the same kernels being linked twice? I did have to add the 8.7 capability to get vLLM running on Orin though; without, it'd complain that there was no compatible kernel available. |
400c3d1
to
ef7ff3c
Compare
Ok, I didn't catch this before, but this PR isn't adding the sm87 kernels to the wheel file (just adds support for compiling them). IIUC, Jetson is ARM-only and the vLLM wheels are only built for x86, so all that looks good to me for this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave it to @youkaichao to decide whether it makes sense to have Jetson in its own platform, but lgtm otherwise
I can find https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/vllm who builds vllm on jetson. I don't see any manipulation like this. |
Just investigated this. NVIDIA JetPack 6 includes NVML support, so the alternate platform will not be needed on JetPack 6. However, JetPack 5 and older still do not support NVML, and JetPack 6 is only releasing for the Orin series, not any older Jetson devices. So if we want to support JetPack 5 devices we will need to use the alternate non-NVML approach. |
51085cf
to
c93ecc4
Compare
Yes, I am currently building this directly on Jetson outside of containers. Build process is fully specified by my nixpkgs branch; for Jetson support make sure to override the source hash to use this branch. You are welcome to build it directly using Nix (I am running jetpack-nixos) or adapt to whatever build scripts you are using. |
Bump, still pending review |
import os | ||
|
||
def cuda_is_jetson() -> bool: | ||
return os.path.isfile("/etc/nv_tegra_release") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to check with nvidia folks, how robust it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check came from this thread:
rapidsai/dask-cuda#400 (comment)
vllm/platforms/cuda.py
Outdated
context.log_warnings() | ||
|
||
|
||
class CudaPlatform(Platform): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can have NvmlCudaPlatform
and NonNvmlCudaPlatform
inheriting from Platform
, and in the end of this file, based on jetson or not, define a variable CudaPlatform
to point to either NvmlCudaPlatform
or NonNvmlCudaPlatform
.
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now! Thanks for the fix!
@conroy-cheers can you please merge main to solve the conflict? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto thanks for the responses!
Merged main and retested on Jetson. Ready to go, assuming CI passes 👍 |
6644c52
to
040dc12
Compare
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
Head branch was pushed to by a user without write access
040dc12
to
77daa0f
Compare
CI seems to be failing, not sure if it's related to the changes here? |
@conroy-cheers failing test is unrelated I'll ask for a force merge, thanks for your work! |
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
Signed-off-by: Conroy Cheers <conroy@corncheese.org> Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
NVIDIA Jetson devices do not support the NVML protocol.
This PR adds a discrete
CudaJetsonPlatform
platform which does not rely on NVML, and is automatically usedon Jetson devices when NVML cannot be initialized. The Jetson platform supports all CUDA functionality, but does
not support NVLink; platform-specific checks are adjusted where appropriate to account for this.
Jetson Orin devices use CUDA capability 8.7 which is mostly identical to 8.6; capability 8.7 has been added to definitions
where 8.6 is used.
This has been built and tested on Jetson Orin AGX.
Fixes #9728