Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Nvidia dependencies #101

Closed
andyl opened this issue Nov 11, 2024 · 6 comments
Closed

Simplify Nvidia dependencies #101

andyl opened this issue Nov 11, 2024 · 6 comments

Comments

@andyl
Copy link

andyl commented Nov 11, 2024

For a given version of EXLA, I find it difficult to know which combination of CUDA drivers, CUDA runtime and cuDNN are needed to make everything work.

Just posting an idea here: Could EXLA/XLA install these dependencies automatically? Or emit a bash script to set things up properly? Or include some sort of healthcheck task?

@polvalente
Copy link

Other than looking in the README for the https://github.com/elixir-nx/xla?tab=readme-ov-file#xla_target table, which does specify the expected versions, I believe automatically installing these would be too invasive.

As a personal anectode, every time my Linux kernel updates, I have to tweak the NVIDIA driver installation -- and the easiest way is to reinstall the NVIDIA drivers and CUDA. If EXLA started tampering with that as well, it could become an unstable system.

I think we could, however, provide some sort of mix xla.check_version --target cuda12 or something similar, where we validate the dependencies for each of the given lines, and emit a warning/fail the task when the validation fails.

@andyl
Copy link
Author

andyl commented Nov 11, 2024

Thanks @polvalente a check_version task would be great! Would you like me to submit a PR for it? If so, LMK if there are features you would like, or guidelines you would like me to follow.

@polvalente
Copy link

I'd like @jonatanklosko to chip in here as he might have some extra thoughts, before we move on with a PR.

I think the task should accept the same values as XLA_TARGET, or use its value directly, and then based on that, validate the key dependency versions.

However, I'm not sure if this is actually doable in a portable way for all of the supported targets.

@jonatanklosko
Copy link
Member

I think portable checks would be problematic. For example, to find out cuDNN version, I'm not sure if there's a more reliable way than searching installed system packages, which is OS-specific. Perhaps we could look for certain files, but I'm pretty sure that's going to be too brittle.

Maybe we should make the EXLA->XLA navigation version aware. Currently the EXLA docs link to https://github.com/elixir-nx/xla, which is always main. We could instead link to `XLA` and then include the versions information in the XLA module docs. Or perhaps that's still too much and we should move all information to the EXLA docs altogether. @josevalim @polvalente thoughts?

@polvalente
Copy link

I like the idea of linking to XLA and having the version-aware info there!

@jonatanklosko
Copy link
Member

I opened PRs with docs improvements, I'm going to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants