-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cross compile cuda support (cnt'd) #210
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
Yes we'll need those. Those contain the headers, the generic unversioned symlinked shared library, the static libraries that we may want specifically for CUDA because there's some implications IIRC, and the pkgconfig files. |
Argh, the test here cannot work because at test time, the "BUILD_PLATFORM" is already emulated.
Added in 91610af; not sure if my scripting is worth a damn, I'm not experienced with |
recipe/cross_compile_support.sh
Outdated
# (names need "_" not "-" to match spelling in manifest) | ||
declare -a DEVELS=( | ||
"cuda_cudart" | ||
"cuda_driver" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you need to use the cuda_cudart
version for cuda_driver
since it doesn't have its own key.
recipe/cross_compile_support.sh
Outdated
declare -a DEVELS=( | ||
"cuda_cudart" | ||
"cuda_driver" | ||
"cuda_nvml" | ||
"cuda_nvrtc" | ||
"libcublas" | ||
"libcufft" | ||
"libcurand" | ||
"libcusolver" | ||
"libcusparse" | ||
"libnpp" | ||
"libnvjpeg" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add: nvidia_driver
, nvidia_fabric_manager
, nvidia_libXNVCtrl
. All three should use the version from the nvidia_driver
key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this and the cuda_driver
thing make it sound like we need to do a full mapping here. Currently it was just a simple scheme: xyz -> xyz_devel
.
You're saying I need to change this to:
# new key... ...uses version from
cuda_cudart_devel -> cuda_cudart
cuda_driver_devel -> cuda_cudart # special
cuda_nvrtc_devel -> cuda_nvrtc
cuda_nvml_devel -> cuda_nvml_dev # special
libcublas_devel -> libcublas
libcufft_devel -> libcufft
libcurand_devel -> libcurand
libcusolver_devel -> libcusolver
libcusparse_devel -> libcusparse
libnpp_devel -> libnpp
libnvjpeg_devel -> libnvjpeg
nvidia_driver_devel -> nvidia_driver
nvidia_fabric_manager -> nvidia_driver # special
nvidia_libXNVCtrl -> nvidia_driver # special
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you specify what you mean by "those"? The original list was based on the devel-libs you had in your PR, but weren't listed in the manifest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvidia_libXNVCtrl
is GPL licensed (https://github.com/NVIDIA/nvidia-settings/blob/main/COPYING) and not governed under the EULA so I think (I am not a lawyer!) we can actually ship that as a conda package if desired so that can definitely be removed from the list in hindsight. My apologies.
I can't find a license or EULA for the nvidia_fabric_manager
(should actually have been nvidia_fabric_manager_devel
, my bad again), so I assumed we weren't allowed to ship it as a conda package. I haven't seen anyone actually use this library before so I'd be perfectly happy to leave it out.
There's other things that would potentially be needed that there aren't stub libraries available for:
nvidia-driver-libs
notably containslibnvoptix
and a bunch of other librariesnvidia-driver-cuda-libs
notably containslibnvidia-nvvm
andlibnvidia-ptxjitcompiler
and more generally contains the libraries that the symlinks innvidia_driver_devel
point to
I have seen a fair amount of software that uses these options where I think it would make sense to add these libraries: nvidia_driver_devel
, nvidia_driver_libs
, and nvidia_driver_cuda_libs
.
Notably with the proposed above, we'll be reliant on stubs for the CUDA driver library libcuda
and the NVIDIA Management Library libnvidia-ml
. I'm not sure how build systems like CMake will react to using a mix of actual and stub libraries. Hopefully everything will be okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvidia-libXNVCtrl{,-devel}
and nvidia-fabric-manager{,-devel}
don't exist under:
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/cross-linux-sbsa/
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/ppc64le/
fbb536a
to
ca40910
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is way more complicated and downloads more things than needed. The packages I had in my PR were the only ones in $CUDA_HOME/targets/sbsa-linux
which is what the cross compiler looks at. All the other libraries are irrelevant.
Fair enough, I was just following the instructions to download packages based on the manifest. I started with downloading everything (+ what Keith told me to add), but if we can filter/shorten the list, all the better |
I think there's a handful of other libraries that end up in the sysroot (I think?) that there aren't stubs for that we'd want as well that weren't included in your PR. I did my best to break them down in my comment in the thread. |
I'll let you two decide what should be included. My goal here was just to make it possible to use the versions from the manifest. As feared, this turned into a fair amount of code, due to various small and not so small divergences between the manifest and the actual RPMs. It's conceivable that someone might prefer to just hardcode the list of deps, but now that it works (w/ ability to map in additional rpms, and the obvious possibility to filter out unwanted packages), I tend to think it's probably a good thing going forward. @isuruf, I managed to get this to run through to the point where you had added
at the end of the loop, but this fails with
|
@isuruf @kkraus14 At the moment the list is as follows:
|
More compact list:
|
I'm pretty sure now that this happens because I was running this in the recipe-section for debugging purposes, where we don't actually have root rights. I think that if this script is run where it should, that actually should run through. In any case, that it works up until this point shows that the downloads are fine, which means I'm marking this as ready (happy to clean up commit history if desired). |
docker_image: # [os.environ.get("BUILD_PLATFORM", "").startswith("linux") and (ppc64le or aarch64)] | ||
# case: native compilation (build == target) | ||
- quay.io/condaforge/linux-anvil-ppc64le-cuda:11.2 # [ppc64le and os.environ.get("BUILD_PLATFORM") == "linux-ppc64le"] | ||
- quay.io/condaforge/linux-anvil-aarch64-cuda:11.2 # [aarch64 and os.environ.get("BUILD_PLATFORM") == "linux-aarch64"] | ||
# case: cross-compilation (build != target) | ||
- quay.io/condaforge/linux-anvil-cuda:11.2 # [ppc64le and os.environ.get("BUILD_PLATFORM") == "linux-64"] | ||
- quay.io/condaforge/linux-anvil-cuda:11.2 # [aarch64 and os.environ.get("BUILD_PLATFORM") == "linux-64"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if adding this migration is strictly speaking necessary, but since the linux builds have both cuda / non-cuda, I think we should do the same for aarch/ppc?
This reflects the changes in conda-forge/conda-forge-pinning-feedstock#3624, which we should ideally merge before-hand (then we can skip the use_local: true
here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this until nvcc-feedstock is ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I'll remove it; though I wonder if we'll still be able to cross-compile cuda without this (which is the point of this PR, no?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you propose we solve the issue in #210 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you propose we solve the issue in #210 (comment)?
I don't know yet, I'm trying to help work through the issues as they appear. But the overarching goal remains to cross-compile CUDA, so if we're falling short of that, we should keep this open IMO.
This turned out to be wrong... So now I'm not sure what we need to tweak (chown?) to get the packages into Also worth noting - not everything gets installed in |
f8078db
to
7b2c42b
Compare
…nda-forge-pinning 2023.01.30.13.41.09
This reverts commit f0ba50f.
…nda-forge-pinning 2023.02.01.21.26.02
OK, I'll keep this in mind. For context, I often need to trawl through feedstock history, and I find sequences of commits like: really annoying (aside from the content-free commit messages that the GH UI encourages), because they tackle one specific issue and semantically should be grouped together. This is not a dig at your commits here, I do the same during development as well, I just have the habit of cleaning up my commit history as I go resp. before merging, so that it's easier to navigate down the line. But now that I know that you don't like that, I'll make an exception for your commits. |
If you want, you can clean up the commit message when there's no message, but keep my commits as they are in the future. |
So what are the next steps we need to tackle for cross-compiling CUDA, and where do we track them (if not here)? |
We can track them in nvcc-feedstock |
New PR since I cannot commit into #209
Closes #209