-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocThrust for thrust functionality in HIP #560
Conversation
Unclear to me why this is failing:
Locally I have rocPRIM, but I can make it download with |
OK I ran locally with the Docker container. It complains that:
So we need a new container with rocm + clang. Trying it out with 22.04 |
This should now compile with acts-project/machines#96, so wait until ubuntu2204_rocm_clang is available. |
🤔 I'm not all too sight about this... 😦 If rocThrust can't deal with HIP not being the thing used for the host code build, then we have a problem. 😦 Since they don't just want "any version" of clang. 🤔 They want the clang version shipped with HIP. (It's a longer story, that AMD's "HIP language support in CMake" makes use of the underlying More than anything else, this looks like a shortcoming in their CMake configuration. I'm not at all convinced that the basic problem is on our side. Even if updating the Docker images might well be due. |
We could potentially get round this by installing rocprim and maybe some other developer tools in advance. That's the setup I have locally. |
Actually rocThrust demands clang:
despite being perfectly happy to build without it. There is an issue: |
No response on the issue above from AMD. However, there is probably some value in running a clang build in the CI even if we are doing it for the wrong reasons. The CI looks OK now - @krasznaa, @stephenswat could I have a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things... 😉
.github/workflows/builds.yml
Outdated
@@ -30,6 +30,9 @@ jobs: | |||
- name: CPU | |||
container: ghcr.io/acts-project/ubuntu2004:v30 | |||
options: -DTRACCC_USE_ROOT=FALSE | |||
- name: HIP | |||
container: ghcr.io/acts-project/ubuntu2204_rocm_clang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to put a particular version number here. We can't have the CI using the "latest" image. (Allowing the CI to break by somebody updating the image in some unforeseen way.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that we need a new tag of machines
for this. I could change this to ubuntu2204_rocm_clang:sha-3a6b0b2
for now, which matches latest. Or we can wait for a new machines tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should make a new tag of it. Though again, I believe that once we turn off the unit test builds of rocThrust, we could possibly survive with something as old as ghcr.io/acts-project/ubuntu1804_rocm:v11
. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not. It still insists on running its compiler checks with this change...
I followed up on the cmake warnings in ROCm/rocm-cmake#154 |
Stewart, how would you feel about me intervening here? 🤔 I'd like to put krasznaa@1c26d46 on top of your developments. Technically I can do it, but didn't want to modify your branch before running it by you. Long story short, I spent some time yesterday and today to force rocThrust into submission. To force its CMake configuration to work with our existing ROCm/HIP setups. Mind you, it now works for me with both Though there is still one glaring issue with all of this. We absolutely must depend on rocThrust privately only. As I don't see yet how we could export rocThrust properly with the rest of our code. (Their CMake kung-fu is not an amazing one.) 😦 |
Looks good to me. I did consider patching it but thought this might be getting a bit complicated |
So... We're on a good track, but still one issue to sort out... 🤔 This setup now works with older versions of ROCm/HIP, and with HIP's NVIDIA backend. (Even when using a newer version of HIP.) But at least with HIP 6.0.2, which I tested just now on my home PC that has an AMD card in it, this current setup fails to build with:
This is an interesting issue. The missing symbols come from https://github.com/acts-project/vecmem/blob/main/cmake/FindHIPToolkit.cmake doesn't look for it, it only sets up Long story short, we'll need a new VecMem tag as well before this PR would be in a really good state. 😦 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out, that no. 🤔 It's rather a misconfiguration on my machine.
I have multiple different versions of ROCm installed here. With the "default" being version 5.4. (Since oneAPI's AMD plugin supports that version officially.) But in this build I was trying to rather use 6.0. But clearly my environment setting breaks down with that. 😦
As soon as I use version 5.4 (which has the same library setup), things work fine. So there is probably some need to make the VecMem code smarter later on, but this PR does not need to wait for that.
So that it could be used with our existing ROCm/HIP Docker images as well.
Add test of rocThrust library so we can use thrust on AMD GPUs in the future. Conditional on new TRACCC_BUILD_HIP flag: we don't have a plain HIP implementation of any algorithms so it's slight overkill.
Note that test_thrust.hip is just a copy of test_thrust.cu. So we could achieve the same via ifdefs and conditional CMake, although the directory
test/cuda
would ideally change name.Surprisingly the two thrust libraries can coexist since only NVidia thrust defines
THRUST_ENABLE_INSTALL_RULES
and therefore the CMake commands defined by both do not clash. This is more or less by accident, but hopefully OK for now.