Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for CUDA versions >= 12 #87

Merged
merged 6 commits into from
Oct 23, 2024
Merged

Updates for CUDA versions >= 12 #87

merged 6 commits into from
Oct 23, 2024

Conversation

The9Cat
Copy link
Contributor

@The9Cat The9Cat commented Oct 10, 2024

Summary of updates

  • As of CUDA 12.4, the header only version of nvtx (nvtx3) is the default. The header of the previous version of nvtx is provided but seems to conflict with nvtx3. Preprocessor flags are used to select the compatible version.
  • As of CUDA 12.6, the thrust and cub ABI encode the NVidia architecture and are inconsistent with the same code compiled with the host compiler. The solution here is to compile all code referencing thrust types through nvcc. Many thanks to Georgia Stuart who contributed the CMakeLists.txt recipes for doing this!

Copy link
Member

@michael-petersen michael-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This compiles for me on cuda 12.6 and runs a basic regression simulation. I have not tried on 11.x again to make sure we are backwards compatible. Anyone else tried this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice maintenance change, forced by the new cuda compile strategy, but I think it also makes the codebase simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The goal here was to get the implementation out of the header where it required NVTX-specific structures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is still for testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like cruft to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning on stylistically removing this everywhere? Fine if so and I'll start when I touch any other code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really stylistic. In most compilers, using the "" first checks your local directory, and if it doesn't find a match then moves on to check the system paths. Using <> starts the search with system headers. NVTX.H should always be local. So I would say: use "" when we really mean local directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason extern int cudaGlobalDevice; now comes earlier, or just convenience?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cudaGlobalDevice should be defined independently of whether nvcc is guiding the compile. It's not about earlier but getting it out of the __NVCC__ block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking -- is this true for all cuda >12? Or just 12.4 and up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the release notes, it's true for all of CUDA 12. It could be that some of this was deprecated for a few point releases before finally not working at 12.4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix here, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's old cruft from emacs.

@michael-petersen
Copy link
Member

Somewhat related, GPU runners are now available (see here), so I could imagine working this compile into CI so that we have some advance notice of issues.

georgiastuart and others added 3 commits October 13, 2024 23:03
Copy link
Member

@michael-petersen michael-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been working with this on cuda 12.6, everything seems to be working as expected. Anything left to do before merge?

@@ -53,6 +53,7 @@ public:
//! Finish and clean-up (caching data necessary for restart)
virtual void finish() {}

// #if HAVE_LIBCUDA==1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// #if HAVE_LIBCUDA==1

@The9Cat
Copy link
Contributor Author

The9Cat commented Oct 23, 2024

I think we can merge this after quickly checking that a compile with CUDA 11.x still works. I'm a little nervous that this PR is really a stop-gap measure. In the end, we should probably move the whole compile to C++-20 modules to avoid this nvcc morass. But that's a big job.

@The9Cat
Copy link
Contributor Author

The9Cat commented Oct 23, 2024

Okay, compiles on CUDA < 12. It's a wrap.

@The9Cat The9Cat merged commit 181b8d1 into main Oct 23, 2024
2 checks passed
@The9Cat The9Cat deleted the fix-cuda-16 branch October 23, 2024 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants