Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove util_device.cuh from iterator headers to enable online compilation #412

Merged
merged 4 commits into from
Sep 11, 2023

Conversation

leofang
Copy link
Member

@leofang leofang commented Sep 7, 2023

Description

Related to #403.

When using NVRTC (+Jitify) to compile these two headers

#include <cub/block/block_reduce.cuh>
#include <cub/block/block_load.cuh>

util_device.cuh ended up being included by one of the CUB headers, and this function

// Host code, only safely usable in C++11 or newer, where thread-safe
// initialization of static locals is guaranteed. This is a separate function
// to avoid defining a local static in a host/device function.
__host__ inline int DeviceCountCachedValue()
{
static ValueCache<int, DeviceCountUncached> cache;
return cache.value;
}
causes the following compiler error

../util_device.cuh(191): error: dynamic initialization is not supported for a function-scope static __device__ variable within a __device__/__global__ function

Upon a closer look, I realized that by commenting out DeviceCountCachedValue, the struct PerDeviceAttributeCache, and everything depending on them (in the same file, so the radius of blast is small), I was able to make NVRTC happy. Basically, hiding all __host__ only functions and structs from NVRTC should suffice.

@jrhemstad pointed out offline that this header doesn't seem to be used in cache_modified_input_iterator.cuh, and I checked locally that it's true for all iterator headers. Hence this patch.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@leofang leofang requested review from a team as code owners September 7, 2023 13:58
@leofang leofang requested review from elstehle and ericniebler and removed request for a team September 7, 2023 13:58
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 7, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jrhemstad
Copy link
Collaborator

/ok to test

@leofang leofang requested a review from a team as a code owner September 7, 2023 16:44
@leofang
Copy link
Member Author

leofang commented Sep 7, 2023

Findings:

  • Several of the device tuning/dispatch headers need ChainedPolicy, so I made those include util_device.cuh directly, as device algorithms contain host API calls and cannot be compiled by NVRTC anyway.
  • In other places, util_device.cuh was included only to access AliasTemporaries. Since it's __host__ __device__ and compilable by NVRTC, I moved it to a separate util header and made it replace the include of util_device.cuh

@leofang
Copy link
Member Author

leofang commented Sep 8, 2023

Can anyone help kick off CI to validate the changes?

@miscco
Copy link
Collaborator

miscco commented Sep 8, 2023

/ok to test

@leofang
Copy link
Member Author

leofang commented Sep 8, 2023

Thanks, Michael, it seems the tests are green 🙂

Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits

cub/cub/util_temporary_storage.cuh Show resolved Hide resolved
cub/cub/util_temporary_storage.cuh Outdated Show resolved Hide resolved
Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for cleaning up the headers! It's much better now. A minor comment below.

cub/cub/util_device.cuh Show resolved Hide resolved
@miscco
Copy link
Collaborator

miscco commented Sep 11, 2023

/ok to test

@gevtushenko gevtushenko merged commit 137f5c7 into NVIDIA:main Sep 11, 2023
464 checks passed
@leofang leofang deleted the remove_redundant_include branch September 11, 2023 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants