Add libcu++ dependency; initial round of `NV_IF_TARGET` ports. #448

alliepiper · 2022-03-24T13:09:15Z

This PR contains an initial set of changes necessary to migrate Thrust and CUB to NV_IF_TARGET and remove dependence on __CUDA_ARCH__. It does not fully remove all usages of __CUDA_ARCH__, but rather focuses on the following:

Establish the libcu++ dependency for both Thrust and CUB.
Remove obsolete checks for unsupported CUDA architectures.
Migrate host/device divergent code from #ifdef __CUDA_ARCH__ to use NV_IF_TARGET.

This also includes various bug fixes for issues exposed by the above.

Future PRs will address the remaining usages of __CUDA_ARCH__ in the CDP macros and the kernel dispatch infrastructure.

Pre-written Release Notes

Breaking Changes

Add libcu++ dependency; initial round of NV_IF_TARGET ports. #448 Add libcu++ dependency.
Add libcu++ dependency; initial round of NV_IF_TARGET ports. #448: The following macros are no longer defined by default. They can be re-enabled by defining CUB_PROVIDE_LEGACY_ARCH_MACROS. These will be completely removed in a future release.
- CUB_IS_HOST_CODE: Replace with NV_IF_TARGET.
- CUB_IS_DEVICE_CODE: Replace with NV_IF_TARGET.
- CUB_INCLUDE_HOST_CODE: Replace with NV_IF_TARGET.
- CUB_INCLUDE_DEVICE_CODE: Replace with NV_IF_TARGET.

Other Enhancements

Add libcu++ dependency; initial round of NV_IF_TARGET ports. #448: Removed special case code for unsupported CUDA architectures.
Add libcu++ dependency; initial round of NV_IF_TARGET ports. #448: Replace several usages of __CUDA_ARCH__ with <nv/target> to handle host/device code divergence.
Add libcu++ dependency; initial round of NV_IF_TARGET ports. #448: Mark unused PTX arch parameters as legacy.

cub/cmake/cub-config.cmake

gevtushenko

A lot of code is much cleaner now, thanks! There are a few minor changes that need to be addressed.

cub/agent/agent_sub_warp_merge_sort.cuh

cub/block/specializations/block_histogram_sort.cuh

cub/block/block_reduce.cuh

cub/detail/target.cuh

cub/device/dispatch/dispatch_segmented_sort.cuh

cub/device/dispatch/dispatch_spmv_orig.cuh

cub/util_arch.cuh

cub/util_debug.cuh

experimental/defunct/test_device_seg_reduce.cu

gevtushenko

Is there a plan on how to address dynamic shared memory allocation without PTX_ARCH when we support redux?. If so, this can be merged.

test/test_util.h

test/test_warp_reduce.cu

alliepiper · 2022-05-16T20:57:49Z

Is there a plan on how to address dynamic shared memory allocation without PTX_ARCH when we support redux?

We'll need to use NV_IF_TARGET, details TBD.

nvc++ will stop defining __NVCOMPILER_CUDA_ARCH__ soon, removing the ability to determine the PTX arch at compile time. This updates agents and collective algorithms to no longer require the PTX_ARCH template parameter, and changes the CUB_WARP_SIZE(PTX_ARCH), etc helpers to ignore their argument. These macros only differed on obsolete arches and have no effect on currently supported architectures.

This fixes the issue reported in NVIDIA#299. There's no clear reason why this should use `RandomBits` unconditionally.

The merge sort test with pow2 >20 fails on GTX 1650. Detect bad_alloc failures and skip those tests. Tests for smaller problem sizes will still fail if there's a bad_alloc.

alliepiper marked this pull request as draft March 24, 2022 13:09

alliepiper added this to the 1.17.0 milestone Mar 24, 2022

alliepiper added the blocked Currently cannot make progress. label Mar 24, 2022

alliepiper mentioned this pull request Mar 24, 2022

Add libcu++ dependency; initial round of NV_IF_TARGET ports. NVIDIA/thrust#1605

Merged

alliepiper force-pushed the if_target_prep branch from 3886111 to 9414e43 Compare March 24, 2022 21:34

alliepiper changed the title ~~libcudacxx, if-target prep~~ Add libcu++ dependency; initial round of NV_IF_TARGET ports. Mar 24, 2022

alliepiper marked this pull request as ready for review March 24, 2022 21:42

robertmaynard reviewed Apr 1, 2022

View reviewed changes

cub/cmake/cub-config.cmake Outdated Show resolved Hide resolved

alliepiper force-pushed the if_target_prep branch from 9414e43 to b1bbe02 Compare April 4, 2022 13:40

alliepiper requested review from gevtushenko and robertmaynard April 4, 2022 17:09

gevtushenko suggested changes Apr 11, 2022

View reviewed changes

alliepiper force-pushed the if_target_prep branch from b1bbe02 to 3efed83 Compare April 13, 2022 21:09

alliepiper modified the milestones: 1.17.0, 2.0.0 Apr 25, 2022

robertmaynard approved these changes May 3, 2022

View reviewed changes

alliepiper force-pushed the if_target_prep branch from 3efed83 to b523fc5 Compare May 10, 2022 21:20

gevtushenko approved these changes May 11, 2022

View reviewed changes

test/test_util.h Outdated Show resolved Hide resolved

test/test_warp_reduce.cu Outdated Show resolved Hide resolved

Add libcu++ dependency.

f42070e

alliepiper force-pushed the if_target_prep branch from b523fc5 to f037174 Compare May 16, 2022 21:26

Remove checks for obsolete architectures.

f9beaa5

alliepiper added 5 commits May 16, 2022 18:04

Use NV_IF_TARGET to select between host/device/sm implementations.

c4299c4

Don't use host-only functions in host-device contexts.

476c1b8

This fixes the issue reported in NVIDIA#299. There's no clear reason why this should use `RandomBits` unconditionally.

Use Thrust's kernel launch helper in DispatchRadixSort.

f4d61fb

Skip large allocation tests that exceed device memory.

4de961a

The merge sort test with pow2 >20 fails on GTX 1650. Detect bad_alloc failures and skip those tests. Tests for smaller problem sizes will still fail if there's a bad_alloc.

alliepiper force-pushed the if_target_prep branch from f037174 to 4de961a Compare May 16, 2022 22:05

alliepiper merged commit 5571258 into NVIDIA:main May 17, 2022

alliepiper deleted the if_target_prep branch May 17, 2022 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add libcu++ dependency; initial round of `NV_IF_TARGET` ports. #448

Add libcu++ dependency; initial round of `NV_IF_TARGET` ports. #448

alliepiper commented Mar 24, 2022 •

edited

Loading

gevtushenko left a comment

gevtushenko left a comment

alliepiper commented May 16, 2022

Add libcu++ dependency; initial round of NV_IF_TARGET ports. #448

Add libcu++ dependency; initial round of NV_IF_TARGET ports. #448

Conversation

alliepiper commented Mar 24, 2022 • edited Loading

Pre-written Release Notes

Breaking Changes

Other Enhancements

gevtushenko left a comment

Choose a reason for hiding this comment

gevtushenko left a comment

Choose a reason for hiding this comment

alliepiper commented May 16, 2022

Add libcu++ dependency; initial round of `NV_IF_TARGET` ports. #448

Add libcu++ dependency; initial round of `NV_IF_TARGET` ports. #448

alliepiper commented Mar 24, 2022 •

edited

Loading