Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Update DeviceScan to pass Thrust's scan tests #210

Merged

Conversation

alliepiper
Copy link
Collaborator

@alliepiper alliepiper commented Oct 7, 2020

Thrust will be switching to cub::DeviceScan to replace its custom scan
implementation. This patch addresses some issues found by the Thrust
tests:

  • Initialize unused BlockLoad items with values known to be in the input
    set. This fixes the TestInclusiveScanWithIndirection Thrust test by
    keeping the plus_mod3 functor indices valid.
  • Use OffsetT instead of int to hold indicies in AgentScan. This
    fixes the Test*ScanWithBigIndexes Thrust tests by not truncating
    the input problem size.
  • Use BLOCK_[STORE|LOAD]_WARP_TRANSPOSED_TIMESLICED instead of
    BLOCK_[STORE|LOAD]_WARP_TRANSPOSED when the intermediate type is
    larger than 128 bytes. This keeps shared memory buffers from growing
    too large in the TestScanWithLargeTypes Thrust test.

@alliepiper alliepiper added this to the 1.11.0 milestone Oct 7, 2020
@alliepiper alliepiper self-assigned this Oct 7, 2020
@alliepiper alliepiper marked this pull request as draft October 7, 2020 15:14
@alliepiper
Copy link
Collaborator Author

Requires NVIDIA/thrust#1304

@alliepiper alliepiper force-pushed the bug/use_cub_scan_in_thrust/gh.thrust1301 branch from 6b125b3 to 51fc06b Compare October 7, 2020 18:55
@alliepiper alliepiper force-pushed the bug/use_cub_scan_in_thrust/gh.thrust1301 branch 2 times, most recently from dd02b11 to f88e0c7 Compare October 13, 2020 20:59
@alliepiper alliepiper changed the title WIP Integrate CUB's scan into thrust. Update DeviceScan to pass Thrust's scan tests Oct 13, 2020
@alliepiper alliepiper removed their assignment Oct 13, 2020
@alliepiper alliepiper marked this pull request as ready for review October 13, 2020 21:06
@alliepiper alliepiper added the testing: gpuCI in progress Started gpuCI testing. label Oct 13, 2020
@alliepiper
Copy link
Collaborator Author

DVS CL 29194365

@alliepiper alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. labels Oct 13, 2020
@RAMitchell
Copy link
Contributor

Not sure if this is covered already by this PR, but discard iterators typicaly don't work with cub interfaces because they can have value_type=void which propagates to OutputT, which cub tries to instantiate. I have been working around this as follows:

// Change the value type of thrust discard iterator so we can use it with cub
template <typename T>
class TypedDiscard : public thrust::discard_iterator<T> {
public:
  using value_type = T;  // NOLINT
};

@alliepiper
Copy link
Collaborator Author

@RAMitchell As of 2fba463, DeviceScan uses the input iterator's type for the scan accumulator, rather than the output iterator's type. But even before that, it had explicit checks for output type == void, so it appears that this was fixed at some point earlier.

Let us know in a new issue if you notice this happening in other algorithms.

@alliepiper alliepiper removed the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Oct 19, 2020
@alliepiper alliepiper force-pushed the bug/use_cub_scan_in_thrust/gh.thrust1301 branch from f88e0c7 to 3742dea Compare October 24, 2020 15:04
@alliepiper
Copy link
Collaborator Author

DVS CL: 29264972

@alliepiper alliepiper added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Oct 30, 2020
@alliepiper alliepiper added testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Nov 2, 2020
Thrust will be switching to `cub::DeviceScan` to replace its custom scan
implementation. This patch addresses some issues found by the Thrust
tests:

- Initialize unused `BlockLoad` items with values known to be in the input
  set. This fixes the `TestInclusiveScanWithIndirection` Thrust test by
  keeping the `plus_mod3` functor indices valid.
- Use `OffsetT` instead of `int` to hold indicies in `AgentScan`. This
  fixes the `Test*ScanWithBigIndexes` Thrust tests by not truncating
  the input problem size.
- Use `BLOCK_[STORE|LOAD]_WARP_TRANSPOSED_TIMESLICED` instead of
  `BLOCK_[STORE|LOAD]_WARP_TRANSPOSED` when the intermediate type is
  larger than 128 bytes. This keeps shared memory buffers from growing
  too large in the `TestScanWithLargeTypes` Thrust test.
@alliepiper alliepiper force-pushed the bug/use_cub_scan_in_thrust/gh.thrust1301 branch from 3742dea to b24ced2 Compare November 2, 2020 19:35
@alliepiper alliepiper merged commit 7e6f33b into NVIDIA:main Nov 2, 2020
@alliepiper alliepiper deleted the bug/use_cub_scan_in_thrust/gh.thrust1301 branch November 2, 2020 19:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants