Asynchronously copy table data to the host during shuffle #11280

jlowe · 2024-07-31T18:21:55Z

Leverages rapidsai/cudf#16429 to asynchronously copy the partitioned table data to the host. This avoids unnecessary stream synchronization between each device buffer being copied back to the host and better overlaps CPU and GPU work during sliceInternalOnCpu. This has no measurable performance difference on NDS runs because the schema of shuffled data is relatively narrow (i.e.: not many separate buffers to copy back to the host and thus not many unnecessary CUDA stream synchronizations to save), but for wide shuffled schemas, this can make a significant difference. For example, sliceInternalOnCpu for a repartition of a table with 512 integer columns takes half the time with this asynchronous copy.

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

jlowe · 2024-07-31T18:52:39Z

build

jlowe added 3 commits July 31, 2024 09:11

Copy table columns back to the host asynchronously

b570982

Signed-off-by: Jason Lowe <jlowe@nvidia.com>

Avoid synchronizing until after the device buffers have been freed

0b0fd89

Use withResource

19b4947

jlowe added the performance A performance related task/issue label Jul 31, 2024

jlowe self-assigned this Jul 31, 2024

revans2 approved these changes Jul 31, 2024

View reviewed changes

jlowe mentioned this pull request Jul 31, 2024

Consider releasing the GPU semaphore earlier during shuffle partitioning #11281

Open

abellina approved these changes Jul 31, 2024

View reviewed changes

jlowe merged commit dfcff71 into NVIDIA:branch-24.10 Jul 31, 2024
44 checks passed

jlowe deleted the copy-host-async branch July 31, 2024 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asynchronously copy table data to the host during shuffle #11280

Asynchronously copy table data to the host during shuffle #11280

jlowe commented Jul 31, 2024

jlowe commented Jul 31, 2024

Asynchronously copy table data to the host during shuffle #11280

Asynchronously copy table data to the host during shuffle #11280

Conversation

jlowe commented Jul 31, 2024

jlowe commented Jul 31, 2024