Efficient cuda kernels for reductions #382

nkoppel · 2023-01-20T17:51:29Z

So far, implements a more efficient sum_to kernel that will have a maximum write contention of the number of blocks (groups of 1024 threads) running concurrently. Operations within each block scale with log2(min(chunk_size, block_size)). Resolves #332, and will depend on #380 for @ViliamVadocz 's fix to atomicMaxf and atomicMinf.

nkoppel · 2023-01-20T19:24:02Z

I've now finished the min_to and max_to kernels.

coreylowman

Looks great, thanks for this contribution! Just have some questions to make sure I'm following 🚀

src/tensor_ops/max_to/cuda_kernel.rs

src/tensor_ops/internal_reshapes.rs

src/tensor_ops/max_to/max_to.cu

coreylowman

Awesome changes, thanks for contribution!

nkoppel added 7 commits January 18, 2023 10:03

Implement adam optimizer cuda kernel

89bb4c5

add chunk_sum to sum cuda kernel and modify supporting code to use it

baf7fef

fix stride computation for sum_to cuda kernel

0161a3a

run cargo fmt; efficiency/readibility changes

a7248fd

rename funciton; fix bugs in chunk_sum; add test for chunk_sum

6fe431c

rename funciton; fix bugs in chunk_sum; more tests for sum

6e79ba5

simplify, document, and rename permute_for_reductions

88cf72e

nkoppel marked this pull request as draft January 20, 2023 17:52

nkoppel added 4 commits January 20, 2023 12:19

move permute_for_reductions; optimize max_to cuda kernel

6cc6278

readability tweaks

0cd9285

implement min_to cuda kernel

fc8c55c

run cargo fmt

025d883

nkoppel marked this pull request as ready for review January 20, 2023 19:23

nkoppel changed the title ~~WIP Efficient cuda kernels for reductions~~ Efficient cuda kernels for reductions Jan 20, 2023

Merge branch 'main' into fast_reductions

2aa56d5

coreylowman reviewed Jan 22, 2023

View reviewed changes

src/tensor_ops/max_to/cuda_kernel.rs Show resolved Hide resolved

src/tensor_ops/internal_reshapes.rs Show resolved Hide resolved

src/tensor_ops/max_to/max_to.cu Show resolved Hide resolved

coreylowman approved these changes Jan 23, 2023

View reviewed changes

coreylowman merged commit 1fedba0 into coreylowman:main Jan 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient cuda kernels for reductions #382

Efficient cuda kernels for reductions #382

nkoppel commented Jan 20, 2023 •

edited

Loading

nkoppel commented Jan 20, 2023

coreylowman left a comment

coreylowman left a comment

Efficient cuda kernels for reductions #382

Efficient cuda kernels for reductions #382

Conversation

nkoppel commented Jan 20, 2023 • edited Loading

nkoppel commented Jan 20, 2023

coreylowman left a comment

Choose a reason for hiding this comment

coreylowman left a comment

Choose a reason for hiding this comment

nkoppel commented Jan 20, 2023 •

edited

Loading