New findmin/max implementation using single-pass reduction #576

maleadt · 2020-11-27T14:12:47Z

This PR replaces the two-kernel findminmax implementation to a single-pass one by @tkf, #484 (comment), as proposed by @Ellipse0934, #320 (comment). I only rebased it and cleaned-up a little.

Performance is about as @Ellipse0934 reported, i.e. massive improvements in some cases, and a small regression in others (which was much less pronounced with my GPU). As the implementation also fixes a bug, I'd say that's worth it. FWIW, I don't think the performance penalty comes from not using warp intrinsics, but due to the bloaty code generated by a tuple-heavy reduce closure. And recent/beefy GPUs like mine are often less sensitive to that.

Fixes #553, fixes #320 (comment)

codecov · 2020-11-27T15:33:11Z

Codecov Report

Merging #576 (042ab92) into master (5f6231d) will decrease coverage by 0.08%.
The diff coverage is 94.44%.

@@            Coverage Diff             @@
##           master     #576      +/-   ##
==========================================
- Coverage   80.20%   80.11%   -0.09%     
==========================================
  Files         116      116              
  Lines        6900     6875      -25     
==========================================
- Hits         5534     5508      -26     
- Misses       1366     1367       +1

Impacted Files	Coverage Δ
src/indexing.jl	`98.18% <94.44%> (-1.82%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f6231d...042ab92. Read the comment docs.

New findmin/max implementation using single-pass reduction

Ellipse0934 and others added 2 commits November 27, 2020 15:12

Reimplement find* using a single mapreduce.

4078f0f

Merge findmin and findmax.

042ab92

maleadt added cuda array Stuff about CuArray. bugfix This gets something working again. performance How fast can we go? labels Nov 27, 2020

maleadt merged commit 3515056 into master Nov 27, 2020

maleadt deleted the tb/findminmax branch November 27, 2020 16:36

maleadt added a commit that referenced this pull request Jan 5, 2021

Merge pull request #576 from JuliaGPU/tb/findminmax

720c4ba

New findmin/max implementation using single-pass reduction

maleadt added a commit that referenced this pull request Jan 6, 2021

Merge pull request #576 from JuliaGPU/tb/findminmax

d85977b

New findmin/max implementation using single-pass reduction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New findmin/max implementation using single-pass reduction #576

New findmin/max implementation using single-pass reduction #576

maleadt commented Nov 27, 2020 •

edited

Loading

codecov bot commented Nov 27, 2020 •

edited

Loading

New findmin/max implementation using single-pass reduction #576

New findmin/max implementation using single-pass reduction #576

Conversation

maleadt commented Nov 27, 2020 • edited Loading

codecov bot commented Nov 27, 2020 • edited Loading

Codecov Report

maleadt commented Nov 27, 2020 •

edited

Loading

codecov bot commented Nov 27, 2020 •

edited

Loading