Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New findmin/max implementation using single-pass reduction #576

Merged
merged 2 commits into from
Nov 27, 2020

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Nov 27, 2020

This PR replaces the two-kernel findminmax implementation to a single-pass one by @tkf, #484 (comment), as proposed by @Ellipse0934, #320 (comment). I only rebased it and cleaned-up a little.

Performance is about as @Ellipse0934 reported, i.e. massive improvements in some cases, and a small regression in others (which was much less pronounced with my GPU). As the implementation also fixes a bug, I'd say that's worth it. FWIW, I don't think the performance penalty comes from not using warp intrinsics, but due to the bloaty code generated by a tuple-heavy reduce closure. And recent/beefy GPUs like mine are often less sensitive to that.

Fixes #553, fixes #320 (comment)

@maleadt maleadt added cuda array Stuff about CuArray. bugfix This gets something working again. performance How fast can we go? labels Nov 27, 2020
@codecov
Copy link

codecov bot commented Nov 27, 2020

Codecov Report

Merging #576 (042ab92) into master (5f6231d) will decrease coverage by 0.08%.
The diff coverage is 94.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #576      +/-   ##
==========================================
- Coverage   80.20%   80.11%   -0.09%     
==========================================
  Files         116      116              
  Lines        6900     6875      -25     
==========================================
- Hits         5534     5508      -26     
- Misses       1366     1367       +1     
Impacted Files Coverage Δ
src/indexing.jl 98.18% <94.44%> (-1.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f6231d...042ab92. Read the comment docs.

@maleadt maleadt merged commit 3515056 into master Nov 27, 2020
@maleadt maleadt deleted the tb/findminmax branch November 27, 2020 16:36
maleadt added a commit that referenced this pull request Jan 5, 2021
New findmin/max implementation using single-pass reduction
maleadt added a commit that referenced this pull request Jan 6, 2021
New findmin/max implementation using single-pass reduction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This gets something working again. cuda array Stuff about CuArray. performance How fast can we go?
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants