-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use vendored version of cupy.pad with added performance optimizations #482
Conversation
Codecov ReportBase: 92.95% // Head: 92.89% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #482 +/- ##
================================================
- Coverage 92.95% 92.89% -0.06%
================================================
Files 130 131 +1
Lines 9775 9905 +130
================================================
+ Hits 9086 9201 +115
- Misses 689 704 +15
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There is a typo but I think it can be corrected later if we don't want to spend extra time triggering CI/CD again.
This version provides faster, elementwise kernel implementations for common padding modes. This version should also be submited upstream to CuPy itself.
apply isort fix typo
ab5a4db
to
b557cf8
Compare
/merge |
Overview
This version provides faster, elementwise kernel implementations for common padding modes.
It is under
_vendored
because most ofpad.py
is copied from CuPy itself. The only new part there is the_use_elementwise_kernel
utility and the conditional branch where it evaluates to True. The newly written code is mostly inpad_elementwise.py
.I could potentially further refactor
pad.py
to remove most of the code and just call out tocupy.pad
instead whenever we aren't using the elementwise kernels.This version should also be submited upstream to CuPy itself.
Padding performance is substantially improved for modes
edge
,symmetric
,reflect
andwrap
. Most places in cuCIM where we use padding, it is not the bottleneck, but it should still provide a small performance improvement in several places. I ran some benchmarks, and the largest impact I saw was around 25% reduction in run-time forchan_vese
.Benchmark Results (vs.
cupy.pad
)In the following, the next-to-last column is the overall acceleration observed. It is large for small 2D or 3D images (>5x) and becomes relatively small for larger images (e.g. ~10% for 4k images).
The final column only relates to the amount of time spent on the host. That "accel. CPU" number always strongly favors the new implementation. It has lower host overhead because everything is done in a single kernel call rather than potentially using multiple kernels for each axis in turn. This kernel launch overhead explains why the overall benefit is much higher for the smaller image sizes.