-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up equalize transform: use bincount instead of histc #3493
Speed up equalize transform: use bincount instead of histc #3493
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3493 +/- ##
=======================================
Coverage 78.75% 78.75%
=======================================
Files 105 105
Lines 9748 9750 +2
Branches 1565 1566 +1
=======================================
+ Hits 7677 7679 +2
Misses 1581 1581
Partials 490 490
Continue to review full report at Codecov.
|
There is a huge difference on CUDA. I think it would be worth opening an issue in PyTorch to point out that the implementation of Also, it would be good to check the number of CUDA again, but this time using |
if img_chan.is_cuda: | ||
hist = torch.histc(img_chan.to(torch.float32), bins=256, min=0, max=255) | ||
else: | ||
hist = torch.bincount(img_chan.view(-1), minlength=256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to clip values between 0 and 255 to keep things consistent between cuda/cpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing minlength=256
ensures that bincount
will count the number of values for all values in [0, 255]
, so histc
and bincount
would be equivalent.
In the histc
call we have bins=256, min=0, max=255
, which IIUC assumes that the image is within [0, 255]
already, is this not the case?
I created a new "Improvement" label, in accordance with #3351 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…3493) Summary: * use bincount instead of hist * only use bincount when on CPU * Added equality test for CPU vs cuda * Fix flake8 and tests * tuple instead of int for size Reviewed By: NicolasHug, cpuhrsch Differential Revision: D26945736 fbshipit-source-id: 5b13a01e1b04d8c92317d3478bf9c9bb1c7d1375
bincount
is faster thanhistc
on CPU, so this should speed up (~2X) theequalize
transforms.This is sort of a follow-up to #3334 and #3173
On GPU:
on CPU (different machine):