-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Color equalizer OpenCL implementation #17372
Conversation
3dcfc87
to
c6fd19c
Compare
Force-pushed a) fixing a possibly sqrt of a negative and b) using the new |
c6fd19c
to
5311186
Compare
Using CPU:
Using GPU dt crashes for me:
|
A pre-requisite for efficient color equalizer OpenCL support.
New opencl kernel functions for 2-channel cl_mem images and support in gaussian API.
1. lookup_gamut might be used elsewhere thus it has been moved to colorspace.h 2. kernel_interpolate_bilinear can be used with 1-4 channels, it's last parameter has been changed for correctness and callers have been modified accordingly. 3. In laplacian there was one call for (2) that missed a kernel parameter. So far result was likely correct ... Checked code there and modified to modern dt_opencl_enqueue_kernel_2d_args() usage 4. Bumped CL kernel version so enforcing a recompilation. 5. Introduce dt_opencl_duplicate_image(const int devid, const cl_mem src)
1. Independent of chosen white level. 2. Generally darker so better viewing.
5311186
to
bc071d2
Compare
force-pushed some updates - there were race conditions with released cl images that mostly were no problem here on intel (intel seems to late-release so data might still be valid for some time, this has been observed elsewhere before). But - i could reproduce something as reported by @TurboGit so maybe another round. If still crashing we would have to investigate the interpolate_bilinear and gaussian on 2-ch images. |
When I see "[dev_pixelpipe] took 0,060 secs (0,388 CPU) [full] processed `colorequal' on GPU, blended on GPU" its running correctly on the opencl path, right? So far no issues on ROCM 6.2.0 with my 6700 XT |
Added kernels plus some missing OpenCL colorspace conversion inline functions.
Should be output_identical compared with CPU code after testing except minimal differences due to some use of native OpenCL functions for performance (as we do elsewhere while converting pixel colorspace.
Exactly. |
bc071d2
to
8468c73
Compare
Just force-pushed some reduction in opencl memory consumption by simple reordering plus removing one kernel call that might help for performance. |
BTW if opencl runs fine i would be very interested in performance on your system. |
Testing this new version, good news no crash on my side. The perf is far better than CPU:
I'll review and do more testing tomorrow. Thanks for the hard work @jenshannoschwalm, I know I keep saying that... but that's your fault you keep doing great stuff for darktable :) |
Speed improvements are massive. Sliders feel so much smoother. CPU: Ryzen 9 7950X.
vs.
|
apple m1max:
|
So good news, on all supported platforms it seems to be running fine with some per gains (except me here currently on Intel 620 for the week). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, all good to me!
It works give good results but it seems to have quite a high diff between the CPU and GPU version:
For other tests, the CPU vs GPU differ for about 30000 pixels. So maybe some calculation not fully equivalent? @jenshannoschwalm : Any idea? |
|
For 3 we can see in my report above that the non guided filter has already almost 1e6 diff pixels. The guided filters has 3x more diff pixels. I would say that we need to fix the non-guided part first and we will have figures for the guided filter part. |
I checked again the whole cl source and can't find an obvious principal problem. The whole UCS color space handling in opencl is full of native functions. The other module I am aware of using that would be colorbalancergb making use of that. Do we have a test case with that module in UCS space? EDIT there is 93 |
We have a test for color balance RGB in UCS mode, and the diff CPU vs GPU is ok (~32000 pixels):
|
Currently i only have a pretty low-power notebook with shared intel 620 OpenCL graphics so i can't tell anything about performance gains vs CPU.
There is some more-tricky new stuff here, so i would appreciate
darktable --bench-module colorequal
vsdarktable --disable-opencl --bench-module colorequal
if you don't use other ways