-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Developer's guide
Some useful ressources to jump into digital color management, editing pipeline, calibrations, view transform, etc. :
- https://www.visualeffectssociety.com/sites/default/files/files/cinematic_color_ves.pdf
- https://acescentral.com/
- http://last.hit.bme.hu/download/firtha/video/Colorimetry/Fairchild_M._Color_appearance_models__2005.pdf
Pixels are essentially 4D RGBA vectors. Since 2004, processors have got special abilities to process vectors and apply Single Instructions on Multiple Data (SIMD). This allows to speed-up the computations by processing 1 pixel (SSE2) to 4 pixels (AVX-512) at the same time, saving a lot of CPU cycles.
darktable has 3 version of its IOPs : pure C (scalar), SSE2 (vectorized for 4 floats) and OpenCL (vectorized on GPU). That triggers some redundancy in the code. However, modern compilers and the OpenMP library have auto-vectorization options that could optimize pure C, provided the code is written in a vectorizable way and uses some pragmas to give hints to the compiler.
Write vectorizable code : https://info.ornl.gov/sites/publications/files/Pub69214.pdf
Best practices for auto-vectorization:
- avoid branches in loops that change the control flow. Use inline statements like
absolute = (x > 0) ? x : -x;
so they can be converted to bytes masks in SIMD, - try to reference pixels from the base pointer of their array instead of declaring intermediate pointers that could look like aliasing to the compiler,
- if you use nested loops (e.g. loop on the width and height of the array), declare the pixel pointers in the innermost loop and use
collapse(2)
in the OpenMP pragma so the compiler will be able to optimize the cache/memory use and split the loop more evenly between the different threads, - use flat indexing of arrays whenever possible (
for(size_t k = 0 ; k < ch * width * height ; k += ch)
) instead of nested width/height/channels loops, - use the
restrict
keyword on image/pixels pointers to avoid aliasing and avoid inplace operations on pixels (*out
must always be different from*in
) so you don't trigger variable dependencies between threads - align arrays on 64 bytes and pixels on 16 bytes blocks so the memory is contiguous and the CPU can load full cache lines (and avoid segfaults),
- write small functions and optimize locally (one loop/function), using OpenMP and/or compiler pragmas,
- keep your code stupid simple, systematic and avoid smart-ass pointer arithmetic because it will only lead the compiler to detect variable dependencies and pointer aliasing where there are none,
- avoid types casts,
- declare input/output pointers as
*const
and variables asconst
to avoid false-sharing in parallel loops (usingshared(variable)
OpenMP pragma), - look at Rawtherapee source code because these guys got it right.
Modules are the interfaces for IOPs, i.e. image-processing filters stacked in the pixelpipe. IOPs can be found in src/iop and the IOP API can be found in the header src/iop/iop_api.h.
Most IOP have 3 variant of their pixel-filtering part:
- a pure C implementation, in
process()
- a C optimized version, with SSE2 intrinsics, in
process_sse2()
- an OpenCL version, offloading the computation to the GPU, in
process_opencl()
.
An example of a dummy IOP can be found in src/iop/useless.c and used as a boilerplate.
If you add a new IOP, be sure to add the C file in src/iop/CMakeLists.txt#L69 and deal with its priority in the pixelpipe by adding a new node in tools/iop_dependencies.py
darktable wiki is licensed under the Creative Commons BY-SA 4.0 terms.