Skip to content
Aurélien PIERRE edited this page Mar 5, 2019 · 24 revisions

Understanding color and color management

Some useful ressources to jump into digital color management, editing pipeline, calibrations, view transform, etc. :

Writing efficient code

Pixels are essentially 4D RGBA vectors. Since 2004, processors have got special abilities to process vectors and apply Single Instructions on Multiple Data (SIMD). This allows to speed-up the computations by processing 1 pixel (SSE2) to 4 pixels (AVX-512) at the same time, saving a lot of CPU cycles.

darktable has 3 version of its IOPs : pure C (scalar), SSE2 (vectorized for 4 floats) and OpenCL (vectorized on GPU). That triggers some redundancy in the code. However, modern compilers and the OpenMP library have auto-vectorization options that could optimize pure C, provided the code is written in a vectorizable way and uses some pragmas to give hints to the compiler.

Write vectorizable code : https://info.ornl.gov/sites/publications/files/Pub69214.pdf

Best practices for auto-vectorization:

  • avoid branches in loops that change the control flow. Use inline statements like absolute = (x > 0) ? x : -x; so they can be converted to bytes masks in SIMD,
  • try to reference pixels from the base pointer of their array instead of declaring intermediate pointers that could look like aliasing to the compiler,
  • if you use nested loops (e.g. loop on the width and height of the array), declare the pixel pointers in the innermost loop and use collapse(2) in the OpenMP pragma so the compiler will be able to optimize the cache/memory use and split the loop more evenly between the different threads,
  • use flat indexing of arrays whenever possible (for(size_t k = 0 ; k < ch * width * height ; k += ch)) instead of nested width/height/channels loops,
  • use the restrict keyword on image/pixels pointers to avoid aliasing and avoid inplace operations on pixels (*out must always be different from *in) so you don't trigger variable dependencies between threads
  • align arrays on 64 bytes and pixels on 16 bytes blocks so the memory is contiguous and the CPU can load full cache lines (and avoid segfaults),
  • write small functions and optimize locally (one loop/function), using OpenMP and/or compiler pragmas,
  • keep your code stupid simple, systematic and avoid smart-ass pointer arithmetic because it will only lead the compiler to detect variable dependencies and pointer aliasing where there are none,
  • avoid types casts,
  • declare input/output pointers as *const and variables as const to avoid false-sharing in parallel loops (using shared(variable) OpenMP pragma),
  • look at Rawtherapee source code because these guys got it right.

Views

Preferences

Modules

Modules are the interfaces for IOPs, i.e. image-processing filters stacked in the pixelpipe. IOPs can be found in src/iop and the IOP API can be found in the header src/iop/iop_api.h.

Most IOP have 3 variant of their pixel-filtering part:

  1. a pure C implementation, in process()
  2. a C optimized version, with SSE2 intrinsics, in process_sse2()
  3. an OpenCL version, offloading the computation to the GPU, in process_opencl().

An example of a dummy IOP can be found in src/iop/useless.c and used as a boilerplate.

If you add a new IOP, be sure to add the C file in src/iop/CMakeLists.txt#L69 and deal with its priority in the pixelpipe by adding a new node in tools/iop_dependencies.py

Libs