Skip to content
/ aor2p Public

Image processing toolchain with AVX-2 support

Notifications You must be signed in to change notification settings

kiclu/aor2p

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AOR2p

1. Description

2. Functions / command line arguments

Usage: ./aor2p input-file [options]

2.1. Basic arithmetic operations

2.1.1. Arithmetic add
px[r,g,b] <= px[r,g,b] + c

-a=<const> / --add=<const>


2.1.2. Arithmetic sub
px[r,g,b] <= px[r,g,b] - c

-s=<const> / --sub=<const>


2.1.3. Arithmetic inverse sub
px[r,g,b] <= c - px[r,g,b]

-is=<const> / --isub=<const>


2.1.4. Arithmetic mul
px[r,g,b] <= px[r,g,b] * c

-m=<const> / --mul=<const>


2.1.5. Arithmetic div
px[r,g,b] <= px[r,g,b] / c

-d=<const> / --div=<const>


2.1.7. Arithmetic inverse div
px[r,g,b] <= c / px[r,g,b]

-id=<const> / --idiv=<const>


2.2. Arithmetic operations with saturation

2.2.1. Add with saturation
px[r,g,b] <= px[r,g,b] + c > PX_MAX ? PX_MAX : px[r,g,b] + c

-as=<const> / --add-saturate=<const>


2.2.2. Sub with saturation
px[r,g,b] <= px[r,g,b] - c < 0 ? 0 : px[r,g,b] - c

-ss=<const> / --sub-saturate=<const>


2.2.3. Inverse sub with saturation
px[r,g,b] <= c - px[r,g,b] < 0 ? 0 : c - px[r,g,b]

-iss=<const> / --isub-saturate=<const>


2.3. Misc arithmetic operations

2.3.1. Pow
px[r,g,b] <= px[r,g,b]**c

-p=<const> / --pow=<const>


2.3.2. Log
px[r,g,b] <= log2(px[r,g,b])

-l / --log


2.3.3. Abs
px[r,g,b] <= abs(px[r,g,b])

--abs


2.3.4. Min
px[r,g,b] <= min(px[r,g,b], c)

--min=<const>


2.3.5. Max
px[r,g,b] <= max(px[r,g,b], c)

--max=<const>


2.4. Image processing

2.4.1. Negative
px[r,g,b] <= PX_MAX - px[r,g,b]

-n / --neg


2.4.2. Greyscale
px[r,g,b] <= 0.299 * px[r] + 0.587 * px[g] + 0.114 * px[b]

-gs / --greyscale


2.5. Kernel filtering

-k=<file> / --kern=<file>


2.6. File operations

-o=<file>


2.7. Optimization level

--thread-count=n
--no-pipeline
--no-simd
-s0 - no optimizations
-s2 - SIMD, no pipeline
-s3 - SIMD & pipeline, default

3. Performance

Time taken for a single operation on 1280x838 sample image, averaged over 10 tests.

Tested on:

  • AMD Ryzen 3700U, 4 cores / 8 threads, 8GB RAM, Debian 11
  • AMD Ryzen 2200G, 4 cores / 8 threads, 16GB RAM, Debian 12
  • AMD Epyc 7R32, 16 cores / 32 threads, 64GB RAM, Debian 11 (AWS c5a.8xlarge instance)

op t avg simd[ns] t avg nosimd[ns] relative
add 208,966 9,048,538 43.30x
sub 234,554 9,139,344 38.96x
subi 167,414 7,725,080 46.14x
mul 242,342 8,503,376 35.09x
div 527,300 15,933,050 30.22x
divi 560,900 13,873,760 24.73x
adds 198,910 8,159,120 41.02x
subs 187,672 8,236,660 43.89x
subis 177,922 6,982,600 39.25x
pow 244,310 59,400,760 243.14x
log 467,486 27,075,790 57.92x
abs 199,296 7,703,620 38.65x
min 204,034 6,366,710 31.20x
max 181,212 6,475,020 35.73x
neg 192,072 7,619,170 39.67x
gs 465,720 8,371,089 17.97x
kern3x3 7,548,670 28,859,810 3.82x
kern5x5 14,156,100 71,436,110 5.05x

* Performance is heavily dependent on both hardware and OS. Results shown are best-case scenario. Running on other hardware and OS may yield slightly worse relative performance, but still much faster than naive implementation.
** Testing has show that there is no real benefit from running with more than 4 threads.


Belgrade, June 2023.

About

Image processing toolchain with AVX-2 support

Topics

Resources

Stars

Watchers

Forks

Languages