AOR2p

1. Description

2. Functions / command line arguments

Usage: ./aor2p input-file [options]

2.1. Basic arithmetic operations

2.1.1. Arithmetic add
px[r,g,b] <= px[r,g,b] + c

-a=<const> / --add=<const>

2.1.2. Arithmetic sub
px[r,g,b] <= px[r,g,b] - c

-s=<const> / --sub=<const>

2.1.3. Arithmetic inverse sub
px[r,g,b] <= c - px[r,g,b]

-is=<const> / --isub=<const>

2.1.4. Arithmetic mul
px[r,g,b] <= px[r,g,b] * c

-m=<const> / --mul=<const>

2.1.5. Arithmetic div
px[r,g,b] <= px[r,g,b] / c

-d=<const> / --div=<const>

2.1.7. Arithmetic inverse div
px[r,g,b] <= c / px[r,g,b]

-id=<const> / --idiv=<const>

2.2. Arithmetic operations with saturation

2.2.1. Add with saturation
px[r,g,b] <= px[r,g,b] + c > PX_MAX ? PX_MAX : px[r,g,b] + c

-as=<const> / --add-saturate=<const>

2.2.2. Sub with saturation
px[r,g,b] <= px[r,g,b] - c < 0 ? 0 : px[r,g,b] - c

-ss=<const> / --sub-saturate=<const>

2.2.3. Inverse sub with saturation
px[r,g,b] <= c - px[r,g,b] < 0 ? 0 : c - px[r,g,b]

-iss=<const> / --isub-saturate=<const>

2.3. Misc arithmetic operations

2.3.1. Pow
px[r,g,b] <= px[r,g,b]**c

-p=<const> / --pow=<const>

2.3.2. Log
px[r,g,b] <= log2(px[r,g,b])

-l / --log

2.3.3. Abs
px[r,g,b] <= abs(px[r,g,b])

--abs

2.3.4. Min
px[r,g,b] <= min(px[r,g,b], c)

--min=<const>

2.3.5. Max
px[r,g,b] <= max(px[r,g,b], c)

--max=<const>

2.4. Image processing

2.4.1. Negative
px[r,g,b] <= PX_MAX - px[r,g,b]

-n / --neg

2.4.2. Greyscale
px[r,g,b] <= 0.299 * px[r] + 0.587 * px[g] + 0.114 * px[b]

-gs / --greyscale

2.5. Kernel filtering

-k=<file> / --kern=<file>

2.6. File operations

-o=<file>

2.7. Optimization level

--thread-count=n
--no-pipeline
--no-simd
-s0 - no optimizations
-s2 - SIMD, no pipeline
-s3 - SIMD & pipeline, default

3. Performance

Time taken for a single operation on 1280x838 sample image, averaged over 10 tests.

Tested on:

AMD Ryzen 3700U, 4 cores / 8 threads, 8GB RAM, Debian 11
AMD Ryzen 2200G, 4 cores / 8 threads, 16GB RAM, Debian 12
AMD Epyc 7R32, 16 cores / 32 threads, 64GB RAM, Debian 11 (AWS c5a.8xlarge instance)

op	t avg simd[ns]	t avg nosimd[ns]	relative
add	208,966	9,048,538	43.30x
sub	234,554	9,139,344	38.96x
subi	167,414	7,725,080	46.14x
mul	242,342	8,503,376	35.09x
div	527,300	15,933,050	30.22x
divi	560,900	13,873,760	24.73x
adds	198,910	8,159,120	41.02x
subs	187,672	8,236,660	43.89x
subis	177,922	6,982,600	39.25x
pow	244,310	59,400,760	243.14x
log	467,486	27,075,790	57.92x
abs	199,296	7,703,620	38.65x
min	204,034	6,366,710	31.20x
max	181,212	6,475,020	35.73x
neg	192,072	7,619,170	39.67x
gs	465,720	8,371,089	17.97x
kern3x3	7,548,670	28,859,810	3.82x
kern5x5	14,156,100	71,436,110	5.05x

* Performance is heavily dependent on both hardware and OS. Results shown are best-case scenario. Running on other hardware and OS may yield slightly worse relative performance, but still much faster than naive implementation.
** Testing has show that there is no real benefit from running with more than 4 threads.

Belgrade, June 2023.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
benchmark		benchmark
include		include
kern		kern
lib/stb		lib/stb
res		res
src		src
valgrind		valgrind
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AOR2p

1. Description

2. Functions / command line arguments

2.1. Basic arithmetic operations

2.2. Arithmetic operations with saturation

2.3. Misc arithmetic operations

2.4. Image processing

2.5. Kernel filtering

2.6. File operations

2.7. Optimization level

3. Performance

About

Languages

kiclu/aor2p

Folders and files

Latest commit

History

Repository files navigation

AOR2p

1. Description

2. Functions / command line arguments

2.1. Basic arithmetic operations

2.2. Arithmetic operations with saturation

2.3. Misc arithmetic operations

2.4. Image processing

2.5. Kernel filtering

2.6. File operations

2.7. Optimization level

3. Performance

About

Topics

Resources

Stars

Watchers

Forks

Languages