Arithmetic cleanup and additions #164

Joel-Dahne · 2023-08-08T12:06:06Z

This is a slightly larger change than the previous PR. For some time I have wanted to split the arithmetic.jl file into several, more focues, files. I here split the file into three parts:

arithmethic.jl: contains +, -, *, / and ^ as well as sign, abs, real, imag and conj.
elementary.jl: contains elementary functions such as exp, sin and cos.
minmax.jl: contains min, max, minmax, minimum, maximum and extrema.

The test-file is also split in the same way.

For arithmethic.jl and minmax.jl I also add some more methods. For arithmetic.jl it's

Multi argument versions of + and *
fma and muladd
cube
signbit

For minmax.jl it adds multi argument versions of min and max.

Most of these are straight forward additions. There is however one thing I'm not sure how to handle, precision for multi argument methods. For now unary methods we in general preserve the precision of the input and binary methods take the maximum of the precision of the two inputs. However, for methods with more inputs it gets a bit clunky to take into account the precision of all the inputs. For now I have opted to use the precision of the first or the maximum of the first two arguments as precision for the multi argument methods. I'm however not sure if this is the best thing to do. The affected methods are

fma(x, y, z)
muladd(x, y, z)
+(a, b, c), +(a, b, c, d), +(a, b, c, d, e)
*(a, b, c), *(a, b, c, d), *(a, b, c, d, e)
min(a, b, c), min(a, b, c, d), min(a, b, c, d, e)
max(a, b, c), max(a, b, c, d), max(a, b, c, d, e)

Note that this slightly changes the behavior from earlier, where for example +(a, b, c) would be computed as ((a + b) + c) and each addition would take the maximum of the precision of the two arguments. Now the maximum of the precision of a and b is used. I'm open for comments on how to handle this!

In general I have realized it is quite annoying to handle precision correctly in all cases. I'm now more okay with in most cases assuming all input have the same precision and that this is the same as the global precision. We have very few tests that check that the precision is preserved as expected anyway.

It is now split into arithmetic.jl, elementary.jl and minmax.jl. The corresponding tests have also been split into three files. The file arithmethic.jl contains +, -, *, / and ^ as well as sign, abs, real, imag and conj. The file elementary.jl contains elementary functions such as exp, sin and cos. The file minmax.jl contains min, max, minmax, minimum, maximum and extrema.

Adds the following new methods: - Multi argument versions of + and * - fma and muladd - cube - signbit It also adds a special case for binary operations with irrationals, reducing the number of allocations.

Adds multi argument versions for min and max.

kalmarek · 2023-08-13T15:05:46Z

src/arithmetic.jl

+        # Define more efficient multi argument versions, similar to BigFloat
+        # TODO: Now precision is determined by first two arguments
+        for T in (MagOrRef, ArfOrRef, ArbOrRef, AcbOrRef)
+            @eval function Base.$jf(a::$T, b::$T, c::$T)


Maybe redefine _precision to handle multiple arguments:

_precision(a,b,args...) = max(_precision(a), _precision(b, args...))

and then

function Base.$jf(a::$T, b::$T, args...) z = $T(prec=_precision(a, b, args...)) z = $af(z, a, b) for x in args z = $af(z, z, x) end return z end

?
we could also abuse this pattern and define

$af(z, a, b, args...) = (z = $af(z, z, a); $af(z, b, args...))

but that changes the semantics of $af, so better not do this directly (function wrapper in the middle would be ok?)

Leaving aside the precision question for now. I got interested in the different ways to write a method handling any numbers of arguments. The results were somewhat surprising and differs a lot between Julia 1.9.2 and 1.10-beta. The methods I tried were

function mul_test_1(a::Arb, b::Arb, c::Arb, d::Arb, e::Arb) z = a * b mul!(z, z, c) mul!(z, z, d) mul!(z, z, e) return z end function mul_test_2(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb}) z = a * b mul!(z, z, c) for x in rest mul!(z, z, x) end return z end mul_test_3(a::Arb, b::Arb) = a * b function mul_test_3(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb}) z = mul_test_3(b, c, rest...) mul!(z, a, z) return z end

The tests I ran are

a = Arb(1) b1 = @benchmarkable Arblib.mul_test_1($a, $a, $a, $a, $a) b2 = @benchmarkable Arblib.mul_test_2($a, $a, $a, $a, $a) b3 = @benchmarkable Arblib.mul_test_3($a, $a, $a, $a, $a)

In Julia 1.9.2 I get the following results

julia> run(b1, evals=1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 68.075 ns … 54.788 μs ┊ GC (min … max): 0.00% … 58.35% Time (median): 70.518 ns ┊ GC (median): 0.00% Time (mean ± σ): 147.215 ns ± 1.955 μs ┊ GC (mean ± σ): 29.33% ± 2.20% ▁▅▇█▇▅▄▃▃▃▂▃▄▄▃▂▁ ▁▂▃▃▃▂▁▁ ▂ ██████████████████▇█▇▇▇▆▆▆▆▅▆▅▅▄▄▄▆██████████▇▇▇▆▆▆▆▅▅▅▆▆▅▄▅ █ 68.1 ns Histogram: log(frequency) by time 102 ns < Memory estimate: 64 bytes, allocs estimate: 1. julia> run(b2, evals=1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 97.290 ns … 15.608 μs ┊ GC (min … max): 0.00% … 61.43% Time (median): 100.489 ns ┊ GC (median): 0.00% Time (mean ± σ): 184.264 ns ± 1.084 μs ┊ GC (mean ± σ): 27.50% ± 4.62% ▃▇█▆▄▄▃▄▅▄▃▂▂▂▂▂▁▁ ▂ ███████████████████▇█▇▇▇▆▆▆▆▄▄▄▅▅▄▄▄▁▁▃▄▄▃▁▃▁▁▃▄▆▆▅▆▇▇▇▆▃▆▆▆ █ 97.3 ns Histogram: log(frequency) by time 168 ns < Memory estimate: 256 bytes, allocs estimate: 7. julia> run(b3, evals=1000) BenchmarkTools.Trial: 8418 samples with 1000 evaluations. Range (min … max): 465.435 ns … 8.043 μs ┊ GC (min … max): 0.00% … 67.40% Time (median): 486.168 ns ┊ GC (median): 0.00% Time (mean ± σ): 592.378 ns ± 847.952 ns ┊ GC (mean ± σ): 12.32% ± 7.95% █ ▁ ██▆▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▇ █ 465 ns Histogram: log(frequency) by time 7.54 μs < Memory estimate: 656 bytes, allocs estimate: 18.

So version 1 is the fastest and the only one that doesn't do extra allocations. The third one allocates a lot and is much slower!

With Julia 1.10-beta I instead get

julia> run(b1, evals=1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 76.832 ns … 378.336 μs ┊ GC (min … max): 0.00% … 56.04% Time (median): 98.380 ns ┊ GC (median): 0.00% Time (mean ± σ): 136.132 ns ± 3.782 μs ┊ GC (mean ± σ): 15.58% ± 0.56% ▂▅██▃ ▃▄▅▄▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▂▂▁▂▂▄▆██████▆▅▅▅▄▄▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂ ▃ 76.8 ns Histogram: frequency by time 112 ns < Memory estimate: 64 bytes, allocs estimate: 1. julia> run(b2, evals=1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 108.310 ns … 324.235 μs ┊ GC (min … max): 0.00% … 67.75% Time (median): 194.815 ns ┊ GC (median): 0.00% Time (mean ± σ): 287.768 ns ± 5.343 μs ┊ GC (mean ± σ): 24.01% ± 1.35% ▄ ▁▆█▆▆▆▇▄ ▄█▃▃▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▂▁▁▁▂▂▃▄▃▄▅▄▃▄▄▄▄▅████████▇▇▅▄▃▃▂▂▂▂▂ ▄ 108 ns Histogram: frequency by time 225 ns < Memory estimate: 256 bytes, allocs estimate: 7. julia> run(b3, evals=1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 78.538 ns … 379.233 μs ┊ GC (min … max): 0.00% … 55.86% Time (median): 100.804 ns ┊ GC (median): 0.00% Time (mean ± σ): 139.085 ns ± 3.791 μs ┊ GC (mean ± σ): 15.23% ± 0.56% ▃▆█▄ ▂▄▄▄▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▂▁▁▂▁▂▂▂▂▃▄██████▅▅▄▄▄▃▃▃▂▃▃▃▄▄▃▂▂▂▂▂▂▂▂ ▃ 78.5 ns Histogram: frequency by time 117 ns < Memory estimate: 64 bytes, allocs estimate: 1.

So now version 1 and 3 are more or less the same, in particular version 3 is not doing any extra allocations anymore. The second version is still slightly slower. Note also that version 1 is slightly faster in 1.9.2 than in 1.10-beta, I'm not sure why this is the case (can you reproduce it?).

My conclusion from this is that it is maybe not the best idea to write a generic multi argument version, but stick with the manual implementation of 3, 4 and 5 arguments. This is also the same as BigFloat does.

I think this is the issue of inlining; on julia-1.9.2 with

mul_test_3(a::Arb, b::Arb) = a * b function mul_test_3(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb}) z = mul_test_3(b, c, rest...) Arblib.mul!(z, a, z) return z end mul_test_4(a::Arb, b::Arb) = a * b @inline function mul_test_4(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb}) z = mul_test_4(b, c, rest...) Arblib.mul!(z, a, z) return z end a = Arb(1) b3 = @benchmarkable mul_test_3($a, $a, $a, $a, $a) b4 = @benchmarkable mul_test_4($a, $a, $a, $a, $a) run(b3, evals=1000) run(b4, evals=1000)

i get

julia> run(b3, evals=1000) BenchmarkTools.Trial: 5755 samples with 1000 evaluations. Range (min … max): 668.101 ns … 16.772 μs ┊ GC (min … max): 0.00% … 74.94% Time (median): 695.130 ns ┊ GC (median): 0.00% Time (mean ± σ): 852.314 ns ± 1.251 μs ┊ GC (mean ± σ): 11.42% ± 7.25% █ ▁ █▇█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▆▆ █ 668 ns Histogram: log(frequency) by time 10.6 μs < Memory estimate: 656 bytes, allocs estimate: 18. julia> run(b4, evals=1000) BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 88.698 ns … 87.804 μs ┊ GC (min … max): 0.00% … 59.67% Time (median): 91.143 ns ┊ GC (median): 0.00% Time (mean ± σ): 202.562 ns ± 2.812 μs ┊ GC (mean ± σ): 28.17% ± 2.02% ▃██▆▄▂ ▁▁▂▃▂▂▁▁ ▂ ███████▆▅▆▆▄▆██▇█▇▇▇▅▆▆▆▆▆▅▅▇▇▆▅▅▄▅▆▅▄▄▅▆▆▇██████████▇▆▆▇▆▇▇ █ 88.7 ns Histogram: log(frequency) by time 150 ns < Memory estimate: 64 bytes, allocs estimate: 1.

Joel-Dahne · 2023-08-19T18:10:08Z

I tried really hard to get a Vararg implementation of the multi-argument methods but failed... I'm not sure if I just missed something or what happened. But I could not get a version that didn't allocate extra on 1.9.2 and 1.10.0-beta2. I now added a multi-argument version of _precision (that I could get to work) and factored out the multi-argument methods to a separate file. I have plans to add polynomial and matrices to these methods so having them in a separate file makes sense I think.

We can possibly add a Vararg version in the future, but for now this is at least better than what we had before!

Joel-Dahne · 2023-08-20T14:49:50Z

I added optimized implementations of arithmetic operations with Rational{<:_BitInteger} as well as optimizations for when the left operand of x - y and x / y is a _BitInteger. With this we should only need one allocation for all arithmetic between Arb / Acb and irrationals, rationals and integers (unless I have missed some case). Irrationals, rationals and integers more or less cover every numerical number you can represent exactly in Julia without allocations so this seems like a good thing to optimize.

I'll merge it sometime tomorrow in case you have any final comments on the last change!

Joel-Dahne added 4 commits August 7, 2023 11:48

Minor cleanup of elementary.jl

07cf9a9

Cleanup of arithmethic.jl and a few new methods

a06d9f8

Adds the following new methods: - Multi argument versions of + and * - fma and muladd - cube - signbit It also adds a special case for binary operations with irrationals, reducing the number of allocations.

Cleanup of minmax.jl

db6745f

Adds multi argument versions for min and max.

Joel-Dahne force-pushed the arithmetic-cleanup branch from b0b62df to db6745f Compare August 8, 2023 14:57

kalmarek reviewed Aug 13, 2023

View reviewed changes

Factor out multi-argument methods into separate file

9a10de8

Joel-Dahne force-pushed the arithmetic-cleanup branch from 4f600c0 to 9a10de8 Compare August 19, 2023 18:11

kalmarek approved these changes Aug 19, 2023

View reviewed changes

Add optimized arithmethic with rationals and integers

dc93618

Joel-Dahne force-pushed the arithmetic-cleanup branch from 730f0b0 to dc93618 Compare August 20, 2023 14:54

Joel-Dahne merged commit f52153f into master Aug 21, 2023
12 checks passed

Joel-Dahne deleted the arithmetic-cleanup branch August 21, 2023 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arithmetic cleanup and additions #164

Arithmetic cleanup and additions #164

Joel-Dahne commented Aug 8, 2023

kalmarek Aug 13, 2023

Joel-Dahne Aug 17, 2023

kalmarek Aug 19, 2023

Joel-Dahne commented Aug 19, 2023 •

edited

Loading

Joel-Dahne commented Aug 20, 2023

Arithmetic cleanup and additions #164

Arithmetic cleanup and additions #164

Conversation

Joel-Dahne commented Aug 8, 2023

kalmarek Aug 13, 2023

Choose a reason for hiding this comment

Joel-Dahne Aug 17, 2023

Choose a reason for hiding this comment

kalmarek Aug 19, 2023

Choose a reason for hiding this comment

Joel-Dahne commented Aug 19, 2023 • edited Loading

Joel-Dahne commented Aug 20, 2023

Joel-Dahne commented Aug 19, 2023 •

edited

Loading