Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arithmetic cleanup and additions #164

Merged
merged 6 commits into from
Aug 21, 2023
Merged

Arithmetic cleanup and additions #164

merged 6 commits into from
Aug 21, 2023

Conversation

Joel-Dahne
Copy link
Collaborator

This is a slightly larger change than the previous PR. For some time I have wanted to split the arithmetic.jl file into several, more focues, files. I here split the file into three parts:

  • arithmethic.jl: contains +, -, *, / and ^ as well as sign, abs, real, imag and conj.
  • elementary.jl: contains elementary functions such as exp, sin and cos.
  • minmax.jl: contains min, max, minmax, minimum, maximum and extrema.

The test-file is also split in the same way.

For arithmethic.jl and minmax.jl I also add some more methods. For arithmetic.jl it's

  • Multi argument versions of + and *
  • fma and muladd
  • cube
  • signbit

For minmax.jl it adds multi argument versions of min and max.

Most of these are straight forward additions. There is however one thing I'm not sure how to handle, precision for multi argument methods. For now unary methods we in general preserve the precision of the input and binary methods take the maximum of the precision of the two inputs. However, for methods with more inputs it gets a bit clunky to take into account the precision of all the inputs. For now I have opted to use the precision of the first or the maximum of the first two arguments as precision for the multi argument methods. I'm however not sure if this is the best thing to do. The affected methods are

  • fma(x, y, z)
  • muladd(x, y, z)
  • +(a, b, c), +(a, b, c, d), +(a, b, c, d, e)
  • *(a, b, c), *(a, b, c, d), *(a, b, c, d, e)
  • min(a, b, c), min(a, b, c, d), min(a, b, c, d, e)
  • max(a, b, c), max(a, b, c, d), max(a, b, c, d, e)

Note that this slightly changes the behavior from earlier, where for example +(a, b, c) would be computed as ((a + b) + c) and each addition would take the maximum of the precision of the two arguments. Now the maximum of the precision of a and b is used. I'm open for comments on how to handle this!

In general I have realized it is quite annoying to handle precision correctly in all cases. I'm now more okay with in most cases assuming all input have the same precision and that this is the same as the global precision. We have very few tests that check that the precision is preserved as expected anyway.

It is now split into arithmetic.jl, elementary.jl and minmax.jl. The
corresponding tests have also been split into three files.

The file arithmethic.jl contains +, -, *, / and ^ as well as sign,
abs, real, imag and conj.

The file elementary.jl contains elementary functions such as exp, sin
and cos.

The file minmax.jl contains min, max, minmax, minimum, maximum and
extrema.
Adds the following new methods:
- Multi argument versions of + and *
- fma and muladd
- cube
- signbit

It also adds a special case for binary operations with irrationals,
reducing the number of allocations.
Adds multi argument versions for min and max.
Comment on lines 36 to 39
# Define more efficient multi argument versions, similar to BigFloat
# TODO: Now precision is determined by first two arguments
for T in (MagOrRef, ArfOrRef, ArbOrRef, AcbOrRef)
@eval function Base.$jf(a::$T, b::$T, c::$T)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe redefine _precision to handle multiple arguments:

_precision(a,b,args...) = 
    max(_precision(a), _precision(b, args...))

and then

function Base.$jf(a::$T, b::$T, args...)
    z = $T(prec=_precision(a, b, args...))
    z = $af(z, a, b)
    for x in args
        z = $af(z, z, x)
    end
    return z
end

?
we could also abuse this pattern and define

$af(z, a, b, args...) = (z = $af(z, z, a); $af(z, b, args...))

but that changes the semantics of $af, so better not do this directly (function wrapper in the middle would be ok?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving aside the precision question for now. I got interested in the different ways to write a method handling any numbers of arguments. The results were somewhat surprising and differs a lot between Julia 1.9.2 and 1.10-beta. The methods I tried were

function mul_test_1(a::Arb, b::Arb, c::Arb, d::Arb, e::Arb)
    z = a * b
    mul!(z, z, c)
    mul!(z, z, d)
    mul!(z, z, e)
    return z
end

function mul_test_2(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb})
    z = a * b
    mul!(z, z, c)
    for x in rest
        mul!(z, z, x)
    end
    return z
end

mul_test_3(a::Arb, b::Arb) = a * b
function mul_test_3(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb})
    z = mul_test_3(b, c, rest...)
    mul!(z, a, z)
    return z
end

The tests I ran are

a = Arb(1)
b1 = @benchmarkable Arblib.mul_test_1($a, $a, $a, $a, $a)
b2 = @benchmarkable Arblib.mul_test_2($a, $a, $a, $a, $a)
b3 = @benchmarkable Arblib.mul_test_3($a, $a, $a, $a, $a)

In Julia 1.9.2 I get the following results

julia> run(b1, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):   68.075 ns … 54.788 μs  ┊ GC (min … max):  0.00% … 58.35%
 Time  (median):      70.518 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):   147.215 ns ±  1.955 μs  ┊ GC (mean ± σ):  29.33% ±  2.20%

  ▁▅▇█▇▅▄▃▃▃▂▃▄▄▃▂▁                  ▁▂▃▃▃▂▁▁                  ▂
  ██████████████████▇█▇▇▇▆▆▆▆▅▆▅▅▄▄▄▆██████████▇▇▇▆▆▆▆▅▅▅▆▆▅▄▅ █
  68.1 ns       Histogram: log(frequency) by time       102 ns <

 Memory estimate: 64 bytes, allocs estimate: 1.

julia> run(b2, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):   97.290 ns … 15.608 μs  ┊ GC (min … max):  0.00% … 61.43%
 Time  (median):     100.489 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):   184.264 ns ±  1.084 μs  ┊ GC (mean ± σ):  27.50% ±  4.62%

  ▃▇█▆▄▄▃▄▅▄▃▂▂▂▂▂▁▁                                           ▂
  ███████████████████▇█▇▇▇▆▆▆▆▄▄▄▅▅▄▄▄▁▁▃▄▄▃▁▃▁▁▃▄▆▆▅▆▇▇▇▆▃▆▆▆ █
  97.3 ns       Histogram: log(frequency) by time       168 ns <

 Memory estimate: 256 bytes, allocs estimate: 7.

julia> run(b3, evals=1000)
BenchmarkTools.Trial: 8418 samples with 1000 evaluations.
 Range (min … max):  465.435 ns …   8.043 μs  ┊ GC (min … max):  0.00% … 67.40%
 Time  (median):     486.168 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   592.378 ns ± 847.952 ns  ┊ GC (mean ± σ):  12.32% ±  7.95%

  █                                                             ▁
  ██▆▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▇ █
  465 ns        Histogram: log(frequency) by time       7.54 μs <

 Memory estimate: 656 bytes, allocs estimate: 18.

So version 1 is the fastest and the only one that doesn't do extra allocations. The third one allocates a lot and is much slower!

With Julia 1.10-beta I instead get

julia> run(b1, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):   76.832 ns … 378.336 μs  ┊ GC (min … max):  0.00% … 56.04%
 Time  (median):      98.380 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   136.132 ns ±   3.782 μs  ┊ GC (mean ± σ):  15.58% ±  0.56%

                                    ▂▅██▃                        
  ▃▄▅▄▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▂▂▁▂▂▄▆██████▆▅▅▅▄▄▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂ ▃
  76.8 ns          Histogram: frequency by time          112 ns <

 Memory estimate: 64 bytes, allocs estimate: 1.

julia> run(b2, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  108.310 ns … 324.235 μs  ┊ GC (min … max):  0.00% … 67.75%
 Time  (median):     194.815 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   287.768 ns ±   5.343 μs  ┊ GC (mean ± σ):  24.01% ±  1.35%

   ▄                                        ▁▆█▆▆▆▇▄             
  ▄█▃▃▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▂▁▁▁▂▂▃▄▃▄▅▄▃▄▄▄▄▅████████▇▇▅▄▃▃▂▂▂▂▂ ▄
  108 ns           Histogram: frequency by time          225 ns <

 Memory estimate: 256 bytes, allocs estimate: 7.

julia> run(b3, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):   78.538 ns … 379.233 μs  ┊ GC (min … max):  0.00% … 55.86%
 Time  (median):     100.804 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   139.085 ns ±   3.791 μs  ┊ GC (mean ± σ):  15.23% ±  0.56%

                                   ▃▆█▄                          
  ▂▄▄▄▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▂▁▁▂▁▂▂▂▂▃▄██████▅▅▄▄▄▃▃▃▂▃▃▃▄▄▃▂▂▂▂▂▂▂▂ ▃
  78.5 ns          Histogram: frequency by time          117 ns <

 Memory estimate: 64 bytes, allocs estimate: 1.

So now version 1 and 3 are more or less the same, in particular version 3 is not doing any extra allocations anymore. The second version is still slightly slower. Note also that version 1 is slightly faster in 1.9.2 than in 1.10-beta, I'm not sure why this is the case (can you reproduce it?).

My conclusion from this is that it is maybe not the best idea to write a generic multi argument version, but stick with the manual implementation of 3, 4 and 5 arguments. This is also the same as BigFloat does.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the issue of inlining; on julia-1.9.2 with

mul_test_3(a::Arb, b::Arb) = a * b
function mul_test_3(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb})
    z = mul_test_3(b, c, rest...)
    Arblib.mul!(z, a, z)
    return z
end

mul_test_4(a::Arb, b::Arb) = a * b
@inline function mul_test_4(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb})
    z = mul_test_4(b, c, rest...)
    Arblib.mul!(z, a, z)
    return z
end

a = Arb(1)
b3 = @benchmarkable mul_test_3($a, $a, $a, $a, $a)
b4 = @benchmarkable mul_test_4($a, $a, $a, $a, $a)

run(b3, evals=1000)
run(b4, evals=1000)

i get

julia> run(b3, evals=1000)
BenchmarkTools.Trial: 5755 samples with 1000 evaluations.
 Range (min … max):  668.101 ns … 16.772 μs  ┊ GC (min … max):  0.00% … 74.94%
 Time  (median):     695.130 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):   852.314 ns ±  1.251 μs  ┊ GC (mean ± σ):  11.42% ±  7.25%

  █                                                            ▁
  █▇█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▆▆ █
  668 ns        Histogram: log(frequency) by time      10.6 μs <

 Memory estimate: 656 bytes, allocs estimate: 18.

julia> run(b4, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):   88.698 ns … 87.804 μs  ┊ GC (min … max):  0.00% … 59.67%
 Time  (median):      91.143 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):   202.562 ns ±  2.812 μs  ┊ GC (mean ± σ):  28.17% ±  2.02%

  ▃██▆▄▂                                      ▁▁▂▃▂▂▁▁         ▂
  ███████▆▅▆▆▄▆██▇█▇▇▇▅▆▆▆▆▆▅▅▇▇▆▅▅▄▅▆▅▄▄▅▆▆▇██████████▇▆▆▇▆▇▇ █
  88.7 ns       Histogram: log(frequency) by time       150 ns <

 Memory estimate: 64 bytes, allocs estimate: 1.

@Joel-Dahne
Copy link
Collaborator Author

Joel-Dahne commented Aug 19, 2023

I tried really hard to get a Vararg implementation of the multi-argument methods but failed... I'm not sure if I just missed something or what happened. But I could not get a version that didn't allocate extra on 1.9.2 and 1.10.0-beta2. I now added a multi-argument version of _precision (that I could get to work) and factored out the multi-argument methods to a separate file. I have plans to add polynomial and matrices to these methods so having them in a separate file makes sense I think.

We can possibly add a Vararg version in the future, but for now this is at least better than what we had before!

@Joel-Dahne
Copy link
Collaborator Author

I added optimized implementations of arithmetic operations with Rational{<:_BitInteger} as well as optimizations for when the left operand of x - y and x / y is a _BitInteger. With this we should only need one allocation for all arithmetic between Arb / Acb and irrationals, rationals and integers (unless I have missed some case). Irrationals, rationals and integers more or less cover every numerical number you can represent exactly in Julia without allocations so this seems like a good thing to optimize.

I'll merge it sometime tomorrow in case you have any final comments on the last change!

@Joel-Dahne Joel-Dahne merged commit f52153f into master Aug 21, 2023
12 checks passed
@Joel-Dahne Joel-Dahne deleted the arithmetic-cleanup branch August 21, 2023 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants