-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arithmetic cleanup and additions #164
Conversation
It is now split into arithmetic.jl, elementary.jl and minmax.jl. The corresponding tests have also been split into three files. The file arithmethic.jl contains +, -, *, / and ^ as well as sign, abs, real, imag and conj. The file elementary.jl contains elementary functions such as exp, sin and cos. The file minmax.jl contains min, max, minmax, minimum, maximum and extrema.
Adds the following new methods: - Multi argument versions of + and * - fma and muladd - cube - signbit It also adds a special case for binary operations with irrationals, reducing the number of allocations.
Adds multi argument versions for min and max.
b0b62df
to
db6745f
Compare
src/arithmetic.jl
Outdated
# Define more efficient multi argument versions, similar to BigFloat | ||
# TODO: Now precision is determined by first two arguments | ||
for T in (MagOrRef, ArfOrRef, ArbOrRef, AcbOrRef) | ||
@eval function Base.$jf(a::$T, b::$T, c::$T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe redefine _precision
to handle multiple arguments:
_precision(a,b,args...) =
max(_precision(a), _precision(b, args...))
and then
function Base.$jf(a::$T, b::$T, args...)
z = $T(prec=_precision(a, b, args...))
z = $af(z, a, b)
for x in args
z = $af(z, z, x)
end
return z
end
?
we could also abuse this pattern and define
$af(z, a, b, args...) = (z = $af(z, z, a); $af(z, b, args...))
but that changes the semantics of $af
, so better not do this directly (function wrapper in the middle would be ok?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving aside the precision question for now. I got interested in the different ways to write a method handling any numbers of arguments. The results were somewhat surprising and differs a lot between Julia 1.9.2 and 1.10-beta. The methods I tried were
function mul_test_1(a::Arb, b::Arb, c::Arb, d::Arb, e::Arb)
z = a * b
mul!(z, z, c)
mul!(z, z, d)
mul!(z, z, e)
return z
end
function mul_test_2(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb})
z = a * b
mul!(z, z, c)
for x in rest
mul!(z, z, x)
end
return z
end
mul_test_3(a::Arb, b::Arb) = a * b
function mul_test_3(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb})
z = mul_test_3(b, c, rest...)
mul!(z, a, z)
return z
end
The tests I ran are
a = Arb(1)
b1 = @benchmarkable Arblib.mul_test_1($a, $a, $a, $a, $a)
b2 = @benchmarkable Arblib.mul_test_2($a, $a, $a, $a, $a)
b3 = @benchmarkable Arblib.mul_test_3($a, $a, $a, $a, $a)
In Julia 1.9.2 I get the following results
julia> run(b1, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 68.075 ns … 54.788 μs ┊ GC (min … max): 0.00% … 58.35%
Time (median): 70.518 ns ┊ GC (median): 0.00%
Time (mean ± σ): 147.215 ns ± 1.955 μs ┊ GC (mean ± σ): 29.33% ± 2.20%
▁▅▇█▇▅▄▃▃▃▂▃▄▄▃▂▁ ▁▂▃▃▃▂▁▁ ▂
██████████████████▇█▇▇▇▆▆▆▆▅▆▅▅▄▄▄▆██████████▇▇▇▆▆▆▆▅▅▅▆▆▅▄▅ █
68.1 ns Histogram: log(frequency) by time 102 ns <
Memory estimate: 64 bytes, allocs estimate: 1.
julia> run(b2, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 97.290 ns … 15.608 μs ┊ GC (min … max): 0.00% … 61.43%
Time (median): 100.489 ns ┊ GC (median): 0.00%
Time (mean ± σ): 184.264 ns ± 1.084 μs ┊ GC (mean ± σ): 27.50% ± 4.62%
▃▇█▆▄▄▃▄▅▄▃▂▂▂▂▂▁▁ ▂
███████████████████▇█▇▇▇▆▆▆▆▄▄▄▅▅▄▄▄▁▁▃▄▄▃▁▃▁▁▃▄▆▆▅▆▇▇▇▆▃▆▆▆ █
97.3 ns Histogram: log(frequency) by time 168 ns <
Memory estimate: 256 bytes, allocs estimate: 7.
julia> run(b3, evals=1000)
BenchmarkTools.Trial: 8418 samples with 1000 evaluations.
Range (min … max): 465.435 ns … 8.043 μs ┊ GC (min … max): 0.00% … 67.40%
Time (median): 486.168 ns ┊ GC (median): 0.00%
Time (mean ± σ): 592.378 ns ± 847.952 ns ┊ GC (mean ± σ): 12.32% ± 7.95%
█ ▁
██▆▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▇ █
465 ns Histogram: log(frequency) by time 7.54 μs <
Memory estimate: 656 bytes, allocs estimate: 18.
So version 1 is the fastest and the only one that doesn't do extra allocations. The third one allocates a lot and is much slower!
With Julia 1.10-beta I instead get
julia> run(b1, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 76.832 ns … 378.336 μs ┊ GC (min … max): 0.00% … 56.04%
Time (median): 98.380 ns ┊ GC (median): 0.00%
Time (mean ± σ): 136.132 ns ± 3.782 μs ┊ GC (mean ± σ): 15.58% ± 0.56%
▂▅██▃
▃▄▅▄▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▁▂▂▁▂▂▄▆██████▆▅▅▅▄▄▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂ ▃
76.8 ns Histogram: frequency by time 112 ns <
Memory estimate: 64 bytes, allocs estimate: 1.
julia> run(b2, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 108.310 ns … 324.235 μs ┊ GC (min … max): 0.00% … 67.75%
Time (median): 194.815 ns ┊ GC (median): 0.00%
Time (mean ± σ): 287.768 ns ± 5.343 μs ┊ GC (mean ± σ): 24.01% ± 1.35%
▄ ▁▆█▆▆▆▇▄
▄█▃▃▄▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▂▁▁▁▂▂▃▄▃▄▅▄▃▄▄▄▄▅████████▇▇▅▄▃▃▂▂▂▂▂ ▄
108 ns Histogram: frequency by time 225 ns <
Memory estimate: 256 bytes, allocs estimate: 7.
julia> run(b3, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 78.538 ns … 379.233 μs ┊ GC (min … max): 0.00% … 55.86%
Time (median): 100.804 ns ┊ GC (median): 0.00%
Time (mean ± σ): 139.085 ns ± 3.791 μs ┊ GC (mean ± σ): 15.23% ± 0.56%
▃▆█▄
▂▄▄▄▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▂▁▁▂▁▂▂▂▂▃▄██████▅▅▄▄▄▃▃▃▂▃▃▃▄▄▃▂▂▂▂▂▂▂▂ ▃
78.5 ns Histogram: frequency by time 117 ns <
Memory estimate: 64 bytes, allocs estimate: 1.
So now version 1 and 3 are more or less the same, in particular version 3 is not doing any extra allocations anymore. The second version is still slightly slower. Note also that version 1 is slightly faster in 1.9.2 than in 1.10-beta, I'm not sure why this is the case (can you reproduce it?).
My conclusion from this is that it is maybe not the best idea to write a generic multi argument version, but stick with the manual implementation of 3, 4 and 5 arguments. This is also the same as BigFloat
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the issue of inlining; on julia-1.9.2
with
mul_test_3(a::Arb, b::Arb) = a * b
function mul_test_3(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb})
z = mul_test_3(b, c, rest...)
Arblib.mul!(z, a, z)
return z
end
mul_test_4(a::Arb, b::Arb) = a * b
@inline function mul_test_4(a::Arb, b::Arb, c::Arb, rest::Vararg{Arb})
z = mul_test_4(b, c, rest...)
Arblib.mul!(z, a, z)
return z
end
a = Arb(1)
b3 = @benchmarkable mul_test_3($a, $a, $a, $a, $a)
b4 = @benchmarkable mul_test_4($a, $a, $a, $a, $a)
run(b3, evals=1000)
run(b4, evals=1000)
i get
julia> run(b3, evals=1000)
BenchmarkTools.Trial: 5755 samples with 1000 evaluations.
Range (min … max): 668.101 ns … 16.772 μs ┊ GC (min … max): 0.00% … 74.94%
Time (median): 695.130 ns ┊ GC (median): 0.00%
Time (mean ± σ): 852.314 ns ± 1.251 μs ┊ GC (mean ± σ): 11.42% ± 7.25%
█ ▁
█▇█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▆▆ █
668 ns Histogram: log(frequency) by time 10.6 μs <
Memory estimate: 656 bytes, allocs estimate: 18.
julia> run(b4, evals=1000)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 88.698 ns … 87.804 μs ┊ GC (min … max): 0.00% … 59.67%
Time (median): 91.143 ns ┊ GC (median): 0.00%
Time (mean ± σ): 202.562 ns ± 2.812 μs ┊ GC (mean ± σ): 28.17% ± 2.02%
▃██▆▄▂ ▁▁▂▃▂▂▁▁ ▂
███████▆▅▆▆▄▆██▇█▇▇▇▅▆▆▆▆▆▅▅▇▇▆▅▅▄▅▆▅▄▄▅▆▆▇██████████▇▆▆▇▆▇▇ █
88.7 ns Histogram: log(frequency) by time 150 ns <
Memory estimate: 64 bytes, allocs estimate: 1.
I tried really hard to get a We can possibly add a |
4f600c0
to
9a10de8
Compare
I added optimized implementations of arithmetic operations with I'll merge it sometime tomorrow in case you have any final comments on the last change! |
730f0b0
to
dc93618
Compare
This is a slightly larger change than the previous PR. For some time I have wanted to split the
arithmetic.jl
file into several, more focues, files. I here split the file into three parts:arithmethic.jl
: contains +, -, *, / and ^ as well as sign, abs, real, imag and conj.elementary.jl
: contains elementary functions such as exp, sin and cos.minmax.jl
: contains min, max, minmax, minimum, maximum and extrema.The test-file is also split in the same way.
For
arithmethic.jl
andminmax.jl
I also add some more methods. Forarithmetic.jl
it's+
and*
fma
andmuladd
cube
signbit
For
minmax.jl
it adds multi argument versions ofmin
andmax
.Most of these are straight forward additions. There is however one thing I'm not sure how to handle, precision for multi argument methods. For now unary methods we in general preserve the precision of the input and binary methods take the maximum of the precision of the two inputs. However, for methods with more inputs it gets a bit clunky to take into account the precision of all the inputs. For now I have opted to use the precision of the first or the maximum of the first two arguments as precision for the multi argument methods. I'm however not sure if this is the best thing to do. The affected methods are
fma(x, y, z)
muladd(x, y, z)
+(a, b, c)
,+(a, b, c, d)
,+(a, b, c, d, e)
*(a, b, c)
,*(a, b, c, d)
,*(a, b, c, d, e)
min(a, b, c)
,min(a, b, c, d)
,min(a, b, c, d, e)
max(a, b, c)
,max(a, b, c, d)
,max(a, b, c, d, e)
Note that this slightly changes the behavior from earlier, where for example
+(a, b, c)
would be computed as((a + b) + c)
and each addition would take the maximum of the precision of the two arguments. Now the maximum of the precision ofa
andb
is used. I'm open for comments on how to handle this!In general I have realized it is quite annoying to handle precision correctly in all cases. I'm now more okay with in most cases assuming all input have the same precision and that this is the same as the global precision. We have very few tests that check that the precision is preserved as expected anyway.