-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make map() and broadcast() faster for Union{T, Missing/Nothing} element types #25828
Conversation
Storing typeof(el)/eltype(B) in a variable in the hot part of the loop kills performance because despite being inferred as equal to Union{Type{T}, Type{Missing/Nothing}}, it is marked as ::Any. Only using typeof()/eltype() when isa() fails works around the problem and makes map/broadcast(x->x, ::Vector{Union{Int, Missing}}) about 10 times faster.
@@ -573,12 +573,11 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T | |||
i = offs | |||
while !done(itr, st) | |||
el, st = next(itr, st) | |||
S = typeof(el) | |||
if S === T || S <: T | |||
if el isa T || typeof(el) === T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The need for typeof(el) === T
is a bit of a mystery, but it was weird to need it in the existing code already. See #25799.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that was some attempt at providing a fast-path for uninferred code. But S === T
is often much harder to infer than S <: T
. And since isconcretetype(S)
, it won't be faster at runtime either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet if I remove that check I get a worse performance than before with map(identity, y)
(see what I noted at #25799). That's really weird, but...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keno recently changed the result of el isa T
from Const(true)
to Conditional(:el, T, Union{})
– maybe there's was a mistake in the implementation of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the relevant commit? Can I try reverting it?
# store the result | ||
if S <: eltype(B) | ||
if V isa eltype(B) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This syntax allows the compiler to see / "prove" that S <: T
if this branch is taken, and thus likely eliminate dispatch in the assignment below
@@ -573,12 +573,11 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T | |||
i = offs | |||
while !done(itr, st) | |||
el, st = next(itr, st) | |||
S = typeof(el) | |||
if S === T || S <: T | |||
if el isa T || typeof(el) === T | |||
@inbounds dest[i] = el::T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you remove the typeof(el) === T
condition above, inference will be able to infer the el::T
result and you can remove this redundant type-check
@nanosoldier |
That was after upgrading to the more recent HTTP version, which was supposed to help the intermittent ignoring, but instead it segfaults... So I've pinned at an older HTTP version for now. @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Those are some nice improvements! |
Most of these conditions were introduced in #25828 and #30480 for some performance reasons atm, but now they seem just unnecessary or even harmful in terms of inferrability. There doesn't seem to be any performance difference in the benchmark used at #25828: ```julia using BenchmarkTools x = rand(Int, 100_000); y = convert(Vector{Union{Int,Missing}}, x); z = copy(y); z[2] = missing; ``` > master: ```julia julia> @Btime map(identity, x); 57.814 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, y); 94.040 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, z); 127.554 μs (5 allocations: 1.62 MiB) julia> @Btime broadcast(x->x, x); 59.248 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, y); 74.693 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, z); 126.262 μs (4 allocations: 1.62 MiB) ``` > this commit: ``` julia> @Btime map(identity, x); 58.668 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, y); 94.013 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, z); 126.600 μs (5 allocations: 1.62 MiB) julia> @Btime broadcast(x->x, x); 57.531 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, y); 69.561 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, z); 125.578 μs (4 allocations: 1.62 MiB) ```
Most of these conditions were introduced in #25828 and #30480 for some performance reasons atm, but now they seem just unnecessary or even harmful in terms of inferrability. There doesn't seem to be any performance difference in the benchmark used at #25828: ```julia using BenchmarkTools x = rand(Int, 100_000); y = convert(Vector{Union{Int,Missing}}, x); z = copy(y); z[2] = missing; ``` > master: ```julia julia> @Btime map(identity, x); 57.814 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, y); 94.040 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, z); 127.554 μs (5 allocations: 1.62 MiB) julia> @Btime broadcast(x->x, x); 59.248 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, y); 74.693 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, z); 126.262 μs (4 allocations: 1.62 MiB) ``` > this commit: ```julia julia> @Btime map(identity, x); 58.668 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, y); 94.013 μs (3 allocations: 781.31 KiB) julia> @Btime map(identity, z); 126.600 μs (5 allocations: 1.62 MiB) julia> @Btime broadcast(x->x, x); 57.531 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, y); 69.561 μs (2 allocations: 781.30 KiB) julia> @Btime broadcast(x->x, z); 125.578 μs (4 allocations: 1.62 MiB) ```
Storing
typeof(el)
/eltype(B)
in a variable in the hot part of the loop kills performance because despite being inferred as equal toUnion{Type{T}, Type{Missing/Nothing}}
, it is marked as ::Any. Only usingtypeof
/eltype
whenisa
fails works around the problem and makesmap/broadcast(x->x, ::Vector{Union{Int, Missing}})
about 10 times faster.Fixes #25799.
Before this PR:
After this PR:
I will add these benchmarks to BaseBenchmarks to ensure we don't regress. EDIT: see JuliaCI/BaseBenchmarks.jl#176.
EDIT: FWIW,
DataArray
takes 11ms withmap
in the benchmarks above, but only 300μs withbroadcast
. R (with C implementation) takes about 400μs (measured withx^2
since there's no vectorized identity operation, but timings are similar toidentity
in Julia). So we're not far.