-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broadcast of short-circuiting boolean operator || generates runtime dispatch and per-element allocations #54141
Comments
This comment was marked as resolved.
This comment was marked as resolved.
Done. |
Fascinating; I was about to punt this to StaticArrays, but neither
Interestingly all of |
JET is clean on the OP in 1.11.0-beta1. It makes sense that the broadcasted The allocations are unexpected and remain present even in 1.11.0-beta1. Unexpected allocations are typically worth investigating, and in a task like this, they are also likely the bottleneck. Aside: IMO it is more idiomatic to define a complex scalar function and broadcast that than to define a complex broadcast, but this does not make the issue you found invalid. g_bitwise(a, b) = (first(a) > 0) | (norm(b) != 0)
g_bitwise.(ab.a, ab.b) I concur that flatten is an issue here. I suspect specialization/compiler heuristics are hurting this: julia> using Chairmarks, StaticArrays, LinearAlgebra; using .Broadcast: instantiate, broadcasted, Broadcasted, oror
julia> bc = Base.Broadcast.flatten(broadcasted(>, broadcasted(norm, zeros(SVector{3, Float64}, 1000000)), 0))
Broadcasted(#12, (SVector{3, Float64}[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0] … [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], 0))
julia> inp = first.(bc.args)
([0.0, 0.0, 0.0], 0)
julia> function slow(bc)
bcf = Base.Broadcast.flatten(bc)
broadcasted((args...) -> bcf.f(args...), bcf.args...)
end
slow (generic function with 1 method)
julia> function fast(bc)
bcf = Base.Broadcast.flatten(bc)
broadcasted(bcf.f, bcf.args...)
end
fast (generic function with 1 method)
julia> @b slow(bc) _.f($inp...)
20.280 ns (2 allocs: 64 bytes)
julia> @b fast(bc) _.f($inp...)
2.840 ns
|
This fixes the allocations and major runtime discrepancies: julia> @eval Base.Broadcast function broadcasted(::OrOr, a, bc::Broadcasted)
bcf = flatten(bc)
broadcasted(((a, args::Vararg{Any, N}) where {N}) -> a || bcf.f(args...), a, bcf.args...)
end See also: https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing |
Creating a BitVector from a slightly non-trivial expression using the short-circuiting boolean operator
.||
results in runtime dispatch, per-element allocations, and worse performance than the equivalent expression using the bitwise operator.|
. The arrays are both in a named tuple and of SVector element type, and LinearAlgebra.norm is called on the right-hand side.Here is the input code only:
The text was updated successfully, but these errors were encountered: