-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Couple of tuple speed-ups and fixes #20720
Conversation
base/tuple.jl
Outdated
@@ -160,7 +160,8 @@ heads(t::Tuple, ts::Tuple...) = (t[1], heads(ts...)...) | |||
tails() = () | |||
tails(t::Tuple, ts::Tuple...) = (tail(t), tails(ts...)...) | |||
map(f, ::Tuple{}, ts::Tuple...) = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tangentially, this method seems incongruous with the spirit of map
?
julia> map(+, (), (1,2))
()
julia> map(+, [], [1,2])
ERROR: DimensionMismatch("dimensions must match")
Stacktrace:
[1] promote_shape(::Tuple{Base.OneTo{Int64}}, ::Tuple{Base.OneTo{Int64}}) at ./indices.jl:79
[2] _array_for(::Type{Any}, ::Base.Iterators.Zip2{Array{Any,1},Array{Int64,1}}, ::Base.HasShape) at ./array.jl:406
[3] collect(::Base.Generator{Base.Iterators.Zip2{Array{Any,1},Array{Int64,1}},Base.##3#4{Base.#+}}) at ./array.jl:416
[4] map(::Function, ::Array{Any,1}, ::Array{Int64,1}) at ./abstractarray.jl:1888
base/tuple.jl
Outdated
@@ -160,7 +160,8 @@ heads(t::Tuple, ts::Tuple...) = (t[1], heads(ts...)...) | |||
tails() = () | |||
tails(t::Tuple, ts::Tuple...) = (tail(t), tails(ts...)...) | |||
map(f, ::Tuple{}, ts::Tuple...) = () | |||
map(f, t1::Tuple, t2::Tuple, ts::Tuple...) = (f(heads(t1, t2, ts...)...), map(f, tails(t1, t2, ts...)...)...) | |||
map(f, t1::Tuple, t2::Tuple, ts::Tuple...) = (@_inline_meta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this definition not result in arbitrarily deep inlining in the same manner that the two argument form does without this additional definition that limits inlining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I looked at this backwards, is heads
and tails
what should be fine to inline. I'll try another solution.
25dc48e
to
5625cb5
Compare
Updated to avoid code blowup for large tuples. This should also fix #20723 now. Unlike
|
@@ -132,7 +132,7 @@ All16{T,N} = Tuple{T,T,T,T,T,T,T,T, | |||
T,T,T,T,T,T,T,T,Vararg{T,N}} | |||
function map(f, t::Any16) | |||
n = length(t) | |||
A = Array{Any}(n) | |||
A = Array{Any,1}(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps Vector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't use Vector
here because of bootstrapping.
@@ -148,19 +148,22 @@ function map(f, t::Tuple, s::Tuple) | |||
end | |||
function map(f, t::Any16, s::Any16) | |||
n = length(t) | |||
A = Array{Any}(n) | |||
A = Array{Any,1}(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, perhaps Vector
?
base/tuple.jl
Outdated
map(f, t1::Tuple, t2::Tuple, ts::Tuple...) = (f(heads(t1, t2, ts...)...), map(f, tails(t1, t2, ts...)...)...) | ||
tails(t::Tuple, ts::Tuple...) = (@_inline_meta; (tail(t), tails(ts...)...)) | ||
map(f, ::Tuple{}...) = () | ||
map(f, t::Tuple, ts::Tuple...) = (@_inline_meta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these definitions exceed the line-length recommendation? If not, perhaps condense these definitions to single lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is 92 characters, right? (I don't recall exactly) If so, then indeed this exceeds the line-length recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I see the first definition exceeds the limit by a few characters though the second does not; I assume you split the second definition for consistency, so cheers :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be written as long-form methods if they need to split across multiple lines
base/tuple.jl
Outdated
@@ -20,7 +20,7 @@ endof(t::Tuple) = length(t) | |||
size(t::Tuple, d) = d==1 ? length(t) : throw(ArgumentError("invalid tuple dimension $d")) | |||
getindex(t::Tuple, i::Int) = getfield(t, i) | |||
getindex(t::Tuple, i::Real) = getfield(t, convert(Int, i)) | |||
getindex(t::Tuple, r::AbstractArray{<:Any,1}) = tuple([t[ri] for ri in r]...) | |||
getindex(t::Tuple, r::AbstractArray{<:Any,1}) = (n = length(eachindex(r)); ntuple(i->t[r[i]], n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this change's purpose? Could indexing r
with indices 1:n
(rather than iterating over r
's elements, or indexing r
more genernically) be an issue? If not, why length(eachindex(r))
rather than simply length(r)
? Does using ntuple
here not potentially hurt inference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could indexing
r
with indices1:n
be an issue?
I feel like we are slowly moving to support non traditional indexing, to handle things like, e.g., OffsetArrays
. That's why I wrote it like this.
Does using
ntuple
here not potentially hurt inference?
This is already non-inferable so it doesn't hurt changing it to make it faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we are slowly moving to support non traditional indexing, to handle things like, e.g., OffsetArrays. That's why I wrote it like this.
Exactly :). The proposed definition indexes r
from 1:n
, as i
iterates over 1:n
in ntuple
. So if indexing r
from 1:n
is not correct, the proposed definition will not be correct.
This is already non-inferable so it doesn't hurt changing it to make it faster.
Consider
julia> foo(t::Tuple, r::AbstractArray{<:Any,1}) = tuple([t[ri] for ri in r]...)
foo (generic function with 1 method)
julia> bar(t::Tuple, r::AbstractArray{<:Any,1}) = (n = length(eachindex(r)); ntuple(i->t[r[i]], n))
bar (generic function with 1 method)
julia> @code_warntype foo((1,2,3), [1])
Variables:
#self#::#foo
t::Tuple{Int64,Int64,Int64}
r::Array{Int64,1}
#1::##1#2{Tuple{Int64,Int64,Int64}}
Body:
begin
#1::##1#2{Tuple{Int64,Int64,Int64}} = $(Expr(:new, :($(QuoteNode(##1#2{Tuple{Int64,Int64,Int64}}))), :(t)))
SSAValue(0) = #1::##1#2{Tuple{Int64,Int64,Int64}}
SSAValue(1) = $(Expr(:new, :($(QuoteNode(Base.Generator{Array{Int64,1},##1#2{Tuple{Int64,Int64,Int64}}}))), SSAValue(0), :(r)))
SSAValue(2) = $(Expr(:invoke, MethodInstance for collect(::Base.Generator{Array{Int64,1},##1#2{Tuple{Int64,Int64,Int64}}}), :(Base.collect), SSAValue(1)))
return (Core._apply)(Main.tuple, SSAValue(2))::Tuple{Vararg{Int64,N} where N}
end::julia> foo(t::Tuple, r::AbstractArray{<:Any,1}) = tuple([t[ri] for ri in r]...)
foo (generic function with 1 method)
julia> bar(t::Tuple, r::AbstractArray{<:Any,1}) = (n = length(eachindex(r)); ntuple(i->t[r[i]], n))
bar (generic function with 1 method)
julia> @code_warntype foo((1,2,3), [1])
Variables:
#self#::#foo
t::Tuple{Int64,Int64,Int64}
r::Array{Int64,1}
#1::##1#2{Tuple{Int64,Int64,Int64}}
Body:
begin
#1::##1#2{Tuple{Int64,Int64,Int64}} = $(Expr(:new, :($(QuoteNode(##1#2{Tuple{Int64,Int64,Int64}}))), :(t)))
SSAValue(0) = #1::##1#2{Tuple{Int64,Int64,Int64}}
SSAValue(1) = $(Expr(:new, :($(QuoteNode(Base.Generator{Array{Int64,1},##1#2{Tuple{Int64,Int64,Int64}}}))), SSAValue(0), :(r)))
SSAValue(2) = $(Expr(:invoke, MethodInstance for collect(::Base.Generator{Array{Int64,1},##1#2{Tuple{Int64,Int64,Int64}}}), :(Base.collect), SSAValue(1)))
return (Core._apply)(Main.tuple, SSAValue(2))::Tuple{Vararg{Int64,N} where N}
end::Tuple{Vararg{Int64,N} where N}
julia> @code_warntype bar((1,2,3), [1])
Variables:
#self#::#bar
t::Tuple{Int64,Int64,Int64}
r::Array{Int64,1}
n::Int64
#3::##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}}
Body:
begin
$(Expr(:inbounds, false))
# meta: location abstractarray.jl eachindex 744
# meta: location abstractarray.jl indices1 63
# meta: location abstractarray.jl indices 56
SSAValue(3) = (Base.arraysize)(r::Array{Int64,1}, 1)::Int64
# meta: location tuple.jl map 124
SSAValue(2) = SSAValue(3)
# meta: pop location
# meta: pop location
# meta: pop location
# meta: pop location
$(Expr(:inbounds, :pop))
n::Int64 = (Core.typeassert)((Base.select_value)((Base.slt_int)(SSAValue(2), 0)::Bool, 0, SSAValue(2))::Int64, Int64)::Int64
#3::##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}} = $(Expr(:new, :($(QuoteNode(##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}}))), :(t), :(r)))
SSAValue(0) = #3::##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}}
return $(Expr(:invoke, MethodInstance for ntuple(::##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}}, ::Int64), :(Main.ntuple), SSAValue(0), :(n)))
end::Tuple
julia>
julia> @code_warntype bar((1,2,3), [1])
Variables:
#self#::#bar
t::Tuple{Int64,Int64,Int64}
r::Array{Int64,1}
n::Int64
#3::##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}}
Body:
begin
$(Expr(:inbounds, false))
# meta: location abstractarray.jl eachindex 744
# meta: location abstractarray.jl indices1 63
# meta: location abstractarray.jl indices 56
SSAValue(3) = (Base.arraysize)(r::Array{Int64,1}, 1)::Int64
# meta: location tuple.jl map 124
SSAValue(2) = SSAValue(3)
# meta: pop location
# meta: pop location
# meta: pop location
# meta: pop location
$(Expr(:inbounds, :pop))
n::Int64 = (Core.typeassert)((Base.select_value)((Base.slt_int)(SSAValue(2), 0)::Bool, 0, SSAValue(2))::Int64, Int64)::Int64
#3::##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}} = $(Expr(:new, :($(QuoteNode(##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}}))), :(t), :(r)))
SSAValue(0) = #3::##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}}
return $(Expr(:invoke, MethodInstance for ntuple(::##3#4{Tuple{Int64,Int64,Int64},Array{Int64,1}}, ::Int64), :(Main.ntuple), SSAValue(0), :(n)))
end::Tuple
The existing definition yields Tuple{Vararg{Int64,N} where N}
whereas the proposed definition yields only Tuple
; some information is lost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I forgot that eachindex
does not necessarily return an iterable that can be indexed with UnitRange
s. I updated the code with an alternative.
Re. inferability: I see that we'd loose some information, but I'd argue that when working with tuples what we most commonly want to infer is the length, so loosing the eltype
here is not that bad, if it buy us some speed. Though, I could be convinced otherwise.
5625cb5
to
aaa7ec9
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Those |
aaa7ec9
to
7d7373a
Compare
base/tuple.jl
Outdated
function getindex(t::Tuple, r::AbstractArray{<:Any,1}) | ||
s = Ref(start(r)) | ||
ntuple(j -> ((i, s[]) = next(r, s[]); t[i]), length(r)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems orthogonal to the other changes in this pull request. Perhaps split this change into a separate pull request for additional discussion and to attract additional review eyes? (I remain concerned about losing eltype
information with this change, and see no reason that that information is not important in general.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the changes to ntuple
, the eltype
for this would be inferred. But I'll probably left this aside anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting :). Out of curiosity, which of the simultaneous ntuple
changes make the new getindex
version's eltype
inferable? (The generator splat to array splat change?)
If this version does not lose eltype
inferability, then cheers form this end. (Though having another pair of eyes on these changes would still be worthwhile.) Best!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that inference works different when you directly return something like c1 ? r1 : c2 ? r2 : c3 ? r3 ...
to when you assign that to a variable and then return the variable. So basically what improved things was assigning the tuple before returning it.
I won't leave this version as is as it messes inference when using this function inside macros in some circumstances (because of the closure). But I have a slightly different version that doesn't have that limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find! Inference's sensitivity to such things is somewhat disconcerting. Best!
base/tuple.jl
Outdated
return t | ||
end | ||
|
||
_ntuple(f::Function, n::Integer) = (@_noinline_meta; ([f(i) for i = 1:n]...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why switch from splatting a generator to splatting an array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being, is faster than using the generator.
n == 7 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7)) : | ||
n == 8 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7), f(8)) : | ||
n == 9 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7), f(8), f(9)) : | ||
n == 10 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7), f(8), f(9), f(10)) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change impact behavior for 10 < n < 16
? (Out of curiosity, what motivates this change?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is faster in some cases or at worst has equal performance than the existing implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that LLVM doesn't reduce this to a calculated-jump. If you care enough to try, would a binary tree implementation be better? (It's less nice from a code-simplicity standpoint, so don't feel you have to do this.)
@nanosoldier |
7a711b7
to
cbc0bc9
Compare
@nanosoldier |
3ad3c51
to
26e8fed
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
Once more to see if anything is noise @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
base/range.jl
Outdated
@@ -151,6 +151,10 @@ unitrange_last{T}(start::T, stop::T) = | |||
ifelse(stop >= start, convert(T,start+floor(stop-start)), | |||
convert(T,start-oneunit(stop-start))) | |||
|
|||
if isdefined(Main, :Base) | |||
getindex(t::Tuple, r::UnitRange) = (o = r.start - 1; ntuple(n -> t[o + n], length(r))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe AbstractUnitRange
? And use first(r)
rather than r.start
?
n == 7 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7)) : | ||
n == 8 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7), f(8)) : | ||
n == 9 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7), f(8), f(9)) : | ||
n == 10 ? (f(1), f(2), f(3), f(4), f(5), f(6), f(7), f(8), f(9), f(10)) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that LLVM doesn't reduce this to a calculated-jump. If you care enough to try, would a binary tree implementation be better? (It's less nice from a code-simplicity standpoint, so don't feel you have to do this.)
map(f, t1::Tuple, t2::Tuple, ts::Tuple...) = (f(heads(t1, t2, ts...)...), map(f, tails(t1, t2, ts...)...)...) | ||
heads(ts::Tuple...) = map(t -> t[1], ts) | ||
tails(ts::Tuple...) = map(tail, ts) | ||
map(f, ::Tuple{}...) = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change because of this:
julia> map(+, (), (3,), (4,))
()
Consequently, this PR will have to wait until 1.0-dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should that be a dimension mismatch error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was a bug. X-ref: #20723
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like that behavior (I like this better), but I'd be surprised if there isn't code out there that counts on the current behavior. So even though I'd call this an improvement, I think it's dangerous to introduce a behavior change at this stage of the release process unless there's reason (1) to be certain it's a bug, or (2) be almost certain no one had been using this. But don't take that as official policy, it's just my viewpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we had an alpha tag, I plan on running a pkgeval comparison before tagging a beta, so we could find out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these are flaky tolerances or timeouts, but AxisArrays failure looks related https://gist.github.com/2290f8d80dea51c9dc4aa3b4de59daf2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we merge this then, or should we wait until a new version of AxisArrays is tagged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timholy what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go for it. I just tagged a new release.
It does if I remove the assignment and return directly, but then inference isn't able to at least figure out the
For now I'd like to keep the code simplicity and leave other approaches for another occasion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with this @pabloferz! :)
Should fix #20718