Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor API for unconventionally-indexed arrays #17137

Merged
merged 15 commits into from
Jul 8, 2016
Merged

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Jun 27, 2016

This is effectively "cleanup" from #16260. It gets rid of shape and renames allocate_for to be similar. It achieves these by introducing a new Range type, OneTo(n), which is equivalent (in a value-sense) to 1:n. These are changes recommended by @JeffBezanson. I'm hoping to avoid having OneTo "leak" out into user space, so it's not exported, but I suspect we may need to export it at some point (let's first see how far we can get without doing that).

Despite the name of this branch, it does not make unconventional indices safer (ref @arraysafe in #16973); it seems better to do that in a separate PR.

A key thing here is going to be to find out whether I've done something nasty for performance, so @nanosoldier runbenchmarks(ALL, vs=:master).

_similar(::IndicesBehavior, a::AbstractArray, T::Type) = similar(a, T, indices(a))
to_shape(::Tuple{}) = ()
to_shape(dims::Dims) = dims
to_shape(dims::DimsOrInds) = map(to_shape, dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea (I don't know how convenient or cumbersome it'd be) would be to have a type Shape of Shp. This way this could changed to convert methods so one can write Shape(dims) and convert(Shape, dim).

Copy link
Sponsor Member Author

@timholy timholy Jun 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_shape is basically a compatibility call, packaging AbstractUnitRange objects tuples as Dims-tuples when possible (when the range can be guaranteed, via the type system, to start at 1). In that sense we already have the type(alias).

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 27, 2016

I seem to have messed up the nanosoldier call, so let's try again after adding quotes: @nanosoldier runbenchmarks(ALL, vs=":master").

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 29, 2016

I've canceled the CI here; while the runtime looks good, 1b684d5 ("eliminate the indicesbehavior trait") absolutely hammers compile time.

Master:

julia> using Base.Test

shell> cd test/linalg/
/home/tim/src/julia-0.5b/test/linalg

julia> @time include("schur.jl")
 17.479364 seconds (13.13 M allocations: 516.591 MB, 1.47% gc time)

julia> @time include("schur.jl")
  1.011176 seconds (445.60 k allocations: 23.307 MB, 1.33% gc time)

This PR:

julia> using Base.Test

shell> cd test/linalg/
/home/tim/src/julia-0.5/test/linalg

julia> @time include("schur.jl")
 44.836752 seconds (72.31 M allocations: 3.040 GB, 2.34% gc time)

julia> @time include("schur.jl")
  0.902519 seconds (447.55 k allocations: 23.480 MB, 1.02% gc time)

I'll see what I can figure out. Meanwhile, it's a nice test case for folks who like to look at compiler performance 😉 .

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 29, 2016

Wow, the ProfileView looks like cathedrals of colorful inference.jl sand, reaching towards the heavens. So pretty and spiritual, I think I'll frame it.
image

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 29, 2016

Interesting observation: if I comment out these loops and just manually set the values, then master and this PR have similar compile times. But set eltya = Int and leave the atype loop intact, and there's a nearly 3x difference in compile times. Also possibly-relevant, changing the for loop to a while loop,

diff --git a/test/linalg/schur.jl b/test/linalg/schur.jl
index 5370afd..02a07dc 100644
--- a/test/linalg/schur.jl
+++ b/test/linalg/schur.jl
@@ -16,11 +16,17 @@ srand(1234321)
 areal = randn(n,n)/2
 aimg  = randn(n,n)/2

-for eltya in (Float32, Float64, Complex64, Complex128, Int)
+# for eltya in (Float32, Float64, Complex64, Complex128, Int)
+eltya = Int
     a = eltya == Int ? rand(1:7, n, n) : convert(Matrix{eltya}, eltya <: Complex ? complex(areal, aimg) : areal)
     asym = a'+a                  # symmetric indefinite
     apd  = a'*a                 # symmetric positive-definite
-    for atype in ("Array", "SubArray")
+    indx = 1
+    atypes = ["Array", "SubArray"]
+    while indx <= length(atypes)
+#    for atype in ("Array", "SubArray")
+        atype = atypes[indx]
+        indx += 1
         if atype == "Array"
             a = a
         else
@@ -96,4 +102,4 @@ for eltya in (Float32, Float64, Complex64, Complex128, Int)
         @test NS[:S] ≈ sS
         @test NS[:Z] ≈ sZ
     end
-end
+# end

does not fix the performance, so this seems different from #16122.

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 29, 2016

Here's a minimal reproducer:

using Base.Test

a = rand(1:7, 10, 10)
for atype in ("Array", "SubArray")
    d,v = eig(a)
end

Master:

julia> @time include("/tmp/tim/schur.jl")
  1.740690 seconds (1.93 M allocations: 84.396 MB, 2.91% gc time)

This PR:

julia> @time include("/tmp/tim/schur.jl")
 13.267818 seconds (46.09 M allocations: 1.985 GB, 5.30% gc time)

But now comment out the for loop, and they are the same.

@JeffBezanson
Copy link
Sponsor Member

Wow, interesting. What happens if you use an array instead of a tuple: for atype in ["Array", "SubArray"]?

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 29, 2016

Wow, interesting.

In the Chinese curse-sense, yes. 😄

What happens if you use an array instead of a tuple: for atype in ["Array", "SubArray"]?

Still slow.

I've debugged this a little further; one relevant point is that I can call

a = rand(1:7, 10, 10)
eig(a)

before includeing the file with the loop (or copy/pasting into the REPL), and it's still dirt-slow. But I can't take the eig call out of the loop without making it fast. Finally, it's all one giant toplevel-inference call: this diff

+
+type InfRef
+    val::Bool
+end
+const debug = InfRef(false)
 function typeinf_ext(linfo::LambdaInfo)
     if isdefined(linfo, :def)
+        local tstart
+        if debug.val
+            print(linfo.def.name, " start: ", ccall(:jl_clock_now, Float64, ()))
+        end
         # method lambda - infer this specialization via the method cache
         (code, _t, inferred) = typeinf_edge(linfo.def, linfo.specTypes, linfo.sparam_vals, true, true, true, linfo)
         if inferred && code.inferred && linfo !== code
@@ -1567,13 +1576,22 @@ function typeinf_ext(linfo::LambdaInfo)
             linfo.inferred = true
             linfo.inInference = false
         end
+        if debug.val
+            print(linfo.def.name, " stop: ", ccall(:jl_clock_now, Float64, ()))
+        end
         return code
     else
         # toplevel lambda - infer directly
+        if debug.val
+            println("toplevel inference")
+        end
         linfo.inInference = true
         frame = InferenceState(linfo, true, true)
         typeinf_loop(frame)
         @assert frame.inferred # TODO: deal with this better
+        if debug.val
+            println("toplevel inference done")
+        end
         return linfo
     end
 end

gives this output (having already called eig once):

julia> for atype in ("Array", "SubArray")
           d,v = eig(a)
       end
toplevel inference
toplevel inference done
:put! start: 1.46724e+09:put! stop: 1.46724e+09:start start: 1.46724e+09:start stop: 1.46724e+09:indexed_next start: 1.46724e+09:indexed_next stop: 1.46724e+09

and all the waiting is before the "toplevel inference done".

@StirlingNewberry
Copy link

宁為太平犬,莫做亂离人

Though the actual curse is English.

On Wed, Jun 29, 2016 at 6:10 PM, Tim Holy notifications@github.com wrote:

Wow, interesting.

In the Chinese curse-sense, yes. 😄

What happens if you use an array instead of a tuple: for atype in
["Array", "SubArray"]?

Still slow.

I've debugged this a little further; one relevant point is that I can call

a = rand(1:7, 10, 10)
eig(a)

before includeing the file with the loop (or copy/pasting into the REPL),
and it's still dirt-slow. But I can't take the eig call out of the loop
without making it fast. Finally, it's all one giant toplevel-inference
call: this diff

++type InfRef+ val::Bool+end+const debug = InfRef(false)
function typeinf_ext(linfo::LambdaInfo)
if isdefined(linfo, :def)+ local tstart+ if debug.val+ print(linfo.def.name, " start: ", ccall(:jl_clock_now, Float64, ()))+ end
# method lambda - infer this specialization via the method cache
(code, _t, inferred) = typeinf_edge(linfo.def, linfo.specTypes, linfo.sparam_vals, true, true, true, linfo)
if inferred && code.inferred && linfo !== code@@ -1567,13 +1576,22 @@ function typeinf_ext(linfo::LambdaInfo)
linfo.inferred = true
linfo.inInference = false
end+ if debug.val+ print(linfo.def.name, " stop: ", ccall(:jl_clock_now, Float64, ()))+ end
return code
else
# toplevel lambda - infer directly+ if debug.val+ println("toplevel inference")+ end
linfo.inInference = true
frame = InferenceState(linfo, true, true)
typeinf_loop(frame)
@Assert frame.inferred # TODO: deal with this better+ if debug.val+ println("toplevel inference done")+ end
return linfo
end
end

gives this output (having already called eig once):

julia> for atype in ("Array", "SubArray")
d,v = eig(a)
end
toplevel inference
toplevel inference done
:put! start: 1.46724e+09:put! stop: 1.46724e+09:start start: 1.46724e+09:start stop: 1.46724e+09:indexed_next start: 1.46724e+09:indexed_next stop: 1.46724e+09

and all the waiting is before the "toplevel inference done".


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#17137 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ATKyInTNYpCYEmrmPTZtDE8iCzFUBMtXks5qQu1FgaJpZM4I-tDS
.

@JeffBezanson
Copy link
Sponsor Member

Inference seems to be encountering a large number of different signatures for _indices, all similar to

Tuple{Base.#_indices, Tuple{Base.OneTo{Int64}, Base.OneTo, Base.OneTo, Base.OneTo, Base.OneTo, Base.OneTo, Base.OneTo, Base.OneTo, Base.OneTo}, Int64, Base.SubArray{Base.Complex{Float32}, 2, A<:Union{Base.ReshapedArray{T<:Any, N<:Any, A<:DenseArray, MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}, N<:Any}}}, DenseArray}, Tuple{Union{Base.UnitRange{Int64}, Base.Colon}, Vararg{Union{Base.AbstractCartesianIndex, Base.Range{Int64}, Int64, Base.Colon}, N<:Any}}, true}, Union{Base.AbstractCartesianIndex{N<:Any}, Base.Range{Int64}, Int64, Base.Colon}, Vararg{Union{Base.AbstractCartesianIndex{N<:Any}, Base.Range{Int64}, Int64, Base.Colon}, N<:Any}}

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 30, 2016

Surprising, because there are exactly 6 definitions in this PR, fewer than the 8 there are in current Base. But of course they're called differently. The example you showed suggests something is calling it with 9 indices...given that we're talking about matrix algebra (2 dimensions), what the heck is that from?

@JeffBezanson
Copy link
Sponsor Member

It's from recursion in the inference process itself.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 6, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master").

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 6, 2016

@jrevels, OK to start nanosoldier now? (It's this PR I'm most interested in right now.)

@jrevels
Copy link
Member

jrevels commented Jul 6, 2016

The daily build ran successfully, so I think it's good now:

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 7, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 7, 2016

I have a good feeling (based on local benchmarking) about this one: @nanosoldier runbenchmarks(ALL, vs = ":master").

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 8, 2016

Fewer exports, cleaner API, fewer lines of code, and more performant---what's not to like? Advance warning, I'll be merging as soon as CI finishes---I don't have any reason to suspect that the one test failure (so far) has anything to do with this PR.

@timholy timholy merged commit efa23ad into master Jul 8, 2016
@timholy timholy deleted the teh/safer_indices branch July 8, 2016 09:18
@tkelman
Copy link
Contributor

tkelman commented Jul 9, 2016

It looks like this broke several packages with a MethodError: no method matching DataArrays.DataArray{T,N}(::Type{Bool}, ::Tuple{Base.OneTo{Int64}}) or MethodError: no method matching Array{T,N}(::Type{Any}, ::Tuple{Base.OneTo{Int64}}) etc, see PkgEval results for DataArrays and others

@tkelman
Copy link
Contributor

tkelman commented Jul 10, 2016

Looks like the issue in DataArrays stems from it using broadcast_shape. That now returns OneTo objects that packages don't know what to do with. Tying into the unexported broadcast internals isn't ideal, but I suspect DataArrays is not the only package that may have needed to do that. I'm leaning in the direction of walking back on trying to support unconventionally indexed arrays for 0.5, since we really would like to have a package ecosystem that at least half works as it does on 0.4 by the time we have the final 0.5.0 done.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 10, 2016

If they need a quick-fix, they should be able to just call Base.to_shape on the output of broadcast_shape. That said, just waiting for some consensus might be the better option.

@tkelman
Copy link
Contributor

tkelman commented Jul 11, 2016

I'm stuck at JuliaStats/DataArrays.jl#205. Help appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants