Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise checkbounds again #17355

Merged
merged 6 commits into from
Jul 14, 2016
Merged

Revise checkbounds again #17355

merged 6 commits into from
Jul 14, 2016

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Jul 9, 2016

checkbounds was revised in #17137 and #17340, but the latter left a few dangling uncalled _chkbnds in multidimensional.jl. Here's another proposed revision that will hopefully be performant (haven't had time to test yet) and also be a cleaner & more general API. As a bonus, this version is the first (to my knowledge) to ever support arrays of CartesianIndexes, which will be important if we ever decide to make find* return such arrays. Finally, lots of tests added.

CC @JeffBezanson, @mbauman, @andreasnoack (ref JuliaParallel/DistributedArrays.jl#74).

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 9, 2016

Might as well let @nanosoldier runbenchmarks(ALL, vs = ":master") do most of the work.

@tkelman
Copy link
Contributor

tkelman commented Jul 9, 2016

Does this help resolve the method errors for OneTo seen in several packages? Wasn't that intended to be mostly an internal detail? looks like that's more of a broadcast issue than a bounds checking change

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 10, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 10, 2016

Tests pass and benchmarks look great, so it's just a question of whether we want this.

checkindex(Bool, inds1, I[1]) & checkbounds_indices(indsrest, tail(I))
end

@inline split{N}(inds, V::Type{Val{N}}) = _split((), inds, V)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be inside the IteratorsMD module?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't think this is broadly useful, moving it would indeed be a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is already a split function for strings, defining these methods in Base would be extending that same function but I think this is sufficiently different that it should be separate

@mbauman
Copy link
Sponsor Member

mbauman commented Jul 11, 2016

Could you add a little documentation on the intended purpose of all these functions? As I understand it:

  • checkbounds(A, I...): allows A to implement its own bounds checking behaviors, but by default it calls:
  • Edit: checkbounds(::Bool, A, I...): which is really what A should implement if it needs. By default it calls:
  • checkbounds_indices(::Tuple{T1, …}, ::Tuple{T2, …}), which allows T2 to implement its own index expansion (e.g., for CartesianIndices where one element maps to more than one dimension). This guy is undocumented and not exported. By default it iteratively calls:
  • checkindex(::Type{Bool}, ::T1, ::T2), which allows T2 to define its own method for determining if it's within the bounds of T1. Is there a reason to preserve the ::Bool first argument?

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 11, 2016

Yes, you have the idea. EDIT: In particular, you seem to understand the ambiguity-reducing cascade: extend checkbounds if you absolutely need to specialize on the array type, and extend either checkindex or (in hopefully-rare cases) checkbounds_indices if you need to specialize on index types. Moreover, since the behaviors of indices-tuples are presumably less diverse than the behaviors of arrays, this represents a rapid "narrowing" of the number of types the compiler needs to consider. One small addition to your comments: checkindex also allows specialization on T1 (yes, at the risk of ambiguities), and in the future novel range types seem likely. Of course, all AbstractUnitRanges can be handled by a single generic dispatch, so hopefully it will not typically be necessary to specialize checkindex on T1.

checkbounds_indices ... This guy is undocumented and not exported

As far as not documented, I did provide this comment. Is this insufficient? Would converting that into a docstring address your concern?

It definitely isn't exported, and I'm not yet convinced it should be. For almost all arrays it seems like an internal detail---as long as the array provides useful indices, I can't see why extending checkbounds_indices should be necessary in any case except for novel index types that consume ≠1 entry in the indices-tuple. However, I'm happy to export it if you think that's better.

checkindex ... Is there a reason to preserve the ::Bool first argument?

I think I did it for consistency with checkbounds, and it leaves room for a checkindex(::T1, ::T2) that throws when its conditions are not met, but I've never found a need to introduce this variant. I'm happy to drop the Bool if that seems preferable.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 11, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master")

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 11, 2016

OK, I think I've figured out what was happening: it was a lack of call-site specialization for the splatted checkbounds. By adding a 1-index specialization (eff2593), locally I recovered the performance. I'm not entirely happy with this solution, because it's one more definition to specialize for a custom array type, but I don't see a way around this (I even annotated checkbounds with @pure, but that didn't help; in contrast, annotating the caller, getindex, with @pure does fix the problem).

However, detailed investigation revealed something relatively extraordinary that I had not yet noticed, and while it's a little long (apologies) I think it's kind of fun:

include(Pkg.dir("BaseBenchmarks","src","array","sumindex.jl"))
a = rand(1000,1000)
b = ArrayLS(a)
s = sub(a, 1:999, 1:1000)  # LinearSlow
perf_sumlinear(a)
perf_sumlinear(b)
perf_sumlinear(s)
@time 1
@time perf_sumlinear(a)
@time perf_sumlinear(b)
@time perf_sumlinear(s)

julia-0.4:

  0.000002 seconds (148 allocations: 10.151 KB)
  0.001049 seconds (5 allocations: 176 bytes)
  0.016766 seconds (5 allocations: 176 bytes)
  0.035747 seconds (2.00 M allocations: 30.479 MB, 12.65% gc time)

master:

  0.000003 seconds (137 allocations: 8.438 KB)
  0.001039 seconds (5 allocations: 176 bytes)
  0.001045 seconds (5 allocations: 176 bytes)
  0.009655 seconds (5 allocations: 176 bytes)

If you look carefully, there's a >10x performance improvement in linear indexing for ArrayLS. That initially seems rather exciting. However, my guess is that what's happening is that there's enough inlining now that LLVM notices that it can elide the div, so we're unlikely to get that kind of performance boost in any situation in which we actually need linear indexing (as in the SubArray case). We can test this theory like this:

immutable ArrayLS1{T,N} <: MyArray{T,N}  # LinearSlow
    data::Array{T,N}
end

@inline Base.size{T}(A::ArrayLS1{T,2}) = (sz = size(A.data); (sz[1]-1,sz[2]-1)) # this line forces LLVM to compute the div

@inline Base.getindex(A::ArrayLS1, i::Int, i2::Int) = getindex(A.data, i, i2)
@inline Base.unsafe_getindex(A::ArrayLS1, i::Int, j::Int) = Base.unsafe_getindex(A.data, i, j)

c = ArrayLS1(a)
perf_sumlinear(c)
@time perf_sumlinear(c)

which yields julia-0.4:

  0.016899 seconds (5 allocations: 176 bytes)

and master:

  0.009288 seconds (5 allocations: 176 bytes)

So all that fighting I did against the 10x regression seems a little bit pointless, since it now looks like a (new) artifact of the benchmark. If we take that viewpoint seriously, should I drop eff2593, since I'd rather not have it? Or is it worth having for those corner cases where something is declared LinearSlow but in reality it's LinearFast?

Presumably it would make sense to revise the benchmarks to make them like ArrayLS1?

Of course, we're still doing considerably better than julia-0.4, including a 4x improvement for SubArray. So there's definitely some genuine good news here, but less dramatic than it initially seemed.

@mbauman
Copy link
Sponsor Member

mbauman commented Jul 11, 2016

Really, what I'm after is a higher level strategy. The call graphs are really deep… and the only reason is to simplify dispatch and allow for easier extensions by custom arrays/indices. Specifically calling out what you expect to be specialized at each step would be very helpful, as would be my bullet list above. I actually missed a step above.

I don't think checkbounds_indices needs to be exported or officially supported, either. I was just trying to describe how things work… and thinking about how to document the strategy here.

I don't feel strongly about checkindex(::Bool, …). I'm fine with the status quo… it's just really hard to change these guys since they're designed to be extended, so I want to limit future changes.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 11, 2016

Really, what I'm after is a higher level strategy.

This is one of those cases where I confess I'm struggling with "author blindness" to what are doubtlessly holes in the documentation. To see if I understand, is your description in #17355 (comment) basically a summary of what you'd like in terms of documentation? (I'm happy to copy/paste/edit that into the source code.) Doesn't that comment I linked to above (conveniently here again) basically say the same thing? Would it be clearer if I described it like

    custom array type->take its indices tuple->call `checkindex` on matching index pairs

?

Or are you looking for "strategic vision"? This can be a little hard to articulate (and I should probably try harder), but here are a couple:

  • checkbounds->checkbounds_indices: while we should (and do) provide an array-specific method (checkbounds), in general bounds-checking is about comparing the allowed indices against the requested indices. For that reason, it seems cleaner to phrase the main bounds-checking logic purely in terms of comparisons between a pair of indices (hence checkbounds_indices).
  • by taking the indices tuple and getting rid of the specific array type at the first stage, we avoid needing to recompile the whole bounds-checking callgraph for each array type. (I suspect this is the main reason the subarray tests have gotten considerably faster of late.)
  • pairing elements of tuples, rather than carrying around the array, seems to make it a tad easier to provide flexibility: it seems just a little easier to "advance the counter" selectively on one tuple or the other. For example, now we can do this:
julia> A = rand(3,4)
3×4 Array{Float64,2}:
 0.546929   0.092763  0.158707  0.405476
 0.0441324  0.536253  0.596574  0.79432 
 0.976893   0.760052  0.56792   0.396879

julia> I = [CartesianIndex((1,2)), CartesianIndex((3,4))]
2-element Array{CartesianIndex{2},1}:
 CartesianIndex{2}((1,2))
 CartesianIndex{2}((3,4))

julia> A[I]
2-element Array{Float64,1}:
 0.092763
 0.396879

whereas on julia-0.4 this yields

julia> A[I]
ERROR: ArgumentError: unable to check bounds for indices of type CartesianIndex{2}
 in getindex at abstractarray.jl:488

The call graphs are really deep… and the only reason is to simplify dispatch and allow for easier extensions by custom arrays/indices.

Yes, this is a big bummer. If you see a way to avoid that design pattern, I'm all ears. I'll point out that I don't think it's any worse than index_lengths, index_shape, reindex, and others which you may be more familiar with. Heck, it's not exactly like _internal_checkbounds and related functions are particularly simple on julia-0.4: if I ditch eff2593 (which I'm tempted to do for other reasons), delete all blank lines, comments, and docstrings, and don't count the 16 lines of code needed to support that CartesianIndex example above (which is a new feature), then bounds-checking is exactly 50 lines on both julia-0.4 and here. That doesn't mean there isn't a difference in complexity; I don't really see one, but I can't judge well since I'm recently the author of this one.

@mbauman
Copy link
Sponsor Member

mbauman commented Jul 11, 2016

I don't have objections to this PR — I definitely like these names much better than what I had done. And that comment is great. My documentation request was more for the global call-chain like I described than the local "this method does x". My comment above would be hopefully be sufficient for a naive reader… somewhere… but I'm not sure where.

I really just think I got confused with the crazy history of changes from checkbounds to _checkbounds to _internal_checkbounds to _chkbnds to checkbounds_indices. This final cleanup is probably more than sufficient in solving that issue.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 11, 2016

Thanks for taking the time to look it over! I agree it's been a somewhat chaotic process.

I will copy/paste some version of your analysis at the top of the bounds-checking code in abstractarray.jl.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 12, 2016

Now documented out the wazoo. I moved some code around and renamed some variables for consistency from one function to the next, so there's more apparent than real churn.

@nanosoldier runbenchmarks(ALL, vs = ":master")

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 12, 2016

With nanosoldier feeling better, let's try again: @nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 12, 2016

Seems like JuliaCI/BaseBenchmarks.jl#16 wasn't used on the nanosoldier run; I've checked locally and can't replicate any of the regressions. I think this is ready to go.

@@ -326,12 +326,17 @@ step(r::AbstractUnitRange) = 1
step(r::FloatRange) = r.step/r.divisor
step{T}(r::LinSpace{T}) = ifelse(r.len <= 0, convert(T,NaN), (r.stop-r.start)/r.divisor)

function length(r::StepRange)
unsafe_length(r::Range) = length(r) # generic fallback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is potentially unsafe about unsafe_length?

Copy link
Sponsor Member Author

@timholy timholy Jul 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the checked_sub variants a little farther down in the file. The checked versions are necessary if you want to avoid being fooled when you do this:

julia> r = typemin(Int):typemax(Int)
-9223372036854775808:9223372036854775807

julia> length(r)
ERROR: OverflowError()
 in length(::UnitRange{Int64}) at ./range.jl:354
 in eval(::Module, ::Any) at ./boot.jl:234
 in macro expansion at ./REPL.jl:92 [inlined]
 in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46

The downside of checked arithmetic is it's god awful slow if you use it for something performance-critical like bounds checking.

Fortunately, since for arrays all sizes have to be an Int >= 0, it's not a cost you need to pay. So it's worth having a fast path for carefully-chosen applications.

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 12, 2016

Thanks for the review as always, @tkelman.

@tkelman
Copy link
Contributor

tkelman commented Jul 12, 2016

note that there are several (new) sphinx warnings from the formatting in your rst

This also reorganizes code to make the order follow the hierarchy; makes more sense for the "big picture" documentation to be near the top node.
@timholy
Copy link
Sponsor Member Author

timholy commented Jul 13, 2016

Hmm, I already encountered an application where it was really sweet to be able to call checkbounds_indices directly. (I'm creating a ColorView array type, which takes a 3-by-m-by-n Array{T} and acts like a m-by-n RGB{T} array---it was nice to be able to check bounds without needing the actual array itself.) The unexpected utility of this design materialized rather quickly!

For that reason, I decided I had better symmetrize everything: now all functions have both foo(Bool, args...) (returns a Bool) and foo(args...) (throws if out-of-bounds) variants.

@@ -147,11 +147,13 @@ function showerror(io::IO, ex::BoundsError)
print(io, ": attempt to access ")
if isa(ex.a, AbstractArray)
print(io, summary(ex.a))
elseif isa(ex.a, Tuple)
print(io, "array with indices ", ex.a)
Copy link
Contributor

@tkelman tkelman Jul 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a BoundsError with a Tuple field always going to mean the indices?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing we didn't have a test for that. Added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see with your new version, you won't usually index with i a tuple

@@ -147,11 +147,13 @@ function showerror(io::IO, ex::BoundsError)
print(io, ": attempt to access ")
if isa(ex.a, AbstractArray)
print(io, summary(ex.a))
elseif isa(ex.a, Tuple) && isdefined(ex, :i) && isa(ex.i, Tuple)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ex.a is also a Tuple when you index a tuple out of bounds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i won't be though, see collapsed discussion right above

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Technically unambiguous, but kind of obscure.

Copy link
Sponsor Member Author

@timholy timholy Jul 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the ambiguity either. Some options I see:

  • create a new exception type (yuck)
  • add another field to BoundsError
  • create an internal ArrayIndices type that wraps the tuple-of-indices and stick it in the a slot

Of these, I like the third the best.

Copy link
Sponsor Member Author

@timholy timholy Jul 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 4: don't support the throwing version of checkbounds_indices. Really I was most concerned about setting up an expectation that checkbounds_indices(IA, I) would throw, and surprise the user when it doesn't. If that yields a MethodError but checkbounds_indices(Bool, IA, I) works, any user will expect that to return a Bool and then could handle the throwing him/herself. Maybe this is the best choice?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, option 4 sounds pretty good.

@JeffBezanson
Copy link
Sponsor Member

I'm basically ok with this (especially since @mbauman is :) ). Overloading the meaning of the fields in BoundsError is kind of unfortunate --- won't that lead to unpredictable messages for bounds errors, where sometimes the array type is given and sometimes it isn't?

@timholy
Copy link
Sponsor Member Author

timholy commented Jul 14, 2016

Test failure is #17409.

@timholy timholy merged commit 828f7ae into master Jul 14, 2016
@timholy timholy deleted the teh/checkbounds_revisited branch July 14, 2016 09:21
length(A) == length(I)
end
function checkbounds_logical(::Type{Bool}, A::AbstractVector, I::AbstractVector{Bool})
indices(A) == indices(I)
Copy link
Contributor

@tkelman tkelman Jul 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are this and checkbounds_logical(::Type{Bool}, A::AbstractVector, I::AbstractArray{Bool}) swapped? Should the vector[array] case be checking indices, and vector[vector] be checking length?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm thinking about it, no, but I could be missing something so I welcome discussion. The idea is that A[i] is retained if I[i] == true. Consequently, the indices of A and I have to match. The exception to the requirement for exact matching is when we're using linear indexing. But when both of them are vectors, linear indexing is trumped by Cartesian indexing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense. I'm not entirely sure what you mean by "linear indexing is trumped by Cartesian indexing" though.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

julia/base/abstractarray.jl

Lines 1242 to 1245 in 3627ed2

# In 1d, there's a question of whether we're doing cartesian indexing
# or linear indexing. Support only the former.
sub2ind(inds::Indices{1}, I::Integer...) = throw(ArgumentError("Linear indexing is not defined for one-dimensional arrays"))
sub2ind(inds::Tuple{OneTo}, I::Integer...) = (@_inline_meta; _sub2ind(inds, 1, 1, I...)) # only OneTo is safe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants