Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoundsError when indexing an array with 2d bit array #18271

Closed
mmagnuski opened this issue Aug 28, 2016 · 29 comments
Closed

BoundsError when indexing an array with 2d bit array #18271

mmagnuski opened this issue Aug 28, 2016 · 29 comments

Comments

@mmagnuski
Copy link

The code below:

A = rand((5, 5))
b = mean(A, 1) .> 0.5
A[:, b]

produces the following error on julia-0.5.0-rc3:

ERROR: BoundsError: attempt to access 5×5 Array{Float64,2} at index [Colon(),Bool[true true false true false]]
 in throw_boundserror(::Array{Float64,2}, ::Tuple{Colon,BitArray{2}}) at .\abstractarray.jl:355
 in checkbounds at .\abstractarray.jl:284 [inlined]
 in _getindex at .\multidimensional.jl:264 [inlined]
 in getindex(::Array{Float64,2}, ::Colon, ::BitArray{2}) at .\abstractarray.jl:752

This used to work on julia 0.4 - is the change intended?
Doing A[:, vec(b)] works of course.

@mbauman
Copy link
Member

mbauman commented Aug 29, 2016

Yes, this is intentional. With the new APL indexing behaviors, the shapes of the indices are significant, and so the meaning of indexing by a multidimensional logical array is a little muddled (unless it's the only index provided). See the discussion at #15431 (comment) and preceding comments which led to this decision if you're curious.

I think this could be reverted to a deprecation warning to suggest vec as you found, but it seems a little late in the game to do so… and I'm not sure I'll have time to get it in place quickly enough. Thoughts?

@mbauman mbauman changed the title BoundsError when idexing an array with 2d bit array BoundsError when indexing an array with 2d bit array Aug 29, 2016
@tkelman
Copy link
Contributor

tkelman commented Aug 29, 2016

In retrospect it would have been nicer to have this go through a deprecation, I'm a little surprised no one said anything earlier. But it is getting very late to put that in, and some of other closely related indexing changes don't really have deprecations. Could just document the behavior change as clearly as we can?

@mmagnuski
Copy link
Author

mmagnuski commented Aug 29, 2016

Ok, I looked through the discussion. I'm still a bit sad and surprised that things like A[:, mean(A, 1) .> 0.5] or B[bitrand(3, 4), :] don't work (as they seem very common and convenient) but I will handle it. Thanks for clarifications!

@JeffBezanson
Copy link
Member

Maybe part of the problem there is that mean(A, 1) should return a vector (#16606).

@timholy
Copy link
Member

timholy commented Aug 29, 2016

This is probably the strongest argument in favor of dropping dimensions from reductions. The harm to broadcasting is quite unfortunate, though. I'll post other discussion over at #16606.

@andreasnoack
Copy link
Member

andreasnoack commented Aug 29, 2016

What about advertising mean(A, 1:1) in the same way we advertise A[1:1,:] if you want to preserve the dimensions? Never mind. I forgot to read #16606 (comment) first

@mbauman
Copy link
Member

mbauman commented Aug 29, 2016

The restriction here is purely artificial, just aimed at making it easier to catch mistakes and giving us some leeway in deciding what to do in the future. Clearly, this wasn't an mistake.

We could certainly remove this restriction. Or we could loosen it such that it allows arrays with only one non-singleton dimension. I kinda like the latter option.

@timholy
Copy link
Member

timholy commented Aug 29, 2016

@andreasnoack, because you can also reduce over multiple dimensions at once, for example mean(A, 1:2) will calculate the mean over dimensions 1 and 2 of a 3d array.

@timholy
Copy link
Member

timholy commented Aug 29, 2016

@mbauman, won't that be inconsistent with "the output dimensions are equal to the sum of the dimensions of the indices"?

@mbauman
Copy link
Member

mbauman commented Aug 29, 2016

Yup, but logical indexing is already a fairly strange special case. It's a trade-off. I'm not convinced either way at the moment; just brainstorming possible solutions.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Sep 7, 2016

Logical indexing with a non-vector does seem like a fairly strange and marginal case, so allowing/requiring boolean indexing with non-vectors with all but one dimension singular seems fairly sensible.

On the other hand, I feel like we're going to get the "why doesn't produce a vector" question over and over again and the sum(A, 1:1) solution does what one wants. I feel like we all the more need a keep(1) operation that wraps a scalar as a lightweight single-dimensional "vector". There are actually a bunch of unary operators available, but I'm not sure we want to write something like sum(A,^1) and A[^1,:], sum(A,&1) and A[&1,:] (reclaimed), sum(A,\1) and A[\1,:]. If &x is taken to mean Ref(x) and Ref(x) is considered a single-element vector containing x then the &1 syntax could be made to work.

@nalimilan
Copy link
Member

nalimilan commented Sep 7, 2016

Maybe indexing with a one-element tuple, e.g. (1,)? That solution has been suggested at #18379 to prevent broadcasting.

mbauman added a commit to mbauman/julia that referenced this issue Sep 8, 2016
Fixes JuliaLang#18271. Allow multidimensional logical arrays as indices into a particular dimension so long as there is only one non-singleton dimension.
@mmagnuski
Copy link
Author

Logical indexing with a non-vector does seem like a fairly strange and marginal case, so allowing/requiring boolean indexing with non-vectors with all but one dimension singular seems fairly sensible.

I don't think it's that uncommon, I've been using it pretty often in numpy and julia. I would be really sad if it's gone forever in Julia. :(

@timholy
Copy link
Member

timholy commented Sep 9, 2016

Stefan meant that it already violates some of our ordinary "rules," and was supporting the fix that's already been merged in #18401. In other words, your example works again on master.

@mmagnuski
Copy link
Author

mmagnuski commented Sep 9, 2016

I meant something different too - that I often do things like indexing 3d matrices with 2d BitArray and a normal index for the last dimension (A[A[:, :, 1] .> 0, 1] += 1). And that would not be possible in julia 0.5, right?

@mmagnuski
Copy link
Author

(but thanks for making my example work again!)

@timholy
Copy link
Member

timholy commented Sep 9, 2016

That doesn't work for 0.4, either. Hmm. It's a completely reasonable operation to want to support.

@timholy timholy reopened this Sep 9, 2016
@mmagnuski
Copy link
Author

@timholy Ah, you are right - it doesn't work for 0.4, I got it mixed up with numpy, where it works. Would be really nice if 2d boolean indexing was supported :).

@mbauman
Copy link
Member

mbauman commented Sep 9, 2016

Thanks for reporting it, @mmagnuski! It's not something I intended to break.

Adding support for things like A[A[:, :, 1] .> 0, 1] is definitely something I'd like to see, but unfortunately I think it's at odds with ignoring the singleton dimensions. On the plus side, it's really easy to implement since we now support vectors of CartesianIndexs:

function findcartesian{N}(B::AbstractArray{Bool,N})
    idxs = Vector{CartesianIndex{N}}()
    for i in CartesianRange(indices(B))
        B[i] && push!(idxs, i)
    end
    idxs
end

julia> A = rand(-5:5, 4, 3, 2)
4×3×2 Array{Int64,3}:
[:, :, 1] =
  4   1   0
  0  -2   5
  0  -1   3
 -1  -3  -3

[:, :, 2] =
  5  -2   4
 -3  -4  -4
 -2  -3   5
 -5  -2   0

julia> A[findcartesian(A[:,:,1] .> 0), 1] += 10;
       A
4×3×2 Array{Int64,3}:
[:, :, 1] =
 14  11   0
  0  -2  15
  0  -1  13
 -1  -3  -3

[:, :, 2] =
  5  -2   4
 -3  -4  -4
 -2  -3   5
 -5  -2   0

@mmagnuski
Copy link
Author

@mbauman I like the trick with CartesianIndex, I didn't have much time before to play with it, thanks!
However - going back to boolean indexing and ignoring singleton dimensions - wouldn't it be possible (in principle) to check if the number of non-singleton dimensions is correct instead of allowing only boolean indexing with BitArrays that have only one non-singleton dimension?

@mbauman
Copy link
Member

mbauman commented Sep 9, 2016

I'm afraid there are some weird edge cases in combining these two functionalities… particularly when there's no non-singleton dimensions.

Here's a rather contrived example:

julia> A = reshape([-2, 5, -4, -1, -4, 5], 1, 1, 2, 3)
       A[A[:,:,1,2] .< 0, 1, 2] # This works on master! One singleton dimension is dropped, making it A[[true], 1, 2]
1-element Array{Int64,1}:
 5

julia> A[findcartesian(A[:,:,1,2] .< 0), 1, 2] # This actually returns the element below 0 as intended.
1-element Array{Int64,1}:
 -4

I don't think we can tell that case apart from the case where someone used a reduction and they wanted the singleton dimensions dropped. I think there exist less contrived examples, too.

I think we need to make reductions drop dimensions, or we need to go through a deprecation cycle to instruct users to use vec themselves to drop singleton dimensions. And then we can implement the nifty general multidimensional logical indexing.

@StefanKarpinski
Copy link
Member

I guess dropping singleton dimensions everywhere is the most consistent approach. That makes me want a concise syntax for keeping them all the more since now it's not just for slicing.

@timholy
Copy link
Member

timholy commented Sep 9, 2016

I should say I like your &i proposal better than anything else, if that is available.

@mbauman
Copy link
Member

mbauman commented Sep 9, 2016

Another option would be a binding intended for postfix multiplication — optimized multiplication with something that's effectively [1] does what we want.

@timholy
Copy link
Member

timholy commented Sep 17, 2016

Yes, that's a good idea too. Seems to be a common way to save on parentheses.

tkelman pushed a commit that referenced this issue Sep 17, 2016
Fixes #18271. Allow multidimensional logical arrays as indices into a particular dimension so long as there is only one non-singleton dimension.

(cherry picked from commit 5fa5b7f)
ref #18401

Fixup inline comment

(cherry picked from commit 2bbcd80)

Update devdocs for bounds checking following #18401

I had forgotten that this behavior was documented in the developer section of the manual when authoring #18401.

(cherry picked from commit b8406ff)
ref #18552
@mbauman
Copy link
Member

mbauman commented Dec 13, 2016

This issue followed a bit of a meandering path that's a little hard to follow in retrospect.

For clarity, let's call "ignoring singleton dimensions in logical indexing" the behavior that allows A[:, mean(A, 1) .> 0.5]. This was permitted in 0.4. In 0.5, alongside APL indexing, we restricted logical indices such that they either must be vectors or match the entire size of the array.

In response to this issue, I restored the 0.4 behavior to allow singleton dimensions in logical indices in #18401, but it was merged after 0.5 was branched and didn't make the cut for a backport. And so it was subsequently reverted in #18573, once again disallowing singleton dimensions in logical indexing.

The plus side is that breaking this behavior allows us to move towards a much more general and more consistent interpretation of multidimensional logical indices, such that the additional dimensions in a logical mask actually represent indexing into additional dimensions into the source array. This generalizes the whole-array masking operations A[mask] to multiple- and single-dimensions. In short, in this scheme logical indices are always transformed by findcartesian as I describe above.

@mbauman mbauman closed this as completed Dec 13, 2016
@mmagnuski
Copy link
Author

Thanks for the update @mbauman!
So would things like A[A[:, :, 1] .> 0, 1] += 1 be allowed in Julia?

@mbauman
Copy link
Member

mbauman commented Dec 13, 2016

Yes, that's my vision here, and I think it's the consensus.

@mmagnuski
Copy link
Author

happy to hear that! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants