Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up logical indexing by bitarray #29746

Merged
merged 3 commits into from
Oct 24, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions base/bitarray.jl
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ IndexStyle(::Type{<:BitArray}) = IndexLinear()
const _msk64 = ~UInt64(0)
@inline _div64(l) = l >> 6
@inline _mod64(l) = l & 63
@inline _BLSR(x)= x & (x-1) #zeros the last set bit. Has native instruction on many archs. needed in multidimensional.jl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pickiest possible nit: maybe _blsr to match the surrounding capitalization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amended.

@inline _msk_end(l::Integer) = _msk64 >>> _mod64(-l)
@inline _msk_end(B::BitArray) = _msk_end(length(B))
num_bit_chunks(n::Int) = _div64(n+63)
Expand Down
26 changes: 15 additions & 11 deletions base/multidimensional.jl
Original file line number Diff line number Diff line change
Expand Up @@ -523,19 +523,23 @@ end
L.mask[idx] && return (idx, s)
end
end
# When wrapping a BitArray, lean heavily upon its internals -- this is a common
# case. Just use the Int index and count as its state.
@inline function iterate(L::LogicalIndex{Int,<:BitArray}, s=(0,1))
s[2] > length(L) && return nothing
i, n = s
# When wrapping a BitArray, lean heavily upon its internals.
@inline function iterate(L::Base.LogicalIndex{Int,<:BitArray})
L.sum == 0 && return nothing
Bc = L.mask.chunks
while true
if Bc[_div64(i)+1] & (UInt64(1)<<_mod64(i)) != 0
i += 1
return (i, (i, n+1))
end
i += 1
return iterate(L::Base.LogicalIndex{Int,<:BitArray}, (1, @inbounds Bc[1]))
chethega marked this conversation as resolved.
Show resolved Hide resolved
end
@propagate_inbounds function iterate(L::Base.LogicalIndex{Int,<:BitArray}, s)
Bc = L.mask.chunks
i1, c = s
while c==0
i1 == length(Bc) && return nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of @propagate_inbounds, it'd be a bit nicer to do i1 >= length(Bc) (or even that unsigned trick to account for negative i1s). Then we could just always use @inbounds Bc[i1].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure about the desired safety here. If the iteration interface is used correctly, this will always be be inbounds. If we get passed an OOB-state, do we need to raise a correct error, are we allowed to return garbage (as long as no OOB read occurs) or are we allowed to perform an OOB read (possibly with segfault) and then return garbage?

The TEST instead of CMP was actually cribbed from the iteration code for vectors. Benchmarks and Agner Fog agree that there is no price difference between these instructions. I could change to i1 >= length(Bc), with the disadvantage that misuse silently returns nothing instead of throwing an OOB error in julia --check-bounds=yes, and the advantage that we won't read OOB on most misuses, even in julia --check-bounds=no. I think we should prefer the OOB error?

For some reason the UInt version is much slower. I tried the following, with no success:

@inline function Base.iterate(L::Base.LogicalIndex{Int,<:BitArray})
    L.sum == 0 && return nothing
    Bc = L.mask.chunks
    return Base.iterate(L::Base.LogicalIndex{Int,<:BitArray}, (UInt(0), @inbounds Bc[1]))
end


@inline function Base.iterate(L::Base.LogicalIndex{Int,<:BitArray}, s::Tuple{UInt,UInt64})
    Bc = L.mask.chunks
    i1, c = s
    while c==0 
        i1 += 1
        i1 < length(Bc) || return nothing        
        @inbounds c = Bc[i1+1] 
    end
    tz = trailing_zeros(c)+1
    c = _BLSR(c)
    return (i1<<6 +tz, (i1, c))
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the disadvantage that misuse silently returns nothing instead of throwing an OOB error

julia> iterate([1], 2)

julia> iterate((1,), 2)

totally fine to return nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got a fast UInt version now. Only question is about what is our preferred error behavior: nothing vs OOB error with check-bounds and OOB read with possible segfault when given an @inbounds invalid state. I still prefer the OOB error, tbh. Do we have a policy for that kind of question?

@inline _BLSR(x)= x & (x-1) 
@inline function Base.iterate(L::Base.LogicalIndex{Int,<:BitArray})
    L.sum == 0 && return nothing
    Bc = L.mask.chunks
    return Base.iterate(L::Base.LogicalIndex{Int,<:BitArray}, (1, @inbounds Bc[1]))
end

@inline function Base.iterate(L::Base.LogicalIndex{Int,<:BitArray}, s)
    Bc = L.mask.chunks
    i1, c = s
    while c==0 
        i1%UInt >= length(Bc)%UInt && return nothing
        i1 += 1
        @inbounds c = Bc[i1] 
    end
    tz = trailing_zeros(c)+1
    c = _BLSR(c)
    return ((i1-1)<<6 +tz, (i1, c))
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"When in doubt, do as the Arrays".

So, couldn't just something like

iterate(A::Array, i=1) = (@_inline_meta; (i % UInt) - 1 < length(A) ? (@inbounds A[i], i + 1) : nothing)

be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays currently give nothing for OOB states, and ranges silently return garbage:

julia> iterate(1:10, 17)
(18, 18)
julia> iterate(collect(1:10), 17) === nothing
true

In view of that, I guess it would be consistent enough to return nothing and reap the speedup when there is no @inbounds annotation. So I'll change that.

i1 += 1
c = Bc[i1]
end
tz = trailing_zeros(c) + 1
c = _BLSR(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really clever idea. Nice work.

return ((i1-1)<<6 + tz, (i1, c))
end

@inline checkbounds(::Type{Bool}, A::AbstractArray, I::LogicalIndex{<:Any,<:AbstractArray{Bool,1}}) =
Expand Down