Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving indexing operations of ranges of bit integers #26608

Closed
haampie opened this issue Mar 24, 2018 · 3 comments
Closed

Improving indexing operations of ranges of bit integers #26608

haampie opened this issue Mar 24, 2018 · 3 comments
Labels
arrays [a, r, r, a, y, s]

Comments

@haampie
Copy link
Contributor

haampie commented Mar 24, 2018

Currently, computations in getindex(::UnitRange, ::Integer) can overflow during the bounds check. If so, it throws an InexactError; not a BoundsError. Compare:

> (-100:100)[400] # ok
ERROR: BoundsError: attempt to access 201-element UnitRange{Int64} at index [400]
> (Int8(-100):Int8(100))[400] # :(
ERROR: InexactError: trunc(Int8, 299)
> (-100:100)[typemax(UInt)] # :(
ERROR: InexactError: check_top_bit(UInt64, 18446744073709551514)

I think I have fixed this for all bit integer types (both the range type and index type) with this hack:

using Base: BitInteger

function getindex(r::UnitRange{Tr}, i::Integer) where {Tr<:BitInteger}
    @_inline_meta
    # if i ≥ 1 we can safely do i - 1 without ending up at typemax(i)
    # if r.start ≤ r.stop we could compute r.stop - r.start, but this number may not be representable in signed integers, so switch to unsigned integers with a hack.
    @boundscheck i  1 && r.start  r.stop && i - 1  unsigned(r.stop) - unsigned(r.start) || throw_boundserror(r, i)

    return (r.start + (i - 1)) % Tr
end

But I'm not 100% sure this is the way to go and if I'm covering everything. At least it fixes the problems listed above:

> (Int8(-100):Int8(100))[400]
ERROR: BoundsError: attempt to access 201-element UnitRange{Int8} at index [400]
> (-100:100)[typemax(UInt)]
ERROR: BoundsError: attempt to access 201-element UnitRange{Int64} at index [18446744073709551615]
@haampie
Copy link
Contributor Author

haampie commented Mar 25, 2018

Another issue not solved by the above:

> (false : true)[3]
ERROR: InexactError()
Stacktrace:
 [1] convert(::Type{Bool}, ::Int64) at ./bool.jl:7
 [2] getindex(::UnitRange{Bool}, ::Int64) at ./range.jl:476

@mbauman
Copy link
Member

mbauman commented Mar 25, 2018

The other thing to keep in mind here is just how performance-sensitive bounds checks are. That really constrains our ability to do fancy things. I'd love for this to be better, though! The abstract checkbounds works in all your error cases, but I'm certain that we're using a custom bounds checking implementation as an optimization. It might be worth doing an updated check on the difference in performance there.

@mbauman mbauman added the arrays [a, r, r, a, y, s] label Mar 25, 2018
@haampie
Copy link
Contributor Author

haampie commented Mar 26, 2018

Thanks for the reply! A few observations:

  1. My implementation suggested above is indeed slower than the current implementation
  2. The checkbounds method is nearly twice as fast as the current implementation, but unfortunately fails when length(r) overflows
    > checkbounds(typemin(Int) : typemax(Int), 10)
    ERROR: OverflowError: 9223372036854775807 - -9223372036854775808 overflowed for type Int64
    Now length(r) will always overflow for typemin(Int) : typemax(Int), since the length is typemax(UInt)+1. I think this issue was the rationale for the current implementation rather than performance.
  3. The current implementation has overhead even when @inbounds is used; the bottleneck seems convert(T, ...) which is called outside of @boundscheck.

Considering all this, I think it's best to keep the current implementation, but remove the convert call:

function new_getindex(v::UnitRange{T}, i::Integer) where T
    @_inline_meta
    value = v.start + i - 1 # Overflow is well-defined behaviour; if this overflows then value < start regardless
    @boundscheck ((i > 0) & (value  v.start) & (value  v.stop)) || throw_boundserror(v, i)
    value % T # This improves performance
end

It works for all edge cases above. I've benchmarked it with r = Int8(-128):Int8(127):

function sum_impl_1(r::UnitRange{T}) where {T}
    sum = zero(T)
    for idx = linearindices(r) # idx of type Int64
        sum += r[idx]
    end
    sum
end

function sum_impl_2(r::UnitRange{T}) where {T}
    sum = zero(T)
    for idx = linearindices(r) # idx of type Int64
        sum += new_getindex(r, idx)
    end
    sum
end

function sum_impl_3(r::UnitRange{T}) where {T}
    sum = zero(T)
    val = r.start
    while true
        sum += val
        val += T(1)

        if val == r.stop
            break
        end
    end
    sum
end

Which gives:

  • With bounds checking: 141.935 ns vs 74.925 ns vs 1.451 ns respectively;
  • Without bounds checking: 142.058 ns vs 1.452 ns vs 1.452 ns respectively.

Interpretation: new implementation is faster with bounds checking, and without bounds checking the compiler is able to optimize the whole for-loop out. I'll just make this a PR and see whether it passes tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s]
Projects
None yet
Development

No branches or pull requests

2 participants