Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Safer, extensible ﹫inbounds #8227

Closed
wants to merge 1 commit into from
Closed

Conversation

simonster
Copy link
Member

This implements my proposal from #7799 (comment). unsafe_getindex/unsafe_setindex! provide getindex/setindex! functionality without bounds checks, and @inbounds rewrites the AST to call the unsafe versions of these functions. @inbounds only affects the expression it's applied to, not all inlined function calls. @boundscheck still works and provides the old behavior in case there's a really good reason to use it. As a bonus, this variant of @inbounds could be made to pass through the value of the enclosed expression, although at the moment I don't think the behavior is as expected for setindex! expressions.

This would have been a lot easier if it didn't require reimplementing the expansion of expressions containing ref into getindex/setindex! in Julia before array.jl is loaded. It may be a better idea to adjust the femtolisp code so that it can perform this expansion without expanding anything else. Being able to rewrite getindex/setindex! with a macro may also be useful for the transition to arrays as views.

@JeffBezanson
Copy link
Member

I'm amazed you managed to reimplement that part of the expansion pass. Having that code duplicated is not good. It probably would have been easier to replace ref with unsafe_ref and do the rest by modifying the existing front-end code. It also might work to call expand from the macro, then replace calls.

I think this is also a sign that this macro is slightly sketchy (admittedly the existing @inbounds is pretty sketchy too). Replacing identifiers is inherently unhygienic, as in this contrived example:

let getindex(x) = 2x
    @inbounds getindex(...)
end

So I think there is still something subtly unsatisfying about this approach.

@simonster
Copy link
Member Author

Yes, modifying the front-end is probably a better solution. Maybe we could have a custom_ref Expr that accepts the substitute getindex and setindex! in its args, so we could reuse it for a future array indexing backwards compatibility macro. I tried using expand, but splicing the resulting Exprs into the AST did not always seem to work. It also seemed like a bad idea because other macros might not expect an expanded AST.

This macro doesn't actually replace any identifiers; it just expands ref differently, and shouldn't do anything to your example. It is true that:

let getindex(x) = 2x
    @inbounds y = x[1]
end

would end up calling unsafe_getindex from outside the let block. It might be better to fall back to getindex if it's from a different scope than unsafe_getindex, if that's even possible. Or we could automatically define unsafe_getindex(x...) = getindex(x) whenever getindex is redefined, although that doesn't seem so pretty. But I suspect code that redefines getindex and uses @inbounds is quite rare in practice, and at least the custom ref expansion is easy to reason about.

@simonbyrne
Copy link
Contributor

It would be great if this pattern could be easily applied to other functions, as this sort of problem pops up in other cases. Two that come to mind:

  • @checked for replacing arithmetic operations (+,*, ...) by their checked equivalents (checked_mul, checked_add, ...). Or conversely, @unchecked if we end up having checked arithmetic by default.
  • Specifying output destinations for array operations. I've already had a stab at this with InplaceOps.jl

@simonster
Copy link
Member Author

Here's another crack at changing how ref is expanded, this time by replacing ref nodes with custom_ref and letting the front-end expand it. That approach is indeed much less code and seems to work. But I'm not all that confident in my Scheme programming abilities and I also don't understand the indentation convention. (Why both tabs and spaces?)

@timholy
Copy link
Member

timholy commented Sep 10, 2014

👍. I can't review the Scheme code changes, but I have no complaints about the Julia code.

Looking forward to having @inbounds work on Ranges!

@@ -342,8 +342,9 @@ end
## Indexing: getindex ##

function unsafe_bitgetindex(Bc::Vector{Uint64}, i::Int)
return (Bc[@_div64(i-1)+1] & (uint64(1)<<@_mod64(i-1))) != 0
return @inbounds (Bc[@_div64(i-1)+1] & (uint64(1)<<@_mod64(i-1))) != 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great if this does no longer cause the function to return nothing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup:

julia> x = [1, 2];

julia> @inbounds x[1]
1

@vtjnash
Copy link
Member

vtjnash commented Sep 12, 2014

what if these were to go into a special module. I may need to write a longer Julep for this, but here's my basic idea for it:

Instead of unsafe_getindex & friends, implement:

module Base
  baremodule Unsafe
    getindex(x...) = Base.getindex(x...)
    setindex!(x...) = Base.setindex!(x...)
    # unsafe versions go here
  end
  # the existing definitions remain here
end

And perhaps add a few new intrinsics to do the unsafe operations (or perhaps just enhance pointerref to handle multiple dimensions and accept an Array type)

If the user wanted, they could then access these directly, or with an importall

But in typical usage, it would just @inline set a flag. Then in Base.resolve_globals and codegen.cpp::emit_var, this would select between looking in Base directly, or in Base.Unsafe.

But really, ideally, the following could probably be made just as efficient, without too much effort:

macro inbounds(expr)
    quote
        let getindex = Base.Unsafe.getindex, setindex! = Base.Unsafe.setindex!
            $(esc(expr))
        end
    end
end

Which would be really sweet, since it doesn't require any special handling.

@simonster
Copy link
Member Author

@vtjnash That's an interesting approach. If we set a flag we could presumably choose to substitute Base.Unsafe.getindex only for Base.getindex and not a user-defined getindex function in another scope, thus avoiding @JeffBezanson's hygiene complaint above. We might also have to tweak the frontend to get x == @inbounds x. let would be cleanest in terms of implementation but it seems to have the same hygiene problem, unless you want to make let getindex = $(esc(getindex)) === Base.getindex ? Base.Unsafe.getindex : $(esc(getindex)) efficient too.

@vtjnash
Copy link
Member

vtjnash commented Sep 12, 2014

A slightly-less contrived example is:

module MyModule
  global getindex
  getindex(x...) = ...
  function  f()
      @inline x[1]  = 2
  end
end

But what I perhaps like most about the let block approach is that it requires absolutely nothing special added to the compiler, just a new, general inference pass for elidding let expressions. And as such, it is easy to document that it is doing nothing special. So, if you want the previous version to work in a customized manner, it's still completely transparent and simple:

module MyModule
    macro inbounds(expr...)
        :( let getindex = my_getindex; :(esc(expr)); nothing; end )
    end
    # now @inbounds is something different
end

p.s. I might as well mention this too: I think the module approach also generalizes really nicely for other checked/unchecked operations (cough_arithmetic_cough). So you could rewrite the default arithmetic in Base to be always do checked math, and then have an @unchecked_math flag that worked the same, but listed all of the relevant math operators (+,-,*,/,div,^,.+,.-,.*,./,.^)

p.p.s I think Unchecked might be a better name than Unsafe for the module, because it is more evocative of why it is potentially dangerous, than simply a generic warning.

p.p.s. while my approach may seems similar, note that it would have significantly behavior in the following case, which may not be entirely desirable:

function f()
    @inbounds map(getindex, [[1],[2],[3]])
end

But then again, having the following operations calling different functions is perhaps a bit odd too:

function f()
    @inbounds begin
        let gi = getindex
            getindex(x) == gi(x) == x[]
        end
     end
end

p.p.p.s. I corrected my above @inbounds macro to have a result of nothing, like the existing macro

@simonster
Copy link
Member Author

A problem with the let approach is that it creates a new scope for whatever's inside @inbounds, so you'd get:

@inbounds x = a[1]
x # undefined

Also, I don't think we want @inbounds to have a result of nothing. That seems like an undesirable consequence of the current implementation. I think we want it to pass through the result of whatever's inside it.

@timholy timholy mentioned this pull request Sep 22, 2014
15 tasks
@simonster
Copy link
Member Author

So given the scoping issue with let, I think we have three options:

  • Use this strategy currently implemented in this PR, which only touches the frontend.
  • Add a new type of Expr that tells the codegen/type inference to substitute calls to f with calls to g, so that @inbounds $ex is Expr(:substitutecall, {Base.getindex, Base.unsafe_getindex, Base.setindex!, Base.unsafe_setindex!}, ex) or something like that.
  • Add a new type of block (with or without syntax) that behaves like let but does not introduce a new scope, and make unscoped_let getindex = Base.unsafe_getindex, setindex! = Base.unsafe_setindex! fast.

@vtjnash
Copy link
Member

vtjnash commented Sep 22, 2014

Options two and three are the same. I'll let Jeff make the final call.

@simonster
Copy link
Member Author

@vtjnash At least as I'm envisioning them, they're not quite the same: option 2 would only substitute functions for functions, while option 3 would work on at the level of identifiers. Option 2 would avoid the hygiene issue (we could substitute Base.unsafe_getindex for Base.getindex only, and not for getindex from some other scope) while option 3 wouldn't, but option 3 might be useful in some other context.

@timholy
Copy link
Member

timholy commented Nov 20, 2014

Bump. Now that we have most of the other critical elements of #7941, this (in my opinion) is the key outstanding issue. It's not obvious to me that one wants to turn on getindex(A, I...) = sub(A, I...) until we have bounds-checking.

@vtjnash vtjnash mentioned this pull request Dec 19, 2014
@timholy
Copy link
Member

timholy commented Feb 25, 2015

Given the interest in moving towards finalizing 0.4, @JeffBezanson, input/code review is needed here. In my opinion this is the main obstacle for returning views from indexing in 0.4. Most of the work is already done: in addition to the SubArray and multidimensional indexing revamp, there's #9150 waiting in the wings. But we need to make a couple of decisions before we can move forward.

Two comments:

  • I'm not clear on whether the Base.Unsafe solution is truly "user-extensible," meaning that a package author could also make use of it.
  • The question of whether @inbounds should recurse is now slightly more interesting in light of Should reshaping a SubArray produce another SubArray? #9874 (comment). If both ReshapeArray and SubArray check bounds upon construction, then checking bounds on the "outer" composition is a guarantee that it's safe all the way down. SubArray{Array} is of course going to be very common; presumably at least one ReshapeArray (so 3 layers in total) will not be exactly rare. However, one might need a boundschecked_upon_construction trait to be safe in general.

@simonster
Copy link
Member Author

Some more thoughts:

  • In RFC: Give AbstractArrays smart and performant indexing behaviors for free #10525 (comment) @mbauman proposes getindex(::Type{BoundsCheckOn}, x, I...), which has some elegance, but see the caveats in my post below in that issue.
  • Yet another way to express this is to have an Expr that says that we should first look for imported bindings from module A in module B, so Base.getindex could instead get looked up as Base.Unsafe.getindex. This is another variant of option 2 above but might be a bit less ugly, and could also be useful for @fastmath. But this doesn't solve the issue of potential code duplication between getindex and Unsafe.getindex.

@mbauman
Copy link
Member

mbauman commented Mar 17, 2015

Moving the discussion here, referencing #10525 (comment):

The first problem is that it makes defining getindex uglier for cases where performance doesn't matter

Couldn't we have ref create calls with BoundsCheckOn() by default? Here's what Base julia could define:

abstract BoundsCheck
immutable BoundsCheckOn <: BoundsCheck; end
immutable BoundsCheckOff <: BoundsCheck; end
getindex(::BoundsCheck, x, I...) = getindex(x, I...)

Then users can opt into getting bounds check information when defining their own methods. It makes calling getindex by name more complicated, but I think it could be ok… it's already a bit of an expert syntax over x[I...].

The second problem is that, at least as far as I can tell, there's still a lowering issue.

I was thinking of that part as solved by the code you've already written here. :)

@simonster
Copy link
Member Author

@mbauman With the fallback, it seems that any method defined as getindex(::BoundsCheckOn, x, I...) would take precedence over all methods defined as getindex(x, I...). Does that still help with #10525? Wouldn't you still need to define all general-purpose methods with bounds checks as getindex(x, I...) rather than getindex(::BoundsCheckOn, x, I...) to avoid shadowing user definitions?

@mbauman
Copy link
Member

mbauman commented Mar 17, 2015

The reason I liked having this as an aspect of multiple dispatch is not for defining different methods, but rather as a way to allow the compiler to elide bounds checks. But it seems like the optimization that I thought would occur isn't happening. Here's how I would write #10525 with this proposal:

Bool(::BoundsCheckOn) = true
Bool(::BoundsCheckOff) = false
getindex(b::BoundsCheck, x::AbstractArray, I...) = _getindex(b, linearindexing(x), x, I...)
function _getindex(b::BoundsCheck, ::LinearFast, x, I::Union(Real, AbstractArray, Colon)...)
    Bool(b) && checkbounds(x, I...) # I initially tried isa(b, BoundsCheckOn), but of course that doesn't work
     # compute i::Int from I...
    getindex(BoundsCheckOff(), x, i) # If the user hasn't defined this, it will fall back to the next method
end
_getindex(::BoundsCheck, ::LinearFast, x, I::Int) = getindex(x, I) # And now this will throw the appropriate no method error!

Edit: Oh, I see what you mean now. I think the same general strategy that I'm already using will work just fine with this setup. I've updated my code sketch.

@mbauman
Copy link
Member

mbauman commented Mar 17, 2015

Oooh, right you are. This will cause a strange middle-ground where getindex(::MyArray, ::Int) = 1 works but getindex(::MyArray, ::Float) = 2 doesn't. I don't like that. :\

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants