Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ReinterpretArray #23750

Merged
merged 3 commits into from
Oct 9, 2017
Merged

Implement ReinterpretArray #23750

merged 3 commits into from
Oct 9, 2017

Conversation

Keno
Copy link
Member

@Keno Keno commented Sep 18, 2017

This redoes reinterpret in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from reinterpret is now ReinterpretArray, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

In addition, the new reinterpret implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238

@Keno Keno added this to the 1.0 milestone Sep 18, 2017
we always emit that as a memcpy rather than a load/store pair. However,
this can give worse optimization results in certain cases because some
optimizations that can handle load/store pairs cannot handle memcpys.
Mem2reg is one of these optimizations. This patch adds rudamentary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rudimentary

// appropriate coercion manually.
AllocaInst *AI = cast<AllocaInst>(p);
Type *AllocType = AI->getAllocatedType();
if (!AI->isArrayAllocation() && !AllocType->isAggregateType() && !AllocType->isVectorTy()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe test for what it can handle directly?

(AllocType->isFloatingPointTy() || AllocType->isIntegerTy() || AllocType->isPointerTy()) &&
(to->isFloatingPointTy() || AllocType->isIntegerTy() || AllocType->isPointerTy())

}
if (!dest)
return unboxed;
Type *dest_ty = unboxed->getType()->getPointerTo();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave dest handling here

src/cgutils.cpp Outdated
Value *src_ptr = data_pointer(ctx, src, T_pint8);
if (dest->getType() != T_pint8)
dest = emit_bitcast(ctx, dest, T_pint8);
Value *src_ptr = data_pointer(ctx, src);
if (skip) // copy dest -> dest to simulate an undef value / conditional copy
src_ptr = ctx.builder.CreateSelect(skip, dest, src_ptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is broken now?

"""
struct ReinterpretArray{T,S,N,A<:AbstractArray{S, N}} <: AbstractArray{T, N}
parent::A
Base.reinterpret(::Type{T}, a::A) where {T,S,N,A<:AbstractArray{S, N}} = new{T, S, N, A}(a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code assumes isbits(T) && isbits(S), it should be asserted here on construction (like we did before)

Base.reinterpret(::Type{T}, a::A) where {T,S,N,A<:AbstractArray{S, N}} = new{T, S, N, A}(a)
end

Base.eltype(a::ReinterpretArray{T}) where {T} = T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use Base. for code in Base.

return reinterpret(T, a[inds...])
elseif sizeof(T) > sizeof(S)
nels = div(sizeof(T), sizeof(S))
ind_off = (inds[1]-1) * nels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not inbounds if called as ReinterpretArray(a)[]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe define that method specifically? Base.getindex(a::ReinterpretArray) = a[1], if that's right?

o = Ref{T}()
optr = Base.unsafe_convert(Ref{T}, o)
for i = 1:nels
unsafe_store!(convert(Ptr{S}, optr)+(i-1)*sizeof(S), a.parent[ind_off + i, tail(inds)...])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe_store! can already handle this pointer offset computation:

optr = Ptr{S}(Base.unsafe_convert(Ref{T}, o))
for i in 1:nels
    unsafe_store!(optr, a.parent[ind_off + i, tail(inds)...], i)
end

r = Ref{S}(a.parent[1+ind, Base.tail(inds)...])
@gc_preserve r begin
rptr = Base.unsafe_convert(Ref{S}, r)
ret = unsafe_load(convert(Ptr{T}, rptr) + sub*sizeof(T))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly here, use:

ret = unsafe_load(Ptr{T}(rptr), sub)

Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

function Base.size(a::ReinterpretArray{T,S}) where {T,S}
psize = size(a.parent)
if sizeof(T) > sizeof(S)
size1 = div(psize[1], div(sizeof(T), sizeof(S)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this first div can't be const-ified by inference/LLVM. Since size is used in bounds-checks, and since div is achingly slow, I wonder if we need to cache this value in the struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

div(T, S) is constant folded. The other div is generally folded to shifts by LLVM.

end
return o[]
else
ind, sub = divrem(inds[1]-1, div(sizeof(S), sizeof(T)))
Copy link
Member

@timholy timholy Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a likely performance bottleneck. Consider using a cached SignedMultiplicativeInverse here.

EDIT: ...unless LLVM can perform this optimization itself. This is a case where the denominator is known at compile time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, llvm does very well here, since we're dealing with integers and known divisors.

If the size of `T` differs from the size of `S`, the array will be compressed/expanded in
the first dimension.
"""
struct ReinterpretArray{T,S,N,A<:AbstractArray{S, N}} <: AbstractArray{T, N}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps switch places for S and N? Typically for AbstractArrays the element type is the first parameter, and the rank the second parameter.

@kshyatt kshyatt added the arrays [a, r, r, a, y, s] label Sep 19, 2017
ind, sub = divrem(inds[1]-1, div(sizeof(S), sizeof(T)))
r = Ref{S}(a.parent[1+ind, tail(inds)...])
rptr = unsafe_convert(Ref{S}, r)
unsafe_store!(Ptr{T}(rptr), v, sub+1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rptr is not valid here, since r is not gc_preserve_begin'd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r is used after, but I can put in a gc preserve just for completeness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appearing in the source code after does not gc-preserve a value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

o = Ref{T}()
optr = unsafe_convert(Ref{T}, o)
for i = 1:nels
unsafe_store!(Ptr{S}(optr), a.parent[ind_off + i, tail(inds)...], i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for optr here

@Keno Keno force-pushed the kf/reinterpretarray branch from 89834c6 to afc6839 Compare September 20, 2017 02:43
@Keno
Copy link
Member Author

Keno commented Sep 20, 2017

Rebased, review addressed. Upstream LLVM patch isn't looking to good. The recommendation is to get rid of the early mem2reg in favor of SROA. We can't do that on pre-0.5, so I think carrying this patch might be fine. I opened #23772 to track looking into possible changes to the pass pipeline.

@Keno Keno changed the title WIP: Implement ReinterpretArray Implement ReinterpretArray Sep 20, 2017
@Keno Keno force-pushed the kf/reinterpretarray branch 7 times, most recently from e62c1fb to 6e9056e Compare September 27, 2017 18:46
@Keno Keno force-pushed the kf/reinterpretarray branch from 6e9056e to 11e941c Compare September 28, 2017 21:12
@Keno
Copy link
Member Author

Keno commented Sep 29, 2017

Looks like my comment on this got lost. In any case, I think this is good to go. I had to update it to deal with some corner cases (length 3 array of Complex{Int64} to length 2 array of NTuple{3, Int64}), which lost some performance, but I'm confident that can be recovered at the LLVM level. I do want to get this in since it's fairly large and on the 1.0 path.

base/io.jl Outdated
@@ -267,15 +267,16 @@ readlines(s=STDIN; chomp::Bool=true) = collect(eachline(s, chomp=chomp))

## byte-order mark, ntoh & hton ##

let endian_boms = reinterpret(UInt8, UInt32[0x01020304])
a = UInt32[0x01020304]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be let-bound.

reinterpret(T, a, size(a))
end

function reinterpret(::Type{T}, a::Array{S}, dims::NTuple{N,Int}) where T where S where N
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a deprecation.

@JeffBezanson JeffBezanson added the needs news A NEWS entry is required for this change label Sep 29, 2017
@Keno
Copy link
Member Author

Keno commented Sep 29, 2017

@JeffBezanson made a good point about the sparse array stuff. Reinterpret used to work fine on sparse arrays and would reinterpret the structural non-zeros. However in general, a reinterpreted zero is no longer zero. It's not entirely clear that that was a good behavior and what to do about it.
For this PR, I'm inclined to leave it alone (but put back the restriction on requiring equal element size).

@Keno Keno force-pushed the kf/reinterpretarray branch 4 times, most recently from 56dcd47 to 8fd5d61 Compare September 29, 2017 21:52

@inline @propagate_inbounds function setindex!(a::ReinterpretArray{T,N,S}, v, inds::Vararg{Int, N}) where {T,N,S}
v = convert(T, v)::T
if sizeof(T) == sizeof(S)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

sptr = Ptr{UInt8}(unsafe_convert(Ref{S}, s))
nbytes_copied = 0
i = 1
@inline function copy_element()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably shouldn't use a closure, since this isn't inferrable

@@ -2,7 +2,7 @@

abstract type AbstractSparseArray{Tv,Ti,N} <: AbstractArray{Tv,N} end

const AbstractSparseVector{Tv,Ti} = AbstractSparseArray{Tv,Ti,1}
const AbstractSparseVector{Tv,Ti} = Union{AbstractSparseArray{Tv,Ti,1}, Base.ReinterpretArray{Tv,1,T,<:AbstractSparseArray{T,Ti,1}} where T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ewww

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, this was true with the old definition of reinterpret on SparseArrays, it's not true with the new definition (see comment on nonzeroinds)

ind * div(sizeof(S), sizeof(T))
end
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method assumes that reinterpret(zero(T)) === zero(S). I don't think we should define it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So did the old reinterpret kind off, which is why I didn't change it. Happy to get rid of this though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old definition returned a sparse reinterpreted array (SparseArray{ReinterpretedArray{parent}}), the new one returns a reinterpreted sparse array (Reinterpreted{A}). The main point where they differ is in the handling of this method, since the old one would return a strong zero (zero(S)), where the new one would return a casted zero (reinterpret(S, zero(T)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but since nonzeroinds returns structural non-zeros, that still seems fine to (if odd if reinterpret(S, zero(T)) != zero(S). Would be good to have some input from people who care about sparse matrices.

@Keno
Copy link
Member Author

Keno commented Oct 7, 2017

Seems like the preferred solution is to disallow reinterpret on SparseArrays. I'll back out the sparse parts of this and update NEWS accordingly.

@Keno
Copy link
Member Author

Keno commented Oct 8, 2017

Is this agreeable with everyone now?

@Keno
Copy link
Member Author

Keno commented Oct 9, 2017

Seeing no objections, this is getting merged.

@Keno Keno merged commit 8273529 into master Oct 9, 2017
@StefanKarpinski StefanKarpinski deleted the kf/reinterpretarray branch October 9, 2017 03:00
@ararslan ararslan added breaking This change will break code and removed needs news A NEWS entry is required for this change labels Oct 9, 2017
@ararslan
Copy link
Member

ararslan commented Oct 9, 2017

@Keno
Copy link
Member Author

Keno commented Oct 9, 2017

I'll take a look.

@Keno
Copy link
Member Author

Keno commented Oct 9, 2017

Looks like JLD is the thing that's borked. Also looks like JLD hasn't been updated to support bitsunions yet.

@ararslan
Copy link
Member

I don't know who's actively maintaining JLD at this point. Migrating Nanosoldier's serialization format from JLD to JSON is on my to-do list but it isn't at the top.

@nalimilan
Copy link
Member

Is this failure expected?

julia> x = reinterpret(Int, 1:4)
4-element reinterpret(Int64, ::UnitRange{Int64}):
 1
 2
 3
 4

julia> next(x, endof(x))
ERROR: BoundsError
Stacktrace:
 [1] getindex at ./number.jl:57 [inlined]
 [2] next(::Base.ReinterpretArray{Int64,1,Int64,UnitRange{Int64}}, ::Int64) at ./abstractarray.jl:763

This breaks code which uses while i <= endof(x) and then next(x, i) and expects this to work on all vectors. I'm not sure whether it's supposed to be supported or not.

@Keno
Copy link
Member Author

Keno commented Oct 22, 2017

Since start of AbstractArray is

julia> start(x)
(Base.OneTo(4), 1)

I don't think that particular feature is part of the AbstractArray contract. Does it hold for say AxisArrays?

@nalimilan
Copy link
Member

Yeah, I guess we're just used to Vector, BitVector and ranges being the only AbstractArray common implementations. AxisArray doesn't support it either.

unsigned alignment = julia_alignment(typ, 0);
emit_memcpy(ctx, dest, src_ptr, jl_datatype_size(typ), alignment, isVolatile, tbaa);
Value *nbytes = ConstantInt::get(T_size, nb);
if (skip) // copy dest -> dest to simulate an undef value / conditional copy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that LLVM recently hit this problem too, and are working on a real fix (especially due the performance problems this change may cause) https://reviews.llvm.org/D86815

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] breaking This change will break code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get rid of reinterpret in its current form Multiplication of a SharedArray and an Array
9 participants