Implement ReinterpretArray #23750

Keno · 2017-09-18T02:26:49Z

This redoes reinterpret in julia rather than punning the memory
of the actual array. The motivation for this is to avoid the API
limitations of the current reinterpret implementation (Array only,
preventing strong TBAA, alignment problems). The surface API
essentially unchanged, though the shape argument to reinterpret
is removed, since those concepts are now orthogonal. The return
type from reinterpret is now ReinterpretArray, which implements
the AbstractArray interface and does the reinterpreting lazily on
demand. The compiler is able to fold away the abstraction and
generate very tight IR:

julia> ar = reinterpret(Complex{Int64}, rand(Int64, 1000));

julia> typeof(ar)
Base.ReinterpretArray{Complex{Int64},Int64,1,Array{Int64,1}}

julia> f(ar) = @inbounds return ar[1]
f (generic function with 1 method)

julia> @code_llvm f(ar)

; Function f
; Location: REPL[2]
define void @julia_f_63575({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(8)) #0 {
top:
; Location: REPL[2]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to %jl_value_t addrspace(10)* addrspace(11)*
  %4 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %3, align 8
  %5 = addrspacecast %jl_value_t addrspace(10)* %4 to %jl_value_t addrspace(11)*
  %6 = bitcast %jl_value_t addrspace(11)* %5 to i64* addrspace(11)*
  %7 = load i64*, i64* addrspace(11)* %6, align 8
  %8 = load i64, i64* %7, align 8
  %9 = getelementptr i64, i64* %7, i64 1
  %10 = load i64, i64* %9, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %8, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %10, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

julia> g(a) = @inbounds return reinterpret(Complex{Int64}, a)[1]
g (generic function with 1 method)

julia> @code_llvm g(randn(1000))

; Function g
; Location: REPL[4]
define void @julia_g_63642({ i64, i64 } addrspace(11)* noalias nocapture sret, %jl_value_t addrspace(10)* dereferenceable(40)) #0 {
top:
; Location: REPL[4]:1
; Function getindex; {
; Location: reinterpretarray.jl:31
  %2 = addrspacecast %jl_value_t addrspace(10)* %1 to %jl_value_t addrspace(11)*
  %3 = bitcast %jl_value_t addrspace(11)* %2 to double* addrspace(11)*
  %4 = load double*, double* addrspace(11)* %3, align 8
  %5 = bitcast double* %4 to i64*
  %6 = load i64, i64* %5, align 8
  %7 = getelementptr double, double* %4, i64 1
  %8 = bitcast double* %7 to i64*
  %9 = load i64, i64* %8, align 8
  %.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 0
  store i64 %6, i64 addrspace(11)* %.sroa.0.0..sroa_idx, align 8
  %.sroa.3.0..sroa_idx13 = getelementptr inbounds { i64, i64 }, { i64, i64 } addrspace(11)* %0, i64 0, i32 1
  store i64 %9, i64 addrspace(11)* %.sroa.3.0..sroa_idx13, align 8
;}
  ret void
}

In addition, the new reinterpret implementation is able to handle any AbstractArray
(whether useful or not is a separate decision):

invoke(reinterpret, Tuple{Type{Complex{Float64}}, AbstractArray}, Complex{Float64}, speye(10))
5×10 Base.ReinterpretArray{Complex{Float64},Float64,2,SparseMatrixCSC{Float64,Int64}}:
 1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im  0.0+0.0im  0.0+0.0im
 0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  0.0+0.0im  1.0+0.0im  0.0+1.0im

The remaining todo is to audit the uses of reinterpret in base. I've fixed up the uses themselves, but there's
code deeper in the array code that needs to be broadened to allow ReinterpretArray.

Fixes #22849
Fixes #19238

quinnj · 2017-09-18T02:54:02Z

deps/patches/llvm-D37939-Mem2Reg-Also-handle-memcpy.patch

+we always emit that as a memcpy rather than a load/store pair. However,
+this can give worse optimization results in certain cases because some
+optimizations that can handle load/store pairs cannot handle memcpys.
+Mem2reg is one of these optimizations. This patch adds rudamentary


rudimentary

vtjnash · 2017-09-18T15:41:02Z

src/intrinsics.cpp

+            // appropriate coercion manually.
+            AllocaInst *AI = cast<AllocaInst>(p);
+            Type *AllocType = AI->getAllocatedType();
+            if (!AI->isArrayAllocation() && !AllocType->isAggregateType() && !AllocType->isVectorTy()) {


maybe test for what it can handle directly?

(AllocType->isFloatingPointTy() || AllocType->isIntegerTy() || AllocType->isPointerTy()) && (to->isFloatingPointTy() || AllocType->isIntegerTy() || AllocType->isPointerTy())

vtjnash · 2017-09-18T15:44:44Z

src/intrinsics.cpp

-        }
-        if (!dest)
-            return unboxed;
-        Type *dest_ty = unboxed->getType()->getPointerTo();


leave dest handling here

vtjnash · 2017-09-18T15:48:44Z

src/cgutils.cpp

-                Value *src_ptr = data_pointer(ctx, src, T_pint8);
-                if (dest->getType() != T_pint8)
-                    dest = emit_bitcast(ctx, dest, T_pint8);
+                Value *src_ptr = data_pointer(ctx, src);
                if (skip) // copy dest -> dest to simulate an undef value / conditional copy
                    src_ptr = ctx.builder.CreateSelect(skip, dest, src_ptr);


this is broken now?

vtjnash · 2017-09-18T15:54:50Z

base/reinterpretarray.jl

+"""
+struct ReinterpretArray{T,S,N,A<:AbstractArray{S, N}} <: AbstractArray{T, N}
+    parent::A
+    Base.reinterpret(::Type{T}, a::A) where {T,S,N,A<:AbstractArray{S, N}} = new{T, S, N, A}(a)


this code assumes isbits(T) && isbits(S), it should be asserted here on construction (like we did before)

vtjnash · 2017-09-18T15:57:26Z

base/reinterpretarray.jl

+    Base.reinterpret(::Type{T}, a::A) where {T,S,N,A<:AbstractArray{S, N}} = new{T, S, N, A}(a)
+end
+
+Base.eltype(a::ReinterpretArray{T}) where {T} = T


Don't use Base. for code in Base.

vtjnash · 2017-09-18T16:03:02Z

base/reinterpretarray.jl

+        return reinterpret(T, a[inds...])
+    elseif sizeof(T) > sizeof(S)
+        nels = div(sizeof(T), sizeof(S))
+        ind_off = (inds[1]-1) * nels


not inbounds if called as ReinterpretArray(a)[]

maybe define that method specifically? Base.getindex(a::ReinterpretArray) = a[1], if that's right?

vtjnash · 2017-09-18T16:05:06Z

base/reinterpretarray.jl

+        o = Ref{T}()
+        optr = Base.unsafe_convert(Ref{T}, o)
+        for i = 1:nels
+            unsafe_store!(convert(Ptr{S}, optr)+(i-1)*sizeof(S), a.parent[ind_off + i, tail(inds)...])


unsafe_store! can already handle this pointer offset computation:

optr = Ptr{S}(Base.unsafe_convert(Ref{T}, o)) for i in 1:nels unsafe_store!(optr, a.parent[ind_off + i, tail(inds)...], i) end

vtjnash · 2017-09-18T16:08:14Z

base/reinterpretarray.jl

+        r = Ref{S}(a.parent[1+ind, Base.tail(inds)...])
+        @gc_preserve r begin
+            rptr = Base.unsafe_convert(Ref{S}, r)
+            ret = unsafe_load(convert(Ptr{T}, rptr) + sub*sizeof(T))


similarly here, use:

ret = unsafe_load(Ptr{T}(rptr), sub)

timholy

👍

timholy · 2017-09-18T16:28:36Z

base/reinterpretarray.jl

+function Base.size(a::ReinterpretArray{T,S}) where {T,S}
+    psize = size(a.parent)
+    if sizeof(T) > sizeof(S)
+        size1 = div(psize[1], div(sizeof(T), sizeof(S)))


Presumably this first div can't be const-ified by inference/LLVM. Since size is used in bounds-checks, and since div is achingly slow, I wonder if we need to cache this value in the struct.

div(T, S) is constant folded. The other div is generally folded to shifts by LLVM.

timholy · 2017-09-18T16:30:57Z

base/reinterpretarray.jl

+        end
+        return o[]
+    else
+        ind, sub = divrem(inds[1]-1, div(sizeof(S), sizeof(T)))


Also a likely performance bottleneck. Consider using a cached SignedMultiplicativeInverse here.

EDIT: ...unless LLVM can perform this optimization itself. This is a case where the denominator is known at compile time.

Yes, llvm does very well here, since we're dealing with integers and known divisors.

fredrikekre · 2017-09-18T17:43:13Z

base/reinterpretarray.jl

+If the size of `T` differs from the size of `S`, the array will be compressed/expanded in
+the first dimension.
+"""
+struct ReinterpretArray{T,S,N,A<:AbstractArray{S, N}} <: AbstractArray{T, N}


Perhaps switch places for S and N? Typically for AbstractArrays the element type is the first parameter, and the rank the second parameter.

vtjnash · 2017-09-19T23:53:55Z

base/reinterpretarray.jl

+        ind, sub = divrem(inds[1]-1, div(sizeof(S), sizeof(T)))
+        r = Ref{S}(a.parent[1+ind, tail(inds)...])
+        rptr = unsafe_convert(Ref{S}, r)
+        unsafe_store!(Ptr{T}(rptr), v, sub+1)


rptr is not valid here, since r is not gc_preserve_begin'd

r is used after, but I can put in a gc preserve just for completeness.

appearing in the source code after does not gc-preserve a value

fair enough

vtjnash · 2017-09-19T23:54:21Z

base/reinterpretarray.jl

+        o = Ref{T}()
+        optr = unsafe_convert(Ref{T}, o)
+        for i = 1:nels
+            unsafe_store!(Ptr{S}(optr), a.parent[ind_off + i, tail(inds)...], i)


same for optr here

Keno · 2017-09-20T02:45:10Z

Rebased, review addressed. Upstream LLVM patch isn't looking to good. The recommendation is to get rid of the early mem2reg in favor of SROA. We can't do that on pre-0.5, so I think carrying this patch might be fine. I opened #23772 to track looking into possible changes to the pass pipeline.

Keno · 2017-09-29T17:20:34Z

Looks like my comment on this got lost. In any case, I think this is good to go. I had to update it to deal with some corner cases (length 3 array of Complex{Int64} to length 2 array of NTuple{3, Int64}), which lost some performance, but I'm confident that can be recovered at the LLVM level. I do want to get this in since it's fairly large and on the 1.0 path.

JeffBezanson · 2017-09-29T17:49:53Z

base/io.jl

@@ -267,15 +267,16 @@ readlines(s=STDIN; chomp::Bool=true) = collect(eachline(s, chomp=chomp))

 ## byte-order mark, ntoh & hton ##

-let endian_boms = reinterpret(UInt8, UInt32[0x01020304])
+a = UInt32[0x01020304]


This should also be let-bound.

JeffBezanson · 2017-09-29T18:09:36Z

base/array.jl

-    reinterpret(T, a, size(a))
-end
-
-function reinterpret(::Type{T}, a::Array{S}, dims::NTuple{N,Int}) where T where S where N


This needs a deprecation.

Keno · 2017-09-29T19:21:22Z

@JeffBezanson made a good point about the sparse array stuff. Reinterpret used to work fine on sparse arrays and would reinterpret the structural non-zeros. However in general, a reinterpreted zero is no longer zero. It's not entirely clear that that was a good behavior and what to do about it.
For this PR, I'm inclined to leave it alone (but put back the restriction on requiring equal element size).

vtjnash · 2017-10-05T20:58:18Z

base/reinterpretarray.jl

+
+@inline @propagate_inbounds function setindex!(a::ReinterpretArray{T,N,S}, v, inds::Vararg{Int, N}) where {T,N,S}
+    v = convert(T, v)::T
+    if sizeof(T) == sizeof(S)


vtjnash · 2017-10-05T20:59:17Z

base/reinterpretarray.jl

+            sptr = Ptr{UInt8}(unsafe_convert(Ref{S}, s))
+            nbytes_copied = 0
+            i = 1
+            @inline function copy_element()


probably shouldn't use a closure, since this isn't inferrable

vtjnash · 2017-10-05T21:00:01Z

base/sparse/abstractsparse.jl

@@ -2,7 +2,7 @@

 abstract type AbstractSparseArray{Tv,Ti,N} <: AbstractArray{Tv,N} end

-const AbstractSparseVector{Tv,Ti} = AbstractSparseArray{Tv,Ti,1}
+const AbstractSparseVector{Tv,Ti} = Union{AbstractSparseArray{Tv,Ti,1}, Base.ReinterpretArray{Tv,1,T,<:AbstractSparseArray{T,Ti,1}} where T}


also, this was true with the old definition of reinterpret on SparseArrays, it's not true with the new definition (see comment on nonzeroinds)

vtjnash · 2017-10-05T21:01:56Z

base/sparse/abstractsparse.jl

+            ind * div(sizeof(S), sizeof(T))
+        end
+    end
+end


This method assumes that reinterpret(zero(T)) === zero(S). I don't think we should define it.

So did the old reinterpret kind off, which is why I didn't change it. Happy to get rid of this though.

The old definition returned a sparse reinterpreted array (SparseArray{ReinterpretedArray{parent}}), the new one returns a reinterpreted sparse array (Reinterpreted{A}). The main point where they differ is in the handling of this method, since the old one would return a strong zero (zero(S)), where the new one would return a casted zero (reinterpret(S, zero(T)))

That's true, but since nonzeroinds returns structural non-zeros, that still seems fine to (if odd if reinterpret(S, zero(T)) != zero(S). Would be good to have some input from people who care about sparse matrices.

Keno · 2017-10-07T22:15:23Z

Seems like the preferred solution is to disallow reinterpret on SparseArrays. I'll back out the sparse parts of this and update NEWS accordingly.

Keno · 2017-10-08T02:39:43Z

Is this agreeable with everyone now?

Keno · 2017-10-09T02:54:33Z

Seeing no objections, this is getting merged.

ararslan · 2017-10-09T05:59:08Z

This seems to have broken HDF5. From the Nanosoldier log: https://github.com/JuliaCI/BaseBenchmarkReports/blob/9c503da1dad07ebbe4f7e2e9dadf19be6eb5b1f5/daily_2017_10_9/logs/8273529b08bd23e17be0234ce976bb137d375708_primary.err#L8567

Keno · 2017-10-09T17:54:31Z

I'll take a look.

Keno · 2017-10-09T18:15:01Z

Looks like JLD is the thing that's borked. Also looks like JLD hasn't been updated to support bitsunions yet.

ararslan · 2017-10-10T02:53:27Z

I don't know who's actively maintaining JLD at this point. Migrating Nanosoldier's serialization format from JLD to JSON is on my to-do list but it isn't at the top.

nalimilan · 2017-10-22T19:44:43Z

Is this failure expected?

julia> x = reinterpret(Int, 1:4)
4-element reinterpret(Int64, ::UnitRange{Int64}):
 1
 2
 3
 4

julia> next(x, endof(x))
ERROR: BoundsError
Stacktrace:
 [1] getindex at ./number.jl:57 [inlined]
 [2] next(::Base.ReinterpretArray{Int64,1,Int64,UnitRange{Int64}}, ::Int64) at ./abstractarray.jl:763

This breaks code which uses while i <= endof(x) and then next(x, i) and expects this to work on all vectors. I'm not sure whether it's supposed to be supported or not.

Keno · 2017-10-22T20:18:30Z

Since start of AbstractArray is

julia> start(x)
(Base.OneTo(4), 1)

I don't think that particular feature is part of the AbstractArray contract. Does it hold for say AxisArrays?

nalimilan · 2017-10-22T20:43:19Z

Yeah, I guess we're just used to Vector, BitVector and ranges being the only AbstractArray common implementations. AxisArray doesn't support it either.

vtjnash · 2020-09-23T16:54:16Z

src/cgutils.cpp

                unsigned alignment = julia_alignment(typ, 0);
-                emit_memcpy(ctx, dest, src_ptr, jl_datatype_size(typ), alignment, isVolatile, tbaa);
+                Value *nbytes = ConstantInt::get(T_size, nb);
+                if (skip) // copy dest -> dest to simulate an undef value / conditional copy


It appears that LLVM recently hit this problem too, and are working on a real fix (especially due the performance problems this change may cause) https://reviews.llvm.org/D86815

Keno added this to the 1.0 milestone Sep 18, 2017

quinnj reviewed Sep 18, 2017

View reviewed changes

vtjnash reviewed Sep 18, 2017

View reviewed changes

timholy reviewed Sep 18, 2017

View reviewed changes

fredrikekre reviewed Sep 18, 2017

View reviewed changes

kshyatt added the arrays [a, r, r, a, y, s] label Sep 19, 2017

vtjnash reviewed Sep 19, 2017

View reviewed changes

Keno force-pushed the kf/reinterpretarray branch from 89834c6 to afc6839 Compare September 20, 2017 02:43

Keno changed the title ~~WIP: Implement ReinterpretArray~~ Implement ReinterpretArray Sep 20, 2017

Keno force-pushed the kf/reinterpretarray branch 7 times, most recently from e62c1fb to 6e9056e Compare September 27, 2017 18:46

Keno force-pushed the kf/reinterpretarray branch from 6e9056e to 11e941c Compare September 28, 2017 21:12

JeffBezanson reviewed Sep 29, 2017

View reviewed changes

JeffBezanson added the needs news A NEWS entry is required for this change label Sep 29, 2017

Keno force-pushed the kf/reinterpretarray branch 4 times, most recently from 56dcd47 to 8fd5d61 Compare September 29, 2017 21:52

vtjnash reviewed Oct 5, 2017

View reviewed changes

Discontinue reinterpret on sparse arrays

0054051

Keno merged commit 8273529 into master Oct 9, 2017

StefanKarpinski deleted the kf/reinterpretarray branch October 9, 2017 03:00

ararslan added breaking This change will break code and removed needs news A NEWS entry is required for this change labels Oct 9, 2017

Sacha0 mentioned this pull request Oct 10, 2017

[WIP] improved dense-sparse and sparse-dense matrix multiplication kernels #24045

Closed

nalimilan mentioned this pull request Oct 22, 2017

move DirectIndexString from Base JuliaStrings/LegacyStrings.jl#24

Merged

maleadt mentioned this pull request Dec 6, 2017

LLVM assertion error on release-0.6 while building docs #24855

Closed

pablosanjose mentioned this pull request Dec 6, 2017

Fixing ReinterpretArray breakage - first PR KristofferC/NearestNeighbors.jl#59

Closed

timholy mentioned this pull request Dec 27, 2017

Rewrite for julia 0.7 JuliaImages/ImageCore.jl#52

Merged

fredrikekre mentioned this pull request Jan 9, 2018

move SparseArrays to stdlib #25249

Merged

4 tasks

cormullion mentioned this pull request Mar 18, 2018

NEWS.md is getting a bit untidy #26508

Closed

mschauer mentioned this pull request May 21, 2018

Reinterpret an Array of Float64 as an Array of SVector{Float64} ? JuliaArrays/StaticArrays.jl#410

Closed

vtjnash reviewed Sep 23, 2020

View reviewed changes

This was referenced Nov 24, 2022

Why is reinterpret prohibited? JuliaSparse/SparseArrays.jl#289

Closed

Fix v[i] = -0.0 and support reinterpret JuliaSparse/SparseArrays.jl#296

Merged

Implement ReinterpretArray #23750

Implement ReinterpretArray #23750

Conversation

Keno commented Sep 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timholy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timholy Sep 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Keno commented Sep 20, 2017

Keno commented Sep 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Keno commented Sep 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Keno commented Oct 7, 2017

Keno commented Oct 8, 2017

Keno commented Oct 9, 2017

ararslan commented Oct 9, 2017

Keno commented Oct 9, 2017

Keno commented Oct 9, 2017

ararslan commented Oct 10, 2017

nalimilan commented Oct 22, 2017

Keno commented Oct 22, 2017

nalimilan commented Oct 22, 2017

Choose a reason for hiding this comment

Keno commented Sep 18, 2017 •

edited

Loading

timholy Sep 18, 2017 •

edited

Loading