RFC: Add native Array slices/views with shared data #8809

gummif · 2014-10-25T02:48:56Z

Add method arrayslice(A, pos, dim) which returns a slice of type Array and is memory safe. The slices use contiguous parts of memory from A and can be any dimensional. The advantage for this special case compared to ArrayViews/SubArrays is performance and compatibility with any code for Arrays.

The C function jl_slice_owned_array is actually a general case of jl_reshape_array, so there is a question whether the code duplication is ok, or whether to make the reshape function call the slice function.

Memory check (same as with reshape):

n=int(1e8)
x=rand(n);
gc() # nothing happens
y=arrayslice(x,101,20)
gc() # nothing happens
x=1
gc() # nothing happens
y=1
gc() # memory usage reduces

Code example:

julia> x=rand(3,5)
3x5 Array{Float64,2}:
 0.839011   0.954     0.185297  0.236349  0.0152856
 0.0864201  0.561014  0.215385  0.219099  0.948724 
 0.708425   0.489022  0.20037   0.907779  0.399409 

julia> y=arrayslice(x,7,(3,3))
3x3 Array{Float64,2}:
 0.185297  0.236349  0.0152856
 0.215385  0.219099  0.948724 
 0.20037   0.907779  0.399409 

julia> I=eye(3);

julia> copy!(y,I);

julia> x
3x5 Array{Float64,2}:
 0.839011   0.954     1.0  0.0  0.0
 0.0864201  0.561014  0.0  1.0  0.0
 0.708425   0.489022  0.0  0.0  1.0

julia> y=arrayslice(x,1,3)
3-element Array{Float64,1}:
 0.839011
 0.0864201
 0.708425

julia> scale!(100,y);

julia> x
3x5 Array{Float64,2}:
 83.9011   0.954     1.0  0.0  0.0
  8.64201  0.561014  0.0  1.0  0.0
 70.8425   0.489022  0.0  0.0  1.0

See also #8795

gummif · 2014-10-25T13:58:46Z

My only concern is that a->isaligned = data->isaligned might not be correct for all data types (e.g. Uint8).

JeffBezanson · 2014-10-25T17:11:05Z

I suppose alignment is an issue for any implementation of array views.

timholy · 2014-10-26T13:54:29Z

How much better is this than ContiguousViews? (In terms of both speed and memory allocation.)

For run-time error checking of user inputs, it's better to use something other than @assert.

gummif · 2014-10-26T18:27:12Z

arrayslice with Julia latest compared to view with Julia-0.3 (didn't load in latest).

1.25x speedup for a simple for-loop.
3.00x speedup for scale!
constructor speed and allocation very similar
ArrayView does not check for view bounds e.g. xview[-1] is fine here, which is usually not desirable

Parameters:

n = 600
tn = 10000
x = randn(n,3)
v = 1.00001
function f1(x,v)
    for j=1:length(x)
        x[j] += v
    end
end

Typical output:

julia> @time xview = view(x,n+1:n*2)
elapsed time: 1.2021e-5 seconds (216 bytes allocated)

julia> @time for i=1:tn
           f1(xview,v)
       end
elapsed time: 0.02136735 seconds (623688 bytes allocated)

julia> @time for i=1:tn
           scale!(v,xview)
       end
elapsed time: 0.024176824 seconds (623688 bytes allocated)

julia> @time xslice = arrayslice(x,n+1,n)
elapsed time: 1.0543e-5 seconds (200 bytes allocated)

julia> @time for i=1:tn
           f1(xslice,v)
       end
elapsed time: 0.017143545 seconds (623688 bytes allocated)

julia> @time for i=1:tn
           scale!(v,xslice)
       end
elapsed time: 0.007889941 seconds (623688 bytes allocated)

Native method for Array slices using shared memory

timholy · 2014-10-26T21:57:10Z

I wouldn't take the construction timing too seriously unless you do it in a (function) loop. But the rest is probably OK (it might be best to do the looping inside a function, too). Also, boundschecking should come to views in the not too distant future, see #7941.

But these are details; overall, I suspect there is real merit in this---the functionality-to-size ratio for this patch is quite favorable.

Jutho · 2014-10-27T13:23:36Z

Where is the allocation coming from in f1 and scale!. Should these be allocation free? Did you run this in global scope? I don't currently have ArrayViews installed, nor did I bother to checkout this branch, but I wouldn't expect that much allocation unless I missed something.

timholy · 2014-10-27T13:47:26Z

I suspect if you put the loop inside a function, you won't see it---it's probably just for the return value. But definitely worth testing.

andreasnoack · 2014-10-27T13:51:46Z

Indeed. I get

julia> let
       n = 600
       tn = 10^7
       x = randn(n,3)
       v = 1.00001
       @time xview = view(x,n+1:n*2)
       @time for i=1:tn
                  f1(xview,v)
              end
       @time for i=1:tn
                  scale!(v,xview)
              end
       @time xslice = arrayslice(x,n+1,n)
       @time for i=1:tn
                  f1(xslice,v)
              end
       @time for i=1:tn
                  scale!(v,xslice)
              end
       end
elapsed time: 4.84e-7 seconds (80 bytes allocated)
elapsed time: 5.808660996 seconds (0 bytes allocated)
elapsed time: 4.907682724 seconds (0 bytes allocated)
elapsed time: 1.129e-6 seconds (144 bytes allocated)
elapsed time: 3.759845979 seconds (0 bytes allocated)
elapsed time: 1.651122024 seconds (0 bytes allocated)

Jutho · 2014-10-27T13:55:21Z

The big difference for scale! is remarkable. Is the arrayslice being scaled by BLAS whereas the view is not?

Anyway, a lightweight construction for array slices could be a useful addition to Base, even it would just be to interface certain libraries (e.g. it would help with #7907 )

andreasnoack · 2014-10-27T14:13:17Z

Yes, unfortunately that is the difference. Calling BLAS.scal! directly gives a quite different result.

julia> let
       n = 600
       tn = 10^7
       x = randn(n,3)
       v = 1.00001
       @time xview = view(x,n+1:n*2)
       @time for i=1:tn
                  f1(xview,v)
              end
       @time for i=1:tn
                  BLAS.scal!(n,v,xview,1)
              end
       @time xslice = arrayslice(x,n+1,n)
       @time for i=1:tn
                  f1(xslice,v)
              end
       @time for i=1:tn
                  BLAS.scal!(n,v,xslice,1)
              end
       end
elapsed time: 6.71e-7 seconds (80 bytes allocated)
elapsed time: 5.802042786 seconds (0 bytes allocated)
elapsed time: 0.913455843 seconds (0 bytes allocated)
elapsed time: 1.146e-6 seconds (144 bytes allocated)
elapsed time: 3.786127559 seconds (0 bytes allocated)
elapsed time: 0.865561249 seconds (0 bytes allocated)

The scale! functions need a general cleanup. It is a bit unfortunate that we have

julia> @which scale!(1.1, randn(2))
scale!(s::Number,X::AbstractArray{T,N}) at linalg/generic.jl:33

julia> @which scale!(randn(2), 1.1)
scale!{T<:Union(Complex{Float32},Float64,Complex{Float64},Float32)}(X::Array{T<:Union(Complex{Float32},Float64,Complex{Float64},Float32),N},s::T<:Union(Complex{Float32},Float64,Complex{Float64},Float32)) at linalg/dense.jl:11

gummif · 2014-10-27T15:42:52Z

OK thanks for that. The contruction with arrayslice is a bit bigger, but it looks like there are considerable gains for non-BLAS functions.

timholy · 2014-10-27T22:15:10Z

At least on my machine, there's an even bigger difference if you use @inbounds in the definition of f1 (I get a 6-fold difference; I was testing with a regular Vector and not the arrayslice patch, but I imagine they perform identically). Presumably this is due to automatic SIMD-ification.

Of course, for an ideal compiler there really shouldn't be any difference between these. So one option is to take this as evidence that other areas need to be improved. Another is to say, in the short term we should do this and strive to eventually eliminate the gap.

simonster · 2014-10-27T22:15:43Z

With #8827, a tweak to annotate loads from immutables as constant (not sure if this is legal in all cases, but it should be legal in this case), and @inbounds, ContiguousViews inexplicably become faster than arrays on my system:

elapsed time: 1.159e-6 seconds (80 bytes allocated)
elapsed time: 1.487991947 seconds (0 bytes allocated)
elapsed time: 1.495554059 seconds (0 bytes allocated)
elapsed time: 2.862e-6 seconds (144 bytes allocated)
elapsed time: 1.815113036 seconds (0 bytes allocated)
elapsed time: 1.832906432 seconds (0 bytes allocated)

timholy · 2014-10-27T22:38:20Z

Dratted slog-through-accumulated-email delay...

Very nice!

lindahua · 2014-11-01T05:39:52Z

I am not completely sure whether this is necessary. As @simonster shows, when the compiler does the right thing for immutables, ArrayViews is just as fast, but it is more general.

gummif · 2014-11-03T01:07:03Z

Right. I guess this is equivalent to ArrayViews + compiler magic.

gummif force-pushed the vecslice branch from 4f3679f to 949bd78 Compare October 25, 2014 13:56

add arrayslice

27bbb39

Native method for Array slices using shared memory

gummif force-pushed the vecslice branch from 949bd78 to 27bbb39 Compare October 26, 2014 18:53

timholy mentioned this pull request Oct 27, 2014

Avoid undefined checks for fields that are always initialized #8827

Merged

jiahao force-pushed the master branch from cdde4df to 7fdc860 Compare October 28, 2014 04:20

simonster mentioned this pull request Oct 31, 2014

Use tbaa for heap-allocated immutables #8867

Merged

gummif closed this Nov 3, 2014

simonster mentioned this pull request Dec 12, 2014

Make SubArrays immutable #9322

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Add native Array slices/views with shared data #8809

RFC: Add native Array slices/views with shared data #8809

gummif commented Oct 25, 2014

gummif commented Oct 25, 2014

JeffBezanson commented Oct 25, 2014

timholy commented Oct 26, 2014

gummif commented Oct 26, 2014

timholy commented Oct 26, 2014

Jutho commented Oct 27, 2014

timholy commented Oct 27, 2014

andreasnoack commented Oct 27, 2014

Jutho commented Oct 27, 2014

andreasnoack commented Oct 27, 2014

gummif commented Oct 27, 2014

timholy commented Oct 27, 2014

simonster commented Oct 27, 2014

timholy commented Oct 27, 2014

lindahua commented Nov 1, 2014

gummif commented Nov 3, 2014

RFC: Add native Array slices/views with shared data #8809

RFC: Add native Array slices/views with shared data #8809

Conversation

gummif commented Oct 25, 2014

gummif commented Oct 25, 2014

JeffBezanson commented Oct 25, 2014

timholy commented Oct 26, 2014

gummif commented Oct 26, 2014

timholy commented Oct 26, 2014

Jutho commented Oct 27, 2014

timholy commented Oct 27, 2014

andreasnoack commented Oct 27, 2014

Jutho commented Oct 27, 2014

andreasnoack commented Oct 27, 2014

gummif commented Oct 27, 2014

timholy commented Oct 27, 2014

simonster commented Oct 27, 2014

timholy commented Oct 27, 2014

lindahua commented Nov 1, 2014

gummif commented Nov 3, 2014