Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add native Array slices/views with shared data #8809

Closed
wants to merge 1 commit into from

Conversation

gummif
Copy link
Contributor

@gummif gummif commented Oct 25, 2014

Add method arrayslice(A, pos, dim) which returns a slice of type Array and is memory safe. The slices use contiguous parts of memory from A and can be any dimensional. The advantage for this special case compared to ArrayViews/SubArrays is performance and compatibility with any code for Arrays.

The C function jl_slice_owned_array is actually a general case of jl_reshape_array, so there is a question whether the code duplication is ok, or whether to make the reshape function call the slice function.

Memory check (same as with reshape):

n=int(1e8)
x=rand(n);
gc() # nothing happens
y=arrayslice(x,101,20)
gc() # nothing happens
x=1
gc() # nothing happens
y=1
gc() # memory usage reduces

Code example:

julia> x=rand(3,5)
3x5 Array{Float64,2}:
 0.839011   0.954     0.185297  0.236349  0.0152856
 0.0864201  0.561014  0.215385  0.219099  0.948724 
 0.708425   0.489022  0.20037   0.907779  0.399409 

julia> y=arrayslice(x,7,(3,3))
3x3 Array{Float64,2}:
 0.185297  0.236349  0.0152856
 0.215385  0.219099  0.948724 
 0.20037   0.907779  0.399409 

julia> I=eye(3);

julia> copy!(y,I);

julia> x
3x5 Array{Float64,2}:
 0.839011   0.954     1.0  0.0  0.0
 0.0864201  0.561014  0.0  1.0  0.0
 0.708425   0.489022  0.0  0.0  1.0

julia> y=arrayslice(x,1,3)
3-element Array{Float64,1}:
 0.839011
 0.0864201
 0.708425

julia> scale!(100,y);

julia> x
3x5 Array{Float64,2}:
 83.9011   0.954     1.0  0.0  0.0
  8.64201  0.561014  0.0  1.0  0.0
 70.8425   0.489022  0.0  0.0  1.0

See also #8795

@gummif
Copy link
Contributor Author

gummif commented Oct 25, 2014

My only concern is that a->isaligned = data->isaligned might not be correct for all data types (e.g. Uint8).

@JeffBezanson
Copy link
Member

I suppose alignment is an issue for any implementation of array views.

@timholy
Copy link
Member

timholy commented Oct 26, 2014

How much better is this than ContiguousViews? (In terms of both speed and memory allocation.)

For run-time error checking of user inputs, it's better to use something other than @assert.

@gummif
Copy link
Contributor Author

gummif commented Oct 26, 2014

arrayslice with Julia latest compared to view with Julia-0.3 (didn't load in latest).

  • 1.25x speedup for a simple for-loop.
  • 3.00x speedup for scale!
  • constructor speed and allocation very similar
  • ArrayView does not check for view bounds e.g. xview[-1] is fine here, which is usually not desirable

Parameters:

n = 600
tn = 10000
x = randn(n,3)
v = 1.00001
function f1(x,v)
    for j=1:length(x)
        x[j] += v
    end
end

Typical output:

julia> @time xview = view(x,n+1:n*2)
elapsed time: 1.2021e-5 seconds (216 bytes allocated)

julia> @time for i=1:tn
           f1(xview,v)
       end
elapsed time: 0.02136735 seconds (623688 bytes allocated)

julia> @time for i=1:tn
           scale!(v,xview)
       end
elapsed time: 0.024176824 seconds (623688 bytes allocated)

julia> @time xslice = arrayslice(x,n+1,n)
elapsed time: 1.0543e-5 seconds (200 bytes allocated)

julia> @time for i=1:tn
           f1(xslice,v)
       end
elapsed time: 0.017143545 seconds (623688 bytes allocated)

julia> @time for i=1:tn
           scale!(v,xslice)
       end
elapsed time: 0.007889941 seconds (623688 bytes allocated)

Native method for Array slices using shared memory
@timholy
Copy link
Member

timholy commented Oct 26, 2014

I wouldn't take the construction timing too seriously unless you do it in a (function) loop. But the rest is probably OK (it might be best to do the looping inside a function, too). Also, boundschecking should come to views in the not too distant future, see #7941.

But these are details; overall, I suspect there is real merit in this---the functionality-to-size ratio for this patch is quite favorable.

@Jutho
Copy link
Contributor

Jutho commented Oct 27, 2014

Where is the allocation coming from in f1 and scale!. Should these be allocation free? Did you run this in global scope? I don't currently have ArrayViews installed, nor did I bother to checkout this branch, but I wouldn't expect that much allocation unless I missed something.

@timholy
Copy link
Member

timholy commented Oct 27, 2014

I suspect if you put the loop inside a function, you won't see it---it's probably just for the return value. But definitely worth testing.

@andreasnoack
Copy link
Member

Indeed. I get

julia> let
       n = 600
       tn = 10^7
       x = randn(n,3)
       v = 1.00001
       @time xview = view(x,n+1:n*2)
       @time for i=1:tn
                  f1(xview,v)
              end
       @time for i=1:tn
                  scale!(v,xview)
              end
       @time xslice = arrayslice(x,n+1,n)
       @time for i=1:tn
                  f1(xslice,v)
              end
       @time for i=1:tn
                  scale!(v,xslice)
              end
       end
elapsed time: 4.84e-7 seconds (80 bytes allocated)
elapsed time: 5.808660996 seconds (0 bytes allocated)
elapsed time: 4.907682724 seconds (0 bytes allocated)
elapsed time: 1.129e-6 seconds (144 bytes allocated)
elapsed time: 3.759845979 seconds (0 bytes allocated)
elapsed time: 1.651122024 seconds (0 bytes allocated)

@Jutho
Copy link
Contributor

Jutho commented Oct 27, 2014

The big difference for scale! is remarkable. Is the arrayslice being scaled by BLAS whereas the view is not?

Anyway, a lightweight construction for array slices could be a useful addition to Base, even it would just be to interface certain libraries (e.g. it would help with #7907 )

@andreasnoack
Copy link
Member

Yes, unfortunately that is the difference. Calling BLAS.scal! directly gives a quite different result.

julia> let
       n = 600
       tn = 10^7
       x = randn(n,3)
       v = 1.00001
       @time xview = view(x,n+1:n*2)
       @time for i=1:tn
                  f1(xview,v)
              end
       @time for i=1:tn
                  BLAS.scal!(n,v,xview,1)
              end
       @time xslice = arrayslice(x,n+1,n)
       @time for i=1:tn
                  f1(xslice,v)
              end
       @time for i=1:tn
                  BLAS.scal!(n,v,xslice,1)
              end
       end
elapsed time: 6.71e-7 seconds (80 bytes allocated)
elapsed time: 5.802042786 seconds (0 bytes allocated)
elapsed time: 0.913455843 seconds (0 bytes allocated)
elapsed time: 1.146e-6 seconds (144 bytes allocated)
elapsed time: 3.786127559 seconds (0 bytes allocated)
elapsed time: 0.865561249 seconds (0 bytes allocated)

The scale! functions need a general cleanup. It is a bit unfortunate that we have

julia> @which scale!(1.1, randn(2))
scale!(s::Number,X::AbstractArray{T,N}) at linalg/generic.jl:33

julia> @which scale!(randn(2), 1.1)
scale!{T<:Union(Complex{Float32},Float64,Complex{Float64},Float32)}(X::Array{T<:Union(Complex{Float32},Float64,Complex{Float64},Float32),N},s::T<:Union(Complex{Float32},Float64,Complex{Float64},Float32)) at linalg/dense.jl:11

@gummif
Copy link
Contributor Author

gummif commented Oct 27, 2014

OK thanks for that. The contruction with arrayslice is a bit bigger, but it looks like there are considerable gains for non-BLAS functions.

@timholy
Copy link
Member

timholy commented Oct 27, 2014

At least on my machine, there's an even bigger difference if you use @inbounds in the definition of f1 (I get a 6-fold difference; I was testing with a regular Vector and not the arrayslice patch, but I imagine they perform identically). Presumably this is due to automatic SIMD-ification.

Of course, for an ideal compiler there really shouldn't be any difference between these. So one option is to take this as evidence that other areas need to be improved. Another is to say, in the short term we should do this and strive to eventually eliminate the gap.

@simonster
Copy link
Member

With #8827, a tweak to annotate loads from immutables as constant (not sure if this is legal in all cases, but it should be legal in this case), and @inbounds, ContiguousViews inexplicably become faster than arrays on my system:

elapsed time: 1.159e-6 seconds (80 bytes allocated)
elapsed time: 1.487991947 seconds (0 bytes allocated)
elapsed time: 1.495554059 seconds (0 bytes allocated)
elapsed time: 2.862e-6 seconds (144 bytes allocated)
elapsed time: 1.815113036 seconds (0 bytes allocated)
elapsed time: 1.832906432 seconds (0 bytes allocated)

@timholy
Copy link
Member

timholy commented Oct 27, 2014

Dratted slog-through-accumulated-email delay...

Very nice!

@lindahua
Copy link
Contributor

lindahua commented Nov 1, 2014

I am not completely sure whether this is necessary. As @simonster shows, when the compiler does the right thing for immutables, ArrayViews is just as fast, but it is more general.

@gummif
Copy link
Contributor Author

gummif commented Nov 3, 2014

Right. I guess this is equivalent to ArrayViews + compiler magic.

@gummif gummif closed this Nov 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants