Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow SubMatrix copy #3098

Closed
bsxfan opened this issue May 14, 2013 · 10 comments
Closed

slow SubMatrix copy #3098

bsxfan opened this issue May 14, 2013 · 10 comments
Labels
performance Must go faster

Comments

@bsxfan
Copy link

bsxfan commented May 14, 2013

For comparison to copy!, define:

loopcopy!(D,S) = (for i=1:length(D);D[i]=S[i];end;D)

Create regular and strided source and destination matrices:

S = randn(1000,1000); 
Ssub = sub(randn(2000,1000),1:2:2000,1:1000);
D = zeros(1000,1000); 
Dsub = sub(zeros(2000,1000),1:2:2000,1:1000);

Time loopcopy! vs copy!. Timings below are for 32-bit windows, but behaviour on 64-bit linux is similar:

julia> sum([@elapsed copy!(D,S) for i=1:100])
0.452816014
julia> sum([@elapsed loopcopy!(D,S) for i=1:100])
0.4669809420000002

OK, very close, but for SubMatrix source:

julia> sum([@elapsed loopcopy!(D,Ssub) for i=1:100])
1.8858308039999998
julia> sum([@elapsed copy!(D,Ssub) for i=1:100])
10.769275100000002

copy! is much slower. Similarly:

julia> sum([@elapsed loopcopy!(Dsub,Ssub) for i=1:100])
3.675649855999999
julia> sum([@elapsed copy!(Dsub,Ssub) for i=1:100])
12.968587926999998

Counterexample:

julia> sum([@elapsed loopcopy!(Dsub,S) for i=1:100])
2.3974653309999994
julia> sum([@elapsed copy!(Dsub,S) for i=1:100])
2.401044135
@timholy
Copy link
Member

timholy commented May 14, 2013

In the case of arrays, on my 64-bit linux (x86_64) I get slightly better performance from copy! than loopcopy! (0.25s vs 0.31s). But by-and-large I replicate your findings.

The problem seems to come from abstractarray.jl:copy!'s use of the construct for x in src. If I replace that with for j = 1:length(src)... then they work out the same.

I'll also note that one can do vastly better with subarrays:

parent(S::SubArray) = S.parent
parent(A::Array) = A

function fastcopy!{T}(pdest::Matrix{T}, sdest::NTuple{2,Int}, psrc::Matrix{T}, ssrc::NTuple{2,Int}, sz::NTuple{2,Int})
    m = sz[1]
    sdest1 = sdest[1]
    ssrc1 = ssrc[1]
    for j = 0:sz[2]-1
        kdest = j*sdest[2]+1
        ksrc = j*ssrc[2]+1
        for i = 0:m-1
            idest = kdest+i*sdest1
            isrc = ksrc+i*ssrc1
            pdest[idest] = psrc[isrc]
        end
    end
end

function fastcopy!{T}(dest::StridedMatrix{T}, src::StridedMatrix{T})
    @assert size(dest) == size(src)
    sdest = strides(dest)
    ssrc = strides(src)
    pdest = parent(dest)
    psrc = parent(src)
    fastcopy!(pdest, sdest, psrc, ssrc, size(dest))
    dest
end

Test:

julia> sum([@elapsed loopcopy!(D,Ssub) for i=1:100])
3.1146607700000004

julia> sum([@elapsed fastcopy!(D,Ssub) for i=1:100])
0.4422959499999998

Images.jl uses some of these tricks, although I haven't yet profiled to see if they are working as expected.
https://github.com/timholy/Images.jl/blob/master/src/algorithms.jl#L117

@timholy
Copy link
Member

timholy commented May 14, 2013

(Basically, subarrays need the kind of love that I gave array.jl long ago. A variant of make_arrayind_loop_nest might need to be written to handle two arrays.)

@ViralBShah
Copy link
Member

On a slightly orthogonal but not completely unrelated note, there has been the proposal of making array indexing return views. That has benefits of greatly reducing copying and GC, and also removes the need to use subarrays in many cases. Now that we can do stuff like pointer_to_array(pointer(a, m), dims), it should be pretty easy to get high performance array views.

@timholy
Copy link
Member

timholy commented May 14, 2013

Definitely related. I even started working on something like that within the first few days of the merging of immutable types, but got bit by challenges stemming from the need for better inlining. I haven't looked recently.

@toivoh
Copy link
Contributor

toivoh commented May 14, 2013

@ViralBShah: I thought that those views would be subarrays themselves? In my view, producing array views will increase the need to be able to deal with subarrays/strided arrays.

Also, it should be straightforward to get machinery such as what I propose for fast bsxfun in #3100 to create loops like @timholy's fastcopy above, on demand.

@bsxfan
Copy link
Author

bsxfan commented May 15, 2013

Thanks @timholy for your nice fastcopy!() example. That clears up some mysteries for me. But yes, you are right, SubArray could definitely benefit from some more attention. I've just logged two more issues involving subarray. See #3114 and JuliaLang/LinearAlgebra.jl#14.

@poulson
Copy link

poulson commented Jan 27, 2014

Have there been any updates on the creation of submatrix views? I am experimenting with using Julia in a linear algebra course that I'm teaching and would find it very helpful for providing high-performance code samples for factorizations.

@timholy
Copy link
Member

timholy commented Jan 27, 2014

Gimmee a couple more days, and a faster version of copy! should land. I've just been swamped with other things.

Meanwhile @lindahua has submitted the latest proposal for a faster ArrayView (#5556), which I also haven't had time to look at yet but which sounds promising.

@ViralBShah
Copy link
Member

Can we close this with the new SubArrays, or should we wait for the updates to getindex and such?

@timholy
Copy link
Member

timholy commented Nov 23, 2014

We've actually had a fast copy! for months (shortly after Base.Cartesian landed).

@timholy timholy closed this as completed Nov 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants