Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setindex #75

Open
michaellindon opened this issue Feb 22, 2016 · 25 comments
Open

setindex #75

michaellindon opened this issue Feb 22, 2016 · 25 comments

Comments

@michaellindon
Copy link

I have an application where I'm doing an enormous amount of computation on 3x3 and 4x4 matrices, multiplications, transposes, choleskys, determinants etc. My implementation in Julia is very vanilla and I guess that all of these operations are being sent off to BLAS or LAPACK which has quite a large overhead for such small matrices. I considered rewriting these computationally expensive functions in Eigen because this has a small vectorized fixed size type for matrices and vectors. I was linked to this julia package from my question on stackoverflow. It looks exactly what I'm looking for, except for cholesky or ldl factorization, but I can't edit the elements of the vector or matrix once it has been created (they are immutable?). I'm hoping to allocate memory for an array of fixed size matrices or vectors and then change them at each iteration. It looks like setindex is on the to do list. Is there any other way to change the elements without setindex? How hard would it be to implement, or are there any other packages which are suitable for my application. Thanks!

Context, I'm implementing a forward filtering backward sampling algorithm and my chain is very long, and this happens at every iteration of an mcmc algorithm. The dimensionality of my state space is either 3 or 4, meaning that for each time point in my chain I have to do the linear algebra to generate a 4 dimensional random vector from a conditional normal distribtion.

@SimonDanisch
Copy link
Owner

Hi, looks like this package could indeed help you! If they're in an array, it should be quite a bit easier to modify the values.
@c42f, what would be needed to adapt your slicing and DestructuredArray code to work for this?

@mschauer
Copy link
Collaborator

Ah, great! I plan to add the Cholesky factorization, I am also using FixedSizedArrays for filtering and MCMC in the widest sense. I could also add the randn()-functor from https://github.com/mschauer/Bridge.jl/blob/master/src/fsa.jl, I guess you have use for that too

@c42f
Copy link
Collaborator

c42f commented Feb 23, 2016

@michaellindon the destructure() function may already do what you want (apologies for the lack of documentation visible on the web. There is some docs available in the repl.). Demo:

julia> using FixedSizeArrays

help?> destructure
search: destructure

  Destructure the elements of an array A to create M=ndims(eltype(A))
  additional dimensions prepended to the dimensions of A.  The returned
  array is a view onto the original elements; additional dimensions occur
  first for consistency with the natural memory ordering.

  For example, AbstractArray{F<:FixedArray{T,M,SIZE},N} appears as an
  AbstractArray{T,M+N} after destructuring.

julia> v = [Vec(i,-i,1) for i=1:10]
10-element Array{FixedSizeArrays.Vec{3,Int64},1}:
 Vec(1,-1,1)
 Vec(2,-2,1)
 Vec(3,-3,1)
 Vec(4,-4,1)
 Vec(5,-5,1)
 Vec(6,-6,1)
 Vec(7,-7,1)
 Vec(8,-8,1)
 Vec(9,-9,1)
 Vec(10,-10,1)

julia> destructure(v)
3x10 Array{Int64,2}:
  1   2   3   4   5   6   7   8   9   10
 -1  -2  -3  -4  -5  -6  -7  -8  -9  -10
  1   1   1   1   1   1   1   1   1    1

julia> @fslice v[1:2,:]
2x10 Array{Int64,2}:
  1   2   3   4   5   6   7   8   9   10
 -1  -2  -3  -4  -5  -6  -7  -8  -9  -10

When writing the DestructuredArray view type for destructuring of non-dense arrays, I found that setindex() doesn't currently optimize very well, and it wasn't clear how it can be improved prior to JuliaLang/julia#11902 being worked out. (Having said that, it is implemented.). The destructure() and @fslice tools above currently use reinterpret() where possible to avoid this performance pitfall.

@michaellindon
Copy link
Author

I'm not sure its exactly what I am looking for. I have n 2 vectors and n 2x2 matrices. At the moment I am storing them as Arrays like:

m=Array{FixedSizeArrays.Vec{2,Float64},1}()
M=Array{FixedSizeArrays.Mat{2,2,Float64},1}()
x=Array{FixedSizeArrays.Vec{2,Float64},1}()

Note if I specify the size, then these guys get initialized and I cannot change their values. Nor can I create an empty matrix and use push!(x,Mat(...)) because the algorithm requires I start at the end of x and go backwards, x[n-1] is computed from x[n].

I did away with the Arrays and replaced them with Dicts, so I have x[n-1]=f(x[n]) for f some linear function. Under profiling my code is spending most of its time in constructor.jl, which is frustrating, because the only way I can change these elements is an assignment from some Vec((a,b)),Mat((a,b),(c,d)) type thing as opposed to setting the indices of x[n-1].

In addition, x,m,M will change at every iteration, profiling my code yields memory allocation of the order of gigabytes. Overall my code is taking 7x as long using FixedSizeArrays because I cannot preallocate stuff in advance.

@michaellindon
Copy link
Author

@c42f it seems I missed your point about fslice, but does it work for matrices?

v = [Mat(rand(2,2)) for i=1:10]
julia> @fslice v[1:2,1:2,:]
ERROR: MethodError: `size` has no method matching size(::Type{Any})
Closest candidates are:
  size(::Any, ::Integer, ::Integer, ::Integer...)
 in destructure at /home/grad/msl33/.julia/v0.4/FixedSizeArrays/src/destructure.jl:56

@c42f
Copy link
Collaborator

c42f commented Feb 25, 2016

Your problem is that the list comprehension has failed type inference, so v is an array of Any. Eventually I'd like to make destructure() into a more general tool (eventually probably in a separate package DestructuredArrays.jl) and make it work for arrays of all types, including Any, but I haven't had time to do this yet.

For now it's easy enough to work around:

v = Mat{2,2,Float64}[Mat(rand(2,2)) for i=1:10]
@fslice v[1:2,1:2,:]

@c42f
Copy link
Collaborator

c42f commented Feb 25, 2016

By the way, you can definitely change the values inside your m=Array{Vec{2,Float64},1}(), you just need to replace whole elements at once rather than individual components:

m = Array{Vec{2,Float64},1}()
for i=1:10
    push!(m, Vec(1.0,i))
end
for j=1:10
    m[j] = Vec(2.0,j)
end

@michaellindon
Copy link
Author

@c42f Thanks!! I've been able to implement it now as I had hoped, preallocating everything in advance, but sadly it still runs 1 second slower than using native linear algebra operations :/ Profile indicates the most burdensome lines are:

                AMAQ[i-1]=A[i-1]*M[i-1]*A[i-1]'+ρ²*Q                                                                                                                                                               
                m[i]=A[i-1]*m[i-1]+Vec(AMAQ[i-1][1:d,1])*(y[i]-μ-dot(Vec(A[i-1][1,1:d]),m[i-1]))/(σ²+AMAQ[i-1][1,1])                                                                                               
                M[i]=AMAQ[i-1]-(Mat(AMAQ[i-1][1:end,1])*Mat(AMAQ[i-1][1,1:end])')/(σ²+AMAQ[i-1][1,1])    

and

                Σ=M[i]-M[i]*A[i]'*inv(AMAQ[i])*A[i]*M[i]                                                                                                                                                           
                Σ=0.5*(Σ+Σ')                                                                                                                                                                                       
                x[i]=m[i]+M[i]*A[i]'*\(AMAQ[i],x[i+1]-A[i]*m[i])+chol(Σ+Mat(0.0000000001*eye(d)))'*Vec(rand(Normal(0,1),d))       

I feel like wrapping things in Vec and Mat, especially for the submatrix views, may be slow. Is there anything that jumps out at you for why this is slow? All my matrices are 2x2

@michaellindon
Copy link
Author

Native Julia:

function FFBS(y,x,t,m,M,A,AMAQ,μ,σ²,ł,ρ²)
    n=length(y)
    resize!(m,n)
    resize!(M,n)
    resize!(x,n)
    Cₛ=statcorr(ł)
    m[1]=reshape(ρ²*Cₛ[:,1]*(y[1]-μ)/(σ²+ρ²*Cₛ[1,1]),d,1)
    M[1]=ρ²*Cₛ-ρ²*Cₛ[:,1]*Cₛ[1,:]*ρ²/(σ²+ρ²*Cₛ[1,1])
    resize!(AMAQ,n-1)
    resize!(A,n-1)
    Q=Array{Float64}(d,d)
    for i=2:n
        Δ=t[i]-t[i-1]
        if(isdefined(A,i-1))
            transition!(A[i-1],Δ,ł)
        else
            A[i-1]=transition(Δ,ł)
        end
        innovation!(Q,Δ,ł)
        broadcast!(+,AMAQ[i-1],A[i-1]*(M[i-1]*A[i-1]'),ρ²*Q)
        BLAS.gemv!('N', 1.0, A[i-1], vec(m[i-1]), 0.0, vec(m[i]))
        BLAS.axpy!(length(m[i]),1.0,sub(AMAQ[i-1],:,1)*(y[i]-μ-sub(A[i-1],1,:)*m[i-1])/(σ²+AMAQ[i-1][1,1]),1,m[i],1);
        @inbounds M[i][:,:]=AMAQ[i-1]
        BLAS.ger!(-1.0/(σ²+AMAQ[i-1][1,1]), sub(AMAQ[i-1],:,1), sub(AMAQ[i-1],:,1), M[i]);
    end
    #Backward Sampling
    Σ=M[n]+eps()*eye(d)
    Σ=cholfact!(Σ,:L,Val{true})
    @inbounds x[n][:,:]=m[n]+Σ[:P]*Σ[:L]*rand(Normal(0,1),d)
    for i=(n-1):-1:1
        Σ=M[i]-M[i]*A[i]'*\(AMAQ[i],A[i]*M[i])+eps()*eye(d)
        Σ=cholfact!(Σ,:L,Val{true})
        @inbounds x[i][:,:]=m[i]+M[i]*(A[i]'*\(AMAQ[i],x[i+1]-A[i]*m[i]))+Σ[:P]*Σ[:L]*rand(Normal(0,1),d)
    end
end

Timing:

julia> @time FFBS(y,x,t,m,M,A,AMAQ,μ₀,σ²,10,ρ²₀);
  1.900887 seconds (10.33 M allocations: 519.606 MB, 9.82% gc time)

julia> @time FFBS(y,x,t,m,M,A,AMAQ,μ₀,σ²,10,ρ²₀);
  2.647288 seconds (10.33 M allocations: 519.606 MB, 8.16% gc time)

FixedSizeArrays

function FFBS(y,x,t,m,M,A,AMAQ,μ,σ²,ł,ρ²)
    n=length(y)
    resize!(m,n)
    resize!(M,n)
    resize!(x,n)
    resize!(AMAQ,n-1)
    resize!(A,n-1)
    Cₛ=statcorr(ł)
    m[1]=Vec(ρ²*Cₛ[:,1]*(y[1]-μ)/(σ²+ρ²*Cₛ[1,1]))
    M[1]=Mat(ρ²*Cₛ-ρ²*Cₛ[:,1]*Cₛ[1,:]*ρ²/(σ²+ρ²*Cₛ[1,1]))
    for i=2:n
        Δ=t[i]-t[i-1]
        @inbounds A[i-1]=transition(Δ,ł)
        Q=innovation(Δ,ł)
        @inbounds AMAQ[i-1]=A[i-1]*M[i-1]*A[i-1]'+ρ²*Q
        @inbounds m[i]=A[i-1]*m[i-1]+Vec(AMAQ[i-1][1:d,1])*(y[i]-μ-dot(Vec(A[i-1][1,1:d]),m[i-1]))/(σ²+AMAQ[i-1][1,1])
        @inbounds M[i]=AMAQ[i-1]-(Mat(AMAQ[i-1][1:end,1])*Mat(AMAQ[i-1][1,1:end])')/(σ²+AMAQ[i-1][1,1])
    end

    Σ=M[n]
    Σ=0.5*(Σ+Σ')
    x[n]=m[n]+chol(Σ)'*Vec(rand(Normal(0,1),d))
    for i=(n-1):-1:1
        Σ=M[i]-M[i]*A[i]'*inv(AMAQ[i])*A[i]*M[i]
        Σ=0.5*(Σ+Σ')
        @inbounds x[i]=m[i]+M[i]*A[i]'*\(AMAQ[i],x[i+1]-A[i]*m[i])+chol(Σ+Mat(0.0000000001*eye(d)))'*Vec(rand(Normal(0,1),d))
    end
end

Timing:

julia> @time FFBS(y,x,t,m,M,A,AMAQ,μ₀,σ²,10,ρ²₀);
  5.032666 seconds (24.97 M allocations: 786.711 MB, 12.78% gc time)

julia> @time FFBS(y,x,t,m,M,A,AMAQ,μ₀,σ²,10,ρ²₀);
  5.010241 seconds (24.97 M allocations: 786.711 MB, 14.79% gc time)

@c42f
Copy link
Collaborator

c42f commented Feb 25, 2016

That's rather a lot of symbols to digest, and it's a bit hard to diagnose without any type information for the inputs to FFBS. To focus our attention, could we reduce the test case to just the second loop, and get a fully runnable snippet (including allocation of representative inputs - could be randomly generated or whatever)?

To give you some pointers, consider the expression

x[i]=m[i]+M[i]*A[i]'*\(AMAQ[i],x[i+1]-A[i]*m[i])+chol+Mat(0.0000000001*eye(d)))'*Vec(rand(Normal(0,1),d))
  • What's d? It doesn't seem to be a function input parameter. eye(d) presumably creates a 2x2 dense matrix and then turns it into a fixed size array. I guess this might be replaced with eye(Mat{d,d,Float64}), you should make sure d is known at function compile time.
  • chol() is currently implemented in terms of Base.chol() for dimension greater than 2, I guess that's ok for you if d=2?
  • rand() probably allocates a Base.Vector which is then turned into a Vec{2} - you'll need to avoid this to make things go fast
  • I'm not sure what \ does for FSAs, would be worth checking that it's going through a fast path and not allocating

@mschauer
Copy link
Collaborator

I can answer some of this from memory:

  • just use (Σ+0.0000000001*I)
  • rand(Vec{d,Float64}) is native, see src/constructors.jl#L142
  • \ uses native inversion

@c42f
Copy link
Collaborator

c42f commented Feb 25, 2016

Ah, now how did I not know about I, that's very nice! However... UniformScaling doesn't seem to interoperate with Mat yet. Ah you added it just the other day... so @michaellindon will need to clone the latest - there's no tagged version with that yet. The same goes for \.

For rand, I believe @michaellindon is using Distributions, and FSAs currently have no way to deal with this. The following roughly works:

immutable RandFunctor2{T} ; dist::T ; end
@inline call{T}(rf::RandFunctor2{T}, i...) = rand(rf.dist)
Base.rand{F<:FixedArray}(d::Distribution, ::Type{F}) = map(RandFunctor2(d), F)

# After which you can call
rand(Normal(0,1), Vec{2,Float64})

I should do a PR with something like the above, but FSAs probably shouldn't depend on Distributions.jl, and the existing RandFunctor doesn't make sense for use with Distributions.Distribution at the moment.

@mschauer
Copy link
Collaborator

Ah, yes. For now I would rather add randn(Vec{d,Float64}) from https://github.com/mschauer/Bridge.jl/blob/master/src/fsa.jl

immutable RandnFunctor{T} <: FixedSizeArrays.Func{1} end
@inline call{T}(rf::Type{RandnFunctor{T}}, i...) = randn(T)
@inline randn{FSA <: FixedArray}(x::Type{FSA}) = map(RandnFunctor{eltype(FSA)}, FSA)

@mschauer
Copy link
Collaborator

@michaellindon You see, it is work in progress, but your "trying the ice to see if it holds" is very helpful for us.

@timholy
Copy link
Contributor

timholy commented Feb 26, 2016

@michaellindon,

Note if I specify the size, then these guys get initialized and I cannot change their values.

Not true:

julia> using FixedSizeArrays

julia> a = Vec{2,Float64}[rand(2), rand(2)]
2-element Array{FixedSizeArrays.Vec{2,Float64},1}:
 Vec(0.4211134132103669,0.015044855516675781)
 Vec(0.6822470581398004,0.6019663948045095)  

julia> a[1] = Vec(rand(2))
FixedSizeArrays.Vec{2,Float64}((0.2785146610286493,0.9090981367087467))

julia> a
2-element Array{FixedSizeArrays.Vec{2,Float64},1}:
 Vec(0.2785146610286493,0.9090981367087467)
 Vec(0.6822470581398004,0.6019663948045095)

For the bigger question, I fear your code will be pretty hard for anyone but you to debug: it's complicated, it's not complete, and we don't know the types of the inputs you're using. But generally speaking, @profile and @code_warntype are your friends here.

@michaellindon
Copy link
Author

Hi Everyone, thanks for your comments. I profiled my code and it is spending the majority of time in constructors.jl:
screenshot from 2016-02-26 12 55 50

The other blocks on that line are ctranspose, line 100 of ops.jl. This block occurs 4 times. The others are convert line 9 from indexing.jl

In short the most expensive operations are:
_fill_tuples
ctranspose
convert

As for the time spent constructing, at every iteration transition(Δ,ł) and innovation(Δ,ł) returns a 3x3 Mat, for example:

                return Mat(((q*(3+em2Δλ*(-3-2*Δ*λ*(3+Δ*λ*(3+Δ*λ*(2+Δ*λ))))))/(16*λ^5),1/8*em2Δλ*q*Δ^4,-((em2Δλ*q*(-1+e2Δλ+2*Δ*λ*(-1+Δ*λ*(-1+Δ*λ*(-2+Δ*λ)))))/(16*λ^3))),                                           
                (1/8*em2Δλ*q*Δ^4,(q*(1+em2Δλ*(-1-2*Δ*λ*(1+Δ*λ*(-1+Δ*λ)^2))))/(16*λ^3),1/8*em2Δλ*q*Δ^2*(-2+Δ*λ)^2),                                                                                                 
                (-((em2Δλ*q*(-1+e2Δλ+2*Δ*λ*(-1+Δ*λ*(-1+Δ*λ*(-2+Δ*λ)))))/(16*λ^3)),1/8*em2Δλ*q*Δ^2*(-2+Δ*λ)^2,(q*(3+em2Δλ*(-3-2*Δ*λ*(-5+Δ*λ*(11+Δ*λ*(-6+Δ*λ))))))/(16*λ))) 

ctranspose I cannot explain. I would have hoped that A_B' is as fast as A_B like in BLAS

convert
I'm guessing this comes from wrapping things in Vec() and Mat(), but this seems unavoidable when looking at submatrix views like (Mat(AMAQ[i-1][1:end,1]).

@mschauer Can you push your randn(Vec{d,Float64}) function to the repo?

@michaellindon
Copy link
Author

@timholy I just noticed something from your post, you specify the return type. I just tried the following:


julia> @time for i=1:1000000 A=Mat(rand(2,2)) end
 11.588897 seconds (56.00 M allocations: 1.714 GB, 6.98% gc time)

julia> @time for i=1:1000000 A=Mat{2,2,Float64}(rand(2,2)) end
  0.166086 seconds (3.00 M allocations: 167.847 MB, 35.28% gc time)

Never realized that this could have such a huge impact

@SimonDanisch
Copy link
Owner

constructors line 52 seems to be the one for Vec(::Vector)...Ideally, you wouldn't construct Vec's from arrays at all!
Also, there is a rand(Mat{2,2,Float64}) which should be way faster...

@SimonDanisch
Copy link
Owner

And the conversion code doesn't seem to be optimal as well ;) I could try to make it faster

@SimonDanisch
Copy link
Owner

supplying the type of the Mat helps because the dimensionality can then be fixed at compile time. Otherwise, it would depend on the dimension of the array.

@SimonDanisch
Copy link
Owner

Okay nevermind, the typed code path is fairly optimal :D
since @inbounds doesn't seem to change much, we could leave away the size check to gain a 1.2 performance increase...

@timholy
Copy link
Contributor

timholy commented Feb 26, 2016

@SimonDanisch, if you got rid of the size check then you could construct a Mat{M,N} from any matrix with size bigger than (M,N). I'm not sure that's a good thing (though it is consistent with OpenGL's conversions of, e.g., vec4->vec3).

@michaellindon, with that much red at the top of your profile, you have big bad type-stability problems. Specifying the sizes of your Mat is surely a step in the right direction, but you should look again and see if there are any others. Try @code_warntype and look for variables that are marked in red.

@mschauer
Copy link
Collaborator

I make a PR this weekend if c42f does not beat me to it.

@SimonDanisch
Copy link
Owner

@timholy I'm not sure either... It's super convenient in OpenGL ;) The best reason for it would be, that it is a valid use case and it will be hard to offer this use case in a concise way otherwise.
Obviously, it's bad for bug hunting ;)

@mschauer mschauer mentioned this issue Feb 29, 2016
@c42f
Copy link
Collaborator

c42f commented Mar 10, 2016

@mschauer thanks for making that PR with randn, I've been away on holiday for the last couple of weeks or so.

@SimonDanisch I agree with Tim regarding size checks. IMO GLSL conversion semantics only make sense in the context of the specific things you're likely to do in a shading language (and closely related graphics tasks on the CPU side). For generality I think FixedSizeArrays should be aiming for API compatibility with other dense arrays as much as possible which would suggest explicit slicing for changing shape. (But until julia has some combination of constant propagation and inlining mixed into type inference that's not going to work well - ref #17.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants