Reducing memory allocation #16

Jutho · 2013-12-17T00:01:24Z

I completed a first browsing through the code of this package. I would like to comment that, if speed/efficiency is going to be high on the focus list, excessive memory allocation should be avoided. One example is the orthogonalize routine, which will be the backbone of many solvers and will be called many times:

    elseif method == :ModifiedGramSchmidt || method== :Householder
        #The numerical equivalence of ModifiedGramSchmidt and Householder was
        #established in doi:10.1137/0613015
        cs = zeros(T, p)
        for (i, e) in enumerate(Kk)
            cs[i] = dot(v, e)
            v-= cs[i] * e
        end
    else

I strongly believe the v=-cs[i]*e line should call Base.LinAlg.BLAS.axpy!
This will of course modify v in place, such that orthogonalize should become orthogonalize!, which I think is fine. I would be happy to change this if everybody agrees.

On a side note, even though ModifiedGramSchmidt is numerically equivalent to some form of HouseHolder orthogonalization on matrix stacked with zeros on top of it (which is useful to get good error bounds and get better insights in how to modify existing algorithms), this doesn't imply that it is equivalent to HouseHolder applied to the original matrix, nor that things like a second reorthogonalization round in MGS is useless. (Although I certainly agree that MGS is sufficient for many algorithms which only use small Krylov subspaces).

The text was updated successfully, but these errors were encountered:

jiahao · 2013-12-17T02:31:45Z

Yes, orthogonalize should eventually become mutating to save memory. And also yes, we should have a proper Householder orthogonalization.

jiahao · 2013-12-17T02:46:14Z

v=-cs[i]*e line should call Base.LinAlg.BLAS.axpy!

This should certainly be the case for BlasFloat eltypes. But we should write a specialized method for BlasFloats.

stevengj · 2013-12-18T18:25:22Z

You could certainly write a fallback axpy! that works in-place on arbitrary abstract arrays. In fact, that fallback should probably already be in Base.

timholy · 2013-12-21T12:16:41Z

My recent refactoring of the API to perform mutating updates on an initial guess was motivated substantially by this concern. That patch, while large, didn't begin to touch the internal temporaries. But at least the way is clearer now.

I suspect we're not far from the day where b could be a SharedArray and A a SparseMatrixCSC using SharedArray storage (SparseMatrixCSC will need some generalization to support this). Then we could parallelize our matrix multiplies, if both inputs and the output are SharedArrays. Since I expect that allocating a SharedArray will be more expensive than allocating a regular Array, it will become even more important to reduce re-allocation of temporaries inside the algorithms. That means more usage of mutating operations.

haampie · 2017-12-06T12:56:25Z

Orthogonalization doesn't copy anymore, so we can close this issue. See src/orthogonalize.jl

stevengj mentioned this issue Dec 18, 2013

add generic fallback for Blas.LinAlg.axpy! JuliaLang/julia#5189

Merged

lopezm94 mentioned this issue Jul 28, 2016

Add blas sugar and blas optimizations to methods #79

Merged

haampie closed this as completed Dec 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing memory allocation #16

Reducing memory allocation #16

Jutho commented Dec 17, 2013

jiahao commented Dec 17, 2013

jiahao commented Dec 17, 2013

stevengj commented Dec 18, 2013

timholy commented Dec 21, 2013

haampie commented Dec 6, 2017

Reducing memory allocation #16

Reducing memory allocation #16

Comments

Jutho commented Dec 17, 2013

jiahao commented Dec 17, 2013

jiahao commented Dec 17, 2013

stevengj commented Dec 18, 2013

timholy commented Dec 21, 2013

haampie commented Dec 6, 2017