Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding repeated allocations in mul! for CompositeMap #90

Closed
JeffFessler opened this issue Jun 16, 2020 · 9 comments · Fixed by #92
Closed

Avoiding repeated allocations in mul! for CompositeMap #90

JeffFessler opened this issue Jun 16, 2020 · 9 comments · Fixed by #92

Comments

@JeffFessler
Copy link
Member

Replacing y=A*x with mul!(y, A, x) gives the (perhaps naive) user (e.g., me) the sense that no memory allocations will need to be made, and often that is the case.
But if A is a CompositeMap, e.g., A = B * C, then internally A*x = B*(C*x) and the current implementation will allocate a temporary variable for the product C*x every time we do A*x or the mul! version. This overhead adds up when iterating.

We know the size of C at the time we make the CompositeMap, so in principle the LM package could allocate a work space for it and store it as part of the struct, so that mul! operations would need no new allocations. (We'd also need a work space for B' for the adjoint.)
Perhaps this could be a user-selectable option when creating a LinearMap.

Thoughts on this feature suggestion? I might try it myself unless anyone sees flaws in it...

@dkarrasch
Copy link
Member

Very much yes! Reducing temporary allocations was something I started doing in #74, but got stuck. We have something in that direction for LinearCombinations of LinearMaps that don't have an allocation-free 5-arg mul!. The issue is that we will need to leave the ground of overloading Base or LinearAlgebra functions. But I guess that in a specific enough context, there is no generic function anyway, so using a specific LinearMaps.jl function is okay.

At first, I was skeptical about storing the temporary vectors in the struct, because of the resize! operation in case the CompositeMap has three or more factors. But the following works:

using LinearMaps

struct MyMap{T,S<:LinearMap{T},V<:AbstractVector} <: LinearMap{T}
    map::S
    temp::V
end

A = MyMap(LinearMap(rand(3,3)), Vector{Float64}(undef, 3))
resize!(A.temp, 5)

I wonder though if people would like to use that temp memory also for other purposes. Storing them in the struct would kind of hide it from everything else.

Another option is to let multiplication happen in internal helper functions, which are provided with memory by some external callee. In case you just call mul!, there'd be an outer helper function that creates that intermediate storage. But in case you know exactly what you're doing, you could hook into the inner most helper function, providing vectors of the appropriate size. Again, roughly something like that is currently done in the LinearCombination code...

@JeffFessler
Copy link
Member Author

Thanks for the encouragement. Is resize! essential?
For A = B*C I was thinking of allocating a single temp = zeros(maximum(size(C)))
and then using @view to get an appropriately size working vector.

For A = B*C*D*... I think I need two such vectors to ping-pong between, again using @view.

I agree that in the long run it would be nice to be able to recycle such memory. One option is to make the temp variable(s) accessible as A.temp1 via getproperty. Another way would be to let the user provide temp1 (and temp2 if needed) as an optional kwarg when they create any of B, C, D, etc. and then internally use those if provided, otherwise default to allocating one internally to store in the struct. I'd lean towards this latter option because it avoids needing to expose any internals (other than at the top level with the kwarg).

@Jutho
Copy link
Collaborator

Jutho commented Jun 16, 2020

I think this is a good idea in theory. In practice, there could be some issues. The first I see is regarding element type. If say C itself is Float64, but then you multiply it with a complex vector, and the next time with a BigFloat vector. What does the temporary variable need to be?

As you try to push this further and optimize this more (like how could you re-use that temporary variable at other places, what are the minimal number of temporary variables I need, how large should they be) you are essentially implementing your own memory handler. It does seem like this something that should be provided for is by the language. And it is, it's the garbage collector. Is there clear evidence that the extra allocations have a major impact in your use case @JeffFessler ? I will be the first to admit that Julia's GC has some problems with large temporary variables, which is why I also have some cache for temporaries in TensorOperations.jl. But there too it feels like reinventing the wheel (and probably in a bad way).

@JeffFessler
Copy link
Member Author

@Jutho thanks for the comments and questions. I can do some more careful timing tests for my case and see what impact it might have.

Most of your concerns also apply to basic mul! though, right? The basic user types y = A * x and does not worry about the element types.
The advanced user invokes the scarier looking mul!(y, A, x) because they want the (memory) performance benefit, and then the onus is on that advanced user to provide an appropriately typed and sized y for their situation (otherwise they get an error message).
As long as mul! has a benefit to Julia users over * then doesn't it seem worth extending that benefit to all of LinearMaps types?

@Jutho
Copy link
Collaborator

Jutho commented Jun 16, 2020

Yes, mul! certainly has an advantage, but as a user you know the eltype of x and thus should take care of making sure y has a compatible eltype to store the result of the computation. If you allocate a temporary, then, if I understood correctly, you do this when creating the matrix / linear map, before x is known, and so you cannot know the correct eltype.

@JeffFessler
Copy link
Member Author

Ah, good point. It would put even more onus on the advanced user to anticipate the appropriate type they want their CompositeMap to support. I think the user who invokes this proposed feature will know if they are dealing with real or complex data. Instead of thinking about "correct" type, I think of it as the "desired" precision for the application at hand. I had to check to see if this would work and it does:

x = Complex{BigFloat}.(randn(ComplexF32, 9))
A = rand(1:5, 9, 9) # Int array
y = randn(ComplexF16, 9) # want low precision result
mul!(y, A, x)

@JeffFessler
Copy link
Member Author

Here's a MWE that illustrates the benefits of in-place mul! for CompositeMaps.
It was about a 25% difference in time, which is not huge, but something.
I chose this because one of my real use cases is a product of 3 LinearMaps, 2 of which are essentially diagonal.

using LinearAlgebra: mul!, Diagonal
using LinearMaps
using BenchmarkTools

N = 2^14
T = Float32
X = Diagonal(T.(1:N))
Y = copy(X)
Z = copy(X)
x = randn(T, N)
Xm = LinearMap(X)
Ym = LinearMap(Y)
Zm = LinearMap(Z)
Lm = Xm * Ym * Zm
@assert Lm * x == Xm * (Ym * (Zm * x))
y1 = zeros(T,N) # temp space
y2 = zeros(T,N)
y3 = zeros(T,N)
ym = zeros(T,N)
mul!(ym, Lm, x)
mul!(y3, Xm, mul!(y2, Ym, mul!(y1, Zm, x)))
@assert ym == y3
@btime mul!(y3, Xm, mul!(y2, Ym, mul!(y1, Zm, x)))
@btime mul!(ym, Lm, x) # requires allocations :(

@dkarrasch
Copy link
Member

Hm, I hadn't taken into account the eltype issue. So, #92 introduces a helper function that would allow to pass temporary memory to the computation, albeit not at the generic level. Since this package here is rather generic, I am concerned that there are many issues to be handled. For example, all * calls runs through mul!, so we can't really distinguish the "naive" vs. the "expert" uses.

I do think, however, that in a specific library like @JeffFessler may have in mind, these generic issues may not come up and you have control about output types, number of maps in the composition etc., maybe because these issues are not even floating on the API surface anyway. So, one has two options: (i) call LinearMaps.compositemul! directly if you know that at that point you will only ever handle composite maps at that point in your code, or (ii) introduce a LinearMap subtype, which has temporary memory fields, and passes them to LinearMaps.compositemul! in its own mul!. The latter preserves genericitiy(?) in your code. I could help with the latter.

@JeffFessler
Copy link
Member Author

Thanks for all the comments. I'll try to just make one last note about @Jutho 's comment that the eltype of x is unknown when we create the LinearMap, and then I'll move on, hopefully :)
In functional analysis, a linear operator is a mapping between two function spaces and each of those function spaces are vector spaces that should be defined on some field (e.g., complex numbers) and so arguably we should know the fields at the time we create the linear operator.
Of course computing has different precisions (Float16, BigFloat) to represent numbers from the same field so the analogy between math and computing is imperfect, but I believe that the creator of a LinearMap knows both the field and the precision of interest.
A small perk of having this built into mul! is the ability to use InPlaceOps.jl to write @! y = A * x that looks more like the math than LinearMaps.compositemul!(y, A, x) but that's minor.
I might revisit this with LinearMapsAA.jl after #74 is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants