-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: make SparseMatrixCSC immutable #16371
Conversation
Tagging some people that might want to look at this @tkelman @Sacha0 @martinholters |
@nanosoldier |
@inbounds for col in 1:n | ||
rrange = colptr[col]:(colptr[col+1]-1) | ||
(nadd > 0) && (colptr[col] = colptr[col] + nadd) | ||
for col in 1:n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@inbounds
decoration drop: Accidental or intentional? If the latter, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental, thanks for spotting. Removed when testing some things and forgot to add back. @jrevels, What happens if I force push to this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean w.r.t. @nanosoldier? It pulls down the merge commit of the PR as soon as the job runs, so force pushing after it's started won't have any effect on the current job. The easiest thing for us to do here is just submit the new job whenever the PR is ready (currently, there's not a good way to cancel jobs without manually hacking around the submission system).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the information. I updated the PR with the forgotten @inbounds
but we can wait for CI to be green to restart the benchmark to make sure I am not wasting your and @nanosoldier's time
@@ -2458,11 +2461,14 @@ function setindex!{Tv,Ti,T<:Integer}(A::SparseMatrixCSC{Tv,Ti}, B::SparseMatrixC | |||
colptrB = B.colptr; rowvalB = B.rowval; nzvalB = B.nzval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you've removed all references to colptrB
, rowvalB
, and nzvalB
in this method. Perhaps nix this line? Or did I miss remaining references?
Edit: The same seems to hold for colptrA
, rowvalA
, and nzvalA
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mixed up two commits a bit. That was supposed to go in the (potential) later PR that removes manual hoistings. I will change back to using the colptrB
variables.
84e4563
to
8aa787e
Compare
rowvalA = copy(rowvalB) | ||
nzvalA = copy(nzvalB) | ||
resize!(rowvalB, length(rowvalA)+memreq) | ||
resize!(nzvalB, length(rowvalA)+memreq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand correctly that in this method colptrB
, rowvalB
, and nzvalB
are now persistent references to A.colptr
, A.rowval
, and A.nzval
, while colptrA
, rowvalA
, and nzvalA
can become disjoint storage for the original contents of A.colptr
, A.rowval
, and A.nzval
? If so this seems a bit labyrinthine, and I might advocate for renaming things commensurate with their use (as in the preceding methods).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You basically want to swap xxxA
and xxxB
?
I agree, these functions are indeed difficult to reason about. They try to reuse the storage in the input matrix but if they find out that they need temporary data they allocate copies of the matrix fields and rebind the variables. It makes it difficult to keep track on what data each variable points to. I didn't want to dig down too much into these functions so I mostly mechanically made them set the new values inplace the matrix fields instead of the newly allocated vectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You basically want to swap xxxA and xxxB?
Eventually something along those lines might be nice, yes. But for now please ignore this comment (see below.)
I agree, these functions are indeed difficult to reason about. They try to reuse the storage in the input matrix but if they find out that they need temporary data they allocate copies of the matrix fields and rebind the variables. It makes it difficult to keep track on what data each variable points to. I didn't want to dig down too much into these functions so I mostly mechanically made them set the new values inplace the matrix fields instead of the newly allocated vectors.
Cheers, I would have done the same and for the same reasons. I like how you've minimized modifications in this PR. Better to defer my suggestion above to a separate PR to retain that approach and see this merged efficiently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's open issue about it later so we don't forget it. Hopefully making SparseMatrixCSC an immutable opens up for some clean up possibilities when dereferencing fields can be hoisted out of, for example, utility functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! :)
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels |
I'm a little surprised this was all that needed to change. I have code that has needed to modify |
The previous versions also copied? rowvalA = Array(Ti, nnzA); copy!(rowvalA, 1, rowval, 1, length(rowval)) But yes, some functions didn't have to copy the whole thing though, only up to a certain point. When immutables referencing heap allocated objects can be stack allocated it would likely be as fast as modifying the field, but until then you will hit an extra allocation from allocating the new type. How much work are you doing per matrix resize? |
Well, depends whether the newly added rows/columns are going to have anything in them. Adding many empty columns one by one in a loop happens occasionally, depends on the structure of the problem. |
Some sort of benchmark... function change_size{Tv, Ti}(A::SparseMatrixCSC{Tv, Ti}, m, n)
return SparseMatrixCSC{Tv, Ti}(m, n, A.colptr, A.rowval, A.nzval)
end
A = sprand(10^3, 10^3, 0.01)
t1 = @elapsed for i in 1:10^5
a = abs(rand(Int))
b = abs(rand(Int))
change_size(A, a, b)
end
t2 = @elapsed for i in 1:10^5
A.m = abs(rand(Int))
A.n = abs(rand(Int))
end
t3 = @elapsed for i in 1:10^5
a = abs(rand(Int))
b = abs(rand(Int))
end
println((t1 - t3) / (t2 - t3))
end
julia> test()
10.512425670729362 So yes, a lot slower. However, can that really be a bottleneck in your code? |
@tkelman, could your case be handled by just assembling Another option is to have "assembly" types specifically designed for creation. I've wondered about this in the context of implementing generic algorithms that create new matrices, and noting that not all sparse-like types seem well suited for this. |
I'm getting a bit off topic here but another thing that I thought could be interesting is to refactor the |
I don't have easily benchmarkable examples handy right now, but a lot of what this code was doing was modifying and assembling sparse matrices and the latency of that can be an issue. It ends up building a sparse matrix representation of an expression operation by operation.
No, I very rarely call
I'm skeptical on this, additional levels of nesting don't seem worth it. It may not entirely work right now but it might be possible to make SparseMatrixCSC work with things like |
OK, but in any case you don't need to create the |
I need it to be usable as a matrix midway through the process too, so not really. |
If you are using it as a matrix then the cost of recreating a new matrix with the correct dimension should be insignificant? |
You showed above, it's 10x slower to create a new matrix every time you need to change dimensions. It isn't being used as a matrix every time the dimensions change, but it needs to always be possible, preferably without the slowdown of creating a new matrix every time I need to use it. We could make the dimensions a Ref, or store them in a separate array, or make one of them implicit based on colptr length, but all of those would be noticeable data structure changes to the internal fields. Are any of the performance issues being immutable fixes not possible to work around by manual hoisting? Leaving out the hoisting isn't usually making a factor of 10 difference, is it? |
Of course the relative slowdown here is large because you are comparing creating a new type (that has an error checking inner constructor) with something that takes almost no time to do. However, comparing the relative slowdown to a function that does actual work is ridiculous. You can create 100 million new sparse matrices in less than a second. Just run your code and post some real numbers... If the matrix changes size it is no longer the same matrix and should not be represented by the same object. This discussion has already been had many times for other types the conclusion has always been that we shouldn't bend our API to something that is purely a compiler optimization, in this case stack allocating immutables with refs to heap allocated objects. If you have a personal exceptional use case then it is in my view better to keep a local patch that changes |
You call The port of this particular code to Julia was not finished, I stopped writing research code a while ago. If it were complete, a representative use would be changing dimensions tens or hundreds of thousands of times, calling functions that do useful work several hundred or thousands of times, then repeating the process again with more in place modifications to the existing data. My successors may need to revisit the approach if JuMP's startup time doesn't improve. I don't think modifying sparse matrices in place is a niche application that should require a fork of Julia to be possible to do efficiently. Any optimization algorithm that supports problem modification would need to do something similar. Most solvers that have to call C aren't doing this directly with SparseMatrixCSC, but a pure Julia solver would take full advantage. |
Manual hoisting implies writing code in Matlab style, aka big monolithic functions that might as well be a |
It's the repeated wrapper allocation I'd be worried about, if I do no work for a few thousand modifications. Though I might be able to cheat and use |
Joining this party a bit late, I just took a brief glance at the changes, and my main take-away is ... yes, some clean-up (in a seperate PR) sounds like a really good idea. Being too lazy to figure things out in detail, I can just conclude the changes don't look likely to introduce any really subtle bugs, so if the tests pass, things are likely to be OK. More generally, I like the idea of |
Still a bit synthetic, but doing at least some work on the array after modifying it a bunch of times: https://gist.github.com/6de7e873bc4067dd145d1ff0aa39e6f1
|
Everything there can be done with just the three vectors. The whole point was that you needed the matrix so you could do matrix operations? You only need to convert to a matrix when you actually have an operation to do. |
Which could be at any time. To do what you suggest (without resorting to |
You need it to be in matrix form to know if you want to do work? What is the check to know if work needs to be done? I just want to see some real code where recreating 100 million sparse matrices per second is a bottle neck... |
On that benchmark, I guess one could just write out |
The application is problem assembly, sort of like an eager version of JuMP. Any time you add a constraint it's a new row, any time you add a variable it's a new column (or the reverse, if you work with the transpose / CSR). Convex.jl is actually pretty slow at model formulation because it creates a lot of unnecessary sparse matrices that it immediately throws away - it's not doing exactly my trick of adding empty rows or columns in-place, but some of the behavior is similar in spirit. When you are warm starting the solver from a good initial guess that only takes a handful of iterations, the latency of problem formulation and modification before re-solve can be noticeable.
There is no such check. The problem remains valid after any one of these modifications and may need to be solved immediately at any point. Your suggested approach of "only create a SparseMatrixCSC when you need to call one of its methods" would probably be okay performance-wise since the wrapping penalty of creating a new immutable for every operation will only be noticeable relative to the delegated operation for small sizes. But it's going to be inelegant to actually write if the number of different methods you need is large. Manually hoisting and inlining is inelegant too, and I would like to see the iteration problem addressed. Sparse vs dense iteration isn't all that useful of a comparison, but immutable-sparse vs mutable-sparse would be. This has benefits, but there are use cases that it breaks. |
Right. Having spent too much time today getting that benchmark working in But here's a simple, more artificial, demo: @noinline fetchval(A::SparseMatrixCSC, i) = A.nzval[i]
@noinline fetchval(nz, i) = nz[i]
function sumall1(A, n)
s = 0.0
for k = 1:n
for i = 1:length(A.nzval)
s += fetchval(A, i)
end
end
s
end
function sumall2(A, n)
s = 0.0
nz = A.nzval
for k = 1:n
for i = 1:length(nz)
s += fetchval(nz, i)
end
end
s
end
julia> include("/tmp/sumall.jl")
sumall2 (generic function with 1 method)
julia> A = sprand(10^4, 10^4, 1e-3);
julia> sumall1(A, 1)
49987.660418004125
julia> sumall2(A, 1)
49987.660418004125
julia> @time 1
0.000013 seconds (150 allocations: 8.949 KB)
1
julia> @time sumall1(A, 1000)
0.704123 seconds (5 allocations: 176 bytes)
4.998766041778719e7
julia> @time sumall2(A, 1000)
0.535493 seconds (5 allocations: 176 bytes)
4.998766041778719e7 |
I've just repeated @timholy's benchmark with master, once with immutable ImmutableSparseMatrixCSC{Tv,Ti<:Integer} <: AbstractSparseMatrix{Tv,Ti}
m::Int # Number of rows
n::Int # Number of columns
colptr::Vector{Ti} # Column i is in colptr[i]:(colptr[i+1]-1)
rowval::Vector{Ti} # Row values of nonzeros
nzval::Vector{Tv} # Nonzero values
function ImmutableSparseMatrixCSC(m::Integer, n::Integer, colptr::Vector{Ti}, rowval::Vector{Ti}, nzval::Vector{Tv})
m < 0 && throw(ArgumentError("number of rows (m) must be ≥ 0, got $m"))
n < 0 && throw(ArgumentError("number of columns (n) must be ≥ 0, got $n"))
new(Int(m), Int(n), colptr, rowval, nzval)
end
end
@noinline fetchval(A::ImmutableSparseMatrixCSC, i) = A.nzval[i]
# ...
B = ImmutableSparseMatrixCSC{Float64,Int}(A.m, A.n, A.colptr, A.rowval, A.nzval)
# ...
@time sumall1(B, 1000)
@time sumall2(B, 1000) Result:
The difference between the |
Isn't the whole point that the function should get inlined and the compiler can see the load in the loop and then hoist it? By hiding the load behind a noinline this will not happen and the same code is probably generated. What happens if you inline it instead? |
Replacing |
Since there are no stores in the function the load can be manually hoisted no matter what aliasing information the compiler has. This is in my view a better example: @inline fetchval(A::SparseMatrixCSC, i) = A.nzval[i]
@inline fetchval(A::ImmutableSparseMatrixCSC, i) = A.nzval[i]
@inline setval(A::SparseMatrixCSC, i) = A.rowval[i] = i
@inline setval(A::ImmutableSparseMatrixCSC, i) = A.rowval[i] = i
function sumall1(A, n)
s = 0.0
for k = 1:n
@inbounds for i = 1:length(A.nzval)
s += fetchval(A, i)
setval(A, i)
end
end
s
end A = sprand(10^4, 10^4, 1e-3);
B = ImmutableSparseMatrixCSC{Float64,Int}(A.m, A.n, A.colptr, A.rowval, A.nzval)
julia> @time sumall1(A, 1000)
0.110465 seconds (5 allocations: 176 bytes)
5.005743903297553e7
julia> @time sumall1(B, 1000)
0.071842 seconds (5 allocations: 176 bytes)
5.005743903297553e7 Looking at if7: ; preds = %if7.preheader, %if7
%s.110 = phi double [ %28, %if7 ], [ %s.012, %if7.preheader ]
%"#temp#1.09" = phi i64 [ %21, %if7 ], [ 1, %if7.preheader ]
%21 = add i64 %"#temp#1.09", 1
%22 = load %jl_value_t*, %jl_value_t** %11, align 8
store %jl_value_t* %22, %jl_value_t** %4, align 8
%23 = add i64 %"#temp#1.09", -1
%24 = bitcast %jl_value_t* %22 to double**
%25 = load double*, double** %24, align 8
%26 = getelementptr double, double* %25, i64 %23
%27 = load double, double* %26, align 8
%28 = fadd double %s.110, %27
%29 = load %jl_value_t*, %jl_value_t** %12, align 8
store %jl_value_t* %29, %jl_value_t** %5, align 8
%30 = bitcast %jl_value_t* %29 to i64**
%31 = load i64*, i64** %30, align 8
%32 = getelementptr i64, i64* %31, i64 %23
store i64 %"#temp#1.09", i64* %32, align 8
%33 = icmp eq i64 %"#temp#1.09", %19
br i1 %33, label %L.loopexit.loopexit, label %if7
} while for the immutable: if7: ; preds = %if7.lr.ph, %if7
%s.110 = phi double [ %s.012, %if7.lr.ph ], [ %34, %if7 ]
%"#temp#1.09" = phi i64 [ 1, %if7.lr.ph ], [ %30, %if7 ]
%30 = add i64 %"#temp#1.09", 1
store %jl_value_t* %24, %jl_value_t** %4, align 8
%31 = add i64 %"#temp#1.09", -1
%32 = getelementptr double, double* %26, i64 %31
%33 = load double, double* %32, align 8
%34 = fadd double %s.110, %33
store %jl_value_t* %27, %jl_value_t** %5, align 8
%35 = getelementptr i64, i64* %29, i64 %31
store i64 %"#temp#1.09", i64* %35, align 8
%36 = icmp eq i64 %"#temp#1.09", %16
br i1 %36, label %L.loopexit, label %if7
} For the mutable type we are doing extra loads (two of them, one for each field). |
It gets even worse when we have two matrices at the same time in the loop. function sumall1(A, B, n)
s = 0.0
for k = 1:n
@inbounds for i = 1:length(A.nzval)
s += fetchval(A, i)
setval(B, i)
end
end
s
end
A = sprand(10^4, 10^4, 1e-3);
A2 = sprand(10^4, 10^4, 1e-3);
B = ImmutableSparseMatrixCSC{Float64,Int}(A.m, A.n, A.colptr, A.rowval, A.nzval)
B2 = ImmutableSparseMatrixCSC{Float64,Int}(A2.m, A2.n, A2.colptr, A2.rowval, A2.nzval)
sumall1(A, A2, 1000)
sumall1(B, B2, 1000) julia> @time sumall1(A, A2, 1000)
0.197629 seconds (5 allocations: 176 bytes)
5.020867352413504e7
julia> @time sumall1(B, B2, 1000)
0.071756 seconds (5 allocations: 176 bytes)
5.020867352413504e7 Here the mutable version has 5 (1 + 2 matrices * 2 fields, I guess) loads compared to 1 for the immutable one. |
Yes, there is clear benefit in having it
A drastic measure, but it had crossed my mind, too. However, all the hopefully upcoming code clean-up would then probably make operations on the hypothetical |
We have two compiler inefficiencies here pulling in different directions. One is that the immutable Assuming both of these are fixed we probably still want it to be an |
Same opinion, same bias here. @timholy seems to be similar-minded, @tkelman is biased in the opposite direction. Anyone else care to weigh in? |
We do allow vectors to change size, and in-place appending of new columns to a dense matrix or analogous generalizations to higher dimensions should be possible, it just hasn't been implemented in Other ways of obtaining the same aliasing compiler optimization without sacrificing the mutability of the scalar fields would be #9448 / #11430, but the discussion there recommends using a Without |
My memories of old mailing-list posts suggest that the plan was to never implement it, on the grounds that it makes automatic removal of bounds checks much harder. On julia-0.5 you should be able to grow a matrix by columns (haven't yet tested) by maintaining the data in a I have to say, I really don't understand why @tkelman can't just split out the current mutable implementation into a package. Heck, I suspect that all the sparse matrix implementations will be in packages soon, so this does not make anything a second-class citizen. |
Yes, copy-pasting all the existing code is also an option, though even more labor intensive and undesirable from a maintainability perspective than wrapping and delegation. The fact that copy-pasting that much code is even being suggested means the functionality of being able to modify the scalar fields is valuable enough to be worth keeping (doesn't have to be in base though, sure). Abandoning all code reuse to achieve it seems drastic. |
In support if that, I've found https://groups.google.com/forum/#!msg/julia-users/B4OUYPFM5L8/VKV_LyQJWT8J. However, if the plan has changed and we will see resizable arrays anytime soon, having sparse matrices not resizable would also be awkward.
I really wouldn't like to see two competing sparse matrix packages. Once forked, it is likely to see features being implemented in only one of them. Having users choose a sparse matrix package depending on their specific needs - or worse, having them convert between different representations during computations to be able to call into the right package - dose not sound exactly attractive. I have no experience with |
It is only being suggested because you want to keep it though... You can't make an argument for something and when someone suggests a solution you go "See! People do want that functionality, we need it in Base!". Regarding If we ever decide to support changing dimensions of normal matrices there is no problem adding |
Jebus, what's the point in passive aggressively adding a thumb down emote? Write an answer or don't, but don't go around thumbing down discussion posts that directly involve you when they are not of the survey type. |
Community standards please. The thumbs down is because you're misrepresenting what I'm saying, and I disagree with "The fact that resizing was possible can basically be seen as a bug."
Come on. I explicitly said "doesn't have to be in base though, sure" - just that abandoning code reuse is generally considered unfortunate. And I never claimed to speak for anyone but myself, though packages like Convex.jl could be written to leverage some of these CSC tricks if they wanted to improve performance. Same with any optimization solver that wanted to tie in more closely with the modeling layer to support problem modification. Might also be useful for mesh refinement type algorithms.
Do you have a branch? I can take a look at finishing it up. Directly changing fields is considered poor API design when generality over multiple types is desired, but there's plenty of code out there that modifies fields of mutable types in a non-generic way and that isn't automatically a bug. We break code when there's good reason to, and this is probably enough of a speedup to be worth it, but in-place modification of the scalar fields is valuable functionality that we're losing in exchange. I've basically been fine with merging this since I got |
My suggestion is that we should merge this for now, since changing dimensions of a matrix on the fly is not really a generally needed and supported use case. One can always create a new SparseMatrixCSC and for cases that is not fast enough, create a custom type (by perhaps copying the code). I think by and large this is a good change. |
I only have the branch locally at my office but I will push it tomorrow. |
Thanks. Ref may end having as much or more overhead and would break code that reads from the fields directly instead of using size, but I would like to try it out. If you've got code lying around as a head start that would be useful. |
https://github.com/KristofferC/julia/tree/kc/refwip I acidentally put this on top of two other commits but maybe you can cherry pick it cleanly.. |
…n anticipation of introducing related `permute[!]` methods and taking advantage of JuliaLang#16371 for better structuring. Also adds several tests for these methods.
…n anticipation of introducing related `permute[!]` methods and taking advantage of JuliaLang#16371 for better structuring. Also adds several tests for these methods.
This changes
SparseMatrixCSC
to be an immutable type. The reasons for this are outlined in #15668.I also have commits that remove a lot of the manual hoisting but I believe a two PR approach is better here, both to simplify review and to make benchmarking easier.
If someone could give a wink to @nanosoldier to run the sparse benchmarks I would be grateful.
Fixes #15668