Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: make SparseMatrixCSC immutable #16371

Merged
merged 1 commit into from
May 18, 2016

Conversation

KristofferC
Copy link
Member

This changes SparseMatrixCSC to be an immutable type. The reasons for this are outlined in #15668.

I also have commits that remove a lot of the manual hoisting but I believe a two PR approach is better here, both to simplify review and to make benchmarking easier.

If someone could give a wink to @nanosoldier to run the sparse benchmarks I would be grateful.

Fixes #15668

@KristofferC
Copy link
Member Author

Tagging some people that might want to look at this @tkelman @Sacha0 @martinholters

@jrevels
Copy link
Member

jrevels commented May 14, 2016

@nanosoldier runbenchmarks("sparse", vs = ":master")

@ViralBShah ViralBShah added the sparse Sparse arrays label May 14, 2016
@inbounds for col in 1:n
rrange = colptr[col]:(colptr[col+1]-1)
(nadd > 0) && (colptr[col] = colptr[col] + nadd)
for col in 1:n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inbounds decoration drop: Accidental or intentional? If the latter, why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental, thanks for spotting. Removed when testing some things and forgot to add back. @jrevels, What happens if I force push to this branch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean w.r.t. @nanosoldier? It pulls down the merge commit of the PR as soon as the job runs, so force pushing after it's started won't have any effect on the current job. The easiest thing for us to do here is just submit the new job whenever the PR is ready (currently, there's not a good way to cancel jobs without manually hacking around the submission system).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the information. I updated the PR with the forgotten @inbounds but we can wait for CI to be green to restart the benchmark to make sure I am not wasting your and @nanosoldier's time

@ViralBShah ViralBShah added the linear algebra Linear algebra label May 14, 2016
@@ -2458,11 +2461,14 @@ function setindex!{Tv,Ti,T<:Integer}(A::SparseMatrixCSC{Tv,Ti}, B::SparseMatrixC
colptrB = B.colptr; rowvalB = B.rowval; nzvalB = B.nzval
Copy link
Member

@Sacha0 Sacha0 May 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you've removed all references to colptrB, rowvalB, and nzvalB in this method. Perhaps nix this line? Or did I miss remaining references?

Edit: The same seems to hold for colptrA, rowvalA, and nzvalA?

Copy link
Member Author

@KristofferC KristofferC May 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mixed up two commits a bit. That was supposed to go in the (potential) later PR that removes manual hoistings. I will change back to using the colptrB variables.

@KristofferC KristofferC force-pushed the kc/sparse_mat_immut branch from 84e4563 to 8aa787e Compare May 14, 2016 16:14
rowvalA = copy(rowvalB)
nzvalA = copy(nzvalB)
resize!(rowvalB, length(rowvalA)+memreq)
resize!(nzvalB, length(rowvalA)+memreq)
Copy link
Member

@Sacha0 Sacha0 May 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that in this method colptrB, rowvalB, and nzvalB are now persistent references to A.colptr, A.rowval, and A.nzval, while colptrA, rowvalA, and nzvalA can become disjoint storage for the original contents of A.colptr, A.rowval, and A.nzval? If so this seems a bit labyrinthine, and I might advocate for renaming things commensurate with their use (as in the preceding methods).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You basically want to swap xxxA and xxxB?

I agree, these functions are indeed difficult to reason about. They try to reuse the storage in the input matrix but if they find out that they need temporary data they allocate copies of the matrix fields and rebind the variables. It makes it difficult to keep track on what data each variable points to. I didn't want to dig down too much into these functions so I mostly mechanically made them set the new values inplace the matrix fields instead of the newly allocated vectors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You basically want to swap xxxA and xxxB?

Eventually something along those lines might be nice, yes. But for now please ignore this comment (see below.)

I agree, these functions are indeed difficult to reason about. They try to reuse the storage in the input matrix but if they find out that they need temporary data they allocate copies of the matrix fields and rebind the variables. It makes it difficult to keep track on what data each variable points to. I didn't want to dig down too much into these functions so I mostly mechanically made them set the new values inplace the matrix fields instead of the newly allocated vectors.

Cheers, I would have done the same and for the same reasons. I like how you've minimized modifications in this PR. Better to defer my suggestion above to a separate PR to retain that approach and see this merged efficiently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's open issue about it later so we don't forget it. Hopefully making SparseMatrixCSC an immutable opens up for some clean up possibilities when dereferencing fields can be hoisted out of, for example, utility functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! :)

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @jrevels

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

I'm a little surprised this was all that needed to change. And that the difference between copy and allocating an uninitialized Array is apparently not noticeable (or maybe not covered by the existing benchmarks).

I have code that has needed to modify m and n of a sparse matrix in place as it's being incrementally assembled, which this would break. Though it should be possible to recreate a new SparseMatrixCSC that refers to the same index arrays with different m and n values, I do wonder whether the compiler would handle that as well as in-place modification of integer fields in a mutable type.

@KristofferC
Copy link
Member Author

The previous versions also copied?

rowvalA = Array(Ti, nnzA); copy!(rowvalA, 1, rowval, 1, length(rowval))

But yes, some functions didn't have to copy the whole thing though, only up to a certain point.

When immutables referencing heap allocated objects can be stack allocated it would likely be as fast as modifying the field, but until then you will hit an extra allocation from allocating the new type. How much work are you doing per matrix resize?

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

Well, depends whether the newly added rows/columns are going to have anything in them. Adding many empty columns one by one in a loop happens occasionally, depends on the structure of the problem.

@KristofferC
Copy link
Member Author

KristofferC commented May 15, 2016

Some sort of benchmark...

function change_size{Tv, Ti}(A::SparseMatrixCSC{Tv, Ti}, m, n)
  return SparseMatrixCSC{Tv, Ti}(m, n, A.colptr, A.rowval, A.nzval)
end

    A = sprand(10^3, 10^3, 0.01)
    t1 = @elapsed for i in 1:10^5
        a = abs(rand(Int))
        b = abs(rand(Int))
        change_size(A, a, b)
    end

    t2 = @elapsed for i in 1:10^5
        A.m = abs(rand(Int))
        A.n = abs(rand(Int))
    end

    t3 = @elapsed for i in 1:10^5
        a = abs(rand(Int))
        b = abs(rand(Int))
    end

    println((t1 - t3) / (t2 - t3))
end

julia> test()
10.512425670729362

So yes, a lot slower. However, can that really be a bottleneck in your code?

@timholy
Copy link
Member

timholy commented May 15, 2016

@tkelman, could your case be handled by just assembling I, J, and V as vectors and then calling sparse at the end?

Another option is to have "assembly" types specifically designed for creation. I've wondered about this in the context of implementing generic algorithms that create new matrices, and noting that not all sparse-like types seem well suited for this.

@KristofferC
Copy link
Member Author

I'm getting a bit off topic here but another thing that I thought could be interesting is to refactor the colptr and rowval into a new type called NonZeroStructure or SparsityPattern or something like that. The reason being is that is is quite common that you have multiple matrices with the same non zero structure and having these share a SparsityPattern object would be a good way to save memory. Of course you can just have the fields point to the same underlying vector but having it as a type would be cleaner. You could also write functions that does not need to know about the actual values in the matrix only in terms of the SparsityPattern type could also be considered cleaner.

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

However, can that really be a bottleneck in your code.

I don't have easily benchmarkable examples handy right now, but a lot of what this code was doing was modifying and assembling sparse matrices and the latency of that can be an issue. It ends up building a sparse matrix representation of an expression operation by operation.

could your case be handled by just assembling I, J, and V as vectors and then calling sparse at the end?

No, I very rarely call sparse since it's too slow (and it makes copies) when you know you're assembling the array columnwise and in row-sorted order.

having it as a type would be cleaner

I'm skeptical on this, additional levels of nesting don't seem worth it. It may not entirely work right now but it might be possible to make SparseMatrixCSC work with things like repeat(true, nnz) or Array{Void} as its nzvals.

@timholy
Copy link
Member

timholy commented May 15, 2016

No, I very rarely call sparse since it's too slow (and it makes copies) when you know you're assembling the array columnwise and in row-sorted order.

OK, but in any case you don't need to create the SparseMatrixCSC at the beginning, you can create it at the end from pre-assembled vectors. In other words, this should not become a reason to avoid merging this.

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

I need it to be usable as a matrix midway through the process too, so not really.

@KristofferC
Copy link
Member Author

If you are using it as a matrix then the cost of recreating a new matrix with the correct dimension should be insignificant?

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

You showed above, it's 10x slower to create a new matrix every time you need to change dimensions. It isn't being used as a matrix every time the dimensions change, but it needs to always be possible, preferably without the slowdown of creating a new matrix every time I need to use it.

We could make the dimensions a Ref, or store them in a separate array, or make one of them implicit based on colptr length, but all of those would be noticeable data structure changes to the internal fields.

Are any of the performance issues being immutable fixes not possible to work around by manual hoisting? Leaving out the hoisting isn't usually making a factor of 10 difference, is it?

@KristofferC
Copy link
Member Author

Of course the relative slowdown here is large because you are comparing creating a new type (that has an error checking inner constructor) with something that takes almost no time to do. However, comparing the relative slowdown to a function that does actual work is ridiculous. You can create 100 million new sparse matrices in less than a second. Just run your code and post some real numbers...

If the matrix changes size it is no longer the same matrix and should not be represented by the same object. This discussion has already been had many times for other types the conclusion has always been that we shouldn't bend our API to something that is purely a compiler optimization, in this case stack allocating immutables with refs to heap allocated objects.

If you have a personal exceptional use case then it is in my view better to keep a local patch that changes immutable to type in one place than to force every single base and user function operating on a sparse matrix to hoist loads and manually inlining all auxiliary functions to avoid the dereference.

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

You call push! on a vector, it changes size and is still the same object. We shouldn't change the API for a compiler optimization, but we should make things that currently work efficiently more expensive, for a compiler optimization elsewhere? Is immutability accomplishing more than field reference convenience?

The port of this particular code to Julia was not finished, I stopped writing research code a while ago. If it were complete, a representative use would be changing dimensions tens or hundreds of thousands of times, calling functions that do useful work several hundred or thousands of times, then repeating the process again with more in place modifications to the existing data. My successors may need to revisit the approach if JuMP's startup time doesn't improve. I don't think modifying sparse matrices in place is a niche application that should require a fork of Julia to be possible to do efficiently. Any optimization algorithm that supports problem modification would need to do something similar. Most solvers that have to call C aren't doing this directly with SparseMatrixCSC, but a pure Julia solver would take full advantage.

@timholy
Copy link
Member

timholy commented May 15, 2016

Are any of the performance issues being immutable fixes not possible to work around by manual hoisting? Leaving out the hoisting isn't usually making a factor of 10 difference, is it?

Manual hoisting implies writing code in Matlab style, aka big monolithic functions that might as well be a ccall. I'm hoping to move to a more "julian" style with the visit-only-the-stored-entries iterators in ArrayIteration.jl. None of those can use manual hoisting. And yes, it's a big performance difference (don't remember how much off the top of my head), and it affects each and every element access. The penalty @KristofferC measured was basically benchmarking creation against nothing; turn that benchmark into one where you actually do something with the array (e.g., multiply with it), and that tenfold ratio will turn into 1 + eps().

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

It's the repeated wrapper allocation I'd be worried about, if I do no work for a few thousand modifications.

Though I might be able to cheat and use unsafe_store!. That's liable to trigger UB and the compiler may not cooperate with attempts to trick it though.

@martinholters
Copy link
Member

Joining this party a bit late, I just took a brief glance at the changes, and my main take-away is ... yes, some clean-up (in a seperate PR) sounds like a really good idea. Being too lazy to figure things out in detail, I can just conclude the changes don't look likely to introduce any really subtle bugs, so if the tests pass, things are likely to be OK.

More generally, I like the idea of SparseMatrixCSC being immutable. The fact that relatively few changes were necessary to allow this seem to be indicative that one usually can do quite well without mutating it. I guess the only reason one would like to mutate is to change the size like @tkelman did, or is there any compelling reason one would rather replace the internal vectors than to just replace their contents? It would be interesting, though, how many cases there are in the wild mutating SparseMatrixCSC in addition to @tkelman's. My gut feeling is there are very few, and most of them can be probably fixed as easily as those in Base handled by this PR. So I'd say we gain more than we (or @tkelman) lose(s) by merging this. But before jumping to any conclusions, maybe @tkelman can try to adapt his algorithm to work with this PR to compare the performance of the whole thing? How bad is it really?

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

Still a bit synthetic, but doing at least some work on the array after modifying it a bunch of times: https://gist.github.com/6de7e873bc4067dd145d1ff0aa39e6f1

~/Julia/julia-0.5$ julia-8aa787e36b/bin/julia sparsemutate.jl
@benchmark(test_unsafe!(A)) = ================ Benchmark Results ========================
     Time per evaluation: 11.26 ms [11.06 ms, 11.47 ms]
Proportion of time in GC: 0.68% [0.00%, 1.84%]
        Memory allocated: 2.50 mb
   Number of allocations: 163818 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 1.40 s

@benchmark(test_immut!(A)) = ================ Benchmark Results ========================
     Time per evaluation: 30.87 ms [30.32 ms, 31.42 ms]
Proportion of time in GC: 0.56% [0.00%, 1.37%]
        Memory allocated: 3.42 mb
   Number of allocations: 183818 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 3.30 s

~/Julia/julia-0.5$ ./julia sparsemutate.jl
INFO: Recompiling stale cache file /home/tkelman/.julia/lib/v0.5/Compat.ji for module Compat.
@benchmark(test_mut!(A)) = ================ Benchmark Results ========================
     Time per evaluation: 11.27 ms [11.10 ms, 11.43 ms]
Proportion of time in GC: 0.66% [0.00%, 1.78%]
        Memory allocated: 2.50 mb
   Number of allocations: 163818 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 1.35 s

@benchmark(test_unsafe!(A)) = ================ Benchmark Results ========================
     Time per evaluation: 11.47 ms [11.25 ms, 11.69 ms]
Proportion of time in GC: 0.74% [0.00%, 2.01%]
        Memory allocated: 2.50 mb
   Number of allocations: 163818 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 1.34 s

@benchmark(test_immut!(A)) = ================ Benchmark Results ========================
     Time per evaluation: 36.78 ms [35.79 ms, 37.76 ms]
Proportion of time in GC: 0.48% [0.00%, 1.16%]
        Memory allocated: 3.42 mb
   Number of allocations: 183818 allocations
       Number of samples: 100
   Number of evaluations: 100
 Time spent benchmarking: 3.89 s

@KristofferC
Copy link
Member Author

KristofferC commented May 15, 2016

Everything there can be done with just the three vectors. The whole point was that you needed the matrix so you could do matrix operations? You only need to convert to a matrix when you actually have an operation to do.

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

Which could be at any time. To do what you suggest (without resorting to unsafe_store! hacks) while maintaining readable array operation syntax would require defining a local mutable version of the SparseMatrixCSC type and re-delegating every single possible method signature with a wrapper. Unless there were an AbstractSparseCSC type where the interface is defined as exactly the current set of fields, and we change all the current methods to refer to that.

@KristofferC
Copy link
Member Author

KristofferC commented May 15, 2016

You need it to be in matrix form to know if you want to do work? What is the check to know if work needs to be done?

I just want to see some real code where recreating 100 million sparse matrices per second is a bottle neck...

@timholy
Copy link
Member

timholy commented May 15, 2016

On that benchmark, I guess one could just write out findnext explicitly, see these lines.

@tkelman
Copy link
Contributor

tkelman commented May 15, 2016

The application is problem assembly, sort of like an eager version of JuMP. Any time you add a constraint it's a new row, any time you add a variable it's a new column (or the reverse, if you work with the transpose / CSR). Convex.jl is actually pretty slow at model formulation because it creates a lot of unnecessary sparse matrices that it immediately throws away - it's not doing exactly my trick of adding empty rows or columns in-place, but some of the behavior is similar in spirit. When you are warm starting the solver from a good initial guess that only takes a handful of iterations, the latency of problem formulation and modification before re-solve can be noticeable.

What is the check to know if work needs to be done?

There is no such check. The problem remains valid after any one of these modifications and may need to be solved immediately at any point. Your suggested approach of "only create a SparseMatrixCSC when you need to call one of its methods" would probably be okay performance-wise since the wrapping penalty of creating a new immutable for every operation will only be noticeable relative to the delegated operation for small sizes. But it's going to be inelegant to actually write if the number of different methods you need is large.

Manually hoisting and inlining is inelegant too, and I would like to see the iteration problem addressed. Sparse vs dense iteration isn't all that useful of a comparison, but immutable-sparse vs mutable-sparse would be. This has benefits, but there are use cases that it breaks.

@timholy
Copy link
Member

timholy commented May 15, 2016

Sparse vs dense iteration isn't all that useful of a comparison, but immutable-sparse vs mutable-sparse would be.

Right. Having spent too much time today getting that benchmark working in ArrayIteration (I had column iteration working, but not whole-matrix iteration), my point was that someone could benchmark sumstored on both this branch and master.

But here's a simple, more artificial, demo:

@noinline fetchval(A::SparseMatrixCSC, i) = A.nzval[i]
@noinline fetchval(nz, i) = nz[i]

function sumall1(A, n)
    s = 0.0
    for k = 1:n
        for i = 1:length(A.nzval)
            s += fetchval(A, i)
        end
    end
    s
end

function sumall2(A, n)
    s = 0.0
    nz = A.nzval
    for k = 1:n
        for i = 1:length(nz)
            s += fetchval(nz, i)
        end
    end
    s
end

julia> include("/tmp/sumall.jl")
sumall2 (generic function with 1 method)

julia> A = sprand(10^4, 10^4, 1e-3);

julia> sumall1(A, 1)
49987.660418004125

julia> sumall2(A, 1)
49987.660418004125

julia> @time 1
  0.000013 seconds (150 allocations: 8.949 KB)
1

julia> @time sumall1(A, 1000)
  0.704123 seconds (5 allocations: 176 bytes)
4.998766041778719e7

julia> @time sumall2(A, 1000)
  0.535493 seconds (5 allocations: 176 bytes)
4.998766041778719e7

@martinholters
Copy link
Member

I've just repeated @timholy's benchmark with master, once with SparseMatrixCSC, once with an ad-hoc defined ImmutableSparseMatrixCSC, being equivalent except for being immutable, i.e. I've added:

immutable ImmutableSparseMatrixCSC{Tv,Ti<:Integer} <: AbstractSparseMatrix{Tv,Ti}
    m::Int                  # Number of rows
    n::Int                  # Number of columns
    colptr::Vector{Ti}      # Column i is in colptr[i]:(colptr[i+1]-1)
    rowval::Vector{Ti}      # Row values of nonzeros
    nzval::Vector{Tv}       # Nonzero values

    function ImmutableSparseMatrixCSC(m::Integer, n::Integer, colptr::Vector{Ti}, rowval::Vector{Ti}, nzval::Vector{Tv})
        m < 0 && throw(ArgumentError("number of rows (m) must be ≥ 0, got $m"))
        n < 0 && throw(ArgumentError("number of columns (n) must be ≥ 0, got $n"))
        new(Int(m), Int(n), colptr, rowval, nzval)
    end
end

@noinline fetchval(A::ImmutableSparseMatrixCSC, i) = A.nzval[i]

# ...

B = ImmutableSparseMatrixCSC{Float64,Int}(A.m, A.n, A.colptr, A.rowval, A.nzval)

# ...

@time sumall1(B, 1000)
@time sumall2(B, 1000)

Result:

sumall1 sumall2
type 420.90 ms 388.68 ms
immutable 419.90 ms 390.37 ms

The difference between the type and the immutable version is within measurement error. Interestingly, the relative difference between sumall1 and sumall2 is smaller than what @timholy had.

@KristofferC
Copy link
Member Author

KristofferC commented May 17, 2016

Isn't the whole point that the function should get inlined and the compiler can see the load in the loop and then hoist it? By hiding the load behind a noinline this will not happen and the same code is probably generated. What happens if you inline it instead?

@martinholters
Copy link
Member

Replacing @noinline with @inline, I get about 78 ms for all four cases. Same if I drop the annotation completely.

@KristofferC
Copy link
Member Author

KristofferC commented May 17, 2016

Since there are no stores in the function the load can be manually hoisted no matter what aliasing information the compiler has. This is in my view a better example:

@inline fetchval(A::SparseMatrixCSC, i) = A.nzval[i]
@inline fetchval(A::ImmutableSparseMatrixCSC, i) = A.nzval[i]
@inline setval(A::SparseMatrixCSC, i) = A.rowval[i] = i
@inline setval(A::ImmutableSparseMatrixCSC, i) = A.rowval[i] = i


function sumall1(A, n)
    s = 0.0
    for k = 1:n
        @inbounds for i = 1:length(A.nzval)
            s += fetchval(A, i)
            setval(A, i)
        end
    end
    s
end
A = sprand(10^4, 10^4, 1e-3);
B = ImmutableSparseMatrixCSC{Float64,Int}(A.m, A.n, A.colptr, A.rowval, A.nzval)


julia> @time sumall1(A, 1000)
  0.110465 seconds (5 allocations: 176 bytes)
5.005743903297553e7

julia> @time sumall1(B, 1000)
  0.071842 seconds (5 allocations: 176 bytes)
5.005743903297553e7

Looking at @code_llvm for the loop we have for the mutable:

if7:                                              ; preds = %if7.preheader, %if7
  %s.110 = phi double [ %28, %if7 ], [ %s.012, %if7.preheader ]
  %"#temp#1.09" = phi i64 [ %21, %if7 ], [ 1, %if7.preheader ]
  %21 = add i64 %"#temp#1.09", 1
  %22 = load %jl_value_t*, %jl_value_t** %11, align 8
  store %jl_value_t* %22, %jl_value_t** %4, align 8
  %23 = add i64 %"#temp#1.09", -1
  %24 = bitcast %jl_value_t* %22 to double**
  %25 = load double*, double** %24, align 8
  %26 = getelementptr double, double* %25, i64 %23
  %27 = load double, double* %26, align 8
  %28 = fadd double %s.110, %27
  %29 = load %jl_value_t*, %jl_value_t** %12, align 8
  store %jl_value_t* %29, %jl_value_t** %5, align 8
  %30 = bitcast %jl_value_t* %29 to i64**
  %31 = load i64*, i64** %30, align 8
  %32 = getelementptr i64, i64* %31, i64 %23
  store i64 %"#temp#1.09", i64* %32, align 8
  %33 = icmp eq i64 %"#temp#1.09", %19
  br i1 %33, label %L.loopexit.loopexit, label %if7
}

while for the immutable:

if7:                                              ; preds = %if7.lr.ph, %if7
  %s.110 = phi double [ %s.012, %if7.lr.ph ], [ %34, %if7 ]
  %"#temp#1.09" = phi i64 [ 1, %if7.lr.ph ], [ %30, %if7 ]
  %30 = add i64 %"#temp#1.09", 1
  store %jl_value_t* %24, %jl_value_t** %4, align 8
  %31 = add i64 %"#temp#1.09", -1
  %32 = getelementptr double, double* %26, i64 %31
  %33 = load double, double* %32, align 8
  %34 = fadd double %s.110, %33
  store %jl_value_t* %27, %jl_value_t** %5, align 8
  %35 = getelementptr i64, i64* %29, i64 %31
  store i64 %"#temp#1.09", i64* %35, align 8
  %36 = icmp eq i64 %"#temp#1.09", %16
  br i1 %36, label %L.loopexit, label %if7
}

For the mutable type we are doing extra loads (two of them, one for each field).

@KristofferC
Copy link
Member Author

KristofferC commented May 17, 2016

It gets even worse when we have two matrices at the same time in the loop.

function sumall1(A, B, n)
    s = 0.0
    for k = 1:n
        @inbounds for i = 1:length(A.nzval)
            s += fetchval(A, i)
            setval(B, i)
        end
    end
    s
end

A = sprand(10^4, 10^4, 1e-3);
A2 = sprand(10^4, 10^4, 1e-3);
B = ImmutableSparseMatrixCSC{Float64,Int}(A.m, A.n, A.colptr, A.rowval, A.nzval)
B2 = ImmutableSparseMatrixCSC{Float64,Int}(A2.m, A2.n, A2.colptr, A2.rowval, A2.nzval)

sumall1(A, A2, 1000)
sumall1(B, B2, 1000)
julia> @time sumall1(A, A2, 1000)
  0.197629 seconds (5 allocations: 176 bytes)
5.020867352413504e7

julia> @time sumall1(B, B2, 1000)
  0.071756 seconds (5 allocations: 176 bytes)
5.020867352413504e7

Here the mutable version has 5 (1 + 2 matrices * 2 fields, I guess) loads compared to 1 for the immutable one.

@martinholters
Copy link
Member

Yes, there is clear benefit in having it immutable (as was to be expected), and it will probably allow quite some code clean-up without a performance penalty (that would arise for a mutable type). But then there is the matrix resize where things get worse. I don't see a way out of this.

Unless there were an AbstractSparseCSC type where the interface is defined as exactly the current set of fields, and we change all the current methods to refer to that.

A drastic measure, but it had crossed my mind, too. However, all the hopefully upcoming code clean-up would then probably make operations on the hypothetical MutableSparseMatrixCSC slower. Once that outweighs the cost of constructing an ImmutableSparseMatrixCSC instance, nothing is gained.

@KristofferC
Copy link
Member Author

We have two compiler inefficiencies here pulling in different directions. One is that the immutable SparseMatrixCSC type currently can't be stack allocated. This leads to a performance penalty when creating a new matrix from an old matrix's fields. The other one is that it seems that the compiler think that arrays can alias julia objects which means that it reloads them after a store if the matrix is not an immutable.

Assuming both of these are fixed we probably still want it to be an immutable since we don't allow other matrices to change size. The fact that it also allows us to start cleaning up some of the sparse code right now tips the scale in my opinion. But I never resize my matrices so I'm biased :P

@martinholters
Copy link
Member

Assuming both of these are fixed we probably still want it to be an immutable since we don't allow other matrices to change size. The fact that it also allows us to start cleaning up some of the sparse code right now tips the scale in my opinion. But I never resize my matrices so I'm biased :P

Same opinion, same bias here. @timholy seems to be similar-minded, @tkelman is biased in the opposite direction. Anyone else care to weigh in?

@tkelman
Copy link
Contributor

tkelman commented May 18, 2016

we don't allow other matrices to change size

We do allow vectors to change size, and in-place appending of new columns to a dense matrix or analogous generalizations to higher dimensions should be possible, it just hasn't been implemented in array.c yet.

Other ways of obtaining the same aliasing compiler optimization without sacrificing the mutability of the scalar fields would be #9448 / #11430, but the discussion there recommends using a Ref for this. It's worth testing whether Ref is a good option to preserve this flexibility.

Without Ref (or if it does not help performance) there are other workarounds for changing size, either via unsafe_store! (dangerous) or wrapping every needed computational method to create an immutable SparseMatrixCSC out of varying fields only when delegating to existing methods (verbose/inconvenient). Not pleasant, but hopefully we'll work on stack allocation sometime during the 0.6 timeframe as it's increasingly responsible for abstraction penalties in many places.

@timholy
Copy link
Member

timholy commented May 18, 2016

it just hasn't been implemented in array.c yet.

My memories of old mailing-list posts suggest that the plan was to never implement it, on the grounds that it makes automatic removal of bounds checks much harder. On julia-0.5 you should be able to grow a matrix by columns (haven't yet tested) by maintaining the data in a Vector and then calling ReshapedArray on the result. But note that's creating a new wrapper each time, which is along the lines of what is being proposed here.

I have to say, I really don't understand why @tkelman can't just split out the current mutable implementation into a package. Heck, I suspect that all the sparse matrix implementations will be in packages soon, so this does not make anything a second-class citizen.

@tkelman
Copy link
Contributor

tkelman commented May 18, 2016

Yes, copy-pasting all the existing code is also an option, though even more labor intensive and undesirable from a maintainability perspective than wrapping and delegation. The fact that copy-pasting that much code is even being suggested means the functionality of being able to modify the scalar fields is valuable enough to be worth keeping (doesn't have to be in base though, sure). Abandoning all code reuse to achieve it seems drastic.

@martinholters
Copy link
Member

My memories of old mailing-list posts suggest that the plan was to never implement it, on the grounds that it makes automatic removal of bounds checks much harder

In support if that, I've found https://groups.google.com/forum/#!msg/julia-users/B4OUYPFM5L8/VKV_LyQJWT8J. However, if the plan has changed and we will see resizable arrays anytime soon, having sparse matrices not resizable would also be awkward.

I have to say, I really don't understand why @tkelman can't just split out the current mutable implementation into a package. Heck, I suspect that all the sparse matrix implementations will be in packages soon, so this does not make anything a second-class citizen.

I really wouldn't like to see two competing sparse matrix packages. Once forked, it is likely to see features being implemented in only one of them. Having users choose a sparse matrix package depending on their specific needs - or worse, having them convert between different representations during computations to be able to call into the right package - dose not sound exactly attractive.

I have no experience with Refs. Would that be a way to have our cake and eat it, too?

@KristofferC
Copy link
Member Author

The fact that copy-pasting that much code is even being suggested means the functionality of being able to modify the scalar fields is valuable enough to be worth keeping

It is only being suggested because you want to keep it though... You can't make an argument for something and when someone suggests a solution you go "See! People do want that functionality, we need it in Base!".

Regarding Ref. @martinholters, AFAIU Ref is (at least right now) basically syntactic sugar for an array of length one so you can modify the value even though it is inside an immutable. I started adding the Ref stuff and got most of sparsematrix.jl done but there are more stuff in sparsevector.jl and then SparseVector should get the same treatment. Very boring work.

If we ever decide to support changing dimensions of normal matrices there is no problem adding Ref but I don't see a point now. Directly changing the fields is not supported and anyone who uses it can expect their code to break. Until there is an API like resize! for matrices nothing is being broken by this. The fact that resizing was possible can basically be seen as a bug that this PR fixes.

@KristofferC
Copy link
Member Author

Jebus, what's the point in passive aggressively adding a thumb down emote? Write an answer or don't, but don't go around thumbing down discussion posts that directly involve you when they are not of the survey type.

@tkelman
Copy link
Contributor

tkelman commented May 18, 2016

Jebus, what's the point in passive aggressively adding a thumb down emote?

Community standards please. The thumbs down is because you're misrepresenting what I'm saying, and I disagree with "The fact that resizing was possible can basically be seen as a bug."

People do want that functionality, we need it in Base!

Come on. I explicitly said "doesn't have to be in base though, sure" - just that abandoning code reuse is generally considered unfortunate. And I never claimed to speak for anyone but myself, though packages like Convex.jl could be written to leverage some of these CSC tricks if they wanted to improve performance. Same with any optimization solver that wanted to tie in more closely with the modeling layer to support problem modification. Might also be useful for mesh refinement type algorithms.

I started adding the Ref stuff

Do you have a branch? I can take a look at finishing it up.

Directly changing fields is considered poor API design when generality over multiple types is desired, but there's plenty of code out there that modifies fields of mutable types in a non-generic way and that isn't automatically a bug. We break code when there's good reason to, and this is probably enough of a speedup to be worth it, but in-place modification of the scalar fields is valuable functionality that we're losing in exchange. I've basically been fine with merging this since I got unsafe_store! to work, since the speedup for the immutable code will be worth it and I'll be able to drop the dangerous hack once stack allocation happens.

@ViralBShah
Copy link
Member

My suggestion is that we should merge this for now, since changing dimensions of a matrix on the fly is not really a generally needed and supported use case. One can always create a new SparseMatrixCSC and for cases that is not fast enough, create a custom type (by perhaps copying the code).

I think by and large this is a good change.

@ViralBShah ViralBShah merged commit fa41010 into JuliaLang:master May 18, 2016
@KristofferC
Copy link
Member Author

I only have the branch locally at my office but I will push it tomorrow.

@KristofferC KristofferC deleted the kc/sparse_mat_immut branch May 18, 2016 17:07
@tkelman
Copy link
Contributor

tkelman commented May 18, 2016

Thanks. Ref may end having as much or more overhead and would break code that reads from the fields directly instead of using size, but I would like to try it out. If you've got code lying around as a head start that would be useful.

@KristofferC
Copy link
Member Author

KristofferC commented May 19, 2016

https://github.com/KristofferC/julia/tree/kc/refwip

I acidentally put this on top of two other commits but maybe you can cherry pick it cleanly..

Sacha0 added a commit to Sacha0/julia that referenced this pull request Jun 14, 2016
…n anticipation of introducing related `permute[!]` methods and taking advantage of JuliaLang#16371 for better structuring. Also adds several tests for these methods.
Sacha0 added a commit to Sacha0/julia that referenced this pull request Jun 20, 2016
…n anticipation of introducing related `permute[!]` methods and taking advantage of JuliaLang#16371 for better structuring. Also adds several tests for these methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra sparse Sparse arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants