MIT-licensed SparseMatrixCSC transposition methods #14631

Sacha0 · 2016-01-11T00:39:22Z

Followup to #13001. This pull request replaces the LGPL-licensed SparseMatrixCSC transposition methods in base/sparse/csparse.jl ([c|f]transpose[!]) with MIT-licensed versions.

Specifically, these methods implement the HALFPERM algorithm described in F. Gustavson, "Two fast algorithms for sparse matrices: multiplication and permuted transposition," ACM TOMS 4(3), 250-269 (1978). The algorithm runs in O(A.m, A.n, nnz(A)) time and requires no space beyond that passed in. If you know of a faster-in-practice algorithm, please let me know. The new methods' performance is nearly identical to the old methods' performance; the new methods might have a slight edge.

When editing base/sparse/csparse.jl to remove the existing methods, I folded all blocks to avoid peaking at the LGPL code. So someone should check that the removal was graceful given it was blind but for method signatures.

@dmbates (or anyone interested) From what I see when I grep base/sparse/csparse.jl for long-form method signatures, I'm guessing that triu! and tril! are short children of fkeep!; is that so? Are there short-form children of fkeep! that I'm not seeing? (I've avoided grepping for short-form methods in case doing so reveals some implementation detail that I should not see.) Additionally, I'm guessing that f takes an element (and its position?) and determines whether that element should be kept; what is the implicit signature of f? What is the purpose and form of other? I might pick off the preceding set of methods next. Concerning sparse, I see multiple possible space/time complexity tradeoffs. How much storage does the existing sparse method allocate, and does it achieve O(A.m, A.n, nnz(A)) time or does it trade time to save space? Thanks, and best!

ViralBShah · 2016-01-11T03:36:08Z

Could you provide some benchmarks just to ensure no performance regressions?

Cc @jrevels

ViralBShah · 2016-01-11T03:36:40Z

This is great to have. Looking forward!

Sacha0 · 2016-01-11T05:02:55Z

Could you provide some benchmarks just to ensure no performance regressions?

Gladly! Something like...

using Benchmarks: @benchmark, SummaryStatistics

function prettytimes(bench)
    stats = SummaryStatistics(bench)
    timecenter = stats.elapsed_time_center
    timelower = get(stats.elapsed_time_lower)
    timeupper = get(stats.elapsed_time_upper)
    # based on Benchmarks.pretty_time_string
    timecenter < 1_000.0 ? (scalefactor = 1.0; units = "ns") :
        timecenter < 1_000_000.0 ? (scalefactor = 1_000.0; units = "μs") :
            timecenter < 1_000_000_000.0 ? (scalefactor = 1_000_000.0; units = "ms") :
                (scalefactor = 1_000_000_000.0; units = " s")
    @sprintf("%6.2f %s [%6.2f,%6.2f]", timecenter/scalefactor, units, timelower/scalefactor, timeupper/scalefactor)
end

smallN, smallerN = 600, 400;
smallsqrA = sprand(smallN, smallN, 0.01);
smallrectA = sprand(smallN, smallerN, 0.01);
smallsqrC = transpose(smallsqrA);
smallrectC = transpose(smallrectA);

largeN, largerN = 100000, 200000;
largesqrA = sprand(largeN, largeN, 0.001);
largerectA = sprand(largeN, largerN, 0.001);
largesqrC = transpose(largesqrA);
largerectC = transpose(largerectA);

wm, wt = 12, 25;
println(" $(lpad("method ", wm)) $(lpad("small square A", wt)), $(lpad("small rect A", wt)) | $(lpad("large square A", wt)), $(lpad("large rect A", wt)) ")
@printf("%s: %s , %s | %s, %s\n", lpad("transpose!", wm),
    prettytimes(@benchmark transpose!(smallsqrC, smallsqrA)),
    prettytimes(@benchmark transpose!(smallrectC, smallrectA)),
    prettytimes(@benchmark transpose!(largesqrC, largesqrA)),
    prettytimes(@benchmark transpose!(largerectC, largerectA)) )
@printf("%s: %s , %s | %s, %s\n", lpad("ctranspose!", wm),
    prettytimes(@benchmark ctranspose!(smallsqrC, smallsqrA)),
    prettytimes(@benchmark ctranspose!(smallrectC, smallrectA)),
    prettytimes(@benchmark ctranspose!(largesqrC, largesqrA)),
    prettytimes(@benchmark ctranspose!(largerectC, largerectA)) )
@printf("%s: %s , %s | %s, %s\n", lpad("transpose", wm),
    prettytimes(@benchmark transpose(smallsqrA)),
    prettytimes(@benchmark transpose(smallrectA)),
    prettytimes(@benchmark transpose(largesqrA)),
    prettytimes(@benchmark transpose(largerectA)) )
@printf("%s: %s , %s | %s, %s\n", lpad("ctranspose", wm),
    prettytimes(@benchmark ctranspose(smallsqrA)),
    prettytimes(@benchmark ctranspose(smallrectA)),
    prettytimes(@benchmark ctranspose(largesqrA)),
    prettytimes(@benchmark ctranspose(largerectA)) )

On master:

     method             small square A,              small rect A |            large square A,              large rect A
 transpose!:  24.18 μs [ 21.62, 26.75] ,  20.14 μs [ 19.75, 20.52] | 472.69 ms [466.36,479.02],   1.02  s [  1.00,  1.04]
ctranspose!:  24.26 μs [ 23.80, 24.71] ,  20.81 μs [ 20.43, 21.19] | 479.05 ms [469.00,489.11],   1.02  s [  0.99,  1.04]
  transpose:  42.98 μs [ 41.80, 44.15] ,  27.63 μs [ 26.82, 28.44] | 518.65 ms [501.76,535.54],   1.19  s [  1.17,  1.20]
 ctranspose:  42.76 μs [ 41.46, 44.07] ,  28.11 μs [ 27.27, 28.95] | 529.80 ms [505.16,554.45],   1.19  s [  1.17,  1.21]

On this PR's branch:

     method             small square A,              small rect A |            large square A,              large rect A
 transpose!:  22.42 μs [ 21.84, 23.00] ,  15.99 μs [ 15.63, 16.34] | 460.70 ms [450.73,470.68],   1.01  s [  0.98,  1.03]
ctranspose!:  22.43 μs [ 21.88, 22.98] ,  15.93 μs [ 15.61, 16.25] | 468.47 ms [462.66,474.28],   1.04  s [  1.00,  1.08]
  transpose:  41.57 μs [ 40.51, 42.63] ,  28.04 μs [ 27.33, 28.75] | 500.71 ms [492.20,509.22],   1.18  s [  1.17,  1.20]
 ctranspose:  40.19 μs [ 39.04, 41.33] ,  28.99 μs [ 28.29, 29.69] | 509.75 ms [500.48,519.03],   1.19  s [  1.16,  1.22]

? Thanks!

…se/sparse/csparse.jl ([c|f]transpose[!]) with MIT-licensed versions. See JuliaLang#13001.

Sacha0 · 2016-01-11T05:29:21Z

Thanks for the line-length feedback @tkelman! Sorry for somehow nixing the related comments. PR updated accordingly.

dmbates · 2016-01-12T17:35:55Z

@Sacha0 The signature of f in a call to fkeep! is

f{Tv,Ti}(i::Ti, j::Ti, x::Tv, other::Any) → Bool

The purpose of the other argument is to pass information from the call to fkeep! through to the call to f. As you suspect triu! and tril! are short children of fkeep!. In those cases the other argument is the diagonal below which or above which to zero the contents. It defaults to zero, indicating the main diagonal.

Another use of the other argument is in the droptol! function where it specifies the tolerance on the absolute value of the non-zeros below which they are dropped.

For details on the sparse function I think @ViralBShah is a better source than I.

Sacha0 · 2016-01-13T00:22:15Z

Thanks @dmbates! I will have a go at those methods (fkeep!/triu!/tril!/droptol!) next.

ViralBShah · 2016-01-13T19:29:39Z

The sparse algorithm should work in O(max(m,n,nnz(A))) time, IIRC. It is worth exploring different methods that may take more space and are significantly faster than what we have now.

MIT-licensed SparseMatrixCSC transposition methods

Sacha0 · 2016-01-13T23:51:38Z

Thanks for the review / merge!

KristofferC · 2016-01-14T07:31:38Z

Would be nice to add some of these benchmarks to BaseBenchmarks.jl

…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631.