-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIT-licensed SparseMatrixCSC transposition methods #14631
Conversation
Could you provide some benchmarks just to ensure no performance regressions? Cc @jrevels |
This is great to have. Looking forward! |
Gladly! Something like... using Benchmarks: @benchmark, SummaryStatistics
function prettytimes(bench)
stats = SummaryStatistics(bench)
timecenter = stats.elapsed_time_center
timelower = get(stats.elapsed_time_lower)
timeupper = get(stats.elapsed_time_upper)
# based on Benchmarks.pretty_time_string
timecenter < 1_000.0 ? (scalefactor = 1.0; units = "ns") :
timecenter < 1_000_000.0 ? (scalefactor = 1_000.0; units = "μs") :
timecenter < 1_000_000_000.0 ? (scalefactor = 1_000_000.0; units = "ms") :
(scalefactor = 1_000_000_000.0; units = " s")
@sprintf("%6.2f %s [%6.2f,%6.2f]", timecenter/scalefactor, units, timelower/scalefactor, timeupper/scalefactor)
end
smallN, smallerN = 600, 400;
smallsqrA = sprand(smallN, smallN, 0.01);
smallrectA = sprand(smallN, smallerN, 0.01);
smallsqrC = transpose(smallsqrA);
smallrectC = transpose(smallrectA);
largeN, largerN = 100000, 200000;
largesqrA = sprand(largeN, largeN, 0.001);
largerectA = sprand(largeN, largerN, 0.001);
largesqrC = transpose(largesqrA);
largerectC = transpose(largerectA);
wm, wt = 12, 25;
println(" $(lpad("method ", wm)) $(lpad("small square A", wt)), $(lpad("small rect A", wt)) | $(lpad("large square A", wt)), $(lpad("large rect A", wt)) ")
@printf("%s: %s , %s | %s, %s\n", lpad("transpose!", wm),
prettytimes(@benchmark transpose!(smallsqrC, smallsqrA)),
prettytimes(@benchmark transpose!(smallrectC, smallrectA)),
prettytimes(@benchmark transpose!(largesqrC, largesqrA)),
prettytimes(@benchmark transpose!(largerectC, largerectA)) )
@printf("%s: %s , %s | %s, %s\n", lpad("ctranspose!", wm),
prettytimes(@benchmark ctranspose!(smallsqrC, smallsqrA)),
prettytimes(@benchmark ctranspose!(smallrectC, smallrectA)),
prettytimes(@benchmark ctranspose!(largesqrC, largesqrA)),
prettytimes(@benchmark ctranspose!(largerectC, largerectA)) )
@printf("%s: %s , %s | %s, %s\n", lpad("transpose", wm),
prettytimes(@benchmark transpose(smallsqrA)),
prettytimes(@benchmark transpose(smallrectA)),
prettytimes(@benchmark transpose(largesqrA)),
prettytimes(@benchmark transpose(largerectA)) )
@printf("%s: %s , %s | %s, %s\n", lpad("ctranspose", wm),
prettytimes(@benchmark ctranspose(smallsqrA)),
prettytimes(@benchmark ctranspose(smallrectA)),
prettytimes(@benchmark ctranspose(largesqrA)),
prettytimes(@benchmark ctranspose(largerectA)) ) On master: method small square A, small rect A | large square A, large rect A
transpose!: 24.18 μs [ 21.62, 26.75] , 20.14 μs [ 19.75, 20.52] | 472.69 ms [466.36,479.02], 1.02 s [ 1.00, 1.04]
ctranspose!: 24.26 μs [ 23.80, 24.71] , 20.81 μs [ 20.43, 21.19] | 479.05 ms [469.00,489.11], 1.02 s [ 0.99, 1.04]
transpose: 42.98 μs [ 41.80, 44.15] , 27.63 μs [ 26.82, 28.44] | 518.65 ms [501.76,535.54], 1.19 s [ 1.17, 1.20]
ctranspose: 42.76 μs [ 41.46, 44.07] , 28.11 μs [ 27.27, 28.95] | 529.80 ms [505.16,554.45], 1.19 s [ 1.17, 1.21] On this PR's branch: method small square A, small rect A | large square A, large rect A
transpose!: 22.42 μs [ 21.84, 23.00] , 15.99 μs [ 15.63, 16.34] | 460.70 ms [450.73,470.68], 1.01 s [ 0.98, 1.03]
ctranspose!: 22.43 μs [ 21.88, 22.98] , 15.93 μs [ 15.61, 16.25] | 468.47 ms [462.66,474.28], 1.04 s [ 1.00, 1.08]
transpose: 41.57 μs [ 40.51, 42.63] , 28.04 μs [ 27.33, 28.75] | 500.71 ms [492.20,509.22], 1.18 s [ 1.17, 1.20]
ctranspose: 40.19 μs [ 39.04, 41.33] , 28.99 μs [ 28.29, 29.69] | 509.75 ms [500.48,519.03], 1.19 s [ 1.16, 1.22] ? Thanks! |
…se/sparse/csparse.jl ([c|f]transpose[!]) with MIT-licensed versions. See JuliaLang#13001.
Thanks for the line-length feedback @tkelman! Sorry for somehow nixing the related comments. PR updated accordingly. |
@Sacha0 The signature of f{Tv,Ti}(i::Ti, j::Ti, x::Tv, other::Any) → Bool The purpose of the Another use of the For details on the |
Thanks @dmbates! I will have a go at those methods ( |
The |
MIT-licensed SparseMatrixCSC transposition methods
Thanks for the review / merge! |
Would be nice to add some of these benchmarks to BaseBenchmarks.jl |
…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631.
…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631. Also add a test for dropzeros!.
…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631. Also add a test for dropzeros!.
…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631. Also add a test for dropzeros!.
…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631. Also add a test for dropzeros!.
…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631. Also add a test for dropzeros!.
…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631. Also add a test for dropzeros!.
…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631. Also add a test for dropzeros!.
…tril!, triu!, droptol!, and dropzeros[!] with MIT-licensed versions. See JuliaLang#13001 and JuliaLang#14631. Also add a test for dropzeros!.
Followup to #13001. This pull request replaces the LGPL-licensed SparseMatrixCSC transposition methods in base/sparse/csparse.jl ([c|f]transpose[!]) with MIT-licensed versions.
Specifically, these methods implement the HALFPERM algorithm described in F. Gustavson, "Two fast algorithms for sparse matrices: multiplication and permuted transposition," ACM TOMS 4(3), 250-269 (1978). The algorithm runs in
O(A.m, A.n, nnz(A))
time and requires no space beyond that passed in. If you know of a faster-in-practice algorithm, please let me know. The new methods' performance is nearly identical to the old methods' performance; the new methods might have a slight edge.When editing base/sparse/csparse.jl to remove the existing methods, I folded all blocks to avoid peaking at the LGPL code. So someone should check that the removal was graceful given it was blind but for method signatures.
@dmbates (or anyone interested) From what I see when I grep base/sparse/csparse.jl for long-form method signatures, I'm guessing that
triu!
andtril!
are short children offkeep!
; is that so? Are there short-form children offkeep!
that I'm not seeing? (I've avoided grepping for short-form methods in case doing so reveals some implementation detail that I should not see.) Additionally, I'm guessing thatf
takes an element (and its position?) and determines whether that element should be kept; what is the implicit signature off
? What is the purpose and form ofother
? I might pick off the preceding set of methods next. Concerningsparse
, I see multiple possible space/time complexity tradeoffs. How much storage does the existingsparse
method allocate, and does it achieveO(A.m, A.n, nnz(A))
time or does it trade time to save space? Thanks, and best!