Sparse versions of repeat for matrices and vectors #532

mjacobse · 2024-04-08T14:49:32Z

Calling repeat for sparse matrices or vectors currently calls the Base function, which ends up building the result rather inefficiently with indexing brackets.

This would add sparse versions that use the info that inputs are in CSC format to provide more efficient implementations. To keep it simple it is limited to the straightforward cases of outer repetition along the first two dimensions. Row-wise repetition is intentionally kept close to the implementation for vcat, also using its helper function stuffcol! (with slight modifications). Column-wise repetition is kept simple by deferring to Base.repeat on the colptr, rowval, and nzval components of the CSC format.

The col_length function argument was pretty confusing, as it expected not the number of nonzero elements in the column, but rather one less than that. This moves this offset by one to the actual for-loop that it is for, which clarifies the meaning and also simplifies callsites.

Instead of taking the whole input matrix, only take the actuallly required parts (row indices and nonzero values). That allows using it in more contexts.

codecov · 2024-04-08T15:05:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.07%. Comparing base (33fbc75) to head (6db2712).
Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #532      +/-   ##
==========================================
+ Coverage   76.42%   84.07%   +7.64%     
==========================================
  Files          12       12              
  Lines        8969     9068      +99     
==========================================
+ Hits         6855     7624     +769     
+ Misses       2114     1444     -670

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

SobhanMP · 2024-04-08T19:44:51Z

was vcat using the wrong length?

mjacobse · 2024-04-08T20:11:36Z

was vcat using the wrong length?

Are you referring to col_length and the change in 60eff16? No, in effect it was used correctly. Basically the helper stuffcol! iterates from 0:(n-1) where n is the number of nonzeros in the column. Before 60eff16, it expected you to pass in n-1 for the argument called col_length and then iterated 0:col_length. After 60eff16 it expects you to pass in n for the argument called col_length and then iterates 0:(col_length-1). So the behavior does not change, but to my mind it is much clearer and consistent with the name of the argument. And as I intended to reuse the function in the new repeat, I thought it would be better to make this clarifying change instead of reproducing the somewhat confusing usage.

ViralBShah · 2024-04-20T01:25:28Z

@SobhanMP Can you add your review to this PR?

SobhanMP · 2024-04-20T20:09:21Z

src/sparsematrix.jl

@@ -3908,27 +3908,24 @@ function vcat(X::AbstractSparseMatrixCSC...)
        ptr_res = colptr[c]
        for i = 1 : num
            colptrXi = getcolptr(X[i])
-            col_length = (colptrXi[c + 1] - 1) - colptrXi[c]
+            rowvalXi = rowvals(X[i])


I think at some point we agreed not to create temporary variables for this. Use rowvals(X[i]) directly as it gets optimized away by the compiler

Addressed by b961a7f. I kept the getcolptr stuff in vcat as it was before, to avoid touching lines unnecessarily. Let me know if you disagree and would like it changed by this PR too.

since it's jus style, let's keep the pull request focused on just repeat

src/sparsematrix.jl

src/sparsevector.jl

Just use the accessor functions rowvals, nonzeros, getcolptr whenever needed instead. Especially in the sparse vector case avoid using findnz, which creates a copy of the data. Use nonzeroinds and nonzeros instead.

Simplifies updating the insert position at the call site.

mjacobse · 2024-05-12T12:20:29Z

Thanks for the review! Anything left that I can do to help moving this forward?

ViralBShah · 2024-05-20T21:32:34Z

@SobhanMP Good to merge?

SobhanMP · 2024-05-21T02:35:17Z

yes, sorry I was behind a deadline 😅 @mjacobse, thanks for the code.

mjacobse added 4 commits April 8, 2024 15:55

Simplify arguments of helper functions

2ae8981

Instead of taking the whole input matrix, only take the actuallly required parts (row indices and nonzero values). That allows using it in more contexts.

Add efficient repeat for sparse matrices

fdda5e5

Add efficient repeat for sparse vectors

deda6e8

SobhanMP reviewed Apr 20, 2024

View reviewed changes

mjacobse added 2 commits April 21, 2024 11:44

Avoid using temporary vars for sparse data arrays

b961a7f

Just use the accessor functions rowvals, nonzeros, getcolptr whenever needed instead. Especially in the sparse vector case avoid using findnz, which creates a copy of the data. Use nonzeroinds and nonzeros instead.

Return next insert position from stuffcol helper

6db2712

Simplifies updating the insert position at the call site.

SobhanMP closed this May 10, 2024

SobhanMP reopened this May 10, 2024

SobhanMP merged commit 9d4397f into JuliaSparse:main May 21, 2024
9 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse versions of repeat for matrices and vectors #532

Sparse versions of repeat for matrices and vectors #532

mjacobse commented Apr 8, 2024

codecov bot commented Apr 8, 2024 •

edited

Loading

SobhanMP commented Apr 8, 2024

mjacobse commented Apr 8, 2024

ViralBShah commented Apr 20, 2024

SobhanMP Apr 20, 2024 •

edited

Loading

mjacobse Apr 21, 2024 •

edited

Loading

SobhanMP Apr 21, 2024

mjacobse commented May 12, 2024 •

edited

Loading

ViralBShah commented May 20, 2024

SobhanMP commented May 21, 2024

Sparse versions of repeat for matrices and vectors #532

Sparse versions of repeat for matrices and vectors #532

Conversation

mjacobse commented Apr 8, 2024

codecov bot commented Apr 8, 2024 • edited Loading

Codecov Report

SobhanMP commented Apr 8, 2024

mjacobse commented Apr 8, 2024

ViralBShah commented Apr 20, 2024

SobhanMP Apr 20, 2024 • edited Loading

Choose a reason for hiding this comment

mjacobse Apr 21, 2024 • edited Loading

Choose a reason for hiding this comment

SobhanMP Apr 21, 2024

Choose a reason for hiding this comment

mjacobse commented May 12, 2024 • edited Loading

ViralBShah commented May 20, 2024

SobhanMP commented May 21, 2024

codecov bot commented Apr 8, 2024 •

edited

Loading

SobhanMP Apr 20, 2024 •

edited

Loading

mjacobse Apr 21, 2024 •

edited

Loading

mjacobse commented May 12, 2024 •

edited

Loading