Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse versions of repeat for matrices and vectors #532

Merged
merged 6 commits into from
May 21, 2024

Conversation

mjacobse
Copy link
Contributor

@mjacobse mjacobse commented Apr 8, 2024

Calling repeat for sparse matrices or vectors currently calls the Base function, which ends up building the result rather inefficiently with indexing brackets.

This would add sparse versions that use the info that inputs are in CSC format to provide more efficient implementations. To keep it simple it is limited to the straightforward cases of outer repetition along the first two dimensions. Row-wise repetition is intentionally kept close to the implementation for vcat, also using its helper function stuffcol! (with slight modifications). Column-wise repetition is kept simple by deferring to Base.repeat on the colptr, rowval, and nzval components of the CSC format.

The col_length function argument was pretty confusing, as it expected
not the number of nonzero elements in the column, but rather one less
than that. This moves this offset by one to the actual for-loop that it
is for, which clarifies the meaning and also simplifies callsites.
Instead of taking the whole input matrix, only take the actuallly
required parts (row indices and nonzero values). That allows using it in
more contexts.
Copy link

codecov bot commented Apr 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.07%. Comparing base (33fbc75) to head (6db2712).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #532      +/-   ##
==========================================
+ Coverage   76.42%   84.07%   +7.64%     
==========================================
  Files          12       12              
  Lines        8969     9068      +99     
==========================================
+ Hits         6855     7624     +769     
+ Misses       2114     1444     -670     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@SobhanMP
Copy link
Member

SobhanMP commented Apr 8, 2024

was vcat using the wrong length?

@mjacobse
Copy link
Contributor Author

mjacobse commented Apr 8, 2024

was vcat using the wrong length?

Are you referring to col_length and the change in 60eff16? No, in effect it was used correctly. Basically the helper stuffcol! iterates from 0:(n-1) where n is the number of nonzeros in the column. Before 60eff16, it expected you to pass in n-1 for the argument called col_length and then iterated 0:col_length. After 60eff16 it expects you to pass in n for the argument called col_length and then iterates 0:(col_length-1). So the behavior does not change, but to my mind it is much clearer and consistent with the name of the argument. And as I intended to reuse the function in the new repeat, I thought it would be better to make this clarifying change instead of reproducing the somewhat confusing usage.

@ViralBShah
Copy link
Member

@SobhanMP Can you add your review to this PR?

@@ -3908,27 +3908,24 @@ function vcat(X::AbstractSparseMatrixCSC...)
ptr_res = colptr[c]
for i = 1 : num
colptrXi = getcolptr(X[i])
col_length = (colptrXi[c + 1] - 1) - colptrXi[c]
rowvalXi = rowvals(X[i])
Copy link
Member

@SobhanMP SobhanMP Apr 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at some point we agreed not to create temporary variables for this. Use rowvals(X[i]) directly as it gets optimized away by the compiler

Copy link
Contributor Author

@mjacobse mjacobse Apr 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by b961a7f. I kept the getcolptr stuff in vcat as it was before, to avoid touching lines unnecessarily. Let me know if you disagree and would like it changed by this PR too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's jus style, let's keep the pull request focused on just repeat

src/sparsematrix.jl Show resolved Hide resolved
src/sparsematrix.jl Show resolved Hide resolved
src/sparsematrix.jl Outdated Show resolved Hide resolved
src/sparsematrix.jl Outdated Show resolved Hide resolved
src/sparsevector.jl Outdated Show resolved Hide resolved
src/sparsevector.jl Outdated Show resolved Hide resolved
Just use the accessor functions rowvals, nonzeros, getcolptr whenever
needed instead. Especially in the sparse vector case avoid using findnz,
which creates a copy of the data. Use nonzeroinds and nonzeros instead.
Simplifies updating the insert position at the call site.
@SobhanMP SobhanMP closed this May 10, 2024
@SobhanMP SobhanMP reopened this May 10, 2024
@mjacobse
Copy link
Contributor Author

mjacobse commented May 12, 2024

Thanks for the review! Anything left that I can do to help moving this forward?

@ViralBShah
Copy link
Member

@SobhanMP Good to merge?

@SobhanMP SobhanMP merged commit 9d4397f into JuliaSparse:main May 21, 2024
9 of 18 checks passed
@SobhanMP
Copy link
Member

yes, sorry I was behind a deadline 😅 @mjacobse, thanks for the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants