Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparseArrays: export nonzeroinds for sparse vectors and sparse column views #40

Open
chethega opened this issue Mar 13, 2019 · 9 comments

Comments

@chethega
Copy link

As far as I know, we currently export access to nonzero values to sparse vectors and sparse column views, via SparseArrays.nonzeros. However, the corresponding SparseArrays.nonzeroinds is not exported, which makes these two methods of SparseArrays.nonzeros quite useless.

I think we should just write docs and export it: Users of sparse vectors / sparse column views are likely to rely on SparseArrays.nonzeroinds already, and would already be affected by any reorganization.

@Sacha0
Copy link
Member

Sacha0 commented Mar 13, 2019

It might be worth simultaneously executing on the exported-API nomenclature shift from nonzeros to stored that various parties have been discussing for some time; i.e., rather than a new nonzeroinds export, export storedinds and simultaneously introduce and export storedvals. Best!

@ndinsmore
Copy link

If this is done I would advocate it should be in a slightly more descriptive manner. Such that there is also something for SparseMatrixCSC and potentially per JuliaLang/julia#31329 SparseMatrixCSR.

An interface that got directly at the column pointers would be great as well.

@ndinsmore
Copy link

@Sacha0 for matrices I think that we should also make sure that the methods are representative of the stored structures. So rowvals I think makes sense but nzrange is a little ambiguous.

For vectors what about indvals since that is more in line with the matrix notation.

@chethega
Copy link
Author

@Sacha0 So you are saying new functions storedinds and storedvals that are only defined for sparse vectors, sparse column views (and potentially transposes of sparse vectors / sparse column views)? And nonzeroinds gets retained for the time, until eventually nonzeros for sparse vectors and nonzeroinds get deprecated?

@KlausC
Copy link
Contributor

KlausC commented Mar 16, 2019

IMHO for SparseMatrixCSC we should stick to the exported interface nonzeros - rowvals - nzrange and adapt the interface for SparseVector accordingly.

That means: replace SparseArrays.nonzeroinds by rowvals and formally introduce nzrange(V, col) = 1:length(V) where V<:SparseVector.

That would allow to handle Union{SparseVector,SparseMatrixCSC} uniformly.
The unexported storedinds, storedvals, ... which were introduced to achieve this uniformity, would not be needed anymore.

@ndinsmore
Copy link

It seems to me that Julia has been trying to move towards semantic correctness. While it is semantically correct to ask for the rowvals(vec::SparseColumnView) , it is not correct to ask for the rowvals(vec::SparseVector) as it stands today where a vector is the analog of a "row matrix" or "row vector".

If you want an orientation agnostic method lets try to name it so.

@Sacha0
Copy link
Member

Sacha0 commented Mar 16, 2019

I can't find the relevant discussions immediately --- if someone has the bandwidth to do the necessary issue tracker spelunking, that would be great! --- so in brief I'll reiterate the motivations for moving towards storedinds / storedvals:

  1. Some time ago we moved away from eagerly stripping numerical zeros from the structure of SparseMatrixCSCs, and in conjunction began moving away from the nonzeros terminology / towards the now-more-correct stored.

  2. The SparseVector and SparseMatrixCSC data structures are sufficiently similar that it's possible to build an abstraction layer (abstracting away their field name differences, and otherwise hiding e.g. the absence of column pointers in SparseVector) that enables unification of implementation of a nice set of operations applicable to both SparseVectors and SparseMatrixCSCs. Unifying field names between these data structures insofar as possible facilitates and simplifies realization of that abstraction layer.

  3. Supporting SparseMatrixCSR at some point would be great. Minimizing the amount of implementation duplication between SparseVector, SparseMatrixCSC, and SparseMatrixCSR in so doing would be great. One way to facilitate that implementation deduplication is to extend the preceding abstraction layer to include the hypothetical SparseMatrixCSR, and unifying the field names of those three types insofar as possible would facilitate and simplify the realization of that abstraction layer.

    (More concretely, the data structure underlying CSC is fundamentally the same as that for CSR. So one can construct a type for that data structure, e.g. SparseMatrixCSX, with a parameter indicating whether the X is C or R, and make SparseMatrixCSC and SparseMatrixCSR corresponding type aliases. One could generalize the thing still further and implement SparseVector on top of it too.)

  4. Going a step further, it would be great to extend that abstraction layer to Adjoint, Transpose, and other wrapped types of sparse matrices/vectors, such that... you see where this is going :) ... generic implementations can be written for operations common to all of those types. And so it would be great if the nomenclature could make sense for all these things.

Hence storeinds and storedvals, and potentially pointers or something similar. IIRC, we didn't have enough deprecation cycles and developer bandwidth to make this happen prior to 1.0, but the plan was to use dot-overloading to introduce aliases for the present field names, deprecate the present field names, and then in a future release cycle swap the aliases for the field names. Best!

@KlausC
Copy link
Contributor

KlausC commented Mar 16, 2019

@Sacha0 this vision of a general, concise, and consistent sparse object has my full support. I proposed the (as I know now) legacy interface nonzeros - rowvals - nzrange because that are currently (v1.2) the only exported functions.

I also find it necessary to make a difference between the field names (which should not be used by the user at all), and the accessor methods which could be named storedvals - storedinds - indrange or similar. These may be consistently extended to SparseMatrixCSX, SparseVector, and more or less straight forward to SubArray{,Sparse...}. The last one is subject of a PR I am currently busy with.
I use indrange instead of pointer_range to keep the analogy with nzrange.

There should typically be no need to access the fields directly. All sparse or wrapped sparse types supporting this interface could make use of the standard column-wise (for CSR row-wise) processing loop:
(col, row, and the constant 2 should be replaced by generic names for CSR)

for col in axes(A, 2), j in indrange(A, col)
    row = storedinds(A, k)
    val = storedvals(A, k)
   ... do the job for (col, row, val) ...
end

@ndinsmore
Copy link

@Sacha0 I couldn't agree with this more:

(More concretely, the data structure underlying CSC is fundamentally the same as that for CSR. So one can construct a type for that data structure, e.g. SparseMatrixCSX, with a parameter indicating whether the X is C or R, and make SparseMatrixCSC and SparseMatrixCSR corresponding type aliases. One could generalize the thing still further and implement SparseVector on top of it too.)

To take it further, beyond using the same data structure for matrix and vectors (you are only saving 3 Int), it strikes me that without much difficulty you could also use the same data structure for uncompressed sparse where you leave room in a given"column" to insert more rows.

@KristofferC KristofferC transferred this issue from JuliaLang/julia Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants