Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A keepzeros option for sparse(I, J, V) #12605

Closed
matthieugomez opened this issue Aug 13, 2015 · 16 comments
Closed

A keepzeros option for sparse(I, J, V) #12605

matthieugomez opened this issue Aug 13, 2015 · 16 comments
Labels
sparse Sparse arrays

Comments

@matthieugomez
Copy link
Contributor

sparse currently consider zeros element in V as structural zeros

I = [1, 2]
J = [1,2]
V = [0, 1]
M = sparse(I, J, V)
# 2x2 sparse matrix with 1 Int64 entries:
# [2, 2]  =  1

This means the sparsity structure of M depends on the values of V, and not only of the pair (I, J). So there is no way to initialize a sparse matrix with the "correct" sparsity structure, independently of its values V. This is problematic since it's inefficient to modify the structure of a sparse matrix.

Why can't sparse consider zeros element in V as non structural zeros? I'm not sure how other programming languages handle this situation so I'm looking for some feedback before making a pull request.

@mbauman mbauman added the sparse Sparse arrays label Aug 13, 2015
@KristofferC
Copy link
Member

Related issues: #9906 and #9928

What I do (for Finite Elements) is just to put 1.0 or something in V and then you can fill!(A.nzval, 0.0) to just have the sparse structure with 0.0 values.

+1 for this proposal though.

@jakebolewski
Copy link
Member

Thanks for raising this issue @matthieugomez, please continue the discussion at #9906 and #9928.

@ViralBShah
Copy link
Member

We could have an option to sparse to do this.

@matthieugomez
Copy link
Contributor Author

Why could not zeros always be retained? This is the current behavior for sparsevec

I = [1, 2]
V = [0, 1]
M = sparsevec(I, V)
#2x1 sparse matrix with 2 Int64 entries:
#   [1, 1]  =  0
#   [2, 1]  =  1

The other solution, as you say, is to add an option to sparse, like keepzeros. If you think it's better, should the default behavior for sparsevec be keepzeros = false too?

I don't think my issue is related to #9906 - I'm perfectly happy to work on the internal representation using nonzerors, nzrange and rowvals. I just need a function to initialize the structure in a clean way.

@ViralBShah
Copy link
Member

The SparseMatrixCSC data structure and its operations have been currently designed to squeeze out zeros. This is being debated in the other issues. The sparsevec thing you reported is a bug, and that whole implementation is being revisited with a real SparseVector type in 0.5. For now, in 0.5, I would just introduce a keepzeros option.

@ViralBShah ViralBShah changed the title sparse(I, J, V) A keepzeros option for sparse(I, J, V) Aug 13, 2015
@ViralBShah
Copy link
Member

I am reopening this issue to track the keepzeros option, which is different enough than the other related sparse issues. I have renamed this issue accordingly.

@tkelman
Copy link
Contributor

tkelman commented Aug 13, 2015

The SparseMatrixCSC data structure and its operations have been currently designed to squeeze out zeros.

This is not an accurate description of the data structure. Many of the operations remove zeros, but not all, and there are corner cases as with sparse.

@ViralBShah
Copy link
Member

Like I clearly said, the cases where the zeros don't get squeezed out are bugs - that is not the design. The data structure does not care what you store in it, but the operations are all designed to squeeze out zeros.

@tkelman
Copy link
Contributor

tkelman commented Aug 14, 2015

And the conclusion in #9928 from pretty much all heavy users of sparse matrices is that we want better support for stored zeros, not to work harder to remove them. The design should shift away from Matlab and closer to SciPy, but with adjustments anywhere we think SciPy's behavior is inconsistent and we could do better.

It's a little bit like type stability actually - whether an entry has storage allocated for it in the sparse matrix output from many mathematical routines really shouldn't depend on the numerical values, it should depend primarily on the nonzero structures of its inputs. This is far more amenable to inner-loop performance-critical calculations since it's much more predictable than allocating a differently-sized array when certain numerical elements happen to exactly equal zero.

@timholy
Copy link
Member

timholy commented Aug 14, 2015

Agreed with @tkelman. It would be much easier to ignore stored zeros in most routines, and provide a purge_zeros for those cases where you needed to get rid of them.

@KristofferC
Copy link
Member

We have an unexported function called dropzeros! that does that.

@ViralBShah
Copy link
Member

I am fine with the proposal - I just don't want to do it piecemeal. In 0.5, we can update the whole implementation in one shot.

@StefanKarpinski
Copy link
Member

In 0.5, we can update the whole implementation in one shot.

+1

@tkelman
Copy link
Contributor

tkelman commented Aug 14, 2015

Sounds like a plan.

@KristofferC
Copy link
Member

This can be closed now?

@matthieugomez
Copy link
Contributor Author

#14798

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

8 participants