allow explicit stored zeros in SparseMatrixCSC #5538

mlubin · 2014-01-26T01:14:19Z

Following JuliaLang/LinearAlgebra.jl#60, this PR introduces nfilled to get the number of elements in a sparse matrix and updates the documentation. Also I've renamed nnz to numnz to reflect the fact that it shouldn't be used as frequently. Maybe we could even put in a special deprecation warning letting users know about nfilled.

mlubin · 2014-01-26T01:15:09Z

CC: @ViralBShah @lindahua

lindahua · 2014-01-26T01:35:42Z

This looks good to me. However, I am not completely sure the need of renaming nnz to numnz though.

mlubin · 2014-01-26T01:38:44Z

I'm worried about silently making user code do the wrong thing.

ViralBShah · 2014-01-26T04:42:34Z

I have been thinking about this. Given that our nnz has a different runtime complexity, it is a good idea to rename it, with a deprecation warning. I think it would be useful for the deprecation warning to talk about both, nfilled and nnz for the first occurrence.

On the name nfilled, how about using numalloc that represents the space allocated?

JeffBezanson · 2014-01-26T04:49:12Z

This all seems like a high price in complexity to pay. But if you're really sure you want to put up with all of this, I'll go back to my corner.

mlubin · 2014-01-26T05:10:18Z

The only extra complexity here is in the naming conventions, it doesn't affect the implementation of algorithms at all.

numalloc sounds too general. This could also refer to the space allocated for a vector, e.g., with sizehint.

JeffBezanson · 2014-01-26T05:14:56Z

I find the cognitive load goes up surprisingly quickly as you split single concepts into two. Any one case of it seems small, but they tend to pile up. For example see the recent suggestions to add eccall and interruptable_ccall.

mlubin · 2014-01-26T05:18:42Z

An alternative is to just to redefine nnz to return the number of elements in the matrix, and not expose any direct way to count the exact number of nonzero elements. 99% of the time when dealing with sparse matrices, you want the former, not the latter.

mlubin · 2014-01-26T05:21:06Z

This shifts the extra cognitive load to the minority use cases and has the extra benefit of likely not breaking existing code.

ViralBShah · 2014-01-26T06:29:51Z

@mlubin Do you expect actually returning sparse matrices with zero elements in them to the user for general use, or are these for internal use in libraries only?

lindahua · 2014-01-26T13:40:02Z

I have to note that nnz has complexity of O(n) when applied to a dense array. I don't find it a problem if nnz has a linear complexity when applied to a sparse matrix, as its behavior is consistent with that for dense arrays.

We can also have another function called nstored (or nfilled etc). This is a different concept, and usually only be used by people who understand internal representation of a sparse matrix.

lindahua · 2014-01-26T13:42:31Z

I actually find nnz quite a cryptic name. What about we rename it to countnz that explicitly indicate what the function is doing, and completely deprecate nnz for both dense & sparse arrays.

johnmyleswhite · 2014-01-26T15:53:23Z

+1 for countnz

mlubin · 2014-01-26T16:34:55Z

Good idea, I've updated the PR.

My deprecation doesn't seem to work correctly:

julia> x = [1,2,3];

julia> nnz(x)
WARNING: nnz has been renamed to countnz and is no longer computed in constant time for sparse matrices. Instead, use nfilled() for the number of elements in a sparse matrix.
 in nnz at deprecated.jl:391
WARNING: nnz has been renamed to countnz and is no longer computed in constant time for sparse matrices. Instead, use nfilled() for the number of elements in a sparse matrix.
 in nnz at deprecated.jl:391
 in nnz at deprecated.jl:392
fatal: error thrown and no exception handler available.
<?::Segmentation fault (core dumped)

mlubin · 2014-01-26T18:43:09Z

@ViralBShah I don't expect that Base will return sparse matrices with explicit zeros, but user code should be able to handle it. Going forward, we might want to return these matrices in particular cases, like via a keyword option to sparse so that it doesn't remove explicit zeros.

mlubin · 2014-01-26T18:48:42Z

The deprecated function nnz was accidentally recursive, fixed now.

StefanKarpinski · 2014-01-27T16:42:25Z

Perhaps I should just toss in here that I've always thought that having sparse matrices with a non-zero default would be a very useful thing to have. You can implement them quite simply by storing the default and the delta from the default and noting, e.g. that

(sparse1 + default1) + (sparse2 + default2) = (sparse1 + sparse2) + (default1 + default2)
(sparse1 + default1) * (sparse2 + default2) = (sparse1*sparse2 + default1*sparse2 + default2*sparse2) + (default1*default2)

and so on. The nice thing about sparse matrices with non-zero defaults is that they're closed under basic algebraic operations with themselves and scalars.

mlubin · 2014-01-27T19:14:21Z

That could be useful in some abstract settings, not sure if it's worth making everyone who uses sparse matrices deal with that extra complexity though.

mlubin · 2014-01-28T20:48:27Z

@ViralBShah any more issues?

StefanKarpinski · 2014-01-28T20:52:17Z

I just thought it was a useful forcing function when thinking about the API – what functions make sense regardless of whether the default value is zero or something else?

mlubin · 2014-01-28T21:02:42Z

This seems orthogonal to the pull request. Supposing we supported non-zero defaults, storing explicit default values in the sparse matrix shouldn't affect the implementation of basic linear algebra operations. nfilled still makes sense. countnz doesn't, but then again it's hard to see why someone would want to count the number of non-default values.

kmsquire · 2014-01-28T21:52:55Z

One use for non-zero defaults is for Yates correction in count data, for
which it might be interesting to know the number of non-default values.

mlubin · 2014-01-29T03:01:40Z

Fair enough. Is there a name that captures nnz/countnz for non-zero defaults? Nothing reasonable is coming to mind.

JasonPries · 2014-01-29T03:47:18Z

I usually refer to explicitly stored entries of a sparse matrix as "structurally non-zero", whether or not their value is actually zero. I believe this is how they are referred to in Tim Davis' "Direct Methods for Sparse Linear Systems". So, something like structnz might be appropriate.

mlubin · 2014-01-29T04:30:10Z

That makes perfect sense for sparse matrices, but doesn't read very well for counting non-zeros in a plain vector, which nnz/countnz does also.

mlubin · 2014-01-29T04:35:19Z

I misread that -- structnz would correspond to nfilled from this PR. Also structnz is more zero-centric than nfilled, so Stefan disapproves. ;)

ViralBShah · 2014-01-31T10:54:09Z

How about countfilled and structfilled. A little too long for my taste, given that nnz was only 3 letters.

lindahua · 2014-01-31T12:29:13Z

@ViralBShah: I think the meaning of countfilled should be counting the number of non-default values? Then honestly, countfilledis a little bit misleading.

What about this?

countne(a, v)  # the number of values not equal to v
countnz(a) = countne(a, 0)  # applies to both sparse matrix & dense array
nstored(a)    # the number of explicitly stored elements 
                    # (I feel stored sounds more accurate than filled)

# for a variant of sparse matrix with default values, we may
# count the non-default values in the following way
countne(a, default(a))

mlubin · 2014-01-31T15:26:43Z

structfilled(x) isn't much more useful than length(x.nzval). nstored seems a bit confusing, it seems like it could also have a meaning for plain vectors. Would it be crazy to not even export a function for this and let people use length(x.nzval)?

It also seems strange to export countne given that nobody has requested this feature. Why not just keep countnz and add countne or its equivalent at a later point if/when sparse matrices with non-default values are added to Base?

mlubin · 2014-01-31T15:32:07Z

To clarify, @kmsquire, did you want countne on general matrices now, or for non-default sparse matrices if/when they exist?

lindahua · 2014-01-31T15:44:56Z

To me, the best approach is to not worry about the fancy sparse matrices with non-zero default values, until there's real need of this arises in practice. Each time I see a sparse matrix being talked about, people always think it as a matrix where a dominant portion of the elements are zeros.

If we focus on the standard notion of sparse matrices, then countnz and nfilled sound like a perfect solution to me.

mlubin · 2014-01-31T17:27:26Z

I'd tend to agree with this.

kmsquire · 2014-01-31T19:10:49Z

To clarify, @kmsquire, did you want countne on general matrices now, or for non-default sparse matrices if/when they exist?

I think on non-default sparse matrices if/when they exist is fine.

kmsquire · 2014-01-31T19:11:34Z

I also agree with @lindahua's last comment.

ViralBShah · 2014-02-03T05:25:43Z

I agree about just focussing on the common case of sparse matrices.

On stored zeros, we also need to think about find and nonzeros, and the impact on performance, and potential API renaming.

mlubin · 2014-02-03T06:27:52Z

@ViralBShah:

find already checks for stored zeros, no changes are needed here.
nonzeros isn't specialized for sparse matrices. Seems like it should be, but that can happen after this is merged.
No algorithms need to be modified, as far as I'm aware, so there's no impact on performance for the base case with no explicit zeros. The only impact is if the user knowingly decides to store zeros.
It's inconvenient to rename nnz, but it seems necessary. Nobody has been opposed to @lindahua's suggestion of countnz.

ViralBShah · 2014-02-04T18:28:34Z

Ok, this is good enough to merge then. Could you rebase and merge?

mlubin · 2014-02-04T19:36:46Z

Done.

staticfloat · 2014-02-04T19:49:29Z

I'll pull the trigger for you.

allow explicit stored zeros in SparseMatrixCSC

nolta · 2014-02-04T20:54:50Z

I'm late to the party, but matlab compatibility is a nice feature to have, and i don't see what we gain by deprecating nnz. Also, why wasn't nzmax considered as an option for nfilled?

kmsquire · 2014-02-04T21:17:35Z

Also, why wasn't nzmax considered as an option for nfilled?

Because no one suggested it until now. ;-)

mlubin · 2014-02-04T22:19:30Z

If we didn't deprecate nnz, code that is currently correct would suddenly be iterating through the whole matrix on each call. Silently making user code do unintended operations seemed like something to be avoided. For the same reason, I also think that the name nnz is deceiving... code that deals with sparse matrices should almost always use nfilled instead, so why reserve a three-letter function name for nnz? countnz is much more clear on what it does.

maxnz is actually not exactly the same concept as nfilled. maxnz refers to the length of the storage allocated for the sparse matrix, which you can easily access with length(S.nzval). nfilled(S) = int(S.colptr[end]-1) is the number of elements actually represented in the matrix. It could be that S.colptr[end]-1 < length(S.nzval) if there's extra storage left at the end for some reason (not recommended since it's easy to resize S.nzval anyway).

allow explicit stored zeros in SparseMatrixCSC. closes #5424

c93ed16

staticfloat added a commit that referenced this pull request Feb 4, 2014

Merge pull request #5538 from mlubin/spzeros

427c114

allow explicit stored zeros in SparseMatrixCSC

staticfloat merged commit 427c114 into JuliaLang:master Feb 4, 2014

mlubin mentioned this pull request May 7, 2014

nnz and nonzeros #6769

Closed

mlubin deleted the spzeros branch May 7, 2014 18:32

tknopp mentioned this pull request May 26, 2014

RFC: Reintroduce nnz and nonzeros #6963

Merged

Sacha0 mentioned this pull request Jul 13, 2016

Make setindex! for sparse matrices and vectors not purge stored entries on zero assignment #17404

Merged

allow explicit stored zeros in SparseMatrixCSC #5538

allow explicit stored zeros in SparseMatrixCSC #5538

Conversation

mlubin commented Jan 26, 2014

mlubin commented Jan 26, 2014

lindahua commented Jan 26, 2014

mlubin commented Jan 26, 2014

ViralBShah commented Jan 26, 2014

JeffBezanson commented Jan 26, 2014

mlubin commented Jan 26, 2014

JeffBezanson commented Jan 26, 2014

mlubin commented Jan 26, 2014

mlubin commented Jan 26, 2014

ViralBShah commented Jan 26, 2014

lindahua commented Jan 26, 2014

lindahua commented Jan 26, 2014

johnmyleswhite commented Jan 26, 2014

mlubin commented Jan 26, 2014

mlubin commented Jan 26, 2014

mlubin commented Jan 26, 2014

StefanKarpinski commented Jan 27, 2014

mlubin commented Jan 27, 2014

mlubin commented Jan 28, 2014

StefanKarpinski commented Jan 28, 2014

mlubin commented Jan 28, 2014

kmsquire commented Jan 28, 2014

mlubin commented Jan 29, 2014

JasonPries commented Jan 29, 2014

mlubin commented Jan 29, 2014

mlubin commented Jan 29, 2014

ViralBShah commented Jan 31, 2014

lindahua commented Jan 31, 2014

mlubin commented Jan 31, 2014

mlubin commented Jan 31, 2014

lindahua commented Jan 31, 2014

mlubin commented Jan 31, 2014

kmsquire commented Jan 31, 2014

kmsquire commented Jan 31, 2014

ViralBShah commented Feb 3, 2014

mlubin commented Feb 3, 2014

ViralBShah commented Feb 4, 2014

mlubin commented Feb 4, 2014

staticfloat commented Feb 4, 2014

nolta commented Feb 4, 2014

kmsquire commented Feb 4, 2014

mlubin commented Feb 4, 2014