Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow explicit stored zeros in SparseMatrixCSC #5538

Merged
merged 1 commit into from
Feb 4, 2014

Conversation

mlubin
Copy link
Member

@mlubin mlubin commented Jan 26, 2014

Following JuliaLang/LinearAlgebra.jl#60, this PR introduces nfilled to get the number of elements in a sparse matrix and updates the documentation. Also I've renamed nnz to numnz to reflect the fact that it shouldn't be used as frequently. Maybe we could even put in a special deprecation warning letting users know about nfilled.

@mlubin
Copy link
Member Author

mlubin commented Jan 26, 2014

CC: @ViralBShah @lindahua

@lindahua
Copy link
Contributor

This looks good to me. However, I am not completely sure the need of renaming nnz to numnz though.

@mlubin
Copy link
Member Author

mlubin commented Jan 26, 2014

I'm worried about silently making user code do the wrong thing.

@ViralBShah
Copy link
Member

I have been thinking about this. Given that our nnz has a different runtime complexity, it is a good idea to rename it, with a deprecation warning. I think it would be useful for the deprecation warning to talk about both, nfilled and nnz for the first occurrence.

On the name nfilled, how about using numalloc that represents the space allocated?

@JeffBezanson
Copy link
Member

This all seems like a high price in complexity to pay. But if you're really sure you want to put up with all of this, I'll go back to my corner.

@mlubin
Copy link
Member Author

mlubin commented Jan 26, 2014

The only extra complexity here is in the naming conventions, it doesn't affect the implementation of algorithms at all.

numalloc sounds too general. This could also refer to the space allocated for a vector, e.g., with sizehint.

@JeffBezanson
Copy link
Member

I find the cognitive load goes up surprisingly quickly as you split single concepts into two. Any one case of it seems small, but they tend to pile up. For example see the recent suggestions to add eccall and interruptable_ccall.

@mlubin
Copy link
Member Author

mlubin commented Jan 26, 2014

An alternative is to just to redefine nnz to return the number of elements in the matrix, and not expose any direct way to count the exact number of nonzero elements. 99% of the time when dealing with sparse matrices, you want the former, not the latter.

@mlubin
Copy link
Member Author

mlubin commented Jan 26, 2014

This shifts the extra cognitive load to the minority use cases and has the extra benefit of likely not breaking existing code.

@ViralBShah
Copy link
Member

@mlubin Do you expect actually returning sparse matrices with zero elements in them to the user for general use, or are these for internal use in libraries only?

@lindahua
Copy link
Contributor

I have to note that nnz has complexity of O(n) when applied to a dense array. I don't find it a problem if nnz has a linear complexity when applied to a sparse matrix, as its behavior is consistent with that for dense arrays.

We can also have another function called nstored (or nfilled etc). This is a different concept, and usually only be used by people who understand internal representation of a sparse matrix.

@lindahua
Copy link
Contributor

I actually find nnz quite a cryptic name. What about we rename it to countnz that explicitly indicate what the function is doing, and completely deprecate nnz for both dense & sparse arrays.

@johnmyleswhite
Copy link
Member

+1 for countnz

@mlubin
Copy link
Member Author

mlubin commented Jan 26, 2014

Good idea, I've updated the PR.

My deprecation doesn't seem to work correctly:

julia> x = [1,2,3];

julia> nnz(x)
WARNING: nnz has been renamed to countnz and is no longer computed in constant time for sparse matrices. Instead, use nfilled() for the number of elements in a sparse matrix.
 in nnz at deprecated.jl:391
WARNING: nnz has been renamed to countnz and is no longer computed in constant time for sparse matrices. Instead, use nfilled() for the number of elements in a sparse matrix.
 in nnz at deprecated.jl:391
 in nnz at deprecated.jl:392
fatal: error thrown and no exception handler available.
<?::Segmentation fault (core dumped)

@mlubin
Copy link
Member Author

mlubin commented Jan 26, 2014

@ViralBShah I don't expect that Base will return sparse matrices with explicit zeros, but user code should be able to handle it. Going forward, we might want to return these matrices in particular cases, like via a keyword option to sparse so that it doesn't remove explicit zeros.

@mlubin
Copy link
Member Author

mlubin commented Jan 26, 2014

The deprecated function nnz was accidentally recursive, fixed now.

@StefanKarpinski
Copy link
Member

Perhaps I should just toss in here that I've always thought that having sparse matrices with a non-zero default would be a very useful thing to have. You can implement them quite simply by storing the default and the delta from the default and noting, e.g. that

(sparse1 + default1) + (sparse2 + default2) = (sparse1 + sparse2) + (default1 + default2)
(sparse1 + default1) * (sparse2 + default2) = (sparse1*sparse2 + default1*sparse2 + default2*sparse2) + (default1*default2)

and so on. The nice thing about sparse matrices with non-zero defaults is that they're closed under basic algebraic operations with themselves and scalars.

@mlubin
Copy link
Member Author

mlubin commented Jan 27, 2014

That could be useful in some abstract settings, not sure if it's worth making everyone who uses sparse matrices deal with that extra complexity though.

@mlubin
Copy link
Member Author

mlubin commented Jan 28, 2014

@ViralBShah any more issues?

@StefanKarpinski
Copy link
Member

I just thought it was a useful forcing function when thinking about the API – what functions make sense regardless of whether the default value is zero or something else?

@mlubin
Copy link
Member Author

mlubin commented Jan 28, 2014

This seems orthogonal to the pull request. Supposing we supported non-zero defaults, storing explicit default values in the sparse matrix shouldn't affect the implementation of basic linear algebra operations. nfilled still makes sense. countnz doesn't, but then again it's hard to see why someone would want to count the number of non-default values.

@kmsquire
Copy link
Member

One use for non-zero defaults is for Yates correction in count data, for
which it might be interesting to know the number of non-default values.

@mlubin
Copy link
Member Author

mlubin commented Jan 29, 2014

Fair enough. Is there a name that captures nnz/countnz for non-zero defaults? Nothing reasonable is coming to mind.

@JasonPries
Copy link
Contributor

I usually refer to explicitly stored entries of a sparse matrix as "structurally non-zero", whether or not their value is actually zero. I believe this is how they are referred to in Tim Davis' "Direct Methods for Sparse Linear Systems". So, something like structnz might be appropriate.

@mlubin
Copy link
Member Author

mlubin commented Jan 29, 2014

That makes perfect sense for sparse matrices, but doesn't read very well for counting non-zeros in a plain vector, which nnz/countnz does also.

@mlubin
Copy link
Member Author

mlubin commented Jan 29, 2014

I misread that -- structnz would correspond to nfilled from this PR. Also structnz is more zero-centric than nfilled, so Stefan disapproves. ;)

@ViralBShah
Copy link
Member

How about countfilled and structfilled. A little too long for my taste, given that nnz was only 3 letters.

@lindahua
Copy link
Contributor

@ViralBShah: I think the meaning of countfilled should be counting the number of non-default values? Then honestly, countfilledis a little bit misleading.

What about this?

countne(a, v)  # the number of values not equal to v
countnz(a) = countne(a, 0)  # applies to both sparse matrix & dense array
nstored(a)    # the number of explicitly stored elements 
                    # (I feel stored sounds more accurate than filled)

# for a variant of sparse matrix with default values, we may
# count the non-default values in the following way
countne(a, default(a))

@mlubin
Copy link
Member Author

mlubin commented Jan 31, 2014

structfilled(x) isn't much more useful than length(x.nzval). nstored seems a bit confusing, it seems like it could also have a meaning for plain vectors. Would it be crazy to not even export a function for this and let people use length(x.nzval)?

It also seems strange to export countne given that nobody has requested this feature. Why not just keep countnz and add countne or its equivalent at a later point if/when sparse matrices with non-default values are added to Base?

@mlubin
Copy link
Member Author

mlubin commented Jan 31, 2014

To clarify, @kmsquire, did you want countne on general matrices now, or for non-default sparse matrices if/when they exist?

@lindahua
Copy link
Contributor

To me, the best approach is to not worry about the fancy sparse matrices with non-zero default values, until there's real need of this arises in practice. Each time I see a sparse matrix being talked about, people always think it as a matrix where a dominant portion of the elements are zeros.

If we focus on the standard notion of sparse matrices, then countnz and nfilled sound like a perfect solution to me.

@mlubin
Copy link
Member Author

mlubin commented Jan 31, 2014

I'd tend to agree with this.

@kmsquire
Copy link
Member

To clarify, @kmsquire, did you want countne on general matrices now, or for non-default sparse matrices if/when they exist?

I think on non-default sparse matrices if/when they exist is fine.

@kmsquire
Copy link
Member

I also agree with @lindahua's last comment.

@ViralBShah
Copy link
Member

I agree about just focussing on the common case of sparse matrices.

On stored zeros, we also need to think about find and nonzeros, and the impact on performance, and potential API renaming.

@mlubin
Copy link
Member Author

mlubin commented Feb 3, 2014

@ViralBShah:

  • find already checks for stored zeros, no changes are needed here.
  • nonzeros isn't specialized for sparse matrices. Seems like it should be, but that can happen after this is merged.
  • No algorithms need to be modified, as far as I'm aware, so there's no impact on performance for the base case with no explicit zeros. The only impact is if the user knowingly decides to store zeros.
  • It's inconvenient to rename nnz, but it seems necessary. Nobody has been opposed to @lindahua's suggestion of countnz.

@ViralBShah
Copy link
Member

Ok, this is good enough to merge then. Could you rebase and merge?

@mlubin
Copy link
Member Author

mlubin commented Feb 4, 2014

Done.

@staticfloat
Copy link
Member

I'll pull the trigger for you.

staticfloat added a commit that referenced this pull request Feb 4, 2014
allow explicit stored zeros in SparseMatrixCSC
@staticfloat staticfloat merged commit 427c114 into JuliaLang:master Feb 4, 2014
@nolta
Copy link
Member

nolta commented Feb 4, 2014

I'm late to the party, but matlab compatibility is a nice feature to have, and i don't see what we gain by deprecating nnz. Also, why wasn't nzmax considered as an option for nfilled?

@kmsquire
Copy link
Member

kmsquire commented Feb 4, 2014

Also, why wasn't nzmax considered as an option for nfilled?

Because no one suggested it until now. ;-)

@mlubin
Copy link
Member Author

mlubin commented Feb 4, 2014

If we didn't deprecate nnz, code that is currently correct would suddenly be iterating through the whole matrix on each call. Silently making user code do unintended operations seemed like something to be avoided. For the same reason, I also think that the name nnz is deceiving... code that deals with sparse matrices should almost always use nfilled instead, so why reserve a three-letter function name for nnz? countnz is much more clear on what it does.

maxnz is actually not exactly the same concept as nfilled. maxnz refers to the length of the storage allocated for the sparse matrix, which you can easily access with length(S.nzval). nfilled(S) = int(S.colptr[end]-1) is the number of elements actually represented in the matrix. It could be that S.colptr[end]-1 < length(S.nzval) if there's extra storage left at the end for some reason (not recommended since it's easy to resize S.nzval anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants