Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SparseMatrixCSC() allows creating invalid matrices #14489

Closed
nalimilan opened this issue Dec 26, 2015 · 22 comments
Closed

SparseMatrixCSC() allows creating invalid matrices #14489

nalimilan opened this issue Dec 26, 2015 · 22 comments
Labels
sparse Sparse arrays

Comments

@nalimilan
Copy link
Member

Looks like the constructor should throw an error when trying to create invalid SparseMatrixCSC objects:

julia> x=SparseMatrixCSC(4, 4, Int[], Int[], Int[]);

julia> x
Error showing value of type SparseMatrixCSC{Int64,Int64}:
ERROR: BoundsError: attempt to access 0-element Array{Int64,1}
  at index [0]
 [inlined code] from sparse/sparsematrix.jl:34
 in showarray at sparse/sparsematrix.jl:85
 in anonymous at replutil.jl:38
 in with_output_limit at ./show.jl:1341
 [inlined code] from show.jl:1337
 in writemime at replutil.jl:37
 [inlined code] from expr.jl:8 (repeats 2 times)
 in display at REPL.jl:114
 [inlined code] from multimedia.jl:151
 in display at multimedia.jl:163
 in print_response at REPL.jl:134
 in print_response at REPL.jl:121
 in anonymous at REPL.jl:624
 in run_interface at ./LineEdit.jl:1610
 in run_frontend at ./REPL.jl:864
 in run_repl at ./REPL.jl:167
 in _start at ./client.jl:343

julia> x[1,1]
ERROR: BoundsError: attempt to access 0-element Array{Int64,1}
  at index [1]
 in getindex at sparse/sparsematrix.jl:1348
 in eval at ./boot.jl:265
@nalimilan nalimilan added sparse Sparse arrays good first issue Indicates a good issue for first-time contributors to Julia labels Dec 26, 2015
@tkelman
Copy link
Contributor

tkelman commented Dec 26, 2015

I actually need to be able to do this at times. I think sparse should do validity checking but SparseMatrixCSC shouldn't.

@jiahao
Copy link
Member

jiahao commented Dec 26, 2015

Why is this an invalid matrix? The OP constructs a valid 4x4 sparse matrix of zeros. I think the error is that we don't display an empty matrix correctly.

@tkelman
Copy link
Contributor

tkelman commented Dec 26, 2015

@jiahao the column pointers should never be empty (in a valid sparse matrix).

@jiahao
Copy link
Member

jiahao commented Dec 27, 2015

And why not? How else can one represent a zero matrix with no stored zeros?

Edit: The original papers on the Harwell-Boeing format do not mention whether or not an empty matrix is allowed. Matlab, the de facto reference implementation, allows an empty sparse matrix to be created with sparse() and it behaves like any other sparse matrix. I wasn't able to figure out how to dump the internal representation of the matrix.

@jiahao
Copy link
Member

jiahao commented Dec 27, 2015

Oh, I see the problem now. The column pointers should be a vector of all zeros or something like that, terminated by nnz+1.

@JaredCrean2
Copy link
Contributor

Looking at the documentation of the Yale Sparse Matrix Package here (upon which CSC and CSR are based), colptrs should be all zeros, except for the last entry, which should be 1.

@jiahao
Copy link
Member

jiahao commented Dec 27, 2015

Sorry for the noise; getting rusty.

@tkelman what's your use case for allowing invalid matrices to be constructed?

@tkelman
Copy link
Contributor

tkelman commented Dec 27, 2015

Constructing them incrementally, or one of the constituent vectors at a time.

Remember the Julia column pointers are 1-based.

@nalimilan
Copy link
Member Author

If show and getindex can be made more tolerant about what colptr without losing performance, that would be an even better fix.

@nalimilan nalimilan removed the good first issue Indicates a good issue for first-time contributors to Julia label Dec 27, 2015
@ViralBShah
Copy link
Member

Originally, SparseMatrixCSC was designed to just be an internal constructor, but it is quite popular, and I suspect we want to have a check. Perhaps do it by default, but disable it internally within the sparse codes?

@tkelman
Copy link
Contributor

tkelman commented Dec 27, 2015

I consider using the SparseMatrixCSC constructor directly to be the I-know-what-I'm-doing interface. People use it to skip the overhead and checking of sparse, for example to create matrices with non-sorted rows, or explicit zeros, things like that. I'd be fine with making show do some validity checks on the assumptions it makes before trying to show the matrix, but having a lenient constructor as a direct way of saying "trust me, wrap this data in a SparseMatrixCSC" is a valuable thing.

@nalimilan
Copy link
Member Author

The problem is that most types can be constructed directly via their constructor, except for SparseMatrixCSC for which you're advised to use sparse. This is confusing IMHO. But we could probably have a user-friendly constructor more similar to sparse, and the current ones which require knowing the implementation details and do not perform checks.

@tkelman
Copy link
Contributor

tkelman commented Dec 27, 2015

You can construct a SparseMatrixCSC directly, you just have to know what the fields mean. I dislike a lot of the Matlab heritage in the way sparse works, I like the SciPy design better where you ask for a COO matrix that's a direct wrapper around your triplet inputs (if you have triplet inputs to start with - if you happen to know you're getting things column by column then you shouldn't bother going through COO at all), then ask specifically for a conversion to CSC or CSR before doing operations that would be faster in the other format. Though I think it's been established that this kind of design with a large number of sparse formats probably won't be happening in base.

@ViralBShah
Copy link
Member

Yes, but we have this naming issue in many places - arrays, rngs, etc. The real issue, I think, is that SparseMatrixCSC is an exported function, and perhaps for that reason should have the default be the safe version.

@tkelman
Copy link
Contributor

tkelman commented Dec 27, 2015

It's an exported type (doesn't have to be though), and only really documented in terms of its fields. The problem with "safe" is how much checking are you proposing? Verifying the sortedness of the row indices in each column? That's really expensive to do on every single call to the constructor, you'd absolutely need an escape hatch way of disabling that level of checking, and to use it everywhere the constructor gets called in base if you don't want to slow down all sparse code. I don't think protecting users from the possibility of creating invalid matrices is worth the trouble here.

@nalimilan
Copy link
Member Author

Maybe a @nocheck macro (similar to @inbounds) would be useful in that case and in others?

@JaredCrean2
Copy link
Contributor

Having a generic 'check that this data structure complies with the applicable standards' function in Base could be useful, whether or not it is called by constructors.

Personally, I would prefer the constructor not be checked, and let users call the checker function if they want to. Sometimes creating an initially invalid data structure is ok (for example, pre-allocating the structure of the matrix) because it will be made valid later.

@andreasnoack
Copy link
Member

It's difficult to find the right balance between protecting the user from potential segfaults and have minimal overhead when the user knows what he is doing. There are quite a few @inbounds in the sparse code so segfaults can easily happen and the default should be that Julia code is safe.

However, I'm wondering if a check in the show method for SparseMatrixCSC would catch most of the unintentionally invalid matrices. It could then just print that the instance is invalid. This wouldn't hurt performance. If SparseMatrixCSC will remain unsafe, I think we should consider not exporting it.

@nalimilan
Copy link
Member Author

@andreasnoack Make show more robust is certainly a good idea. The problem is more with getindex, which is a performance-critical function.

@gajomi
Copy link
Contributor

gajomi commented Jan 20, 2016

There is related problem with SparseVector which allows the construction of vectors with more elements than the total length of the vector

julia> sparsevec([1,2,3], [1,1,2],2)
ERROR: ArgumentError: An index is out of bound.
 [inlined code] from essentials.jl:61
 in sparsevec at sparse/sparsevector.jl:145
 in sparsevec at sparse/sparsevector.jl:153
julia> x = SparseVector(2, [1,2,3], [1,1,2])
Sparse vector of length 2, with 3 Int64 nonzero entries:
  [1]  =  1
  [2]  =  1
  [3]  =  2

As one more data point, I also found myself wanting to work with temporarily invalid sparse matrices, so I agree with the above statement that the behavior should be allowed (it is a mutable type at any rate, so there is room to shoot yourself in the foot after constructing things), but agree that the non-safe constructor method should not be exported.

lvnguyen added a commit to lvnguyen/julia that referenced this issue Feb 12, 2016
…ing SparseVector)

For indices that are non positive or exceed the size of the vector, we should throw an error.
@Sacha0
Copy link
Member

Sacha0 commented Jun 28, 2016

Consensus appears to have formed that: (1) being able to construct invalid SparseMatrixCSCs via the SparseMatrixCSC constructor has value; and (2) show tolerating SparseMatrixCSCs with invalid structure, and somehow indicating that invalidity, would be an improvement.

Consensus has yet to form around: (3) whether SparseMatrixCSC should remain exported, potentially being unsafe; and (4) whether getindex/setindex! should check SparseMatrixCSC validity, what validity checking they should perform if so, and how they should respond to invalidity in any case.

Discussion in #16371, particularly the use case @tkelman describes, is worth noting here, the primary relevant conclusion being that SparseMatrixCSC constructor performance is critical and likely to become moreso with time.

Tangentially, I like @JaredCrean2's suggestion:

Having a generic 'check that this data structure complies with the applicable standards' function in Base could be useful, whether or not it is called by constructors.

@ViralBShah
Copy link
Member

I think #22529 implements all the bits we need. This issue by itself can be closed, since we do need the ability to create invalid structures and then fill them up to eventually be valid (which is how the internal implementations often work).

Perhaps we can have an optional validity checking parameter, that can over time become default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sparse Sparse arrays
Projects
None yet
Development

No branches or pull requests

8 participants