-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for sparse matrices #34
Comments
Hello, thank you for raising the issue. I am not very familiar with sparse matrices. Could you provide a bit more context - what kind of support is needed for them? Are the current functions not working when used on sparse matrices? |
Thanks for your quick reply! Sparse matrices are often used in R to represent large amounts of data that may contain a fair number of zeros, such as those derived from Poisson, negative binomial and related distributions. The The main advantage of using sparse matrices is the reduction in memory usage. This is a good introduction to help make oneself more familiar with the topic: https://slowkow.com/notes/sparse-matrix/ Here is some code to create sparse matrices:
Here is some code to test
In terms of which kind of support is needed for sparse matrices, I would refer to the great BioConductor EDIT: I am mentioning these packages just because they may provide useful functions for matrix operations that could be used to support sparse matrices as well. I will tag the authors of those other two packages, but I can obviously only hope that they might one day find time in their busy schedules to comment if they find it appropriate: @HenrikBengtsson @const-ae |
Thanks for an elaborate reply. Here are a few quick thoughts and questions from my side:
|
Thanks for your elaborate reply as well.
PS: You obviously worked sparse matrix in the past. I just don't understand why you would say you "not very familiar" with them. My reply is completely out of place now and I spent more time on it than I would want to admit... |
Hello, maintainer of matrixStats here. Thanks for the dicussions. I'd like to add that, although it seems quick to add support for sparse matrices, the extra burden added can be significantly. There are so many corner cases you need to account for and test for. Having maintained 'matrixStats', I'm still not suprised if someone reports on a corner case of input data that we didn't account for. Writing good tests for numerical methods is tricky and a lot of work. Then there's the extra work coming from additional feature requests that are now possible, because of the expanded scope. As mentioned, for matrixStats, there's SparseMatrixStats which replicates the matrixStats API for sparse matrices. I'm glad I/we didn't have to maintain that too, because of all the ins and outs that comes with that. Although I'm using sparse matrices too, I wouldn't have had the experience to cover all grounds. I would probably learn a lot along the way, but that would be even more work. We found a nice balance with matrixStats and SparseMatrixStats, where matrixStats sets the de facto API standard and SparseMatrixStats follows. This way I/we developing matrixStats can focus on the core business, but we do have to make sure we don't break anything. Sometimes the tests of SparseMatrixStats caught breaking changes that we would have missed otherwise. It's a dance of tango, and in each iteration things gets better and better, and it benefits the community (often without them noticing).
From my experience, it's okay to be "experimental" and taking risks when you're starting out, but when you get to a point when there are lots of users and packages depending on your package, you need to start thinking about the impact design or implementation bugs has. If the package provides statistical tools, it's then also likely it's used in science, and then the implications from a mistake in your code can be huge. The risk for some mistakes to go unnoticed for months and years should not be taken lightly. There are high-impact papers out there whose significant results are solely due to bugs. People use these core tools trusting them to work and, sometimes, they're used indirectly through nested package dependencies. For example, in matrixStats there is So, when we get the privileged to maintain heavily-used, central packages, we need to be careful. It also provides us with means to pay back to the community and the sciences, e.g. in some cases we can add internal assertions checking for user mistakes and that way early on help catch mistakes in long-running scientific projects. My $.02 of coffee comments |
Thank you for the comment, @HenrikBengtsson, I much appreciate you posting and sharing your experience here. I am trying to leave myself out of the picture and go with what is best for matrixTests. My decision for now is to leave this issue open but set it at low priority. This way matrixTests can decide on some kind of support in the future, and meanwhile its users can find this place and read everything here themselves, or even come up with their own proposals and solutions. My current (but subject to change) opinion: I see one way of supporting sparse matrices from within matrixTests and that is by turning all the methods here into generics that dispatch differently in cases when the first argument is a sparse matrix object. Then, when the default dispatch calls (e.g.) However, I am still reluctant because:
Hence this issue will remain open as a feature request but, for now, will not be prioritized. |
Thanks for this amazing package! I have used it in so many projects already!
I just wanted to check if there are plans to support sparse matrices as well any time soon?
The text was updated successfully, but these errors were encountered: