-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse arrays #1375
Comments
Yes, I would say this is in scope, as long as we can keep most of the data-type specific logic out of xarray's core (which seems doable). Currently, we define most of our operations on duck arrays in https://github.com/pydata/xarray/blob/master/xarray/core/duck_array_ops.py There are a few other hacks throughout the codebase, which can find by searching for "dask_array_type": https://github.com/pydata/xarray/search?p=1&q=dask_array_type&type=&utf8=%E2%9C%93 It's pretty crude, but basically this would need to be extended to implement many of these methods on for sparse arrays, too. Ideally we would define xarray's adapter logic into more cleanly separated submodules, perhaps using multiple dispatch. Even better, we would make this public API, so you can write something like It looks like See also #1118 |
👍 to the scipy.sparse array suggestion [While we are discussing supporting other array types, we should keep gpu arrays on the radar] |
Although I don't know much about SciDB, it seems to be another possible application for |
Here is a brief attempt at a multi-dimensional sparse array: https://github.com/mrocklin/sparse It depends on numpy and scipy.sparse and, with the exception of a bit of in-memory data movement and copies, should run at scipy speeds (though I haven't done any benchmarking). @rabernat do you have an application that we could use to drive this? |
Nothing comes to mind immediately. My data are unfortunately quite dense! 😜 |
In case you're still looking for an application, gene expression from single cells (see Here is an example of using Hope this is a good example for sparse arrays! |
Other examples where labeled sparse arrays would be useful are,
|
Sparse Xarray DataArrays would be useful for the linear regridding operations discussed in JiaweiZhuang/xESMF#3. |
I'm interested to see if there have been any developments on this. I currently have an application where I'm working with multiple dask arrays, some of which are sparse (text data). It'd be worth my time to move my project to xarray, so I'm be interested in contributing something here if there is a need. |
I'd know a project which could make perfect use of xarray, if it would support sparse tensors: Currently I have to work with both xarray and anndata to store counts in sparse arrays separate from other depending data which is a little bit annoying :) |
See also: #1938 The major challenge now is the dispatching mechanism, which hopefully http://www.numpy.org/neps/nep-0018-array-function-protocol.html will solve. |
Would it be an option to use dask's sparse support? Currently I load everything into a dask array by hand and pass this dask array to xarray. |
How should these sparse arrays get stored in NetCDF4? |
In principle this would work, though I would prefer to support it directly in xarray, too.
Yes, we would need to implement a convention for handling sparse array data. |
Given the recent improvements in numpy duck array typing, how close are we to being able to just wrap a pydata/sparse array in an xarray Dataset? |
It will need some experimentation, but I think things should be pretty close after NumPy 1.17 is released. Potentially it could be as easy as adjusting the rules xarray uses for casting in |
If someone who is good at numpy shows up at our sprint tomorrow, this could be a good issue try out. |
@rgommers might be able to recommend someone |
I haven't talked to anyone at SciPy'19 yet who was interested in sparse arrays, but I'll keep an eye out today. And yes, this is a fun issue to work on and would be really nice to have! |
I personally use the new sparse project for my day-to-day research. I am motivated on this, but I probably won't have time today to dive deep on this. Maybe CuPy would be more exciting. |
Wondering what the status on this is ? Is there a branch with this functionality implemented - would love to give it a spin ! |
This is working now on the Once we get a few more kinks worked out, it will be in the next release. I've started another issue for discussing how xarray could integrate sparse arrays better into its API: #3213 |
@shoyer |
@fjanoos there isn't any formal documentation yet but you can look at test_sparse.py for examples. That file will also tell you what works and doesn't work currently. |
I would like to have an XArray that has scipy.sparse arrays rather than numpy arrays. Is this in scope?
What would need to happen within XArray to support this?
The text was updated successfully, but these errors were encountered: