You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Xarray already has unstack(sparse=True) which is quite awesome.
However, in many cases it is costly to convert a very dense array (existing values >> missing values) to a sparse representation. Also, many calculations require to convert the sparse array back into dense array and to manually mask the missing values (e.g. Keras).
Logically, a sparse array is equal to a masked dense array.
They only differ in their internal data representation.
Therefore, I would propose to have a masked=True option for all operations that can create missing values. These cover (amongst others):
.unstack([...], masked=True)
.where(<multi-dimensional array>, masked=True)
.align([...], masked=True)
This would solve a number of problems:
No more conversion of int -> float
Explicit value for missingness
When stacking data with missing values, the missing values can be just dropped
When converting data with missing values to DataFrame, the missing values can be just dropped
MCVE Code Sample
An example would be outer joins with slightly different coordinates (taken from the documentation):
While searching for issues related to #1887 I came across this and see no one replied.
More support for missing values would be great, it's a constant source of complications. Though this proposal would likely require a lot of work to implement, and add some complications to the API.
Considering this along with other approaches to missing values support in the community — e.g. pandas' Int type — would give this more context.
Xarray already has
unstack(sparse=True)
which is quite awesome.However, in many cases it is costly to convert a very dense array (existing values >> missing values) to a sparse representation. Also, many calculations require to convert the sparse array back into dense array and to manually mask the missing values (e.g. Keras).
Logically, a sparse array is equal to a masked dense array.
They only differ in their internal data representation.
Therefore, I would propose to have a
masked=True
option for all operations that can create missing values. These cover (amongst others):.unstack([...], masked=True)
.where(<multi-dimensional array>, masked=True)
.align([...], masked=True)
This would solve a number of problems:
MCVE Code Sample
An example would be outer joins with slightly different coordinates (taken from the documentation):
Non-masked outer join:
The masked version:
Related issue:
#3955
The text was updated successfully, but these errors were encountered: