Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow appending non-numerical types to zarr arrays. #3480

Closed
amatsukawa opened this issue Nov 2, 2019 · 0 comments · Fixed by #3504
Closed

Allow appending non-numerical types to zarr arrays. #3480

amatsukawa opened this issue Nov 2, 2019 · 0 comments · Fixed by #3504

Comments

@amatsukawa
Copy link
Contributor

amatsukawa commented Nov 2, 2019

MCVE Code Sample

Zarr itself allows appending np.datetime and np.bool types.

>>> path = 'tmp/test.zarr'
>>> z1 = zarr.open(path, mode='w', shape=(10,), chunks=(10,), dtype='M8[D]')
>>> z1[:] = '1990-01-01'
>>> z2 = zarr.open(path, mode='a')
>>> a = np.array(['1992-01-01'] * 10, dtype='datetime64[D]')
>>> z2.append(a)
(20,)
>>> z2
<zarr.core.Array (20,) datetime64[D]>

But it's equivalent in xarray throws an error:

>>> ds = xr.Dataset(
...     {'y': (('x',), np.array(['1991-01-01'] * 10, dtype='datetime64[D]'))}
... )
>>> ds.to_zarr('tmp/test_xr.zarr', mode='w')
<xarray.backends.zarr.ZarrStore object at 0x31f403170>
>>> ds2 = xr.Dataset(
...      {'y': (('x',), np.array(['1992-01-01'] * 10, dtype='datetime64[D]'))}
... )
>>> ds2.to_zarr('tmp/test_xr.zarr', mode='a', append_dim='x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py", line 1616, in to_zarr
    append_dim=append_dim,
  File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1304, in to_zarr
    _validate_datatypes_for_zarr_append(dataset)
  File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1249, in _validate_datatypes_for_zarr_append
    check_dtype(k)
  File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1245, in check_dtype
    "unicode string or an object".format(var)
ValueError: Invalid dtype for data variable: <xarray.DataArray 'y' (x: 10)>
array(['1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
       '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
       '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
       '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
       '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000'],
      dtype='datetime64[ns]')
Dimensions without coordinates: x dtype must be a subtype of number, a fixed sized string, a fixed size unicode string or an object

Expected Output

The append should succeed.

Problem Description

This function in xarray/api.py is too strict on types:

def _validate_datatypes_for_zarr_append(dataset):
    """DataArray.name and Dataset keys must be a string or None"""

    def check_dtype(var):
        if (
            not np.issubdtype(var.dtype, np.number)
            and not coding.strings.is_unicode_dtype(var.dtype)
            and not var.dtype == object
        ):
            # and not re.match('^bytes[1-9]+$', var.dtype.name)):
            raise ValueError(
                "Invalid dtype for data variable: {} "
                "dtype must be a subtype of number, "
                "a fixed sized string, a fixed size "
                "unicode string or an object".format(var)
            )

    for k in dataset.data_vars.values():
        check_dtype(k)

np.datetime64[.] and np.bool are not numbers:

>>> np.issubdtype(np.dtype('datetime64[D]'), np.number)
False
>>> np.issubdtype(np.dtype('bool'), np.number)
False

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None

xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: 4.7.12
pytest: 5.2.1
IPython: 7.8.0
sphinx: 2.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant