Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow track_order to be passed to h5netcdf #7680

Open
abunimeh opened this issue Mar 26, 2023 · 3 comments
Open

Allow track_order to be passed to h5netcdf #7680

abunimeh opened this issue Mar 26, 2023 · 3 comments

Comments

@abunimeh
Copy link

Is your feature request related to a problem?

when using h5netcdf as a backend. Writing the same exact content to two different files results in unique md5 checksum for the two identical xarray files.

See h5netcdf/h5netcdf#211

Describe the solution you'd like

When saving an nc file. allow track_order=False to be passed as an arg

Describe alternatives you've considered

using netcdf4 engine

Additional context

No response

@jhamman
Copy link
Member

jhamman commented Mar 27, 2023

@abunimeh - Thanks for opening this issue. Can you expand on the feature a bit more? What API would you like to see? ds.to_netcdf(..., track_order=False)?

I suspect this will need to be treated like invalid_netcdf as it will only apply to the h5netcdf backend:

xarray/xarray/core/dataset.py

Lines 1892 to 1895 in 86f3f21

invalid_netcdf: bool, default: False
Only valid along with ``engine="h5netcdf"``. If True, allow writing
hdf5 files which are invalid netcdf as described in
https://github.com/h5netcdf/h5netcdf.

_Note: it would be nice if we had backend_kwargs on to_netcdf since the variety of options scipy/netcdf4/h5netcdf support are increasingly different.

@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Mar 27, 2023

First, I totally agree with @jhamman having backend_kwargs on to_netcdf.

For the particular use case, netcdf-c/netCDF4-python create HDF5 files (NECTDF4-format) with track order enabled as required, see https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html#creation_order.

h5netcdf uses track_order=True as default since version 1.1.0. There have been (and still are, HDFGroup/hdf5#1388) some corner case issues upstream which netcdf-c can somehow circumvent, but h5netcdf can't. Nevertheless, to be compliant with netcdf-c track_order=True is default for h5netcdf.

@abunimeh As a workaround until this is sorted out you could create the file (or subgroup) using h5py/h5netcdf with track_order=False. If a file (root-group) or sub-group in a file is created with track_order=False this will be persistent as it is set at group-define time. Then you can use to_netcdf as usual with mode="a" to append.

import xarray as xr
import h5netcdf
from time import sleep

ds = xr.Dataset(data_vars=dict(hello=(["x"], [1., 1., 1., 1., 1.])))

track_order = False
group = "/track"

with h5netcdf.File("sample1.nc", "a", track_order=track_order) as f1:
    if group.split("/")[-1]:
        f1.create_group(group)

ds.to_netcdf("sample1.nc", mode="a", engine="h5netcdf", group=group)                
sleep(5)

with h5netcdf.File("sample2.nc", "a", track_order=track_order) as f2:
    if group.split("/")[-1]:
        f2.create_group(group)
                
ds.to_netcdf("sample2.nc", mode="a", engine="h5netcdf", group=group)   

Update: Use mode="a" everywhere.
Update2: Cave: You will not be able to append to this file with netcdf-c/netCDF4-python ever again.

@abunimeh
Copy link
Author

Thanks @kmuehlbauer for explaining this.

@jhamman yes, i was hoping that I can pass ds.to_netcdf(..., track_order=False) when engine is hd5netcdf.

It would be nice to enhance backend_kwargs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants