-
Hi I'm trying to encode my xarray Dataset using the LZ4 compression filter while saving to a hdf5 file using the to_netcdf function. Currently I'm using the h5netcdf engine and I've installed the hdf5plugin package, hoping this would expose the LZ4 filter option. So I (very naively) tried the following: ds.to_netcdf('my_filename.hdf5', engine='h5netcdf', encoding={'my_variable': {'compression': 'lz4'}})
ds.to_netcdf('my_filename.hdf5', engine='h5netcdf', encoding={'my_variable': {"lz4": True}}) But this backfires: # First command
ValueError: unexpected encoding parameters for 'h5netcdf' backend: ['lz4']. Valid encodings are: {'fletcher32', 'dtype', '_FillValue', 'compression', 'contiguous', 'shuffle', 'chunksizes', 'compression_opts', 'complevel', 'zlib'}
# Second command, notice data is passed on to h5py to save.
File "/_hl/filters.py", line 185, in fill_dcpl
raise ValueError('Compression filter "%s" is unavailable' % compression)
ValueError: Compression filter "lz4" is unavailable Reading the xarray documentation I have the impression it should be possible:
So my question:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I found an answer, in case this is helpful to someone else. Apparently hdf5plugin implements the filter option names slightly different as opposed to the standard h5py compression options. Snippet from the hdf5plugin documentation using h5py directly. # Compression
f = h5py.File('test.h5', 'w')
f.create_dataset('data', data=numpy.arange(100), **hdf5plugin.LZ4())
f.close() Notice that dict(hdf5plugin.LZ4())
{'compression': 32004, 'compression_opts': (0,)} Notice the compression 'name' is not 'LZ4', but 32004. So in order to use the hdf5plugin filters from xarray, we need to pass the compression options as a nested dictionary to the ds.to_netcdf('my_file.hdf5', engine='h5netcdf', encoding={'my_variable': {'compression': hdf5plugin.LZ4()['compression'], 'compression_opts': hdf5plugin.LZ4()['compression_opts']}}) Or more concise: ds.to_netcdf('my_file.hdf5, engine='h5netcdf', encoding={'my_variable': dict(hdf5plugin.LZ4())}) |
Beta Was this translation helpful? Give feedback.
I found an answer, in case this is helpful to someone else.
Apparently hdf5plugin implements the filter option names slightly different as opposed to the standard h5py compression options.
The h5py backend accepts two keywords,
compression
andcompression_opts
.Snippet from the hdf5plugin documentation using h5py directly.
Notice that
compression
andcompression_opts
are not passed explicitly. But notice how thehdf5plugin.LZ4()
instance is being unpacked in to kwargs (using **), this must mean it behaves like a dictionary: