-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
need better handling of case when both szip and zlib are chosen for compression #1616
Comments
Interesting. With my multi filter code, both would be applied. |
Yes, I understand. When filters are used (other than the built-in HDF5 filters), there is little we can do or know about whether multiple filters are appropriate and correct. However, in this case, we can know that, because zlib and szip are built-in HDF5 filters, and like the shuffle and fletcher32 filters, we have direct knowledge of what is happening in the libsrc4/libhdf5 code. |
It is not obvious to me that we should be making an exception that |
Hopefully we can get some outside opinions on this. |
I ran into another problem, which has to do with the special way that szip is handled. Also I had an incredibly long telecon all day and didn't get much chance to program. So I thought I would give it a good think and try again tomorrow. It's not clear whether H5Pset_szip() will work with H5Pset_deflate() the way that two filters will work together, although we might think they should. I will do some HDF5 test code to check out that interaction. |
It would not surprise me if combining any two compressors |
I have written a test in tst_h_vars.c (soon to be put up in PR) which determines what HDF5 does when both szip and zlib are applied. Unfortunately, the results are inconsistent based on order. ;-) If H5Pset_deflate() is called first, the subsequent call to H5Pset_szip() returns success but takes no effect. Only deflate filter is applied to the data. But if H5Pset_szip() is called first, then both szip and deflate filters are applied. ;-) I have submitted a Jira issue on the HDF5 support site for this. Presumably this is a bug. I believe what we should expect is that both filters are applied. And this should work fine (though it's pointless extra computation to doubly compress, and then uncompress the data). I don't think we can generally return an error when two compression filters are used, since we don't know what a filter does, in general. (We could support a list of known filters, as a partial solution.) Also I am not prepared to say that it's never appropriate to use two different compression filters together. Maybe there are some compression filters that would work well in combination - I don't know. @czender you have studied these filters - do you have any input? But we can forbid the use of zlib and szip together. Uniquely, those are built-in compression filters, so we know what they are. But should we? At the moment, zlib always wins for netcdf-c. If zlib is selected then H5Pset_deflate() is called, and the szlip settings are ignored. Options: I would go with 1, except it seems dumb in light of the fact that this will probably always be a user mistake. |
OK, I have code that causes nc_def_var_deflate() to return NC_EINVAL if nc_def_var_szip() has already been called on the var, and vice versa. If anyone can come up with a good reason to use both filters at once, and if HDF5 fixes the inconsistent behavior, I'm happy to change this code to allow both szip and zlib. But I believe no one really wants to do that. PR up shortly. |
If the user tries to turn on both zlib and szip compression, zlib silently wins.
But instead, each should check if the other has already been selected, and return an error.
So if nc_def_var_szip() has been called on a var, then a future call to nc_def_var_deflate() should return an error, and vice versa. This should also be documented in both functions.
The text was updated successfully, but these errors were encountered: