-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter error: bad id or parameters or duplicate filter #1205
Comments
unfortunately you can't use compression on variable-length string variables. |
It works fine for version 1.5.8 though, so this seems like a regression given that previously working code is now broken. Can you clarify whether this is expected? |
The HDF5 library does not support compression of variable length datatypes. The lib may not have returned an error code in earlier versions, but I don't think any compression was actually done. See discusson at https://forum.hdfgroup.org/t/compression-in-variable-length-datasets-not-working/1276 |
That discussion seems to say that the filters are properly applied to variable length datatypes, but that only the pointers are compressed which doesn't result in appreciable compression gains. I would argue that is still preferable to having working code get broken by an update to the library. We have code that tries to apply zlib compression to all variables it creates, and it worked fine until the update. It's straightforward for us to work around since we know the issue, but this library has many users and having a breaking change on update that could otherwise be avoided (by updating the library code) seems undesired. Perhaps this can be changed to warning instead (or otherwise patched) so that existing code isn't broken? Is this something that I should try to address with the C library instead? |
It's a netcdf-c lib issue - at some point it started to return a non-zero error code. We could just ignore the error code, but I'm not sure the variable would be correctly initialized. |
Here is the relevant netcdf-c PR: Unidata/netcdf-c#2231. This went into version 4.9.0. |
Thanks for the additional info! Given the upstream PR, it seems like this would be a wonfix upstream. Would you be willing to entertain a PR implementing the suggestion I made above about changing the This would allow old code to continue working as it was even though upstream has made the breaking change. |
There are other types that compression will not work for besides VLTypes - seems like if this warning were added it should be more general. |
I see the following: From the above, it seemed like VLType and str were the variable length types but if other types should be included that seems reasonable to me. Which other (variable-length) types should be checked for? And to confirm, does the general idea seem sound and would you be open to a PR implementing this? |
Yes, I would be open to a PR. Have to look at the HDF5 docs to see what datatypes support compression. Certainly all the primitive datatypes, not sure about Compound or Enum. |
I'm not sure what you mean here. I think all the variable data types support compression, but the error only pops up for variable length types. The error is thrown in libdispatch/dfilter.c and the function which determines if the variable is variable length is There we see that variable length fields are either strings, VLType, or Compound types which themselves contain either a string or VLType. You can also see there that ENUM and the primitive datatypes (i.e. |
OK - I wasn't sure if Enum and Compound types supported compression but I guess they do. So you just need to check for variable-length types. |
Thanks for the feedback! There is now a discussion on this over at Unidata/netcdf-c#2554 and it would be cleaner to take this from upstream if it were possible so I'll try to follow up there and update here on resolution. |
The discussion resulted in a PR upstream that addresses this issue, although there is still some discussion on further improvements. Given the fix upstream, I think this may be safe to close, though several versions of the C library when used with this python wrapper will still generate the reported error message, so a fix here might still be independently useful for users on those versions. I'm still happy to create a PR here implementing what was proposed in #1205 (comment) if it would be helpful, but otherwise it seems like we can close this directly. |
Avoid passing filters to variable-length(vlen) datatypes. Refer: - Unidata/netcdf4-python#1205 - Unidata/netcdf-c#2554 - Unidata/netcdf-c#2231
To report a non-security related issue, please provide:
the version of the software with which you are encountering an issue
Version 1.6.1
environmental information (i.e. Operating System, compiler info, java version, python version, etc.)
Python 3.9.13 on Linux x86_64
a description of the issue with the steps needed to reproduce it
When attempting to use zlib compression on str fields, an error is generated rather than working as expected:
Test case:
Setting compression='zlib' instead of zlib=True has the same effect.
The text was updated successfully, but these errors were encountered: