-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions on compression #154
Comments
@zklaus I never used anything desides zlib so I'm not an expert here. My guess is that may we should add those as hdf5 dependencies for netcdf-c to pick them up? Not sure. However, I'd prefer if we could get an error for an unavailable compression option rather than no-compression. This may be an upstream issue though. |
Thing is, |
Yep. That is why I think they should be in hdf5, at least for the netcdf4 format, maybe you can use them to compress netcdf-classic. Again, not sure, I did not test this. Just speculating. Let's ping an expert here (@dopplershift) for help. |
I'm foggy on whether this needs support from HDF5. @WardF @DennisHeimbigner can you shed some light? |
Zstandard requires libzstandard (libzstd?) be installed on the system. There should be a bundled bz2 implementation to fall back on when one is not present on the system. Let me dig into this. |
Ah, I think I understand the question better now. Give me a few to get in front of a keyboard, instead of the GitHub app on my phone. |
We do provide an internal implementation of bzip2 primarily for testing purposes. |
@DennisHeimbigner I think that's possible, but don't lose focus on the actual problem here: netcdf-c was compiled with support for bz2 and zstd from systems packages, |
Then the issue would appear to be that our zstd detector in configure.ac is not working correctly. |
@DennisHeimbigner We did, but in that instance, the issue was that the library was not being detected even though it installs. In this case, it appears that the library is detected, but the results are unexpected. |
I'm poking around, but don't see the |
BTW, is this problem when using CMake or with using AutoMake? |
I'm sorry, it seems we are in quite unfortunate relative time zones. Conda-forge builds I did a bit more poking myself and came away with the impression that we need not only libnetcdf proper and the various compression library, but also the corresponding plugins. They are part of libnetcdf, but the Conda-forge build does not install them at the moment. I have opened conda-forge/libnetcdf-feedstock#172 to change that, and with the build from there (downloaded via the artifacts there and installed as a local package) things work as expected. If it's true that we need the plugins, then the problem is the detection in the Cmake file and somehow overall bumpiness of the workflow: The HDF5 library has a default plugin dir, but also an environment variable; the relationship of the two is not so clear. I did not figure out a super elegant way to detect the default dir, opting in the end to extract it from the So to cut a long ramble short: Is it correct that we need the plugins? If so, let's discuss in the libnetcdf feedstock how exactly we want to install them. For the blosc ones, it would probably be good to get an error similar to the other ones instead of a silent failure to compress; if you agree with that, we should open an issue upstream. |
our CMakeList.txt file uses a module in cmake/modules to locate a number of libraries, including zstd. |
To reference back to Dennis' earlier comment, it may also be that libzstd-dev also needs to be installed. In my (frustrating, frustrated) experience, some systems package the necessary header files in |
In theory conda-forge should have all of those. We don't usually split packages like that. |
Glad to hear that; I'm splitting attention between this and some reported |
@zklaus I think it was glossed over in the responses, but I'm pretty sure you're correct that the problem is that the plugins aren't being installed. There are a variety of issues on the Unidata netcdf-c repository about the plugin directory, though the problem here sounds exactly like Unidata/netcdf-c#2294. |
Thanks, @dopplershift. It does sound similar, though it seems to deal more with the autotools build and maybe the filter isn't even built there? For me, there are three issues: Silent non-compression on
|
Hopefully @zklaus you got your zstd problems resolved and were able to do your benchmarks. I would be very interested in the results. (In my testing, I try to use real data rather than generated random numbers for compression. Random numbers, unless constrained in some way, will be all over the map and not very compressible. That doesn't match real science data, where, for example, a 4D field of atmospheric pressure will generally have numbers close to their neighbors. much more suitable for compression.) I've just taken another swing at the plugin install situation in cmake and autotools build. The fact remains that you need to specify the plugin_dir configure/cmake option, and set it at configure-time and also at run time, in HDF5_PLUGIN_PATH. I have added documentation about this which will be part of the 4.9.3 release. Let me know if you have further troubles with this after the upcoming 4.9.3 release. One open question at the moment is whether netcdf should remember this choice and notify HDF5 where your netCDF plugins are, without the need to set HDF5_PLUGIN_PATH... |
Thanks for getting back to me, @edwardhartnett. I no longer work in climate science (but rather more directly w/ conda-forge, conda, and other packaging related things at Quansight), so I am afraid I won't have much more to contribute here. I do agree with you that real world data is often far from random and have preferred actual data in my experiments as well. |
Comment:
I wanted to play around with new compression options in Netcdf.
For those to whom this means anything, the background is that I would like to write suggestions/requirements for chunking, quantization, and compression into the next Data Request for CMIP7.
I expected to be able to do most of that with
netcdf4
alone, but I found some surprises.I wrote this little program to do some tests.
It creates some random data, chunks it somewhat reasonably, stores it raw and quantized and compressed with different compression methods.
Running it in an environment created with
mamba create -n nc-comp-test-2 humanfriendly netCDF4 pandas
, onlyzlib
andszip
compression is available. I was notably surprised by the absence ofzstd
andbzip2
compression. I could make those available by installing theccr
package, but I was under the impression that at leastzstd
should be available bynetcdf4
alone?I also tried the two variants
blosc_zstd
andblosc_zlib
, which both ran with no exception, but didn't produce any compression at all. Here are some results from running the script:With
ccr
:So overall, my questions are:
zstd
compression work withoutccr
?blosc_*
compressions compress? Do I need to install some particular package to make that work?PS: Of course, actual performance will be dependent on the nature of the data, but I'd like to make sure I understand how things should work technically.
The text was updated successfully, but these errors were encountered: