-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very frequent segfaults with the new netCDF4=1.6.1
#1192
Comments
Looks like you are using conda-forge's netcdf4. Maybe open an issue at https://github.com/conda-forge/netcdf4-feedstock instead. PS: could you also test the wheels just to be sure they are OK? |
@ocefpaf good call, mate! Will do so, cheers 🍺 |
We're also having issues on yt with the windows wheels for version 1.6.1. Namely, h5py is raising a warning at import. See yt-project/yt#4128 |
Same with us - see SciTools/iris#4968. Sometimes manifests as segfaults, sometimes as crashed GHA workers (maybe segfault underneath). @ocefpaf I've confirmed the same problems appear when installing from PyPI OR from conda-forge. |
Many thanks, I was about to test the PyPi version - cheers for testing, that saves me some lunch time 😁 |
@trexfeathers what platforms are failing when you tested the PyPI wheels? I'm particularly interested in the Windows wheels for 1.6.1 b/c those are built in a different way now. |
GHA's |
@ocefpaf to clarify, on yt we're testing with PyPI wheels for all three major platforms, and we're only seeing issues on windows. |
ah I realized I've not clarified it myself: segfaults from a conda-forge install on both ubuntu-latest, OSX-latest on GHA, and ubuntu on CircleCI, off conda-forge as well, no Win testing for us since we've not been able to get a working install of our packages there either, snif but not snif 😁 |
So here's a pretty interesting case study that may lead to fixing this current issue - and a rather seldomly recurring issue @agstephens and myself have noticed in the past, with older (and stable, bullet-proof versions of netCDF4):
My colleague Ag noticed the same behaviour, way back, on very very few occasions - an HDF Segfault on a certain file would automagically disappear if we moved the file out and back in its location, we blamed it on the FS back then, but thinking in restrospect, it could be the same issue here? |
However, the other platforms are failing in other CIs so this is quite confusing and we'll need the reports here to help us sort this out. Edit: @neutrinoceros your report upstream is about h5py and not netcdf4, right? xref: yt-project/yt#4128 |
We're only seeing a warning and yes, it's triggered from h5py. I'm assuming it's the same underlying issue, but that's a wild guess. |
Most likely not. Folks here are experiencing segfaults with the latest netcdf4-python. The h5py warning in your CI is just b/c one version of hdf5 was used to build but another one is used to run. In my experience that is OK 99.99% of the cases. |
Should I file another issue ? |
Probably not. It'll be closed b/c it is a known warning that is mostly harmless. |
It's not clear to me whether this is an issue with all the wheels for 1.6.1, or just the windows wheels? It's hard to see why the linux and macosx wheels would be a problem, since they are built exactly the same way as they were for 1.6.0. The most significant code change in netcdf4-python in 1.6.1 is PR #1181, but I don't see how this could cause segfaults. |
- Pin older version of netCDF4 to avoid Unidata/netcdf4-python#1192 - See ioos#268 for discussion
- Pin older version of netCDF4 to avoid Unidata/netcdf4-python#1192 - See ioos#268 for discussion
I want to share here a workaround I've been using to deal with the netcdf4 python package issue for my projects. After installing all other dependencies, I reinstall netcdf4-python from source with the following (this has solved my issues):
Echoing @jswhit, I mentioned in another issue that I don't think the problem is the code, but the wheel-building process, since installing from sources works perfectly fine. In any case, this is a really mysterious problem! |
@Zeitsperre are you having a problem with windows wheels only, or also the linux and macosx wheels? |
@jswhit would #1192 (comment) give you some sort of a clue what might trigger those intermittent but rather frequent SegFaults? It's a bit black magic to me at the moment 😁 |
hey guys, it appears @Zeitsperre is correct and this whole segfaulting issue is happening as a cause of some installation problem: I went the conda-forge way and did a couple black box tests, see below
Could it be that the conda compilers are not preserving the rght flags or compilation order for you, specifically for 1.6.1? I know (extreme) cases where people need to compile numpy since the conda-forge supplied version is giving them headaches due to numerical precision deltas from version to version, but that's normal(ish). Anyways, here's my test results:
|
I'm only testing on Linux systems, so nothing for me to report on Windows or macOS. |
Are you using wheels from pypi or conda to install? |
Folks, please, everyone that is using the package from conda-forge post your issues and comments in conda-forge/netcdf4-feedstock#141 and not here. Let's help out with the triage so we can solve this! |
cheers @ocefpaf - I'll link my comment above with the test results to the feedstock issue, good point! I am still not 100% sure it's just conda, or PyPi installations, or the code itself that's causing this, that's why I was primarily posting guff here so the experts may be able to get some clues 🍺 |
The PyPI wheels have not been working for me, but the conda binaries have been fine for me on Linux. |
@valeriupredoi reported at conda-forge/netcdf4-feedstock#141 that his segfaults were all related to the use of file caching, and if the file is read directly from disk the segfaults go away. Are others experiencing segfaults also using some sort of caching of |
From the discussion at conda-forge/netcdf4-feedstock#141, it looks like at least some of the segfaults are related to using netcdf4-python within threads. netcdf-c is not thread-safe, and releasing the GIL on all netcdf-c calls (introduced in 1.6.1) has increased the probability of segfaults when threads are used. |
Folks using import dask
dask.config.set(scheduler="single-threaded") instead of pinning to |
There is an experimental PR in netcdf-c that makes the C library threadsafe. This should fix many (all?) of the problems reported here, but won't be available in a released version for some time. |
@trexfeathers speaking on behalf of netcdf-C: yes. It is unclear when a threadsafe version of netcdf-c will be released. |
Is there a plan to revert the GIL-freeing changes? I think I'm getting bitten by this on my linux systems. |
See Unidata/netcdf4-python#1192 (comment) Hopefully able to fix the following weird multiple warnings in homepage notebook 3: ``` HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 1: #000: H5A.c line 528 in H5Aopen_by_name(): can't open attribute major: Attribute minor: Can't open object #1: H5VLcallback.c line 1091 in H5VL_attr_open(): attribute open failed major: Virtual Object Layer minor: Can't open object #2: H5VLcallback.c line 1058 in H5VL__attr_open(): attribute open failed major: Virtual Object Layer minor: Can't open object #3: H5VLnative_attr.c line 130 in H5VL__native_attr_open(): can't open attribute major: Attribute minor: Can't open object #4: H5Aint.c line 545 in H5A__open_by_name(): unable to load attribute info from object header major: Attribute minor: Unable to initialize object #5: H5Oattribute.c line 494 in H5O__attr_open_by_name(): can't locate attribute: '_QuantizeBitGroomNumberOfSignificantDigits' major: Attribute minor: Object not found ```
Heads up guys, we are seeing some very frequent segfaults in our CI when we have the new, hours-old,
netCDF4=1.6.1
in our environment. It's most probably due to it, since HDF5 has been at 1.12.2 for a while now - more than a month, and withnetCDF4=1.6.0
all works fine (and other packages staying at the same version and hash point). Apologies if this proves out to be due to a different package, but better safe than sorry in terms of a forewarning. Cheers muchly 🍺The text was updated successfully, but these errors were encountered: