-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check version of netcdf.h and library to ensure they match #1223
Comments
The only thing this defends against is LD_PRELOAD loading the wrong soname. Please don't do this as it's extremely inconvenient for Linux distributions and does not solve any known problem. |
@DennisHeimbigner comments in #864 that the HDF5 error checking has caught error mismatches for him several times. I agree, that has also happened to me. As alluded by @gsjaardema HPC systems often do not give users much visibility into the LD_LIBRARY_PATH. Users just load the netcdf module and the system sets something in LD_LIBRARY_PATH. Meanwhile, they use a netcdf.h that they find, and it may or may not be correct. That is the reason for the check in HDF5. So @eli-schwartz can I ask you to elaborate on why this makes it extremely inconvenient for Linux distributions? |
For completeness, let me propose this:
This means that any time code is recompiled with the new netcdf header |
Soname versioning work for every other piece of software known to mankind in all of computing history... This is inconvenient for Linux distributions, for the same reason it is inconvenient for regular users. It means that rather than trusting the declared compatibility versioning, you hard abort based on a completely different, unrelated versioning system, which isn't even incompatible. The major.minor.patchlevel version for hdf5 is a superset of the times when actual ABI compatibility has been broken. It includes all soname changes, plus many more changes as well. However, due to the use of hard aborts based on this entirely misplaced runtime check, the soname simply doesn't matter in the slightest, and matching sonames which should work fine, nevertheless require completely recompiling the entire hdf5 stack without exception. Recompiling the world on every patchlevel release is pretty annoying, it requires involved, domain-specific knowledge to know when you cannot trust the soname versions, and it's difficult to check which software is compiled against a patchlevel release whereas there are trivial tools to check which soname something is compiled against. Then you end up with situations where some Linux distributions decide they don't want to play monkey games, and they patch out the check like this: https://salsa.debian.org/debian-gis-team/hdf5/blob/master/debian/patches/relax-version-check.patch If you don't trust sonames, why do you use them in the first place? If you do trust them but are worried about compile-time header mismatches, how does that translate to runtime checks, and either way, why not do compile-time checks against the thing which matters, that is, the ABI version also used by the soname? |
I am not sure I understand.
|
Yes it is possible, but that is exactly what soname versionning is for: forbid running against an incompatible library. If a software is compiled against |
Comment for Ward: what has been our previous experience with bumping the Several thoughts:
|
If you have multiple versions of libhdf5.so being loaded by a single program, then you'll have multiple copies of the symbols, and the dynamic linker will hide some extra copies. IIRC it is an operating system implementation detail whether the first loaded or second loaded library wins. Shortly after, your program will segfault or exhibit other signs of undefined behavior as it attempts to use symbols which it thinks are correct, in ways they were not meant to be used. This will happen regardless of your runtime checks, and in all probability, before you ever try running code containing runtime checks -- but we're dealing with undefined behavior here, so who knows? This is not a sane environment to be working in. It's not an allowed environment. It happens when your application is correctly linked against two dependent libraries, and each is correctly linked against libhdf5, but wrongly linked against different versions of libhdf5. Each dependent library is therefore part of an entirely different, exclusive software stack. If one is in your I would not want to rely on a runtime check to catch this. If that is the concern, then the hdf5 version checking code being held up as an example is wrong. It emits a fatal error claiming that the headers don't match the library, and you should recompile the application... but as far as I can tell, depending on which library was loaded first, the code which loads the data will work fine, but the code which processes it will silently load incompatible symbols and do who-knows-what to your data without the warnings being effective at all. ... This goes far beyond mismatched headers. |
As mentioned in #864 (comment) The use of symbol versioning would allow you to build part of your application against one version of hdf5 or netcdf, and part of your application against another version, and your application would still work. |
@DennisHeimbigner I’m still on thanksgiving travels so will weigh in more fully later. The soversioning is completely separate from the library versioning, and there are rigid rules for how major.minor.revision are incremented. I’ll post the link when Inhave the chance. |
Also, we need to have a solution that works on Windows, OSX, and Cygwin. |
I will catch up on this when I return to the office but the ‘against’ seems fairly straightforward. At a glance I’m not sure if this proposed fix addresses a hypothetical or demonstrated issue but I will review and weigh in (the issue is probably explained in #864). SO versioning is robust and anything that breaks or abandons it at this point would have to be very well justified. Enjoy your weekend and I’ll review tomorrow. |
Cygwin works exactly the same way Linux works. OSX uses Windows uses private dependencies of everything, and manifests which describe exactly where that dependency is located, so chances of mysteriously loading multiple copies of the library via other libraries pinned the same way but depending on other versions, and having the symbols overwrite each other are... low? |
@eli-schwartz OK, I get your point. Checking versions in this way breaks one of the key features of shared libraries. We don't want to do that. |
Looks like eli gave all the arguments I would. Use the proper tool for the job. The hdf5 version check has been a pain in my side for the last 13 years that I've maintained the hdf5 package in Fedora. The thought of netcdf taking it up as well is just too much to bear. |
So why did the HDF5 group do it? They must have had a specific reason for using |
Maybe back in 90s it seemed like a good idea. If you are not doing proper API/ABI management it may seem like a reasonable thing to do. But with the availability of modern tools it is completely the wrong thing to do. |
OK, seems like the consensus is that this is not a good idea, because it makes shared libraries much less effective. I will close this ticket. |
Answer from hdfgroup:Hi Dennis, I don't know why this was done to begin with.
|
OK, well I guess that drives a stake through the heart of this idea. ;-) |
I am breaking this discussion out of #864 so that it can be evaluated on its own merits (or lack of merit).
It has been proposed that we add a check at run-time to ensure that the version of the netcdf.h file matches the version of the library being used.
This checking is also done by HDF5.
The version check is proposed to be accomplished by making nc_open/nc_create into macros, which will call the real nc_open/nc_create with an added version parameter. This is explained well by @gsjaardema in #864.
@opoplawski objects to this feature. I invite him to list his reasons in this issue.
@gsjaardema uses a similar feature in his own exodus package. He thinks it saves hours of debugging when researchers get confused about what header file to use and what is in their LD_LIBRARY_PATH. Given the way that most supercomputers try and abstract the LD_LIBRARY_PATH from the users (instead of just explaining it), I can understand how this kind of confusion would arise.
It has been proposed that this feature is a necessary requirement for changing the mode flag size, however that is less clear. Changing the API in any way is correctly handled by shared library builds, as long as we upgrade the so number correctly. However, if we are to have such a version check, I am happy to implement it before attempting to change the size of the mode field.
The text was updated successfully, but these errors were encountered: