Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check version of netcdf.h and library to ensure they match #1223

Closed
edhartnett opened this issue Nov 25, 2018 · 20 comments
Closed

Check version of netcdf.h and library to ensure they match #1223

edhartnett opened this issue Nov 25, 2018 · 20 comments

Comments

@edhartnett
Copy link
Contributor

edhartnett commented Nov 25, 2018

I am breaking this discussion out of #864 so that it can be evaluated on its own merits (or lack of merit).

It has been proposed that we add a check at run-time to ensure that the version of the netcdf.h file matches the version of the library being used.

This checking is also done by HDF5.

The version check is proposed to be accomplished by making nc_open/nc_create into macros, which will call the real nc_open/nc_create with an added version parameter. This is explained well by @gsjaardema in #864.

@opoplawski objects to this feature. I invite him to list his reasons in this issue.

@gsjaardema uses a similar feature in his own exodus package. He thinks it saves hours of debugging when researchers get confused about what header file to use and what is in their LD_LIBRARY_PATH. Given the way that most supercomputers try and abstract the LD_LIBRARY_PATH from the users (instead of just explaining it), I can understand how this kind of confusion would arise.

It has been proposed that this feature is a necessary requirement for changing the mode flag size, however that is less clear. Changing the API in any way is correctly handled by shared library builds, as long as we upgrade the so number correctly. However, if we are to have such a version check, I am happy to implement it before attempting to change the size of the mode field.

@eli-schwartz
Copy link

The only thing this defends against is LD_PRELOAD loading the wrong soname. Please don't do this as it's extremely inconvenient for Linux distributions and does not solve any known problem.

@edhartnett
Copy link
Contributor Author

@DennisHeimbigner comments in #864 that the HDF5 error checking has caught error mismatches for him several times. I agree, that has also happened to me.

As alluded by @gsjaardema HPC systems often do not give users much visibility into the LD_LIBRARY_PATH. Users just load the netcdf module and the system sets something in LD_LIBRARY_PATH. Meanwhile, they use a netcdf.h that they find, and it may or may not be correct. That is the reason for the check in HDF5.

So @eli-schwartz can I ask you to elaborate on why this makes it extremely inconvenient for Linux distributions?

@DennisHeimbigner
Copy link
Collaborator

For completeness, let me propose this:
The idea is to provide e.g. nc_open32 and nc_open64 (or whatever names)
as well as the original nc_open. We can then do the following:

  1. #define nc_open nc_open64
  2. Build a special .c file that has #undef nc_open
    and provides this code:
    int nc_open(...) {return nc_open32(...);}

This means that any time code is recompiled with the new netcdf header
containing the #define will translate to a call to nc_open64.
Whereas any old code that uses the new library will call a version of
nc_open that then calls nc_open32 as expected.
Anyone see a flaw in this (aside from it being ugly)?

@eli-schwartz
Copy link

eli-schwartz commented Nov 25, 2018

Soname versioning work for every other piece of software known to mankind in all of computing history...

This is inconvenient for Linux distributions, for the same reason it is inconvenient for regular users. It means that rather than trusting the declared compatibility versioning, you hard abort based on a completely different, unrelated versioning system, which isn't even incompatible.

The major.minor.patchlevel version for hdf5 is a superset of the times when actual ABI compatibility has been broken. It includes all soname changes, plus many more changes as well.

However, due to the use of hard aborts based on this entirely misplaced runtime check, the soname simply doesn't matter in the slightest, and matching sonames which should work fine, nevertheless require completely recompiling the entire hdf5 stack without exception. Recompiling the world on every patchlevel release is pretty annoying, it requires involved, domain-specific knowledge to know when you cannot trust the soname versions, and it's difficult to check which software is compiled against a patchlevel release whereas there are trivial tools to check which soname something is compiled against.

Then you end up with situations where some Linux distributions decide they don't want to play monkey games, and they patch out the check like this: https://salsa.debian.org/debian-gis-team/hdf5/blob/master/debian/patches/relax-version-check.patch
(And they get the check wrong, but it doesn't matter because the only check that matters is the soname.)

If you don't trust sonames, why do you use them in the first place? If you do trust them but are worried about compile-time header mismatches, how does that translate to runtime checks, and either way, why not do compile-time checks against the thing which matters, that is, the ABI version also used by the soname?

@DennisHeimbigner
Copy link
Collaborator

I am not sure I understand.

why not do compile-time checks against the thing which matters, that is, the ABI version also used by the soname?
I think the check has to be at runtime because I am assuming that it is possible to
run a client against more than one .so for a given library used by the client. Is this
last assumption false?

@ArchangeGabriel
Copy link
Contributor

I think the check has to be at runtime because I am assuming that it is possible to
run a client against more than one .so for a given library used by the client. Is this
last assumption false?

Yes it is possible, but that is exactly what soname versionning is for: forbid running against an incompatible library.

If a software is compiled against libnetcdf.so.5, it will not be able to run against libnetcdf.so.6 (unless you force it to using LD_PRELOAD). But libnetcdf.so.5 can be provided by e.g. version from 5.2.0 to 5.4.3, and then the .6 one starting with 5.5.0 until 6.0.0 for instance. The important point is to change it when the library is actually incompatible. Not when you release a new patch-level release fixing 3 or even 30 bugs. ;)

@DennisHeimbigner
Copy link
Collaborator

Comment for Ward: what has been our previous experience with bumping the
libnetcdf.so major version?

Several thoughts:

  1. what kind of pushback do we get if we bump the .so major version number?
  2. Has HDF5 routinely used a .so major version bump? It calls into question
    why they do the header/library version check. Why was a .so bump not
    sufficient?

@eli-schwartz
Copy link

eli-schwartz commented Nov 25, 2018

If you have multiple versions of libhdf5.so being loaded by a single program, then you'll have multiple copies of the symbols, and the dynamic linker will hide some extra copies. IIRC it is an operating system implementation detail whether the first loaded or second loaded library wins.

Shortly after, your program will segfault or exhibit other signs of undefined behavior as it attempts to use symbols which it thinks are correct, in ways they were not meant to be used. This will happen regardless of your runtime checks, and in all probability, before you ever try running code containing runtime checks -- but we're dealing with undefined behavior here, so who knows?

This is not a sane environment to be working in. It's not an allowed environment. It happens when your application is correctly linked against two dependent libraries, and each is correctly linked against libhdf5, but wrongly linked against different versions of libhdf5. Each dependent library is therefore part of an entirely different, exclusive software stack. If one is in your LD_LIBRARY_PATH, the other should not be, and this applies regardless of where you separately store your headers. Is this something that is a serious concern to hdf5 or netcdf users? Is this something which is even allowed by the working environment of hdf5 or netcdf users? How do you end up linking your application to multiple different hdf5-using dependencies acquired from multiple incompatible locations that all still exist?

I would not want to rely on a runtime check to catch this.

If that is the concern, then the hdf5 version checking code being held up as an example is wrong. It emits a fatal error claiming that the headers don't match the library, and you should recompile the application... but as far as I can tell, depending on which library was loaded first, the code which loads the data will work fine, but the code which processes it will silently load incompatible symbols and do who-knows-what to your data without the warnings being effective at all.

...

This goes far beyond mismatched headers.

@eli-schwartz
Copy link

As mentioned in #864 (comment)

The use of symbol versioning would allow you to build part of your application against one version of hdf5 or netcdf, and part of your application against another version, and your application would still work.

@WardF
Copy link
Member

WardF commented Nov 25, 2018

@DennisHeimbigner I’m still on thanksgiving travels so will weigh in more fully later. The soversioning is completely separate from the library versioning, and there are rigid rules for how major.minor.revision are incremented. I’ll post the link when Inhave the chance.

@DennisHeimbigner
Copy link
Collaborator

Also, we need to have a solution that works on Windows, OSX, and Cygwin.

@WardF
Copy link
Member

WardF commented Nov 25, 2018

I will catch up on this when I return to the office but the ‘against’ seems fairly straightforward. At a glance I’m not sure if this proposed fix addresses a hypothetical or demonstrated issue but I will review and weigh in (the issue is probably explained in #864). SO versioning is robust and anything that breaks or abandons it at this point would have to be very well justified.

Enjoy your weekend and I’ll review tomorrow.

@eli-schwartz
Copy link

Also, we need to have a solution that works on Windows, OSX, and Cygwin.

Cygwin works exactly the same way Linux works.

OSX uses libfoo.version.dylib whereas Linux uses libfoo.so.version, but aside for the order in which these appear, the same rules apply.

Windows uses private dependencies of everything, and manifests which describe exactly where that dependency is located, so chances of mysteriously loading multiple copies of the library via other libraries pinned the same way but depending on other versions, and having the symbols overwrite each other are... low?

@edhartnett
Copy link
Contributor Author

@eli-schwartz OK, I get your point. Checking versions in this way breaks one of the key features of shared libraries. We don't want to do that.

@opoplawski
Copy link
Contributor

Looks like eli gave all the arguments I would. Use the proper tool for the job. The hdf5 version check has been a pain in my side for the last 13 years that I've maintained the hdf5 package in Fedora. The thought of netcdf taking it up as well is just too much to bear.

@DennisHeimbigner
Copy link
Collaborator

So why did the HDF5 group do it? They must have had a specific reason for using
header version as opposed to .so version? Has anyone ever seen an explanation
from them?

@opoplawski
Copy link
Contributor

Maybe back in 90s it seemed like a good idea. If you are not doing proper API/ABI management it may seem like a reasonable thing to do. But with the availability of modern tools it is completely the wrong thing to do.

@edhartnett
Copy link
Contributor Author

OK, seems like the consensus is that this is not a good idea, because it makes shared libraries much less effective.

I will close this ticket.

@DennisHeimbigner
Copy link
Collaborator

Answer from hdfgroup:Hi Dennis,

I don't know why this was done to begin with.
I went through the bug reports for it. It looks like we intend to fix
the issue, but it is not currently assigned to be fixed in a specific
release. If I find out more I will let you know.
-Barbara

Barbara Jones, The HDF Group Helpdesk, help@hdfgroup.org
Support Services: https://www.hdfgroup.org/solutions/

On Mon, 26 Nov 2018, Dennis Heimbigner wrote:

The netcdf group at Unidata was looking at the header version checking
that is done by the HDF5 library to ensure compatibility between the header
used by client code and the API provided by the HDF5 library.

The question we have is: why did you do this rather than rely on the shared
library
major version number mechanism? Is the rationale behind the header version
checking documented anywhere?

=Dennis Heimbigner
Unidata

--

@edhartnett
Copy link
Contributor Author

OK, well I guess that drives a stake through the heart of this idea. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants