Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase number of mode flags for create/open #864

Closed
edhartnett opened this issue Feb 16, 2018 · 31 comments
Closed

Increase number of mode flags for create/open #864

edhartnett opened this issue Feb 16, 2018 · 31 comments

Comments

@edhartnett
Copy link
Contributor

One topic that came up in #177 was the upcoming scarcity of mode flags. I don't wish that complex conversation to get mixed up with #177, so I started this issue to dicuss.

After PIO and user-defined formats are added, I have this:

#define NC_NOWRITE       0x0000 /**< Set read-only access for nc_open(). */
#define NC_WRITE         0x0001 /**< Set read-write access for nc_open(). */
#define NC_CLOBBER       0x0000 /**< Destroy existing file. Mode flag for nc_create(). */
#define NC_NOCLOBBER     0x0004 /**< Don't destroy existing file. Mode flag for nc_create(). */

#define NC_DISKLESS      0x0008  /**< Use diskless file. Mode flag for nc_open() or nc_create(). */
#define NC_MMAP          0x0010  /**< Use diskless file with mmap. Mode flag for nc_open() or nc_create(). */

#define NC_64BIT_DATA    0x0020  /**< CDF-5 format: classic model but 64 bit dimensions and sizes */
#define NC_CDF5          NC_64BIT_DATA  /**< Alias NC_CDF5 to NC_64BIT_DATA */

#define NC_CLASSIC_MODEL 0x0100 /**< Enforce classic model on netCDF-4. Mode flag for nc_create(). */
#define NC_64BIT_OFFSET  0x0200  /**< Use large (64-bit) file offsets. Mode flag for nc_create(). */

/** \deprecated The following flag currently is ignored, but use in
 * nc_open() or nc_create() may someday support use of advisory
 * locking to prevent multiple writers from clobbering a file
 */
#define NC_LOCK          0x0400

/** Share updates, limit caching.
Use this in mode flags for both nc_create() and nc_open(). */
#define NC_SHARE         0x0800

#define NC_NETCDF4       0x1000  /**< Use netCDF-4/HDF5 format. Mode flag for nc_create(). */

/** Turn on MPI I/O.
Use this in mode flags for both nc_create() and nc_open(). */
#define NC_MPIIO         0x2000
/** Turn on MPI POSIX I/O.
Use this in mode flags for both nc_create() and nc_open(). */
#define NC_MPIPOSIX      0x4000 /**< \deprecated As of libhdf5 1.8.13. */

#define NC_INMEMORY      0x8000  /**< Read from memory. Mode flag for nc_open() or nc_create(). */

#define NC_PNETCDF       (NC_MPIIO) /**< Use parallel-netcdf library; alias for NC_MPIIO. */

#define NC_PIO           0x0040  /**< Use PIO. Mode flag for nc_open() or nc_create(). */
#define NC_UF0           0x0080  /**< User-defined format. */
#define NC_UF1           0x0002  /**< User-defined format. */

Not too many open slots left! What happens when we run out? Currently we have:

EXTERNL int
nc_open(const char *path, int mode, int *ncidp);

I don't think we can change this prototype without introducing warnings in user code.

Some possibilities:

EXTERNL int
nc_open2(const char *path, long long mode, int *ncidp);

Or

EXTERNL int
nc_open2(const char *path, int mode, int mode2, int *ncidp);

Not an immediate issue, but just capturing discussion up to now.

@DennisHeimbigner
Copy link
Collaborator

Another thought. I cannot see how ncdump will have a way to process files
created using user formats.

@gsjaardema
Copy link
Contributor

gsjaardema commented Feb 16, 2018

I don't think you will get a warning if you change the mode argument to "long long mode" and the client continues to pass an "int" since it is a widening conversion and there is no potential for loss of data.

I made a similar change to exodus several years ago and it didn't result in any errors for clients that were still passing int instead of long long.

s1018191:build(master)> cat test.c
#include <stdio.h>

void test(long long mode) {
  fprintf(stderr, "%lld\n", mode);
}

int main() {
  int my_mode = 2;
  long long lmode = 3;
  test(my_mode);
  test(lmode);
}
s1018191:build(master)> gcc-mp-6 -Wall -pedantic test.c
s1018191:build(master)> clang-mp-6.0 -Wall -pedantic test.c

@DennisHeimbigner
Copy link
Collaborator

I think you are correct. Since in the API we always pass the mode by value,
your argument seems sound.

@DennisHeimbigner
Copy link
Collaborator

But only if the user code was compiled with the new API. Otherwise
there is an argument mismatch.

@gsjaardema
Copy link
Contributor

Correct. It may be good to add code similar to HDF5 and exodus which can detect when a client links to a library which is a different version than the include file / API they compiled against. I always hate when I get the warning/error about version mismatch, but it always points out a problem with my build, so I am glad it is there.

@edhartnett
Copy link
Contributor Author

edhartnett commented Feb 16, 2018

@DennisHeimbigner I get now what you mean with ncdump. I have added a discussion under ticket #177.

@edhartnett
Copy link
Contributor Author

@gsjaardema excellent suggestion for using long long.

Also I suggest you create an issue for your idea of noticing a mismatch between netCDF header and built library. Seems like a good idea.

@edhartnett
Copy link
Contributor Author

I have tried this out (converting mode flags to long long) and indeed it works well. No errors or warnings are emitted by any of the the test code, which reassures that this would be safe for users.

@gsjaardema are you familiar with how HDF5 and exodus detect a library/header mismatch? I can add this to the netCDF build system(s) as well.

@gsjaardema
Copy link
Contributor

The method that exodus uses which is similar to hdf5 is as follows:

  • The API ex_open() and ex_create() functions are macros which take the arguments from the client and then add an additional argument which is the version of the library as it appears in the include file:
#define ex_open(path, mode, comp_ws, io_ws, version)                                               \
  ex_open_int(path, mode, comp_ws, io_ws, version, EX_API_VERS_NODOT)

In the compiled library, there is the following code:

int ex_open_int(const char *path, int mode, int *comp_ws, int *io_ws, float *version,
                int run_version)
{
... deleted ...
  if (run_version != EX_API_VERS_NODOT && warning_output == 0) {
    int run_version_major = run_version / 100;
    int run_version_minor = run_version % 100;
    int lib_version_major = EX_API_VERS_NODOT / 100;
    int lib_version_minor = EX_API_VERS_NODOT % 100;
    fprintf(stderr,
            "EXODUS: Warning: This code was compiled with exodus "
            "version %d.%02d,\n          but was linked with exodus "
            "library version %d.%02d\n          This is probably an "
            "error in the build process of this code.\n",
            run_version_major, run_version_minor, lib_version_major, lib_version_minor);
    warning_output = 1;
  }

Basically, in the library, the EX_API_VERS_NODOT is expanded at library compilation time and the EX_API_VERS_NODOT in the include file ex_open() macro is expanded at application build time. If the same library source code is used for both, the versions match and everything is ok.

The HDF5 version is more complex and lets the user specify an environment variable to ignore the version check.

This method introduces a little confusion for debugging since there is no 'ex_open()function to set a breakpoint in, the funtion isex_open_int()` instead. But, it has detected problems with the building/linking many times and saved some confusing debugging time...

@edhartnett
Copy link
Contributor Author

That seems kind of ugly to me.

What are we trying to catch? A user does a yum update netcdf-dev and gets a new library version. Then he uses a netcdf.h file that he has squirreled away somewhere from a previous version?

Why would the user do such a thing? How does it come about?

@gsjaardema
Copy link
Contributor

Or, the user has multiple installations using different compilers; some parallel, some serial. During the build process, or at runtime if using shared libraries, the path to the library is specified incorrectly and the include file doesn't match the library. Typically more frequent at runtime with shared libraries when the LD_LIBRARY_PATH is incorrect and the path wasn't compiled in.

I guess we have different definitions of "ugly" vs "simple easy check that catches stupid mistakes in a few lines of code and eliminates a few hours of debugging"

@edhartnett
Copy link
Contributor Author

OK, I can understand how it can happen with multiple installations and an incorrect LD_LIBRARY_PATH. I've done that myself.

@opoplawski
Copy link
Contributor

Correct. It may be good to add code similar to HDF5 and exodus which can detect when a client links to a library which is a different version than the include file / API they compiled against. I always hate when I get the warning/error about version mismatch, but it always points out a problem with my build, so I am glad it is there.

Please don't do this - I've been fighting for years to get HDF5 to drop the header version check. Instead do proper ABI management via soname, symbol versioning, etc.

@DennisHeimbigner
Copy link
Collaborator

Pardon if I repeat. The issue we are trying to solve is this:

  • we want to increase the size of the mode argument to nc_open and nc_create
    from the current 32 bits upto 64 bits.
    This will not work correctly if the client and the library differ in their assumption
    about the size of the mode argument.
    Have you an alternate suggestion to address this problem?

@edhartnett
Copy link
Contributor Author

Actually it will work just fine to increase the size of the type the nc_open/nc_create command.

Existing C code will automatically promote an int to a long long, so existing code will work.

We will have to change the Fortran library to handle the new type size, so people will have to upgrade their C library when upgrading their fortran library (which they should be doing anyway - no one should be using a new fortran library with an old C library, that is not guaranteed to work.)

So I believe we can just increase the size of the mode parameter and start making use of the extra 32 bits for additional mode flags.

As @opoplawski alludes, using the proper ABI management (which we do) will allow client programs to know that they must recompile to use the new version of the library. Static library users will already understand that they have to re-compile to get a new version of the library in use.

@opoplawski I would be very interested in hearing what you don't like about the HDF5 header version check.

@DennisHeimbigner
Copy link
Collaborator

One point to clarify. Remember that we will be dealing with compiled code
in many cases. That is, the client code will have been compiled to assume a mode
of 32 bits, but the libnetcdf.so will have been compiled to assume a 64 bit argument.
So, there will not be any complaint, but the argument list will be screwed up on the stack.

@edhartnett
Copy link
Contributor Author

@DennisHeimbigner this issue is solved by increasing the so version number, which will then cause client programs to not use the library as a drop-in binary replacement. Recompile will be required to use the new version of the library. This is all part of the shared-library facility. So we don't have to worry about that, as long as we remember to correctly increment the so number.

@DennisHeimbigner
Copy link
Collaborator

Not quite. That is why the header version info is required. I can, for example,
build libnetcdf specifying any version of libcurl in LDFLAGS. The only thing that
prevents failure is the ensuring that the libcurl.so that I used is consistent with
the curl.h that I used to build libnetcdf. That is, of course, what HDF5 is doing.

@edhartnett
Copy link
Contributor Author

edhartnett commented Nov 25, 2018

@DennisHeimbigner I understand your point. Mismatches between header and library are possible.

My point is that changing the API, with shared library builds, will prevent binary code mismatches if they are using the correct header (as they are supposed to be doing).

I have started a new issue for just the header version checking. If there is a consensus that it is a good idea I will be happy to implement.

If it is to be implemented, I'm happy to do so, and then increase the size of the mode flag. If it is not to be implemented, I can just change the mode flags. Ether way, it's fine by me. ;-)

@DennisHeimbigner
Copy link
Collaborator

I do not know about you, but I have hit the HDF5 version mismatch error on a number of
occasions, so using the wrong header does happen.

@edhartnett
Copy link
Contributor Author

@opoplawski this topic has come to the fore. Can you give a more detailed explanation of why you think HDF5 should drop the header check? (I will communicate this explanation to the HDF5 team as well, we were just discussing this today.)

@DennisHeimbigner
Copy link
Collaborator

So basically, the verification process should involve something like this:

  1. the netcdf library being accessed should have a built in constant representing
    its version.
  2. the client code must contain a constant representing the version of library for
    whose header files it was compiled
  3. At run-time, the client code must query the library with its (the clients) version constant
    and the library must compare with its internal version constant.
  4. If the client's version is not compatible with the libraries constant, then an error must
    be signalled.
    Is this correct?

@edhartnett
Copy link
Contributor Author

If the lib file holds the version number in a global var and the version number is in netcdf.h, then those could be compared. Is that what you mean? The library global var is fixed when it is compiled, so if the user builds with a different netcdf.h, the library can throw an error.

But I am not sure this is necessary. Lots of trouble is taken to ensure that the netcdf.h files is backward compatible, so using a newer one will not break anything. And using an older one, which does not contain a prototype or constant, the user will get a compile error. So what does checking the header version really do?

@DennisHeimbigner
Copy link
Collaborator

The problem case is this:

  1. client program was compiled with header files that declared the mode flag
    of e.g. nc_open as 32 bit.
  2. the netcdf.so that is dynamically loaded was built with header files
    that declared the mode flag to nc_open to be 64 bit.
    So, the code thinks it is getting a 64 bit integer on the arg stack, but the client
    code has put a 32 bit integer on the arg stack.

@edhartnett
Copy link
Contributor Author

You know what, let me try it out a bit today on a branch and we will see what we can make happen.

@opoplawski still eager to get your input on this, since you were the one who raised the objection...

@DennisHeimbigner
Copy link
Collaborator

In any case, when/if we expand the size of the mode flag, we should
consider doing 2 things:

  1. see if there are other argument types that need to expand to 64 bits
  2. bump the primary version number to version 5.

@opoplawski
Copy link
Contributor

The proper way to signal the version of the ABI is with the library's soname/soversion. When the library's ABI changes, you bump the SO version. Otherwise you leave it alone and let previously linked programs take advantage of any bug fixes in a compatible update. Conflating the library's release version with the API/ABI is a mistake - and one that I recently argued against netcdf from making itself.

See also https://autotools.io/libtool/version.html and https://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.html

@WardF
Copy link
Member

WardF commented Jun 17, 2019

Thanks @opoplawski ; the second link you provide is the guide I use during a release to update the SO version. I use the ABI reporting tool to generate a change report between the previous version and pending release version and, based on the information therein, update the SO version according to the gnu.org recommended guidelines.

@edhartnett
Copy link
Contributor Author

OK, to summarize, there is a tricky method used by HDF5 to check version of header and compiled library.

However, it monkeys with the usual shared library system, which already allows for the fact that sometimes the compiled library may be updated, and everything is still compatible, so we should all chill.

I believe the conclusion is that we should NOT do the header version checking. I think we have a convincing case here that this is a step too clever. Though it sometimes reveals a problem, other times it causes a problem. So we should not use it and rely on other methods for the user to use the correct header file.

I will also note that with the rareness of changes to the netCDF API, the danger of using a different header file is quite minimal. This is not as true for HDF5, where they change things more frequently.

@WardF
Copy link
Member

WardF commented Jun 18, 2019

I would agree with the summary @edhartnett presents above, for the reasons he succinctly states. We would be potentially trading one set of issues for a whole new set of issues that my gut tells me we would encounter far more often.

@DennisHeimbigner
Copy link
Collaborator

Ed- I would appreciate some clarification. You write:

Though it sometimes reveals a problem, other times it causes a problem.
Can you give an example of each?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants