-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetCDF global attributes vs data variable local attributes #3325
Comments
Ping @zklaus 👍 |
A non-breaking way to implement that may be to introduce two new class members |
@zklaus I was thinking along the same lines. In summary:
So for the case where there is a common attribute shared between To ensure that there is a non-breaking behaviour here, then I think that I'm right in saying that if a user writes to the Hmmm.... this make sense to me. Thoughts? |
Exactly what I was thinking! |
We bumped into this ticket while looking at a related project. Can I check what would happen when you save a cubelist rather than a cube? If cubes in the cubelist had different values of the same global attribute, how would this be saved? If this saved netCDF file was loaded back into Iris would we get the same cubelist and if we didn't, would this matter? |
@jonseddon Good question... Clearly, if there is a However, it begs the question whether a To be honest, I'd opt to separate concerns here. I'd see the debate about |
I had a question about this :) Does it make sense to add global attributes and variable attributes, which I would argue are netCDF-specific concepts, to the cube, which is meant to be format-agnostic? |
Re cubelists: The question is certainly a good one. Note that right now there is no guarantee that loading a single cube from a netcdf file and saving it again will give you the same file. For example,
Seeing as it seems to me that a lot (most?) data has one variable per file (notably of course all cmip and cordex data) I am not sure I would be worried about consistency in cubelist storing so much, at least until we have better consistency in cube storing. Though it is certainly a good idea to keep this in mind so as not to make unnecessary outright contradictory decisions. Re format agnosticism: That is certainly a nice goal, but maybe it needs better definition? It seems that attributes in and of themselves are unsupportable in, eg grib and derived formats. Surely we don't want to abandon them completely. So, is there a format that has attribute support, is supported by iris, and could not be made to work with this model? |
@ehogan From a purely idealistic perspective, I'd agree with you. A For me it's an intention at best, rather than a hard and fast rule. Consider the special way that we handle PP I don't know if this helps answer your question... |
Actually we should have something in GRIB space too, but we don't. |
Sorry for delay, the code is more baroque than I knew ! The code divides possible attributes into various different classes of interest
(1) Class 1 includes either "conventions" or "Conventions" . These are always ignored; we never write a "local" conventions attribute; we always write a global (2) Class 2 have interpreted CF/netcdf meaning handled specially by the loader, e.g. "_FillValue", "valid_min", "standard_name". (3) Class 3 is those which only make sense as local attributes applying to a particular data-variable, e.g. "flag_meanings" (4) class 4 are handled like any other non-local-only (non-class 3) attribute, except that if they get written as local attributes we also raise a warning about it. (5) attributes which are not in class 2/3, and appear on all saved cubes. (6) all others NOTE if there is only 1 cube, class (6) does not really happen, since (5) dominates, except for the specific cases in classes 1/2/3 : This is the same as all cubes having the same attribute. |
IMHO, what comes out of this ...
|
Following up on what @pp-mo just wrote
Some time ago I had a conversation with some CF persons regarding what was/is actually meant with the sentence I was citing
Whether it is the actual existence of same attribute [name] at both places, or the content of the of the two attributes being contradictory. While we did not came to a definitive conclusion we agreed that it was not the intention to only consider the attribute names and totally disregard their content. This would not be the only place where CF is not crystal clear regarding the distinction between "name" and "content" or "spelling" and "meaning". I think that it is useful to think back what was the common situation about 20 years ago (or more), when the first version CF was produced. Then the common practice was that analysts manually looked at files and their content (ncdump style) before doing analyses. Then it was just obvious what to do and how to interpret the data, e.g. that |
Question to the users in the room: what behaviour do you want from merge / concatenate / other operations that make a new cube at the end? When we would make the choice as to whether (for example) a mismatch of these attributes should block merge, we won't know if they're intended to be used (and if they're not then users shouldn't be blocked on that merge). We could
There's also the option (which is simpler, and therefore more quickly implemented, but correspondingly less useful) of setting up a way that the user can access the full local/global attribute information at load time, hold it themselves and then specify its application to the resultant cubes at save time. Would that appeal? |
As one "user in the room" I venture some some thoughts:
File 1 (covering time period 1):
File 2 (covering time period 2):
File 3 (covering time period 3):
That is, someone first made an overall assessment of data quality and found that "generally all this data is good". Later someone looked in more detail and found that that some periods were less reliable and furthermore that this varied between variables. While data quality preferably should be recorded as ancillary quality flags, this example illustrates that it is possible (and perhaps reasonable) to have some kind of complementary (or perhaps hierarchical) global and local attributes. Furthermore it illustrates that an [advanced?] user might want to know the (possibly multidimensional) order in which files were concatenated/merged. I touched on this in a previous comment on another (now closed) issue. Anyway, these are some thoughts that I realise are maybe not so easy to implement . |
Update : my view, as-at 2022-09-07 I feel we already reached a point here were we mostly agree on principles, but then dropped the ball in that we (Iris devs) should have produced a proposal. I've been prompted too by discussions elsewhere, on iris/xarray inter-conversion, which again turned up some of the known roundtrip problems that Iris has. |
The level of outstanding work means this isn't going to make it into Iris |
Update Mon 3rd July:
this effort is now handled in it's own project
please see there for existing task breakdown + progress
At the moment
iris
takes a rational but naive approach to dealing with the local attributes of a NetCDF variable (a variable that becomes a cube) and the global attributes of the NetCDF file that the said variable comes from.That is, the resultant
cube.attributes
will be a combination of both the local and global attributes, where the local attributes will take precedence, and overwrite, common global attributes.From the inception of
iris
, and in the light of no use cases, this seemed like a reasonable thing to do. However, such an approach prevents preservation of the local and global attributes metadata. This is a major issue for many users, who require to preserve all attribute metadata.We require to resolve this issue now in
iris
once and for all 😄Note that, if a solution to this issue was implemented, then it would most likely be a breaking change - caution is needed here.
This is somewhat tangential related to #2352
The text was updated successfully, but these errors were encountered: