-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification of requirements on the attributes of a bounds variable #265
Comments
Hello Martin, The intention, as I recall, was for agreement in meaning, rather agreement in the the exact strings, but that intent clearly didn't make it into the text. If people agree, I would suggest the addition of some suitable wording to make this clear would make this a defect, rather than an enhancement. Thanks, |
Control attributes are fully redundant on bounds variables. The current standard already recommends against them, in two places. Why not simply deprecate them, rather than adding new requirements? |
@davidhassell : OK, I'd be happy to treat this as a @Dave-Allured : I'm not sure why the original choice was made. It would certainly be simpler to ban these attributes on bounds variables, but doubt that it justifies changing the status-quo. |
The original discussions took place on Trac ticket 140 (https://cf-trac.llnl.gov/trac/ticket/140), which was accepted for inclusion into CF-1.7. I'll mark this issue as a defect for now. Deprecating (rather than banning) these attributes is, of course, possible, but I don't feel there is a use case for it at this time. I wonder if there are use cases for bounds variables having, for example, different but equivalent units, or for bounds variables being used independently of their parent coordinate variables (in which case having units, names, etc could be useful). In any event, that perhaps is a discussion for a separate issue, if anyone wants it. |
Hi @davidhassell : thanks. There is a use case for providing attributes on bounds variables in the Trac ticket which you cited (140) : @taylor13 describes a usage where the user needs to know the @JonathanGregory , @taylor13 : can you help with the interpretation of the intention of the discussion in Trac 140 on the precise rule for
|
Evidently we didn't think of this problem. As far as I remember, I had in mind that the strings would have to be exactly the same (B), because it's simpler to check and consistent with them not being needed anyway, which is why they're deprecated in general. |
OK - are these the three choices, then (in no particular order)? i) Drop the word exactly from "must always agree exactly with the same attributes of its associated coordinate, scalar coordinate or auxiliary coordinate variable" (7.1) as rectifying a defect. ii) Keep the word exactly but clarify it to mean "exact string match". This is also rectifying (a different) defect. iii) Drop the word exactly from the sentence as an enhancement. |
Reconsidering, and after briefly reviewing Trac 140, I favor a fourth choice: iv) Martin's original wording, "must exactly match the values of", with no further changes. This rectifies a defect. To me this is simple and sufficient. @davidhassel, I understand what you are trying to clarify in (ii) above, but I think the extra wording is not needed. I retract my earlier comment about deprecating "control attributes". My true preference is to fully ban interpretive attributes on bounds variables, on the rationale that they are only an extension of the parent. This has been said before. But that is outside the scope of this ticket, so I will drop that for now. |
@davidhassell : David A. and Jonathon are both in favour of requiring an exact string match (my option (B), your option (ii)). Are you happy with this interpretation? I agree with Jonathon that this is preferable in order to preserve clarity in the file metadata. |
I don't particularly like the idea of insisting on "exact string match", because it is contrary to how the rest of CF works. For example, if output from one instrument has a data variable with units of I think that it would useful to recommend that exact string matches are used, for clarity, but not to insist on it. |
Dear @davidhassell |
@davidhassell, please consider that bounds variables are unique in relation to ordinary data variables, thus the analogy to instruments with equivalent unit strings is not a very good comparison. A bounds variable should be nothing more than a close extension of the parent coordinate variable. Its only purpose is to provide explicit cell boundary values for each coordinate value. A better analogy would be to actual_range attributes. By their structure alone, these can not have their own independent interpretive attributes, and always inherit from the parent. I see bounds variables in the same light. IMO the practice of varying attribute string values on bounds should be discouraged or prohibited, thus I favor the original intent to prohibit non-exact string values. The message to dataset creators should be simply, "Don't do this." |
Hello @JonathanGregory and all, I don't have a use case for allowing a different string that means the same, and agree that providing exactly the same string is the nearest thing to not providing any string. Therefore, I'm happy to agree with the the "exact string match" interpretation. Thanks, |
Sec. 7.1: Require exact string match for functional attributes on boundary variables. Issue cf-convention#265
There seems to be a consensus. Please review my suggested changes in the above PR. Note that for consistency, I expanded this to all functional attributes on boundary variables, not just |
Apologies, I made a github newbie mistake and failed to squash my commits. Please review changes in the next PR to follow. They will be easier to read. |
Sec. 7.1: Require exact string match for functional attributes on boundary variables. Issue cf-convention#265
To be sure, is the change in text backward compatible or will it make some datasets that were considered conformant with CF now non-conformant? (nb. It is stated in #265 (comment) that "Some CMIP6 data, for example, has been provided with time coordinates using one form and the bounds variable using the other.") |
I assume that @martinjuckes meant that some CMIP files might say |
Dear @Dave-Allured Thanks for the text. I have a couple of comments.
Jonathan |
Um, @martinjuckes did literally say there are existing files with this conflict. First paragraph under Status Quo above: "Some CMIP6 data, for example, has been provided with time coordinates using one form and the bounds variable using the other." Presumably such files are labeled as My strict wording of this requirement was intentional, based on the discussion up to this point. |
Hi @Dave-Allured , @JonathanGregory : Dave is right, the main motivation for raising this was to see if something of the form:
should be considered as valid. My interpretation of the status quo is that the convention is ambiguous, so I don't personally think that a clarification would break with backward compatibility. The strict interpretation, treating the above as an error, is in line with the current behavior of the @Dave-Allured : the last sentence in the 2nd paragraph (line 14) of your text reads "Their data types do not need to be an exact match." I think that "Their" refers to the parent variable and the bounds variable, but grammatically, as written, it appears to refer to the attributes. It might be clearer if placed at the end of line 12? |
@JonathanGregory, thanks for the feedback. Your help with the wording is appreciated. I thought the term "functional attributes" might be understood from context. I was looking for a short collective term that means "attributes that are functionally active in the CF interpretation of the variable". On review, I notice that the original sentence in CF-1.8 is kind of puzzling in its own way.
Is it appropriate here to explain distinction between the first and second groups? What is the best way to describe the general intention here, yet try to remain concise? For the purpose of this section, I think we only care about indicating which attributes are significant in some way to CF interpretation and processing. Let me tentatively merge the two groups, because the distinction may not be locally relevant, and try this simplified wording without "functional":
Do you like this, or have another suggestion? |
@JonathanGregory said:
I added a new short paragraph for this, and expanded a little. Check out this commit: Dave-Allured@44fcdb1 I preserved the opening sentence because it was a general introduction to the concept, and also it was inherited like this from the original in CF-1.8. |
I think that it would be OK to require the same data type as well the same value. In fact I slightly prefer identity of both type and value, because to me it seems that providing something which is exactly redundant is more like providing nothing at all, which is what we want. |
Dear all It would be good if we could conclude this issue, about the attributes of bounds variables, because it has been open for a long while. Considering the previous discussion, I propose the following wording (a revised version of earlier text), to replace the last sentence of the first paragraph of sect 7.1 ("Since a boundary variable ...") and the whole of the next paragraph ("Boundary variable attributes which ..."):
The "I" attributes are Although this began as a defect, I think that adding new "Use" values for attributes (for which some support was previously expressed) is an enhancement, so I'm changing the label. I hope that doesn't slow down the conclusion! What do you think? Best wishes Jonathan |
Jonathan, those changes look good to me. I need to retire from this particular discussion, and abandon my pull request. Also I think this issue #265 should be re-titled to something like "Clarification of attributes on bounds variables". Best wishes for a good resolution. |
Thanks, @Dave-Allured. I have generalised the title, as you suggest. |
P.S. Whomever picks this up, this should be obvious, but please feel free to copy all any or draft wording out of my old pull request. |
Hello, Thanks for new text, Jonathan, which I like, but wonder if the Should The new text doesn't mention that the bounds variable must be of the same data type as it's parent coordinate variable. Am I right in thinking that that was agreed? Thanks, |
Dear David
Actually, in my text, which I think is consistent with the discussion, there is no situation where the bounds attributes overrides the parent attribute. If it's inheritable, it should be omitted, and if included it must be exactly the same as the parent, so it's not overridden, just repeated. If it's not inheritable, and not present, it's undefined for the bounds, rather than inherited. I think two new "Use" values are needed (rather than one) because there are three situations to distinguish. As well as the two just mentioned, there is also a group of attributes which should not be used for bounds, because they are inappropriate.
As you said in your earlier comment,
Yes, it was agreed, and that's what the first paragraph says. "If such an attribute is included, its data type and its value must be exactly the same as the parent variable's attribute." Best wishes Jonathan |
Having had a very useful offline conversation with Jonathan, I now see that some of my suggestions weren't at all right, for which apologies. Specifically, my suggestion about "B" usage type in appendix A, and my comments on the data type of bounds variable. We did, however, also come up with a changes to the proposed text.
The proposal then could then become (new text in bold)
The "BI" attributes are axis calendar For the compression by gathering sissue, I suggest appending to the second paragraph in section 8.2 Lossless Compression by Gathering, new text in bold:
Thanks, |
Dear David Thanks for the discussion and for making the text simpler and better. Regarding section 8.2, I don't think we can say it's not a coordinate variable, because formally it is one, and the paragraph you've quoted begins by saying that it is. I'd suggest we avoid this by not giving a reason for the prohibition. Instead of your bold sentence, I'd propose a shorter one: "The list variable must not have an associated boundary variable." Best wishes Jonathan |
Thanks, Jonathan - your text for 8.2 is much better. In full we have:
|
Hello - unless there are any objections, I shall write up these agreed changes in a PR, with the aim of meeting the CF-1.11 deadline. Thanks, |
Hello @martinjuckes, @Dave-Allured, @JonathanGregory, @taylor13, and @sethmcg, As participants in this conversation, it would be great if you could look at the associated PR #467 and confirm if you are happy with it (or not). The deadline for getting changes into CF-1.11 is Monday 13th November 2023 (i.e. 3 weeks before the expected release date of 4th December), so if you are able to look at it please do so soon. We welcome, of course, a review from anyone else who is interested. Many thanks, |
I support the changes suggested. A couple questions regarding the text: I had trouble understanding the last sentence regarding inherited attributes. I would suggest reordering the information in the last two sentences. Is the following better?
I also found it odd that we even mention bound variables in conjunction with list variables. Aren't list variables invariably integer pointers to where the defined values of an array should be inserted into an array, skipping the elements that are missing? Why would indices of anything have bounds? If I understand this correctly, I'd rather we not muddy things by mentioning boundary variables here. That being said, I support the proposal no matter what the wording. |
Hi @taylor13, Thank you for the review. I like your suggested change to chapter 7 ("It is recommended not to include any of these attributes on a bounds variable, but it is technically not forbidden to include a BI attribute as long as it is also present in the parent coordinate variable"), and will put it into the PR. On list variable bounds, you are indeed right that bounds are nonsense, but currently the conformance checker has no rule on enforcing this, so we need to make it clear in the text. |
I realise that I missed a bit of Karl's suggested text ("(i.e., there must be an exact match)"). I'd like to suggest a rewording around that:
|
Dear @davidhassell, Karl @taylor13 et al. I support the intention and would like to suggest some further small changes to words. I feel that it's not clear whether the "Use" sentence in Appendix A is talking about variables or attributes:
I propose:
In Chapter 7, I propose that we remove some duplication, and the word "technical", which I don't think makes the text clearer, and that we should use the phrase "boundary variable" consistently:
Best wishes Jonathan |
Dear @JonathanGregory . I think your text is an improvement. I might further tweak the following: In Appendix A, REPLACE your suggested text with: WITH: Each attribute may be used in any of the ways shown in its "Use" entry. G indicates it can appear as a global attribute, and Gr as a group attribute; if use of an attribute is restricted to certain kinds of variables this is indicated as follows: C for variables containing coordinate data, D for data variables, M for geometry container variables, Do for domain variables, BO for boundary variables, BI for a variable that can be inherited by a boundary variable from a parent, and - for variables with some other purpose. (I don't think it is necessary to mention here how the BI attributes might (contrary to our recommendations) might be also included explicitly with the boundary variable.) In Chapter 7, REPLACE: It is recommended not to include a BI attribute on a boundary variable, but it is not forbidden to include a BI attribute provided that it is also present in the parent variable and that it exactly matches the parent variable's attribute, i.e. the data type and value must be exactly the same. WITH: It is recommended that BI attributes not be included on a boundary variable, but this is not absolutely forbidden. If a BI attribute is included, it must also be present in the parent variable, and it must exactly match the parent attribute's data type and value. |
Dear Karl @taylor13 For the sake of further simplicity:
and
If it was absolutely forbidden, we would say it MUST NOT be included. I don't think we need to say what isn't the case as well as what is the case. 😄 Moreover, the next sentence implies that it's allowed, otherwise we wouldn't bother to stipulate these conditions. Best wishes Jonathan |
HI Jonathan,
Yes, that is better still. Thanks.
Best wishes,
Karl
From: JonathanGregory ***@***.***>
Reply-To: cf-convention/cf-conventions ***@***.***>
Date: Friday, November 17, 2023 at 5:32 AM
To: cf-convention/cf-conventions ***@***.***>
Cc: "Taylor, Karl E." ***@***.***>, Mention ***@***.***>
Subject: Re: [cf-convention/cf-conventions] Clarification of requirements on the attributes of a bounds variable (#265)
Dear Karl @taylor13<https://urldefense.us/v3/__https:/github.com/taylor13__;!!G2kpM7uM-TzIFchu!yBCdlWo3TC9ieKDvIM15VrummBOdzJVlKPWEcf4-DTdqz8Js-dtpb7Reg48doM4oGehSUzDnZ91nYIZ66OMxF8YYdqY$>
For the sake of further simplicity:
Each attribute may be used in any of the ways shown in its "Use" entry. G indicates it can appear as a global attribute, and Gr as a group attribute; if use of an attribute is restricted to certain kinds of variables this is indicated as follows: C for variables containing coordinate data, D for data variables, M for geometry container variables, Do for domain variables, BI and BO for boundary variables (see <<cell-boundaries>> for the distinction between BI and BO), and - for variables with some other purpose.
and
It is recommended that BI attributes not be included on a boundary variable , but this is not absolutely forbidden. If a BI attribute is included, it must also be present in the parent variable, and it must exactly match the parent attribute's data type and value.
If it was absolutely forbidden, we would say it MUST NOT be included. I don't think we need to say what isn't the case as well as what is the case. 😄 Moreover, the next sentence implies that it's allowed, otherwise we wouldn't bother to stipulate these conditions.
Best wishes
Jonathan
—
Reply to this email directly, view it on GitHub<https://urldefense.us/v3/__https:/github.com/cf-convention/cf-conventions/issues/265*issuecomment-1816439011__;Iw!!G2kpM7uM-TzIFchu!yBCdlWo3TC9ieKDvIM15VrummBOdzJVlKPWEcf4-DTdqz8Js-dtpb7Reg48doM4oGehSUzDnZ91nYIZ66OMxXMnCnko$>, or unsubscribe<https://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/ABGDDH3AVXOJPSAEOIN43ZTYE5ROLAVCNFSM4M3QF632U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBRGY2DGOJQGEYQ__;!!G2kpM7uM-TzIFchu!yBCdlWo3TC9ieKDvIM15VrummBOdzJVlKPWEcf4-DTdqz8Js-dtpb7Reg48doM4oGehSUzDnZ91nYIZ66OMxVVjtMog$>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Dear Jonathan and Karl - thanks for your good suggestions. They are now in the PR: a711f7d. |
Hello, 3 weeks have passed with the only comments being on presentation and clarity, which have been resolved. Could someone please merge PR #467? Many thanks to everyone who took part in this discussion. |
Thanks, @davidhassell. I have resolved a conflict and will merge it now. |
Clarification of requirements on
calendarattributes of a bounds variableModerator
None at present
Moderator Status Review [last updated: 2020/05/07]
Just posted
Requirement Summary
Express clearly what constitutes a valid value for the
calendar
attribute of a bounds variable.During the discussion, the requirement was broadened to consider all attributes of bounds variables.
Technical Proposal Summary
Clarify the text in Section 7.1 expressing the requirement for equivalence of attributes between bounds and associated coordinate variables.
Benefits
People using the CF convention to describe coordinate bounds.
Status Quo
In Section 7.1 of the current standard it is stated that the
calendar
attribute of a bounds variable, together with other listed attributes, "must always agree exactly with the same attributes of its associated coordinate". This appears clear, but there is some ambiguity becausenoleap
and365_day
values for thecalendar
attribute have exactly the same meaning. Some CMIP6 data, for example, has been provided with time coordinates using one form and the bounds variable using the other.Does the reference to exact agreement in the standard mean agreement in string value, or agreement in meaning?
The other attributes referred to in the same sentence are
units
,standard_name
andpositive
. The same issue could arise with theunits
attribute, in that it is possible, in many cases, to express the same logical unit with a range of different string values.I do not have a strong preference here .. but it appears simplest, both in terms of the structure of the standard and the keeping things simplest for users, to insist on an exact match of the attribute values.
Detailed Proposal
Replace "must always agree exactly with" with "must exactly match the values of".
Pull request
#467
The text was updated successfully, but these errors were encountered: