-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unit -> units #168
unit -> units #168
Conversation
Automated Review URLs |
See #157 ;) |
I think I have a slight preference for "units", which is probably why I used |
Apparently a new release of bikeshed is now unhappy with the use of `<img/>` and `<img></img>` is required: ``` $ bikeshed spec "latest/index.bs" "latest/index.out.html" LINE 651:1: Tag <img> wasn't closed at end of file. ✘ Did not generate, due to fatal errors Failed ```
Is there a blocker here or can we go ahead and merge this? |
Since #157 is merged now, could you rebase/merge so that those changes don't show up in this PR as its harder to understand, thx. |
cc @joshmoore @sbesson - you guys OK with this proposal? |
At least from my side, I don't have a compelling reason to use the singular
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to note that this is a bit inconsistent. A single axis will just have one unit
. Conversely the other names should change to names
and types
, which doesn't make much sense.
@constantinpape The main goal here is harmonizing our metadata with the CF conventions, and they use But it's reasonable to ask "why is 'units' conventional, rather than just 'unit'". I suspect this because base units are often combined to form compound units, for which the plural form becomes natural. Thus, although the value of Alternatively, maybe "units" is used because an axis denotes an interval, and the length of that interval is measured as a number of ($unit)s? E.g., we say "10 meters" and not "10 meter"? But these theories are just speculation on my part. I searched for some justification for the plural form in the climate science specifications and couldn't find anything. Ultimately, I'm willing to tolerate the pluralized word in exchange for consistency with climate metadata conventions. |
Apparently a new release of bikeshed is now unhappy with the use of `<img/>` and `<img></img>` is required: ``` $ bikeshed spec "latest/index.bs" "latest/index.out.html" LINE 651:1: Tag <img> wasn't closed at end of file. ✘ Did not generate, due to fatal errors Failed ```
@sbesson to your point about handling breaking changes, I modified this PR to a) support the use of the field |
It looks like the change to the schema has been lost? "Units" makes sense to me because you tend to use the plural for describing units. E.g. "10 millimetres" the units are millimetres. |
@will-moore thanks for spotting that issue with the schema, it should be fixed now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This evening at the Zarr community meeting, @d-v-b brought up the fact this one letter change has not been merged. I guess mea culpa, but I'm still unsure of the value of this churn (needing multiple code paths in implementations). If the code paths are going to be sufficiently rewritten by coming 0.5 work, then like @sbesson I'm not strongly tied to the singular. But unification with cfconventions doesn't feel sufficiently valuable from my point-of-view, at least not yet.
I don't really have an opinion about what the ideal name is, though I did just check and see that in the udunits library, the specific unit of "millimeters", etc. would be called a "unit", and "units" refers to the plural of that, i.e. multiple different units: https://docs.unidata.ucar.edu/udunits/current/udunits2-base.xml It isn't particularly difficult to support this minor change in any given implementation, but it adds implementation complexity and likely will lead to incompatibility in the future, as some implementations may only support one of the names. As far as I understand, this is attempting to use the same name as in the CF conventions spec, but does not actually aid compatibility with xarray or other software that follows CF conventions because xarray is actually using the name in a different place in the JSON metadata. |
Empirically, using the exact same name as xarray / the cf conventions is quite convenient compared to using a name that is the same sans one letter. Xarray coordinate variables, as an in-memory representation, do not have JSON metadata so I'm not sure how to interpret this part. But regardless of where xarray, (or any other library that approximates the CF conventions) puts the word "units", as long as we agree that ome-ngff "unit" and xarray "units" describe the same concept, we should consider harmonizing the word we use for this concept, unless there is a good reason to diverge from CF conventions / xarray. I will note this is not just a CF conventions / xarray thing. NIFTI and NRRD also use
Regarding implementation complexity, I am aware of that, which is why this PR is backwards compatible (i.e., it allows both |
As far as xarray, I'm not too familiar with it, but I assumed that it has a way to load/save this attribute from zarr metadata. Perhaps that is not the case, and it is merely an identifier used in the xarray API. From a quick look, I think "unit" is pretty much universally used as the singular term, as in SI base unit like "meter" or derived unit like "5 meters", and "units" is the plural form: https://simple.wiktionary.org/wiki/unit Since this attribute is specifying a single unit, not multiple units, the singular form would seem to make more sense. Note that in both NIFTI and NRRD, the "units" attribute is specifying more than one unit (in case of NRRD, a unit for each dimension, and for NIFTI, a unit for space and a unit for time), so the plural makes sense there. |
We discussed conventional english usage of the word "unit" and "units" earlier in the thread. English itself is inconsistent in when units of measure are pluralized. Since English already pluralizes units contextually, Is there a legitimate concern that people will be confused here? |
An additional data point: DICOM uses |
If I was a saving a velocity field from my particle image velocimetry experiment, in meters per second, then the. I think units may apply in that case. Since we may be mostly talking about pixel pitch for images, perhaps what we commonly mean is a distance measure per pixel, e.g. nanometers per pixel. Units would also be applicable here. Regarding implementations and backwards compatibility, this is a pre-1.0 specification. Under most understandings of versioning schemes, including semantic versioning, there are few if any stability guarantees. In other words, now is the time to break things. |
I happened to see that stackexchange page as well, but it seems to be related to the question of whether a specific unit name should be singular or plural, e.g. "1 meter" vs "2 meters" vs "2-meter pole". I don't see anything on that page about whether the word "unit" itself should be singular or plural. It is true that there are a lot of inconsistencies in English. But overall it seems to me that "unit" is probably at least somewhat preferred over "units" in terms of common usage for this particular case. The spec does also reference UDUNITS so there may be some value in being consistent with the terminology there. According to udunits, a unit like "meters per second" or "0.7 meters per second" would be considered a "derived unit" (singular). In any case, I'm not worried about confusion when a person is reading the metadata ---- the meaning will be obvious even without reading the specification. If this change introduced actual technical compatibility with CF conventions software, I think there would be a reasonable argument to do it. As it is, though, it is merely making the names more similar but does not improve compatibility. If some existing implementations of ome-zarr are already using Although supporting both would be simpler implementation-wise than having to condition on the version, if this change is to be made, I would actually strongly urge that both "unit" and "units" NOT be supported. As I see it, this type of metadata is the sort of thing that is likely to be generated and parsed directly by all sorts of adhoc scripts; each of these scripts can be thought of as OME-zarr implementations. Therefore as this format becomes more popular we can expect there may be a very large number of independent (partial) implementations, and I expect many of them will support only one of "unit" or "units", probably whichever was used in the dataset used as a reference when writing the code. |
I'd second that, or if the support for the two was meant to be transitional, that we discuss that timeline.
In general, I'd agree that pre-1.0 is the time to break things if necessary, but as I mentioned to @d-v-b during the Zarr call, I would really urge us to also consider the costs (i.e. just because it's sub-1.0 doesn't mean we should break things). For example, there was a mention that this was just a one letter change. I'll reference my well-worn story of making it possible to support "/" in addition to "." at the Zarr-level: that took over 6 months of work. I don't think this is nearly as complicated, but I would strongly urge anyone who is opening a PR to change the spec to engage with that overhead before asking the community to invest their time. @d-v-b: you've brought up a few different reasons in the discussion above, can I suggest you update the description of this PR and summarize the various arguments pro and con? Also, if there's not a clear consensus on the value of changing right now, identifying a point in time when there's going to be a breaking change anyway, might help to draw support. |
The goal was to make the support for both
I'm not proposing this change for fun; I think it materially improves the spec. The cost of changing the spec and the cost of enduring warts in the spec both increase with time, so it's imperative that we change things earlier rather than later... assuming we agree on the change 😅
That's a good idea. I will post a summary of pros and cons here, and also add it to the description: Arguments in favor of keeping
Arguments in favor of switching to
|
What do we need to get this closed? Should I rebase on top of the coordinate transformations PR? |
After discussion over in zulip, it's clear that this proposal does not add enough value to warrant inclusion in the spec. |
Change
unit
tounits
.units
is the name used by the CF Conventions, and thereby the broader climate science community -- the xarray ecosystem uses theunits
keyword (see https://docs.xarray.dev/en/stable/examples/multidimensional-coords.html, and https://pint-xarray.readthedocs.io/en/stable/creation.html).Unless there's a very compelling reason for the singular
unit
, we should probably aim for compatibility with the climate science community (and their nice tools) and useunits
.I also added clarification that the
name
andtype
attributes should be strings.I thought I would need to change the schema, but it looks like
units
is already in there?(Update, June 20, 2023): this issue has generated a lively discussion and @joshmoore requested that I summarize the pros and cons of the proposed change:
Arguments in favor of keeping
unit
:unit
overunits
(e.g.). But others preferunits
overunit
, e.g..unit
in its reference xml filesunit
becomes pluralized tounits
, what about the other fields ofaxis
? e.g., shouldname
becomenames
, andtype
becometypes
?Arguments in favor of switching to
units
:units
is more future-proof. If this spec ever allows axis objects to be multi-dimensional, or more generally allows describing the units of multidimensional quantities, then theunits
field could evolve to a list of udunits-compatible strings without needing to change the name of the field itself. The singularunit
fails this test (this might be why CF usesunits
, but i'm not sure).