-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding grid information #112
Comments
@headmetal before I forget you might be interested in this feature, as it would allow your live stats tool to discover datasets that can be compared with the live-tracked model. |
Can we use the resolution information given to us in the metadata to do this? Example metadata available from the |
In some cases yes: assuming that information is supplied, you're working with a related set of experiments and the encoding is consistent, e.g. '1 degree' means the same thing across unrelated experiments. The proposal above is partly about creating some standard names we can use to identify common grids. As an example,
It is typical that the ocean/ice models use Arakawa grids, e.g. MOM5 is a B-grid model, MOM6 is C-grid. This means there are are intersecting grids in the models. In some cases there are diagnostics that are output on either the tracer grid (T-grid) or the velocity grid (U-grid). There are even some diagnostics that have a hybrid of the grids with one coordinate T-grid, and the other coordinate U-grid. In the case of diagnostics with mixed coordinates, if there is a reduction along one of the horizontal spatial coordinates, e.g. If there was a way of matching those coordinates then compatibility could be automatically discovered/determined. |
I like this idea. There are a few fiddly bits that come to mind that I'm noting down while they're in my head.
|
cf_xarray is your friend
I would not suggest doing this without something like cf_xarray to do the inspection (even though I failed to suggest using it above).
I don't think I would recommend that approach. Instead I would say build into the grid information tool the idea of a hierarchy of grids, so you can say one grid is equivalent to another, but they have some "quality metric" to say one is superior, i.e. unmasked. Also it should be possible to map from a 1D to a 2D curvilinear grid and request the "best quality" grid with the required dimensionality.
See above. It would require some manual intervention at some point. When a new grid is encountered it might need some inspection to see if it is just another version of an existing well known grid, so that mapping could be added. I can think off the top of my head some heuristics to check if the mappings could be done semi-automatically, e.g.
|
The 1D coordinates don't uniquely describe the 2D coordinates. Is it safe to assume they do? |
Safe enough for our purposes I think. Those connections will mostly be done in a curated way, not automatically, so we'd only be concerned about false positives, and I think they'd be fairly unlikely, and mostly harmless if they did occur. |
Does specifying the "coordinates" attribute make this easier? (e.g. https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_two_dimensional_latitude_longitude_coordinate_variables) e.g. following the cice example, aice_m in om2 has:
But if we add to the attributes: then the grid should be uniquely defined? |
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there: |
Is your feature request related to a problem? Please describe.
The resolution of a dataset is an important piece of information. It can be critical when searching for data to know the resolution, as the representation of physical processes is typically dictated by the resolution of a model. The information on the resolution of a dataset is encoded in the underlying grid coordinates.
Also when comparing datasets knowing what other datasets use the same grid is very useful information, as it allows a comparisons to be made without any time consuming, and often technically demanding, regridding.
Describe the feature you'd like
I want everything, but would be content with a system that extracts grid information when the netCDF files are opened and inspected during the cataloguing process, and saves that grid information in a form that can be queried.
One suggestion is to save grid information into a complementary catalog or tool that can be queried independently of the main ACCESS-NRI Intake Catalog, somewhat similar to the variable suggester tool (#26)
There are number of increasingly convoluted thought bubbles about how to uniquely identify grids in this issue, but the gist is this:
Assuming there are
md5
checksums attributes for all coordinate variables (ideally a independent post-processing step) something like this pseudo-code:Where
coordinates
andgrids
would be serialised to a catalog that could be queried. Thedataset
catalog is already serialised, but thegrid_id
's found indataset
would be added todataset
metadata.So queries could be done to retrieve which datasets contained a given
grid
, and it should be possible to provide a function or the logic required to say if two datasets share a common grid.An issue for the MOM data is that masked data (which is most of it) also has masked coordinates. In an ideal world a post-processing step would fix that, but it hasn't been done in the past so will need to be supported.
In the linked COSIMA Cookbook issue I suggested we could augment the grid information with a
metadata.yaml
file that could give grids useful human-readable names, but also define relationships between grids, perhaps defining the grids with missing data to be equivalent to unmasked grids.Apologies if I've used the wrong terminology above, e.g. datasets, and so made this needlessly confusing.
The text was updated successfully, but these errors were encountered: