Skip to content
This repository has been archived by the owner on Oct 11, 2019. It is now read-only.

A lightweight CF model #10

Open
ocefpaf opened this issue Sep 22, 2015 · 14 comments
Open

A lightweight CF model #10

ocefpaf opened this issue Sep 22, 2015 · 14 comments

Comments

@ocefpaf
Copy link
Member

ocefpaf commented Sep 22, 2015

Since the first time I saw iris I fell in love with its interpretation of the CF-conventions*. It is not a simple metadata bookkeeping like column/index labels in pandas or xray, nor a "bag" dictionary holding all the metadata. It is a full-fledged CF-convention parser to create a Python object. Propagating units, checking for compliance, etc.

I am completely unfamiliar with the tool iris uses to do this (pyke) and I never looked in to the details of the implementation. However, it would be extremely useful if we could take the approach used in cf_units and create a standalone module that generates a CF-python object. This object could be used by iris to create the cube. And it would also be possible to use it to create other objects, maybe even a CF xray.Dataset.

Note that there is a cf_python module out there, but I never looked if it fits our needs. (Well... We have to define our needs first don't we?)

I believe we do not have the manpower to do this right now, but I wanted to open the issue here to keep this idea alive and start a discussion.

Pinging @pelson and @rhattersley. Are you 👍 or 👎 ? Do you think this is possible? Do you
think this is useful? Or do you think this is a wild-goose chase?

* The truth is that this is a love-and-hate relationship. The CF interpretation is so good that it brings all of CF shortcomings to the cube 😜

@rhattersley
Copy link

Thanks for the ping. 😄

I am completely unfamiliar with the tool iris uses to do this (pyke)

Pyke is not core to Iris at all. It's just happens to be used to translate CF-netCDF files into Cubes, but it's the Cube which embodies CF in a Python object.

it would be extremely useful if we could take the approach used in cf_units and create a standalone module that generates a CF-python object. This object could be used by iris to create the cube.

I'm keen to explore a core + optional extras model with Iris (e.g. https://github.com/SciTools/iris-extras/issues/7 and SciTools/iris#1789). The improving package/dependency management tools make it more feasible for us to pull capabilities out of the core Iris package and into extension packages. In the logical conclusion of that model the "CF-python object" is the Cube. I'm guessing you don't see things in quite the same way though, so I'm eager to understand the difference. Speaking of which...

it would also be possible to use it to create other objects, maybe even a CF xray.Dataset.

How would a "CF xray.Dataset" differ from your "CF-python object"?

The CF interpretation is so good that it brings all of CF shortcomings to the cube 😜

I think you once said something roughly equivalent to "I use xray by default and iris when I need CF compliance". I'd love to know more about what makes you reach for Iris.

@rsignell-usgs
Copy link
Member

I had a long talk with @kwilcox about this yesterday.

The CF model itself actual consists of functionality that can be separated: unit conversion, vertical coordinate calculation, standard_name manipulation, handling of different common data model featureTypes (Grid, Point, TimeSeries, TimeSeriesProfile, Profile, Trajectory, TrajectoryProfile). Grid handles only data which is colocated with coordinate values.

To handle many of the newer oceanographic, atmospheric and hydrologic models, we also need support for grids where the data is not colocated with the coordinate data (staggered grid) and data which is on non-rectangular mesh (unstructured grid). This was the motivation behind the UGRID and SGRID conventions, and the "pyugrid" and "pysgrid" packages.

We were thinking that if these packages could provide standard methods for these regular grid, ugrid or sgrid objects (e.g. subsetting and regridding methods that return specific featureTypes) then they could be passed into functions that would do things like return a vertical transect along a specified path, regardless of the type of object. And folks who come up with some other type of model feature type (possible spectral representation for FEM models like Imperial College ICOM model) could create their own package, as long as they provided the appropriate methods.

Could Iris be the package that orchestrates this functionality?

I don't see why not. The main things that keep me from using Iris more are: (1) awkward slicing on coordinate values (e.g. compared to Xray); (2) long time to open and inspect a dataset; (3) lack of a dataset concept; (4) monolithic structure.

Yet (1) is probably easily overcome, (2) may be just a question of learning how to inspect a dataset with Iris (using raw over strict), (3) may not be a real problem as long as cube lists don't actually duplicate coordinate data and (4) is being worked on.

@ocefpaf
Copy link
Member Author

ocefpaf commented Sep 23, 2015

Pyke is not core to Iris at all. It's just happens to be used to translate CF-netCDF files into Cubes

I did not say "core of iris." But bare in mind that 99.99% of the time our data is in the netCDF format. That means pyke, for us, is the CF parser in iris.

but it's the Cube which embodies CF in a Python object.

What we imagine is an object one step behind the cube. Maybe just a new netCDF object with some CF modifications and checked for compliance, or a dict of dicts mapping nc.variables and nc.dimensions to CF definitions. I must sound like an 8 year old wishing for a dirty bike with a rocket 🚲 + 🚀

I'm keen to explore a core + optional extras model with Iris (e.g. SciTools/iris-extras#7 and SciTools/iris#1789).

I guess that the grid support, like pyugrid and pysgrid, fall into the optional extra models category.

In the logical conclusion of that model the "CF-python object" is the Cube. I'm guessing you don't see things in quite the same way though, so I'm eager to understand the difference. Speaking of which...

The cube is more than the CF-object, and that is the main problem. My imaginary CF-object would be a lighter cube-like constructor behind the cube. Here are some examples of why we want something like this:

How would a "CF xray.Dataset" differ from your "CF-python object"?

There is no "CF xray.Dataset" yet, but the CF-python object would help create it. One could add vertical coordinate to the Dataset using the information parsed by the CF-python object. If someone wants to do this in xray right now they would have to re-invent the wheel. CF-python object would provide the wheel parts for this task and it will no longer be re-inventing the wheel but rather "assembling the wheel."

Maybe these two example will help:

  • 1 The iris.cube is great for interactive work, but it is heavy and clunky to be used otherwise. Recently @rsignell-usgs and I had an exercise trying to write a script to convert epic data (pre-CF standard) to CF, while calculating the speed based on the u, v using iris, xray, and raw netCDF4-python. iris allowed us to write this with a minimal code and took care of most of the metadata, like units, coords, etc. But it was slow and did not scale very well. xray was faster, but we had to manually care of more metadata than we would like. As expected netCDF4-python had the best performance, but required several orders of magnitude more LOC than xray and iris.
  • 2 In the vertical coordinate module I am writing I am re-inventing the wheel when it comes to parsing the formula terms. Iris does this very well. But again, we do not want that hard-wired into the cube for speed and flexibility. We would like to create an xray.Dataset with the vertical coords, to take advantage of the pandas-like indexing, or having just an array to use as input for an isosurface module. Going through to cube adds some extra run time that tools like sci-wms cannot afford to waste.

If we could have an intermediate object maybe we could do this:

formula_terms = awesome_cf_object.get_formula_terms()

The formula_terms would be a mapping to the formula terms vars, dimensions, standard_name, etc. All parsed in a similar way iris does and checked for compliance.

I think you once said something roughly equivalent to "I use xray by default and iris when I need CF compliance". I'd love to know more about what makes you reach for Iris.

I am writing a blog post about this can you wait for it? 😜

@rsignell-usgs
Copy link
Member

@lesserwhirls and @dopplershift, I'm bringing you guys into this discussion too, because it would be great if we could all be working toward harmonization of access in python to the common data model featureType objects, and I know you are working on the Siphon API for accessing Unidata technologies.

@rhattersley
Copy link

I am writing a blog post about this can you wait for it? 😜

Depends how long I need to wait... 😜

@ocefpaf
Copy link
Member Author

ocefpaf commented Apr 25, 2016

Ooops. My laptop died with that post and never configured the new one for the blog... Sorry.

@rhattersley
Copy link

Are you planning to create a new post? Either way, I'd still love to know more about what helps/hinders your usage of Iris.

@ocefpaf
Copy link
Member Author

ocefpaf commented Apr 26, 2016

Are you planning to create a new post?

Yes. As soon as I have some free time to restore my old HDD.

Either way, I'd still love to know more about what helps/hinders your usage of Iris.

In a gist the post will be about how the CF model in iris helps our workflow.

PS: The hinders are mostly the slicing (the reason why xarray is so popular is the panda-like slicing) and the lack of support for 2D coordinates (99% of oceans models use 2D coords).

@rhattersley
Copy link

Yes. As soon as I have some free time to restore my old HDD.

Super! Thank you! 😄

The hinders are mostly the slicing...

I'm trying to get a shared plan together for that: https://github.com/SciTools/iris/wiki/IEP-1

@ocefpaf
Copy link
Member Author

ocefpaf commented Apr 26, 2016

I'm trying to get a shared plan together for that: https://github.com/SciTools/iris/wiki/IEP-1

Awesome! I made a few comments here:

https://via.hypothes.is/https://github.com/SciTools/iris/wiki/IEP-1

I guess that hypothes.is needs chrome/chromium to work.

@rsignell-usgs
Copy link
Member

rsignell-usgs commented Apr 26, 2016

@rhattersley Here's an example that shows the kind of thing that hinders usage of Iris. In this notebook, the user just wants to do something very simple and common: extract time series data in a specified date range and plot them up:
https://gist.github.com/rsignell-usgs/13d7ce9d95fddb4983d4cbf98be6c71d

Not only is the xarray syntax a lot simpler, but it's a lot faster. The speeds are listed in the notebook, but I'm summarizing them here:

Xarray: 1 loop, best of 3: 857 ms per loop
Iris: 1 loop, best of 3: 1min per loop

Xarray is 60 times faster!

@rhattersley
Copy link

I guess that hypothes.is needs chrome/chromium to work.

@ocefpaf - chrome was the only browser that showed the overlay widgets, but even with chrome I couldn't see any comments.

Here's an example that shows the kind of thing that hinders usage of Iris.

@rsignell-usgs - thanks! 👍

@ocefpaf
Copy link
Member Author

ocefpaf commented Apr 27, 2016

@ocefpaf - chrome was the only browser that showed the overlay widgets, but even with chrome I couldn't see any comments.

Weird I lost the comments too. I guess it is because the wiki was modified.
Anyways I just wanted to avoid making this thread longer... so here it goes (short version):

  • I prefer the pandas style vs the xarray style simply because it has been out there for a longer period of time. (I am not sure why xarray changed it to (i)sel.)
  • I would be nice if the time slice could take string and datetime dates like pandas instead of the specialized PartialDateTime object.
  • There was a point about date slices being inclusive or not. Inclusive is non-pythonic, but makes a lot of sense when slicing dates. That is why pandas broke from Python standard and implemented it. So I am 👍 to inclusive.

@rhattersley
Copy link

I just wanted to avoid making this thread longer

👍 We can move any further discussion to SciTools/iris#1988.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants