-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IEP 1 #1988
IEP 1 #1988
Conversation
@rhattersley I am copying ioos/APIRUS#10 (comment) here
|
The notebook showing 60x performance of xarray over iris for a simple time slicing operation is at |
I'm a bit disappointed that there is no support in your examples for selecting over ranges of a coordinate
(You'll note I am careful to avoid suggesting a usable syntax !) Is this an intentional omission ? |
Firstly, thanks for taking a look!
I pushed up an updated version just a few minutes before your comment which starts to address this. (I'm guessing you were reviewing the previous version in which the only hint at this was the cryptic entry "Inclusive vs. exclusive" in the TODO section.) In general, this is not a finished document hence the (non-exhaustive) "TODO" section! Contributions are very welcome! |
However, it is really essential for us to support indexing by named dimension instead of just dimension order, as the xarray docs have it :
Unless I'm out of date, and this pandas limitation has changed somehow ? |
@pp-mo I meant the name of the method The indexing by named dimension (or extended pandas style in the document) is essential! |
Sorry, quite clear, I just hadn't read it all + properly understood it ! |
Some random thoughts on syntax .. Using dictionary keys or named arguments to identify coordinates means we cannot use the getitem-specific from:to:by syntax. E.G. Then we can write, for instance ...
or probably more usefully, by value :
I like this because it preserves a more readable form in the code, especially for items like "from:", ":to" and ":to:by" ... But here's another possibility: We can define two different helper objects to represent selection by index and value : If these produce distinct output types (i.e. not just slice objects), we can then have a single common selector function which distinguishes between index- and value- selecting inputs, and hence you wouldn't need separate methods for these. (assuming :
|
Sorry for thinking aloud... Though TBH I'm not sure that is actually preferable for readability. |
NumPy does almost exactly that with |
Regarding the distinction between orthogonal and vectorised indexing, to use the terminology used by xarray, here: The natural interpretation IMHO is to treat it like this: orthogonal, as already discussed
vectorised, by contrast:
N.B. this last also has a natural extension where the size M becomes multidimensional. This approach is a bit like what we currently have in Iris trajectory interpolation. |
a|[source,python] | ||
---- | ||
cube[dict(height=2)] | ||
cube.iloc[dict(height=2)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is iloc
a string which could be used in a
cube.coord('iloc')
If so, what coord names are allowed that would not be able to support this?
coord.name()
could return a long_name, with space in, for example
oh, I see more now, loc
and iloc
are special functions (catching up slowly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, my comments are related to the string 'height'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the names are the same as for cube.coord(...)
.
A name that is not a valid Python identifier can be expressed with a dict literal, e.g.:
cube.iloc[{'nasty name': 12}]
This shouldn't be an issue for general purpose code, which is probably doing the equivalent of:
# Build up a dict with the desired selection criteria
criteria = {}
for thing in some_things:
...
criteria[foo] = bar
result = cube.iloc[criteria]
An alternative proposal: Pick off model levels 25, 26, 27 (value based):
Use a function to do filtering with (value based):
Use a function to do filtering with (value based) for time:
Syntactically, there is no reason to add magic indexing properties - we are able to express this uniquely through square bracket form alone. The only reason that we may choose to have separate properties is to avoid complexity within One problem that we are trying to workaround with all of our proposals is that we don't really have an object that represents a coordinate in the context of the cube that is lives on (quite rightly, coordinates themselves do not hold a (circular) reference back to the cube). There is nothing to say we can't have such a thing though...
Suppose, for a second, such a thing were returned by
In reality, this thing could be the object that actually does the mapping of coordinate to data dimension for the cube (so a whole heap of (currently ugly) logic for dimension mapping could be moved into this object).
(Pushing my luck) We might even proxy all attribute access to the coordinate...
|
If I've read your suggestion right, you're proposing to only support two of the four indexing variants:
In which case, yes, you can get away without needing indexing helper objects such as |
Within getitem on the cube ( The object proposed would be very similar to the iloc object that would be necessary in the alternative proposal, except this version would be bound to a single coordinate (meaning you can only do coordinate based indexing one coordinate at a time). This has the advantage of allowing other cube+coordinate behaviour, such as being the canonical place to map dimensions. |
Not sure quite how relevant this is to the true thrust of this discussion, but ... I played around with Python access syntax with special methods Headline : this enables expressions like ...
I thought I was going to wind up passing a dictionary to One thing that emerged from this exercise is the idea of a convenient operation to select a single point by value. |
A couple of points I'd like more clarity on: (1) (2) Ideas @pelson @rhattersley ? |
Pragmatic brevity? |
I'm going to merge this PR, after it hanging around for over 3 years. I'm keen to bank the content, which is just documentation, and merging is no commitment to implementing this proposal - it's just a proposal. |
IEP => Iris enhancement proposal
A replacement for the content in the wiki so we can try pointing to bits of it and discussing them.
Preview the latest version at: https://github.com/rhattersley/iris/blob/iep1/docs/iris/src/IEP/IEP001.adoc
Ping @SciTools/iris-devs, @ocefpaf, @rsignell-usgs