-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a utility for flattening Aux Coords #3030
Conversation
lib/iris/util.py
Outdated
if not isinstance(c, DimCoord): | ||
# The new (flat) AuxCoords are always the last dimension | ||
new_aux_coords.append((c.copy(c.points.flat, | ||
c.bounds.reshape(-1, 2)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got an error here because bounds
is None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure if this is robust to the cases where AuxCoord dimensions are not necessarily in the order you expect (#2606).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix that
msg = msg.format(type(name_or_coord)) | ||
raise TypeError(msg) | ||
|
||
coord_dims = cube.coord_dims(coord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding something like
if len(coord_dims) == 1:
return cube
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just twigged that the auxcoord of interest gets moved to the last dimension when flattened, so perhaps a cube.transpose
would be more consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also be worth having some kind of handling for coords which span more than 2 dims (which I am assured can exist). However, considering how rare they must be, I would be happy with a not-implemented error in that case, or maybe just a pointer to flatten_cube
.
lib/iris/util.py
Outdated
new_data = cube.data.reshape(new_shape) | ||
new_aux_coords = other_aux_coords | ||
for c in coords_to_flatten: | ||
# Only flatten aux coords (we will drop DimCoords) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be for the DimCoords to be converted to AuxCoords, with their points and bounds repeated so they fit along the flattened axis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like the 1D AuxCoords on these dimensions to be treated similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1d AuxCoords should survive untouched
lib/iris/util.py
Outdated
new_shape = np.hstack([shape_array[other_dims], | ||
np.product(shape_array[list(coord_dims)])]) | ||
|
||
new_data = cube.data.reshape(new_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be done on the dask array if the cube has lazy data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, data.reshape should work on both numpy and dask arrays
I'd definitely be pleased to see something like this included in Iris, and your implementation looks a lot cleaner than mine! In case it helps motivate a merge, I'll explain my own use case: I'm analysing GloSea hindcast ensembles. Sometimes the cube looks like this (I added the
But if a member falls over it gets replaced with a higher numbered one, so the cube looks like this:
If I flatten out the |
2fc7161
to
f03b1f2
Compare
lib/iris/util.py
Outdated
longitude x | ||
""" | ||
import numpy as np | ||
from iris.coords import AuxCoord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 'iris.coords.AuxCoord' imported but unused
Thanks for the useful feedback! The latest commit should include all of your suggestions, and I've added some tests. It would be great to include your use-case as a test though - do you have some code I could use to mock up a cube with those coordinate? This still needs a whats-new if it's to be merged |
Note to self: it might be nice to factor out a method for flattening a cube along specific dimensions (which is what the meat of this function really does). I should also check that creating a new cube is necessary and desirable - and if so that the new coord points aren’t just views on the old ones. |
lib/iris/util.py
Outdated
if c not in coords_to_flatten | ||
and c not in coords_to_expand] | ||
|
||
new_data = cube.data.reshape(new_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this needs cube.core_data()
if it's to work on Dask arrays.
I also think the array might need transposing if the auxcoord didn't start off on the trailing dimensions. Sorry, I missed that before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I need to get used to using core_data()
now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think it will now work with Aux Coords across any combination of dimensions, I've added a couple of tests too.
lib/iris/util.py
Outdated
new_points = broadcast_to_shape(c.points, coord.shape, (0,)) | ||
new_bounds = broadcast_to_shape(c.bounds, coord.bounds.shape, (0,)) \ | ||
if c.bounds is not None else None | ||
coords_to_flatten.append(c.copy(new_points, new_bounds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is failing here because its trying to create a 2d DimCoord. You can get around it with
coords_to_flatten.append(
iris.coords.AuxCoord.from_coord(c).copy(new_points, new_bounds))
though there may be a better way.
Then the test still fails when the cube is constructed because it claims the new 'blah' is only length 3. I do not understand this at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch thanks - I've fixed that now.
lib/iris/util.py
Outdated
|
||
# Expand the coords which need expanding first, then add to flatten list | ||
for c in coords_to_expand: | ||
new_points = broadcast_to_shape(c.points, coord.shape, (0,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (0,)
here will need generalising I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, although it looks a bit wierd the zero here is fine because c is must be one dimensional. I've added a comment to clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c
is one dimensional, but coord
is multidimensional, and the tuple indicates the dimension of coord
that c
is getting mapped to. So, for your docstring example, you would have
broadcast_to_shape(c.points, (2, 2), (0,))
for the x
coord, but
broadcast_to_shape(c.points, (2, 2), (1,))
for the y
coord.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes - sorry! 😳
Is there a nice way of getting the dimension of an Aux Coord for a given DimCoord?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No specific method that I know of. I'd be tempted to combine the loop with the one at line 1614.
for dim_map, d in enumerate(coord_dims):
for c in cube.coords(dimensions=d):
new_points = broadcast_to_shape(c.points, coord.shape, (dim_map,))
...etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
I don't at the moment but will have a go when I have a minute. |
def test_aux_coords_leading(self): | ||
cube_a = stock.simple_3d_w_multidim_coords() | ||
# Move the aux coord dims to the front of the cube | ||
cube_a.transpose((1,2,0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E231 missing whitespace after ','
|
||
def test_split_aux_coord_dims(self): | ||
cube_a = stock.simple_3d_w_multidim_coords() | ||
cube_a.transpose((1,0,2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E231 missing whitespace after ','
Does anyone have a preference for if this should work on the Cube in-place? It would have to reach into the Cube internals to update the coordinates which then makes it feel more like a method on the Cube itself, but maybe it's OK? |
lib/iris/util.py
Outdated
# These (1-D) coordinates are expanded before being flattened | ||
coords_to_ignore = [] | ||
for aux_dim, coord_dim in enumerate(coord_dims): | ||
# Expand the coords which need expanding first, then add to flatten list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (80 > 79 characters)
lib/iris/util.py
Outdated
new_bounds = None if c.bounds is None else \ | ||
broadcast_to_shape(c.bounds, coord.bounds.shape, (aux_dim,)) | ||
coords_to_flatten.append( | ||
iris.coords.AuxCoord.from_coord(c).copy(new_points, new_bounds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (80 > 79 characters)
lib/iris/util.py
Outdated
latitude x | ||
longitude x | ||
""" | ||
from iris.coords import AuxCoord |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 'iris.coords.AuxCoord' imported but unused
OK, I don't think I'll be able to do this inlace since the DataManager explicitly doesn't allow these kind of reshapes (see I had come to the conclusion that since we're just reshaping it would be nice not to have to copy the data... |
@pelson any ideas? |
An inplace operation would definitely need to be a Cube method, as at present the cube API is designed to guarantee that any cube derived from another one is totally independent, i.e. no data sharing. The guaranteed copying behaviour seemed like a great idea at outset of Iris, because of the total certainty given, and at least partly because many of our internal users were used to IDL which did make copies in all cases. |
I've finally got around to testing the latest revision with my data and can report that it's working nicely now: In [14]: print(cube)
air_temperature / (K) (realization: 4; -- : 108; latitude: 324; longitude: 432)
Dimension coordinates:
realization x - - -
latitude - - x -
longitude - - - x
Auxiliary coordinates:
combined_realization x x - -
forecast_period - x - -
forecast_reference_time - x - -
time - x - -
Scalar coordinates:
height: 1.5 m
Attributes:
STASH: m01s03i236
source: Data from Met Office Unified Model
um_version: 10.4
Cell methods:
mean: time (1 hour)
In [15]: print(iris.util.flatten_multidim_coord(cube, 'combined_realization'))
air_temperature / (K) (-- : 432; latitude: 324; longitude: 432)
Dimension coordinates:
latitude - x -
longitude - - x
Auxiliary coordinates:
combined_realization x - -
forecast_period x - -
forecast_reference_time x - -
realization x - -
time x - -
Scalar coordinates:
height: 1.5 m
Attributes:
STASH: m01s03i236
source: Data from Met Office Unified Model
um_version: 10.4
Cell methods:
mean: time (1 hour) I did start writing some code to mock up a cube with similar metadata, but it got messy very quickly 😟. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all this work @duncanwp, your time and effort is appreciated.
Functionally, this looks great, but I have made some comments as I think there are some minor changes to make before we get this merged.
Most of the changes are just typos and name changes, but I do think the tests need tidying up a bit; they're a bit mixed up at the moment and all seem to be testing several things at once. I have commented on the tests individually, but you will also need to make a new file called iris.tests.unit.util.test_flatten_cube
and move the appropriate tests across.
msg = msg.format(type(name_or_coord)) | ||
raise TypeError(msg) | ||
|
||
coord_dims = cube.coord_dims(coord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also be worth having some kind of handling for coords which span more than 2 dims (which I am assured can exist). However, considering how rare they must be, I would be happy with a not-implemented error in that case, or maybe just a pointer to flatten_cube
.
self.assertEqual(cube_b.dim_coords, tuple()) | ||
self.assertEqual(cube_b.shape, (12, )) | ||
|
||
def test_oned_dim_coord_flattened(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The odd comment with an explanation of the point of the test wouldn't go amiss.
self.assertEqual(cube_b.coord('blah', dim_coords=False).shape, (12, )) | ||
self.assertEqual(cube_b.shape, (2, 12)) | ||
|
||
def test_multiple_oned_dim_coords_flattened(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test name describes the test very well. The word 'oned' seems inappropriate as one of your additional coords is actually twoed (or they are both just inserted).
I'm also not sure what this test adds compared to that above, so I would recommend removing it (considering the heaving test suite that Iris already accommodates, any new tests added should be lean and purposeful).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to rationalise these a bit now and added a comment to describe them
cube_b = flatten_cube(cube_a, (0, 2)) | ||
self.assertEqual(cube_b.dim_coords, (cube_a.coord('wibble'), )) | ||
self.assertEqual(cube_b.shape, (12, 2)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm not sure if this test adds any value that hasn't already been added by test_aux_coords_leading
, so I would suggest removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is useful because there is some special logic for dealing with these types of coordinates
…ract calls) Adding some tests and fixing some edge cases Removing unused import Add comment and rename var for clarity Add comment and rename var for clarity Minor fix Retain lazy data Support aux coords on leading dimensions, and split across non-neighbouring dimensions. The routine chooses the first dimension of the AuxCoord as the new flat one. Fix whitespace Fix issue with multiple dim coords being flattened PEP8 conformance Long lines Factor out a function for flattening arbitrary dimensions Fix case for coordinates with bounds Flatten all dims by default Remove unused imports Remove unused imports
…ions. Other minor whitespace and text changes
b1c9870
to
38af4b5
Compare
Thanks for the detailed review @corinnebosley. I've finally managed to get around to making your proposed changes. Hopefully it's looking a bit tidier now. |
* BLD: Pinning cftime to 1.0.1 (SciTools#3247) * Fix for numpy 1v16. (SciTools#3257)
…ools#3267)" (SciTools#3274) This reverts commit a76afdc.
@bjlittle I consider this done now, rather than work in progress. Do you think there's something missing? |
Hey @duncanwp, awesome! I'll take a peek and start reviewing... Before I do, would you consider it a feature enhancement that's not a breaking change? We're teeing up |
It's just an extra feature and not breaking so if you can roll it into |
Hi @duncanwp The whole of a 2.2.x mergeback to master (#3256) is appearing on your commit chain, and the main set of changes with your name appear twice in 2 different versions ! So I rebuilt by just cherry-picking the final commits, which then looks like #3379. Apart from that I think may be a few remaining things to pick up on, but I guess we need you to fix the PR, i.e. reconfirm what is proposed first. |
Note my (failed!) attempt to expedite this : #3379 There are outstanding relevant comments there : |
That's frustrating, I thought it was ready :-( I won't have a chance to look at this for a while but I'll pick it up when I have a chance. Thanks for trying to expedite it. |
Thanks for your attention @duncanwp . |
@duncanwp Sadly this PR has been stalled for almost 2 and a half years now. Rather than let it linger still longer, I'm going to close this PR, which is not something that I really want to do. Personally, I'd rather see it merged and its benefit utilised by the community. However, if you have the time and motivation, then please re-open this PR and we'll take it from there. Hopefully we can get it across the line. Thanks 👍 |
e.g. for later use in extract calls, and other use cases (cf #2766).
It still needs a couple of tests and some docs but I'd like some feedback on the general usefulness and approach first.
See discussion here: https://groups.google.com/forum/#!topic/scitools-iris/Kni1_QO7H1M
Ping @rcomer who also has an implementation of this!