Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a utility for flattening Aux Coords #3030

Closed
wants to merge 32 commits into from

Conversation

duncanwp
Copy link
Contributor

e.g. for later use in extract calls, and other use cases (cf #2766).

It still needs a couple of tests and some docs but I'd like some feedback on the general usefulness and approach first.

See discussion here: https://groups.google.com/forum/#!topic/scitools-iris/Kni1_QO7H1M

Ping @rcomer who also has an implementation of this!

lib/iris/util.py Outdated
if not isinstance(c, DimCoord):
# The new (flat) AuxCoords are always the last dimension
new_aux_coords.append((c.copy(c.points.flat,
c.bounds.reshape(-1, 2)),
Copy link
Member

@rcomer rcomer May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got an error here because bounds is None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure if this is robust to the cases where AuxCoord dimensions are not necessarily in the order you expect (#2606).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix that

msg = msg.format(type(name_or_coord))
raise TypeError(msg)

coord_dims = cube.coord_dims(coord)
Copy link
Member

@rcomer rcomer May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding something like

if len(coord_dims) == 1:
    return cube

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just twigged that the auxcoord of interest gets moved to the last dimension when flattened, so perhaps a cube.transpose would be more consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be worth having some kind of handling for coords which span more than 2 dims (which I am assured can exist). However, considering how rare they must be, I would be happy with a not-implemented error in that case, or maybe just a pointer to flatten_cube.

lib/iris/util.py Outdated
new_data = cube.data.reshape(new_shape)
new_aux_coords = other_aux_coords
for c in coords_to_flatten:
# Only flatten aux coords (we will drop DimCoords)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be for the DimCoords to be converted to AuxCoords, with their points and bounds repeated so they fit along the flattened axis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the 1D AuxCoords on these dimensions to be treated similarly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1d AuxCoords should survive untouched

lib/iris/util.py Outdated
new_shape = np.hstack([shape_array[other_dims],
np.product(shape_array[list(coord_dims)])])

new_data = cube.data.reshape(new_shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done on the dask array if the cube has lazy data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, data.reshape should work on both numpy and dask arrays

@rcomer
Copy link
Member

rcomer commented May 25, 2018

I'd definitely be pleased to see something like this included in Iris, and your implementation looks a lot cleaner than mine!

In case it helps motivate a merge, I'll explain my own use case: I'm analysing GloSea hindcast ensembles. Sometimes the cube looks like this (I added the combined_realization coord as a function of the realization and forecast_reference_time coords):

air_temperature / (K)               (realization: 4; -- : 108; latitude: 324; longitude: 432)
     Dimension coordinates:
          realization                           x       -              -               -
          latitude                              -       -              x               -
          longitude                             -       -              -               x
     Auxiliary coordinates:
          combined_realization                  x       x              -               -
          forecast_period                       -       x              -               -
          forecast_reference_time               -       x              -               -
          time                                  -       x              -               -
     Scalar coordinates:
          height: 1.5 m
     Attributes:
          STASH: m01s03i236
          source: Data from Met Office Unified Model
          um_version: 10.4
     Cell methods:
          mean: time (1 hour)

But if a member falls over it gets replaced with a higher numbered one, so the cube looks like this:

air_temperature / (K)               (-- : 432; latitude: 324; longitude: 432)
     Dimension coordinates:
          latitude                      -              x               -
          longitude                     -              -               x
     Auxiliary coordinates:
          forecast_period               x              -               -
          forecast_reference_time       x              -               -
          realization                   x              -               -
          time                          x              -               -
     Scalar coordinates:
          height: 1.5 m
     Attributes:
          STASH: m01s03i236
          source: Data from Met Office Unified Model
          um_version: 10.4
     Cell methods:
          mean: time (1 hour)

If I flatten out the combined_realization coord, then my cubes will always have the same shape. This makes writing subsequent analysis code simpler!

lib/iris/util.py Outdated
longitude x
"""
import numpy as np
from iris.coords import AuxCoord

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'iris.coords.AuxCoord' imported but unused

@duncanwp
Copy link
Contributor Author

Thanks for the useful feedback!

The latest commit should include all of your suggestions, and I've added some tests. It would be great to include your use-case as a test though - do you have some code I could use to mock up a cube with those coordinate?

This still needs a whats-new if it's to be merged

@duncanwp
Copy link
Contributor Author

duncanwp commented Jun 1, 2018

Note to self: it might be nice to factor out a method for flattening a cube along specific dimensions (which is what the meat of this function really does). I should also check that creating a new cube is necessary and desirable - and if so that the new coord points aren’t just views on the old ones.

lib/iris/util.py Outdated
if c not in coords_to_flatten
and c not in coords_to_expand]

new_data = cube.data.reshape(new_shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this needs cube.core_data() if it's to work on Dask arrays.

I also think the array might need transposing if the auxcoord didn't start off on the trailing dimensions. Sorry, I missed that before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I need to get used to using core_data() now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think it will now work with Aux Coords across any combination of dimensions, I've added a couple of tests too.

lib/iris/util.py Outdated
new_points = broadcast_to_shape(c.points, coord.shape, (0,))
new_bounds = broadcast_to_shape(c.bounds, coord.bounds.shape, (0,)) \
if c.bounds is not None else None
coords_to_flatten.append(c.copy(new_points, new_bounds))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is failing here because its trying to create a 2d DimCoord. You can get around it with

coords_to_flatten.append(
    iris.coords.AuxCoord.from_coord(c).copy(new_points, new_bounds))

though there may be a better way.

Then the test still fails when the cube is constructed because it claims the new 'blah' is only length 3. I do not understand this at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch thanks - I've fixed that now.

lib/iris/util.py Outdated

# Expand the coords which need expanding first, then add to flatten list
for c in coords_to_expand:
new_points = broadcast_to_shape(c.points, coord.shape, (0,))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (0,) here will need generalising I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, although it looks a bit wierd the zero here is fine because c is must be one dimensional. I've added a comment to clarify

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c is one dimensional, but coord is multidimensional, and the tuple indicates the dimension of coord that c is getting mapped to. So, for your docstring example, you would have
broadcast_to_shape(c.points, (2, 2), (0,)) for the x coord, but
broadcast_to_shape(c.points, (2, 2), (1,)) for the y coord.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes - sorry! 😳

Is there a nice way of getting the dimension of an Aux Coord for a given DimCoord?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific method that I know of. I'd be tempted to combine the loop with the one at line 1614.

for dim_map, d in enumerate(coord_dims):
    for c in cube.coords(dimensions=d):
        new_points = broadcast_to_shape(c.points, coord.shape, (dim_map,))
        ...etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@rcomer
Copy link
Member

rcomer commented Jun 1, 2018

do you have some code I could use to mock up a cube with those coordinate?

I don't at the moment but will have a go when I have a minute.

def test_aux_coords_leading(self):
cube_a = stock.simple_3d_w_multidim_coords()
# Move the aux coord dims to the front of the cube
cube_a.transpose((1,2,0))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E231 missing whitespace after ','


def test_split_aux_coord_dims(self):
cube_a = stock.simple_3d_w_multidim_coords()
cube_a.transpose((1,0,2))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E231 missing whitespace after ','

@duncanwp
Copy link
Contributor Author

duncanwp commented Jun 3, 2018

Does anyone have a preference for if this should work on the Cube in-place? It would have to reach into the Cube internals to update the coordinates which then makes it feel more like a method on the Cube itself, but maybe it's OK?

lib/iris/util.py Outdated
# These (1-D) coordinates are expanded before being flattened
coords_to_ignore = []
for aux_dim, coord_dim in enumerate(coord_dims):
# Expand the coords which need expanding first, then add to flatten list

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

lib/iris/util.py Outdated
new_bounds = None if c.bounds is None else \
broadcast_to_shape(c.bounds, coord.bounds.shape, (aux_dim,))
coords_to_flatten.append(
iris.coords.AuxCoord.from_coord(c).copy(new_points, new_bounds))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

lib/iris/util.py Outdated
latitude x
longitude x
"""
from iris.coords import AuxCoord

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'iris.coords.AuxCoord' imported but unused

@duncanwp
Copy link
Contributor Author

duncanwp commented Jun 5, 2018

OK, I don't think I'll be able to do this inlace since the DataManager explicitly doesn't allow these kind of reshapes (see _data_manager.py:L260). Is there a way around this?

I had come to the conclusion that since we're just reshaping it would be nice not to have to copy the data...

@duncanwp
Copy link
Contributor Author

@pelson any ideas?

@pp-mo
Copy link
Member

pp-mo commented Jun 22, 2018

An inplace operation would definitely need to be a Cube method, as at present the cube API is designed to guarantee that any cube derived from another one is totally independent, i.e. no data sharing.
However, this is something we are intending to relax in future, though we are currently a bit blocked on that -- see #2681.

The guaranteed copying behaviour seemed like a great idea at outset of Iris, because of the total certainty given, and at least partly because many of our internal users were used to IDL which did make copies in all cases.
Going forward we can instead probably adopt the numpy attitude, which I'd summarise as
"for efficiency, results can sometimes share data with the input; you just need to be careful".

@rcomer
Copy link
Member

rcomer commented Oct 5, 2018

I've finally got around to testing the latest revision with my data and can report that it's working nicely now:

In [14]: print(cube)
air_temperature / (K)               (realization: 4; -- : 108; latitude: 324; longitude: 432)
     Dimension coordinates:
          realization                           x       -              -               -
          latitude                              -       -              x               -
          longitude                             -       -              -               x
     Auxiliary coordinates:
          combined_realization                  x       x              -               -
          forecast_period                       -       x              -               -
          forecast_reference_time               -       x              -               -
          time                                  -       x              -               -
     Scalar coordinates:
          height: 1.5 m
     Attributes:
          STASH: m01s03i236
          source: Data from Met Office Unified Model
          um_version: 10.4
     Cell methods:
          mean: time (1 hour)

In [15]: print(iris.util.flatten_multidim_coord(cube, 'combined_realization'))
air_temperature / (K)               (-- : 432; latitude: 324; longitude: 432)
     Dimension coordinates:
          latitude                      -              x               -
          longitude                     -              -               x
     Auxiliary coordinates:
          combined_realization          x              -               -
          forecast_period               x              -               -
          forecast_reference_time       x              -               -
          realization                   x              -               -
          time                          x              -               -
     Scalar coordinates:
          height: 1.5 m
     Attributes:
          STASH: m01s03i236
          source: Data from Met Office Unified Model
          um_version: 10.4
     Cell methods:
          mean: time (1 hour)

I did start writing some code to mock up a cube with similar metadata, but it got messy very quickly 😟.

Copy link
Member

@corinnebosley corinnebosley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all this work @duncanwp, your time and effort is appreciated.

Functionally, this looks great, but I have made some comments as I think there are some minor changes to make before we get this merged.

Most of the changes are just typos and name changes, but I do think the tests need tidying up a bit; they're a bit mixed up at the moment and all seem to be testing several things at once. I have commented on the tests individually, but you will also need to make a new file called iris.tests.unit.util.test_flatten_cube and move the appropriate tests across.

lib/iris/util.py Outdated Show resolved Hide resolved
lib/iris/util.py Show resolved Hide resolved
lib/iris/util.py Show resolved Hide resolved
lib/iris/util.py Outdated Show resolved Hide resolved
msg = msg.format(type(name_or_coord))
raise TypeError(msg)

coord_dims = cube.coord_dims(coord)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be worth having some kind of handling for coords which span more than 2 dims (which I am assured can exist). However, considering how rare they must be, I would be happy with a not-implemented error in that case, or maybe just a pointer to flatten_cube.

lib/iris/tests/unit/util/test_flatten_multidim_coord.py Outdated Show resolved Hide resolved
lib/iris/tests/unit/util/test_flatten_multidim_coord.py Outdated Show resolved Hide resolved
self.assertEqual(cube_b.dim_coords, tuple())
self.assertEqual(cube_b.shape, (12, ))

def test_oned_dim_coord_flattened(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The odd comment with an explanation of the point of the test wouldn't go amiss.

self.assertEqual(cube_b.coord('blah', dim_coords=False).shape, (12, ))
self.assertEqual(cube_b.shape, (2, 12))

def test_multiple_oned_dim_coords_flattened(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test name describes the test very well. The word 'oned' seems inappropriate as one of your additional coords is actually twoed (or they are both just inserted).

I'm also not sure what this test adds compared to that above, so I would recommend removing it (considering the heaving test suite that Iris already accommodates, any new tests added should be lean and purposeful).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to rationalise these a bit now and added a comment to describe them

cube_b = flatten_cube(cube_a, (0, 2))
self.assertEqual(cube_b.dim_coords, (cube_a.coord('wibble'), ))
self.assertEqual(cube_b.shape, (12, 2))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm not sure if this test adds any value that hasn't already been added by test_aux_coords_leading, so I would suggest removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful because there is some special logic for dealing with these types of coordinates

duncanwp and others added 4 commits January 20, 2019 21:31
…ract calls)

Adding some tests and fixing some edge cases

Removing unused import

Add comment and rename var for clarity

Add comment and rename var for clarity

Minor fix

Retain lazy data

Support aux coords on leading dimensions, and split across non-neighbouring dimensions. The routine chooses the first dimension of the AuxCoord as the new flat one.

Fix whitespace

Fix issue with multiple dim coords being flattened

PEP8 conformance

Long lines

Factor out a function for flattening arbitrary dimensions

Fix case for coordinates with bounds

Flatten all dims by default

Remove unused imports

Remove unused imports
…ions. Other minor whitespace and text changes
@duncanwp
Copy link
Contributor Author

Thanks for the detailed review @corinnebosley. I've finally managed to get around to making your proposed changes. Hopefully it's looking a bit tidier now.

@duncanwp
Copy link
Contributor Author

@bjlittle I consider this done now, rather than work in progress. Do you think there's something missing?

@bjlittle
Copy link
Member

Hey @duncanwp, awesome! I'll take a peek and start reviewing...

Before I do, would you consider it a feature enhancement that's not a breaking change? We're teeing up iris minor release 2.3.0 for the end of August, and iris major release 3.0.0 for November...

@duncanwp
Copy link
Contributor Author

It's just an extra feature and not breaking so if you can roll it into 2.3.0 that would be great!

@pp-mo
Copy link
Member

pp-mo commented Aug 23, 2019

Hi @duncanwp
We're working on iris 2.3 and I'm just considering this PR, but the history has got into a nasty mess for some reason...

The whole of a 2.2.x mergeback to master (#3256) is appearing on your commit chain, and the main set of changes with your name appear twice in 2 different versions !
I think the penultimate commit "Merge remote-tracking branch 'origin/flatten_aux_coord' into flatten_aux_coord" is probably the main culprit, it sounds like a mistake to merge from your fork back into your local ?

So I rebuilt by just cherry-picking the final commits, which then looks like #3379.
Does that contain everything it should have, in your latest incarnation ??

Apart from that I think may be a few remaining things to pick up on, but I guess we need you to fix the PR, i.e. reconfirm what is proposed first.
Can you find some time to fix this ?

@pp-mo
Copy link
Member

pp-mo commented Aug 27, 2019

Note my (failed!) attempt to expedite this : #3379

There are outstanding relevant comments there :

@duncanwp
Copy link
Contributor Author

duncanwp commented Aug 30, 2019

That's frustrating, I thought it was ready :-(

I won't have a chance to look at this for a while but I'll pick it up when I have a chance. Thanks for trying to expedite it.

@pp-mo
Copy link
Member

pp-mo commented Aug 30, 2019

Thanks for your attention @duncanwp .
We'll wait !

@bjlittle bjlittle modified the milestones: v2.3.0, v3.1.0 Nov 13, 2019
@bjlittle bjlittle added the Peloton 🚴‍♂️ Target a breakaway issue to be caught and closed by the peloton label Oct 1, 2020
@bjlittle
Copy link
Member

@duncanwp Sadly this PR has been stalled for almost 2 and a half years now.

Rather than let it linger still longer, I'm going to close this PR, which is not something that I really want to do. Personally, I'd rather see it merged and its benefit utilised by the community.

However, if you have the time and motivation, then please re-open this PR and we'll take it from there. Hopefully we can get it across the line. Thanks 👍

@bjlittle bjlittle closed this Oct 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Peloton 🚴‍♂️ Target a breakaway issue to be caught and closed by the peloton Release: Minor Status: Stalled Type: Enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants