Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Strategy #2766

Closed
djkirkham opened this issue Sep 11, 2017 · 4 comments
Closed

Merge Strategy #2766

djkirkham opened this issue Sep 11, 2017 · 4 comments

Comments

@djkirkham
Copy link
Contributor

It seems Iris' strategy when merging cubes is to create the maximum number of dimensions it can, even if there is an 'acceptable' cube with fewer dimensions (i.e., a cube with no anonymous dimensions). For example, cubes with scalar coordinates taken from the following sets of points will merge into a 2 x 3 cube, but could also merge into a 1D cube of size 6:

A=[1,1,1,2,2,2]
B=[1,2,3,1,2,3]
C=[1,2,3,4,5,6]

I've recently had a user report a situation where this strategy resulted in a cube which seemed wrong. A PP load operation resulted in a cube with the following signature:

Heavyside function on pressure levels / (1) (forecast_period: 7; realization: 3; forecast_reference_time: 14; latitude: 325; longitude: 432)
     Dimension coordinates:
          forecast_period                                   x               -                           -             -               -
          realization                                       -               x                           -             -               -
          forecast_reference_time                           -               -                           x             -               -
          latitude                                          -               -                           -             x               -
          longitude                                         -               -                           -             -               x
     Auxiliary coordinates:
          time                                              x               -                           x             -               -
     ...

But it seems more sensible for there to be a single time dimension, rather than separate forecast_period and forecast_reference_time dimensions.

@rcomer
Copy link
Member

rcomer commented Sep 12, 2017

I can't help thinking that whether this is right or wrong is in the eye of the user, and depends on how they want to use the data. The above example looks very much like the data I use, and the way Iris has organised it seems sensible enough to me.

It's possible that the user needs time to be on a single dimension (e.g. if they want to use cube.extract or cube.aggregated_by). So finding ways to give the user more control on the cube's shape would be good. I think @niallrobinson once proposed a keyword for merge that would let the user specify the desired dim-coords, but I can't seem to find the relevant issue/PR now.

I have my own function auxcoord_flatten, which reshapes a cube so that a specified 2d auxcoord becomes 1d. It involves lots of slicing and I'm sure could be done better by someone who knew what they were doing. If so could be a useful addition to e.g. iris.util?

@pelson
Copy link
Member

pelson commented Feb 14, 2018

First things first, merge is designed to maximise the number of dimensions whilst without creating missing data. The principle behind this is that the more dimensions you have, the more degrees of freedom you have to do analysis on your cube. In general, it is not possible to structure the data in a way that is always what the user wants - sometimes you want a long thin dimension, sometimes you want many short dimensions.

Let's take the above as given, I doubt we would change the behaviour of merge in this instance. It has made a heuristic choice that in many circumstances was a really good one. The logical next step is to provide functionality to the user that makes it fast to swap dimensions around - perhaps even producing higher dimensional cubes that do contain empty data.

I'm sure there are a number of good API options for this - xarray may be a good source of inspiration on the matter.

In summary: merge is unlikely to change its strategy based on this example, but I believe you are describing useful functionality that sits alongside merge/concatenate.

@github-actions
Copy link
Contributor

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

@github-actions github-actions bot added the Stale A stale issue/pull-request label Jan 18, 2022
@github-actions
Copy link
Contributor

This stale issue has been automatically closed due to a lack of community activity.

If you still care about this issue, then please either:

  • Re-open this issue, if you have sufficient permissions, or
  • Add a comment pinging @SciTools/iris-devs who will re-open on your behalf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants