Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenating cubes realises auxiliary coordinates. #5115

Closed
sloosvel opened this issue Dec 20, 2022 · 5 comments · Fixed by #5142
Closed

Concatenating cubes realises auxiliary coordinates. #5115

sloosvel opened this issue Dec 20, 2022 · 5 comments · Fixed by #5142

Comments

@sloosvel
Copy link
Contributor

🐛 Bug Report

It looks like after the concatenation of cubes, the auxiliary coordinates of the input cubes and the resulting concatenated cube get realised. This can be a problem when working with high resolution data in two-dimensional grids, because if the concatenation involves a lot of files, you can end up running out of memory due to the coordinate arrays.

How To Reproduce

Steps to reproduce the behaviour:

  1. Load the cubes
>>> cube
<iris 'Cube' of sea_surface_height_above_geoid / (m) (time: 1; cell index along second dimension: 3059; cell index along first dimension: 4322)>
>>> cube2
<iris 'Cube' of sea_surface_height_above_geoid / (m) (time: 1; cell index along second dimension: 3059; cell index along first dimension: 4322)>
  1. Check that the coordinates are lazy after the loading
>>> cube.coord('latitude')
<AuxCoord: latitude / (degrees)  <lazy>+bounds  shape(3059, 4322)>
>>> cube2.coord('latitude')
<AuxCoord: latitude / (degrees)  <lazy>+bounds  shape(3059, 4322)>
  1. Concatenate the cube
concat = iris.cube.CubeList([cube, cube2]).concatenate_cube()
  1. The coordinates are now numpy arrays in both input cubes and the resulting concatenated one
>>cube.coord('latitude')
<AuxCoord: latitude / (degrees)  [[0., 0., ..., 0., 0.], ...]+bounds  shape(3059, 4322)>
>>>cube2.coord('latitude')
<AuxCoord: latitude / (degrees)  [[0., 0., ..., 0., 0.], ...]+bounds  shape(3059, 4322)>
>>>concat.coord('latitude')
<AuxCoord: latitude / (degrees)  [[0., 0., ..., 0., 0.], ...]+bounds  shape(3059, 4322)>

Expected behaviour

It would be nice if the coordinate arrays could stay lazy. During the concatenation in ESMValTool maybe we can try to delete the input cubes when we are done concatenating, but if this could be fixed in Iris it would also be great. Thanks.

Screenshots

Environment

  • OS & Version: [e.g., Ubuntu 20.04 LTS]
    NAME="Red Hat Enterprise Linux"
    VERSION="8.4 (Ootpa)"

  • Iris Version: [e.g., From the command line run python -c "import iris; print(iris.__version__)"]

>>> iris.__version__
'3.2.1'

Additional context

Click to expand this section...
Please add additional verbose information in this section e.g., code, output, tracebacks, screenshots etc
@sloosvel
Copy link
Contributor Author

Please let me know if there is anything I can do, I have some time to work on this! I took a look and managed to get a cube with lazy aux coords after concatenation just by changing coord.points to coord.lazy_points() in _CoordMetaData:

class _CoordMetaData(

However there are other parts of _concatenate.py that also call coord.points, would that need to be changed as well?

@trexfeathers
Copy link
Contributor

trexfeathers commented Jan 26, 2023

Thanks for the offer @sloosvel 😊. We are pretty stretched at the moment so any help is welcome.

You would need to use core_points() instead of lazy_points(), otherwise there will be unnecessary conversion from realised NumPy arrays back into Dask arrays.

As for where else the change is needed, I wouldn't know without actually doing the work. Stepping through code would probably help you get to the bottom of this.

@sloosvel
Copy link
Contributor Author

You would need to use core_points() instead of lazy_points(), otherwise there will be unnecessary conversion from realised NumPy arrays back into Dask arrays.

Great, thanks for the tip!

As for where else the change is needed, I wouldn't know without actually doing the work. Stepping through code would probably help you get to the bottom of this.

I'll check the other parts of the code further then!

If I get something decent, should I just open a PR?

@trexfeathers
Copy link
Contributor

If I get something decent, should I just open a PR?

Yes, we'd love that

@trexfeathers
Copy link
Contributor

Closed by #5142

@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in ESMValTool Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants