Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Remove deepcopies when slicing cubes and copying coords #37

Open
wants to merge 1 commit into
base: dask
Choose a base branch
from

Conversation

cpelley
Copy link
Owner

@cpelley cpelley commented Apr 24, 2017

Replacement to SciTools#2261, which points to the dask branch.

  • When indexing a cube when the data is reaslised (not lazy), a view of the original array is returned where possible (subject to the rules when slicing in numpy).
  • When indexing a cube when the data is not reaslised (lazy), realising the data on one will still not realise the data on the other.
  • Optimisation of coord copy when replacing the points is to shallow copy the points and bounds before replacing them to avoid unnecessary copies.
  • Existing behaviour is that slicing coordinates returns views of the original points and bounds (where possible). This was likely chosen behaviour on the basis that DimCoords at least is not writeable. This is not the same however for Auxiliary coordinates and likely raises the likely case for this being a bug (i.e. one can modify AuxCoord object points and bounds).
  • DimCoord slicing will now return views of its data like AuxCoords. DimCoords will continue to realise data unlike AuxCoords due to the validation necessary for being monotonically increasing.

@cpelley
Copy link
Owner Author

cpelley commented Apr 24, 2017

@marqh
I have pulled in all the functional changes from SciTools#2261 (such that they work with dask). This includes the new unittests. I have not bothered converting the existing iris unittests at this stage.

@cpelley
Copy link
Owner Author

cpelley commented Apr 24, 2017

The current behaviour changes the default behaviour so that indexing returns views always (i.e. switch on/off of this behaviour). That is, here is the API proposed for returning a view or a copy:

view_cube = cube[slice]                             # CUBE/COORD VIEW
copy_cube = cube[slice].copy()                      # CUBE/COORD COPY

@cpelley
Copy link
Owner Author

cpelley commented Apr 24, 2017

If we don't the ability to switch behaviours and or not change the default behaviour then here are some options:

Setting an attribute: (akin to numpy.ndarray.flags)

copy_cube = cube[slice]                             # CUBE/COORD COPY
cube.flags.share_data = True
view_cube = cube[slice]                             # CUBE/COORD VIEW

Retuning a cube that is shareable by some function/method: (perhaps returning a sub-class of Cube)

copy_cube = cube[slice]                             # CUBE/COORD COPY
new_cube = cube.as_shareable()      or     new_cube = util.as_shareable(cube)
view_cube = new_cube[slice]                         # CUBE/COORD VIEW

@cpelley
Copy link
Owner Author

cpelley commented Apr 24, 2017

My preference would be for the cleaner approach of changing the behaviour with no switch if we can agree to do this (with obvious changes seen by the user where they assume a copy on slicing).
However, if utilising a switch-able approach, the flag approach looks like the most versatile.
Returning a sub-class assumes hierarchy and one can imagine easily having to break this assumed hierarchy when considering more than one state of in future.

@cpelley
Copy link
Owner Author

cpelley commented Apr 25, 2017

ping @pp-mo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant