Remove deepcopies when slicing cubes and copying coords #1992

rhattersley · 2016-04-28T17:57:08Z

An updated version of #939.

Not ready for merging...

...the remaining open question is whether being able to switch this change on/off with a Future toggle out-weighs the implementation leakage.

This is a first attempt at removing unnecessary (and very slow) deepcopy operations with slicing or otherwise manipulating cubes and coordinates. See SciTools#914. Note: A few of the unit tests are failing, because they insist on checking the order (Fortran or C) of numpy arrays. I think these checks should be removed, because it is a waste of computational effort to always ensure arrays are contiguous. If some code needs to interface with external modules code that require continguous arrays, it should use np.ascontiguousarray or np.asfortranarray at the immediate level of the wrapper.

rhattersley · 2016-04-28T17:57:46Z

Ping @cpelley re. #1983.

cpelley · 2016-04-29T07:42:05Z

Great stuff, thank for doing this @rhattersley , very much appreciated.

My only concern is that when the cube is still lazy and sliced, that the sliced cube data when realised will not realise the data in the original cube. Could this not pose a possible danger with some users, returning views of their arrays (when they though it was still lazy) or vice versa if the data has been actualised.

Regarding the future toggle, it does sound like a good idea. I'm sure there must be people who have hardcoded slicing the cube (expecting not views of their data).

Check this out @jkettleb :)

Thanks @rhattersley

rhattersley · 2016-04-29T08:42:03Z

... when the cube is still lazy and sliced ... the sliced cube data when realised will not realise the data in the original cube.

True.

>>> air_temp.has_lazy_data()
True
>>> t0 = air_temp[0]
>>> t0.has_lazy_data()
True
>>> real_numbers = t0.data
>>> t0.has_lazy_data()
False
>>> air_temp.has_lazy_data()
True

Could this not pose a possible danger with some users, returning views of their arrays (when they though it was still lazy) or vice versa if the data has been actualised.

It's certainly possibly that user code might rely on the old data copying behaviour to ensure that modifying the data in a sliced cube doesn't modify the data in the original cube. My guess is that such reliance is rare, but if there is any affected code the impact from this change might be hard to track down. (The same sort of thing applies to coordinate copies.)

Other that that I think the only implication is a change to the memory/performance characteristics.

cpelley · 2016-05-03T16:05:54Z

Other that that I think the only implication is a change to the memory/performance characteristics.

Is a degradation in performance you think significant enough to worry about?
I'm guessing this degradation would come from reading data from disk which may or may not be contiguous after slicing?

I think this would be a really nice addition :)

rhattersley · 2016-05-04T07:55:56Z

Is a degradation in performance you think significant enough to worry about?

I wasn't implying performance will degrade, just that the performance will change. Hopefully mostly for the better, but perhaps sometimes for the worse. It will depend on the use case.

rhattersley · 2016-05-04T10:21:11Z

Options:

Switch to the new behaviour in 1.10
Leave 1.10 untouched. Switch to the new behaviour in 2.0
Add an iris.FUTURE toggle in 1.10, leaving the 1.9 behaviour as the default. Update the default behaviour in 2.0.

NB. Even if we use an iris.FUTURE toggle I can't think of a sensible way to issue a deprecation warning for the deep-copying behaviour. Lots of user code is going to be doing cube indexing without caring about the data-copying behaviour, and there's no real way for Iris to tell. It doesn't make much sense to have almost everyone have to insert iris.FUTURE.share_data = True in their scripts even though it only makes a difference for a tiny fraction of people. Plus, there's the knock-on impact on other parts of the Iris API that make use of cube indexing.

My current suggestion: go with option (3) but without any kind of deprecation warning. (When we make the switch in 2.0 we would still need to deprecate the iris.FUTURE.share_data toggle.)

rhattersley · 2016-05-04T10:22:36Z

I've pushed a commit which implements option (3).

cpelley · 2016-05-05T13:04:46Z

I'm happy with option3 (no risk then and least controversy).
rhattersley#8 if your interested, otherwise the PR looks ready to go :)

Thanks @rhattersley

TEST: Added test for cube.__getitem__

cpelley · 2016-05-06T07:59:36Z

ooh just remembered, don't we need to update what's new for this?

pp-mo · 2016-05-06T17:34:33Z

I was initially a bit horrified at breaking so much deeply-buried subtle behaviour !
But on re-reading @shoyer original #914, I must say he has a powerful point :
I think I now agree that it would make more sense to behave like numpy, and force explicit copies when wanted. If we are going to do that, it needs to be soon.

pp-mo · 2016-05-06T17:36:01Z

breaking so much ... behaviour

After a closer look at this, I really don't see _why_ we can't issue deprecation warnings for these changes.
Very little code is touched, we just need to warn in those places.
True, it will affect nearly everyone, but only until 2.0, and it is something everyone needs to know.

In fact, we should really add something to the user guide about this.

See my proposals at #1999 for making attempts to commit to deeper promises regarding deprecation warnings.
In particular : summary-of-provisions comment

cpelley · 2016-05-09T09:54:57Z

...I really don't see why we can't issue deprecation warnings for these changes.
Very little code is touched, we just need to warn in those places.

Happy if a deprecation warning were to be issued. Assuming your happy with keeping the FUTURE toggle though?

In fact, we should really add something to the user guide about this.

+1 Perhaps an explicit 'Copies and views' section. However, unless I'm mistaken, there is a bigger hole here to describe in the user guide that extends beyond this behavioural change (an explicit section that explains when/what provides views/copies of what). There is a great deal of misunderstanding amongst the iris community. Perhaps this information is in there somewhere? For this reason, I would propose splitting the userguide work to another issue with v1.10 milestone. What you think?

pp-mo · 2016-05-09T10:54:33Z

Assuming your happy with keeping the FUTURE toggle though?

Yes, we need a control and I think it's just the kind of thing FUTURE should be used for.

My "new proposals" include some extra rules + enhanced importance for deprecations : It's of key importance that you can avoid deprecated features.

pp-mo · 2016-05-09T11:12:47Z

However, unless I'm mistaken, there is a bigger hole here ...
an explicit section that explains when/what provides views/copies of what).
There is a great deal of misunderstanding amongst the iris community.
Perhaps this information is in there somewhere?

I think any existing issues with copies + views are all based with _numpy_, as Iris itself (as it currently stands) makes strenuous efforts to avoid producing views anywhere within the cube operations.
No ?

So, it should really refer to the numpy docs to explain the concept.
The problem is, numpy documentation is rather weak on fundamental concepts.
Having just reviewed it, I think the best you get on "views" is a short entry in the glossary and a couple of mentions in the "Indexing" section.
Actually, even the detailed reference docs don't routinely make clear when views as opposed to new data may be returned,
much as the stats routines mostly don't bother to explain exactly what they do with missing data elements.

For this reason, I would propose splitting the userguide work to another issue with v1.10 milestone. What you think?

I think we should definitely not merge this without the accompanying documentation, so I'm not sure of the benefit of treating it as separate.
As I'm in writing mode, I might try to produce something ...

cpelley · 2016-05-09T12:16:32Z

I think any existing issues with copies + views are all based with numpy, as Iris itself (as it currently stands) makes strenuous efforts to avoid producing views anywhere within the cube operations.
No ?

I meant a bit wider than this. Other examples might include dropping metadata when performing most operations. I understand why but an explicit top-level section concerning the subtle behaviours of working with cubes would help iris users get to grips with how to better think about the concept of cubes.

I think we should definitely not merge this without the accompanying documentation, so I'm not sure of the benefit of treating it as separate.
As I'm in writing mode, I might try to produce something ...

Thanks @pp-mo sure.

rhattersley · 2016-05-11T14:13:01Z

I'm not sure we're ready to force this into v1.10. It hasn't had very wide-spread discussion.

pp-mo · 2016-05-11T16:00:51Z

I'm not sure we're ready to force this into v1.10

I took a look into how to document/explain this, and was rather taken aback by some of the existing behaviour. I had thought we consistently avoided view-like copies in Iris up to now, but that is not the case for coords as it turns out .
Would you believe...

>>> from iris.coords import AuxCoord
>>> co1 = AuxCoord([1,2,3,4,5])
>>> co2 = co1.copy()
>>> co2.points[2:4] = 77
>>> co2
AuxCoord(array([ 1,  2, 77, 77,  5]), standard_name=None, units=Unit('1'))
>>> co1
AuxCoord(array([1, 2, 3, 4, 5]), standard_name=None, units=Unit('1'))
>>> 
>>> co3 = co1[:]
>>> co3.points[1:3] = -99
>>> co3
AuxCoord(array([  1, -99, -99,   4,   5]), standard_name=None, units=Unit('1'))
>>> co1
AuxCoord(array([  1, -99, -99,   4,   5]), standard_name=None, units=Unit('1'))

So [:] delivers a new coord with a view on the old one after all.

This could make it harder to explain what we are changing.

rhattersley · 2016-05-12T07:22:10Z

This could make it harder to explain what we are changing.

Thanks for clarifying @pp-mo. Coming from a numpy perspective, having coord[:] return a view but coord.copy() return a copy makes a lot of sense and provides choice & flexibility to the user. I don't see how changing the behaviour of copy (as in this PR) helps matters.

Rather than make coord.copy() return a view we should be looking at the code that's using copy and decide whether it's appropriate to use indexing instead.

cpelley · 2016-07-26T13:54:11Z

@pp-mo are we able to reach an agreed way forward for this?
We would love to see this get into iris.

Cheers

cpelley · 2016-08-16T07:54:44Z

This PR looks good to go to me and provides significant benefit to us.
Can someone with 'merge privileges' take a look please?

Cheers

cpelley · 2016-08-23T13:59:08Z

ping

cpelley · 2016-11-29T11:46:59Z

Please let me know if there's anything I can do to help to get this in.
This would provide very useful capability for CMIP6 due to the sheer size of data involved.

Cheers

marqh · 2016-12-07T13:45:50Z

Hello @cpelley

please accept my apologies for this getting left behind and you having to chase so.

I feel that the change makes sense and I am content to support it.

...the remaining open question is whether being able to switch this change on/off with a Future toggle out-weighs the implementation leakage.
#939 (comment)

Is this still an open question that needs addressing?

This is part of the more general hazard of global flags that change functionality.
I agree with this in principal, but the question is out of scope for this PR.

Practically, it is a goodly while since this code was written (and tested) and there are references to deprecated in 1.10
we have since cut 1.100 and 1.11
@cpelley would you be prepared to create your own PR from the commits in this one and adapt the deprecation messages to state 1.12

I am minded to merge such a PR once I have seen it

thank you
mark

cpelley · 2016-12-08T10:02:03Z

Thanks for taking this up @marqh

Is this still an open question that needs addressing?

A subtlety I missed the first time around. I think we are opening ourselves to some potential problems by having the option to switch between both behaviours but whether this risk is greater than the backlash (impact) of changing the default behaviour right away... I don't know.

I never feel too unconformable going with @rhattersley's preferred option :).

@cpelley would you be prepared to create your own PR...

Happy to make a new PR

Thanks @marqh

pp-mo · 2017-01-04T12:18:38Z

In #2261 ...

Replaces #1992

Control copying with iris.FUTURE.share_data

944f1b1

rhattersley added this to the v1.10 milestone May 4, 2016

rhattersley and others added 3 commits May 4, 2016 12:35

Add deprecation messages and "What's new" entry.

e438f58

TEST: Added test for cube.__getitem__

425e746

TEST: Refactor of tests

72934ef

rhattersley added 2 commits May 5, 2016 14:11

Merge pull request #8 from cpelley/AVOID_DATA_COPIES

b97b2e8

TEST: Added test for cube.__getitem__

Test AuxCoord.copy() with FUTURE.share_data

4dcf4da

rhattersley removed this from the v1.10 milestone May 11, 2016

cpelley mentioned this pull request Dec 8, 2016

ENH: Remove deepcopies when slicing cubes and copying coords #2261

Closed

pp-mo closed this Jan 4, 2017

cpelley mentioned this pull request May 12, 2017

ENH: Shared data between cube slices #2549

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove deepcopies when slicing cubes and copying coords #1992

Remove deepcopies when slicing cubes and copying coords #1992

rhattersley commented Apr 28, 2016

rhattersley commented Apr 28, 2016

cpelley commented Apr 29, 2016 •

edited

Loading

rhattersley commented Apr 29, 2016

cpelley commented May 3, 2016

rhattersley commented May 4, 2016

rhattersley commented May 4, 2016

rhattersley commented May 4, 2016

cpelley commented May 5, 2016

cpelley commented May 6, 2016

pp-mo commented May 6, 2016

pp-mo commented May 6, 2016 •

edited

Loading

cpelley commented May 9, 2016

pp-mo commented May 9, 2016

pp-mo commented May 9, 2016 •

edited

Loading

cpelley commented May 9, 2016

rhattersley commented May 11, 2016

pp-mo commented May 11, 2016

rhattersley commented May 12, 2016

cpelley commented Jul 26, 2016

cpelley commented Aug 16, 2016

cpelley commented Aug 23, 2016

cpelley commented Nov 29, 2016 •

edited

Loading

marqh commented Dec 7, 2016

cpelley commented Dec 8, 2016 •

edited

Loading

pp-mo commented Jan 4, 2017 •

edited

Loading

Remove deepcopies when slicing cubes and copying coords #1992

Remove deepcopies when slicing cubes and copying coords #1992

Conversation

rhattersley commented Apr 28, 2016

rhattersley commented Apr 28, 2016

cpelley commented Apr 29, 2016 • edited Loading

rhattersley commented Apr 29, 2016

cpelley commented May 3, 2016

rhattersley commented May 4, 2016

rhattersley commented May 4, 2016

rhattersley commented May 4, 2016

cpelley commented May 5, 2016

cpelley commented May 6, 2016

pp-mo commented May 6, 2016

pp-mo commented May 6, 2016 • edited Loading

cpelley commented May 9, 2016

pp-mo commented May 9, 2016

pp-mo commented May 9, 2016 • edited Loading

cpelley commented May 9, 2016

rhattersley commented May 11, 2016

pp-mo commented May 11, 2016

rhattersley commented May 12, 2016

cpelley commented Jul 26, 2016

cpelley commented Aug 16, 2016

cpelley commented Aug 23, 2016

cpelley commented Nov 29, 2016 • edited Loading

marqh commented Dec 7, 2016

cpelley commented Dec 8, 2016 • edited Loading

pp-mo commented Jan 4, 2017 • edited Loading

cpelley commented Apr 29, 2016 •

edited

Loading

pp-mo commented May 6, 2016 •

edited

Loading

pp-mo commented May 9, 2016 •

edited

Loading

cpelley commented Nov 29, 2016 •

edited

Loading

cpelley commented Dec 8, 2016 •

edited

Loading

pp-mo commented Jan 4, 2017 •

edited

Loading