-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove deepcopies when slicing cubes and copying coords #1992
Conversation
This is a first attempt at removing unnecessary (and very slow) deepcopy operations with slicing or otherwise manipulating cubes and coordinates. See SciTools#914. Note: A few of the unit tests are failing, because they insist on checking the order (Fortran or C) of numpy arrays. I think these checks should be removed, because it is a waste of computational effort to always ensure arrays are contiguous. If some code needs to interface with external modules code that require continguous arrays, it should use np.ascontiguousarray or np.asfortranarray at the immediate level of the wrapper.
Great stuff, thank for doing this @rhattersley , very much appreciated. My only concern is that when the cube is still lazy and sliced, that the sliced cube data when realised will not realise the data in the original cube. Could this not pose a possible danger with some users, returning views of their arrays (when they though it was still lazy) or vice versa if the data has been actualised. Regarding the future toggle, it does sound like a good idea. I'm sure there must be people who have hardcoded slicing the cube (expecting not views of their data). Check this out @jkettleb :) Thanks @rhattersley |
True. >>> air_temp.has_lazy_data()
True
>>> t0 = air_temp[0]
>>> t0.has_lazy_data()
True
>>> real_numbers = t0.data
>>> t0.has_lazy_data()
False
>>> air_temp.has_lazy_data()
True
It's certainly possibly that user code might rely on the old data copying behaviour to ensure that modifying the data in a sliced cube doesn't modify the data in the original cube. My guess is that such reliance is rare, but if there is any affected code the impact from this change might be hard to track down. (The same sort of thing applies to coordinate copies.) Other that that I think the only implication is a change to the memory/performance characteristics. |
Is a degradation in performance you think significant enough to worry about? I think this would be a really nice addition :) |
I wasn't implying performance will degrade, just that the performance will change. Hopefully mostly for the better, but perhaps sometimes for the worse. It will depend on the use case. |
Options:
NB. Even if we use an iris.FUTURE toggle I can't think of a sensible way to issue a deprecation warning for the deep-copying behaviour. Lots of user code is going to be doing cube indexing without caring about the data-copying behaviour, and there's no real way for Iris to tell. It doesn't make much sense to have almost everyone have to insert My current suggestion: go with option (3) but without any kind of deprecation warning. (When we make the switch in 2.0 we would still need to deprecate the iris.FUTURE.share_data toggle.) |
I've pushed a commit which implements option (3). |
I'm happy with option3 (no risk then and least controversy). Thanks @rhattersley |
TEST: Added test for cube.__getitem__
ooh just remembered, don't we need to update what's new for this? |
I was initially a bit horrified at breaking so much deeply-buried subtle behaviour ! |
After a closer look at this, I really don't see _why_ we can't issue deprecation warnings for these changes.
See my proposals at #1999 for making attempts to commit to deeper promises regarding deprecation warnings. |
Happy if a deprecation warning were to be issued. Assuming your happy with keeping the FUTURE toggle though?
+1 Perhaps an explicit 'Copies and views' section. However, unless I'm mistaken, there is a bigger hole here to describe in the user guide that extends beyond this behavioural change (an explicit section that explains when/what provides views/copies of what). There is a great deal of misunderstanding amongst the iris community. Perhaps this information is in there somewhere? For this reason, I would propose splitting the userguide work to another issue with v1.10 milestone. What you think? |
Yes, we need a control and I think it's just the kind of thing FUTURE should be used for. My "new proposals" include some extra rules + enhanced importance for deprecations : It's of key importance that you can avoid deprecated features. |
I think any existing issues with copies + views are all based with _numpy_, as Iris itself (as it currently stands) makes strenuous efforts to avoid producing views anywhere within the cube operations. So, it should really refer to the numpy docs to explain the concept.
I think we should definitely not merge this without the accompanying documentation, so I'm not sure of the benefit of treating it as separate. |
I meant a bit wider than this. Other examples might include dropping metadata when performing most operations. I understand why but an explicit top-level section concerning the subtle behaviours of working with cubes would help iris users get to grips with how to better think about the concept of cubes.
Thanks @pp-mo sure. |
I'm not sure we're ready to force this into v1.10. It hasn't had very wide-spread discussion. |
I took a look into how to document/explain this, and was rather taken aback by some of the existing behaviour. I had thought we consistently avoided view-like copies in Iris up to now, but that is not the case for coords as it turns out .
So [:] delivers a new coord with a view on the old one after all. This could make it harder to explain what we are changing. |
Thanks for clarifying @pp-mo. Coming from a numpy perspective, having Rather than make |
@pp-mo are we able to reach an agreed way forward for this? Cheers |
This PR looks good to go to me and provides significant benefit to us. Cheers |
ping |
Please let me know if there's anything I can do to help to get this in. Cheers |
Hello @cpelley please accept my apologies for this getting left behind and you having to chase so. I feel that the change makes sense and I am content to support it.
Is this still an open question that needs addressing?
Practically, it is a goodly while since this code was written (and tested) and there are references to deprecated in 1.10 I am minded to merge such a PR once I have seen it thank you |
Thanks for taking this up @marqh
A subtlety I missed the first time around. I think we are opening ourselves to some potential problems by having the option to switch between both behaviours but whether this risk is greater than the backlash (impact) of changing the default behaviour right away... I don't know. I never feel too unconformable going with @rhattersley's preferred option :).
Happy to make a new PR Thanks @marqh |
An updated version of #939.
Not ready for merging...
...the remaining open question is whether being able to switch this change on/off with a
Future
toggle out-weighs the implementation leakage.