-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharedata #2691
Sharedata #2691
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the requirement for this sort of behaviour in Iris, the fact remains that there are some very serious problems with the implementation presented here. The approach proposed simply does not adhere to how Iris makes these sort of runtime changes, which needs to be done using a new options class added to iris.config
. Some of the logic proposed here is highly doubtful, and a lot more testing is needed to ensure this will behave correctly in more than just the most basic of usages.
@@ -1877,6 +1882,50 @@ def test_remove_cell_measure(self): | |||
[[self.b_cell_measure, (0, 1)]]) | |||
|
|||
|
|||
class Test_share_data(tests.IrisTest): | |||
def setter_lazy_data(self): | |||
cube = Cube(biggus.NumpyArrayAdapter(np.arange(6).reshape(2, 3))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Biggus is no longer a dependency of Iris.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also a mistake, that I missed due to the earlier mistake, of the test not running
# not unncessarily copy the old points. | ||
|
||
# Create a temp-coord to manage deep copying of a small array. | ||
temp_coord = copy.copy(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do self.copy()
?
# a deepcopy operation. | ||
temp_coord._points_dm = DataManager(np.array((1,))) | ||
new_coord = copy.deepcopy(temp_coord) | ||
del(temp_coord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be happening: using del
in module code is a good indication that something is badly wrong in the code's logic.
@@ -781,6 +781,27 @@ def __init__(self, data, standard_name=None, long_name=None, | |||
for cell_measure, dims in cell_measures_and_dims: | |||
self.add_cell_measure(cell_measure, dims) | |||
|
|||
# When True indexing may result in a view onto the original data array, | |||
# to avoid unnecessary copying. | |||
self._share_data = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this been placed at the end of the constructor method? There are specific locations in the flow of the constructor code for "properties" such as this one, so why not follow the pattern.
self._share_data = False | ||
|
||
@property | ||
def share_data(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What? Why is this even a cube method? We need to be reducing the size of the cube API, not adding random stuff to it.
This should in no way be a cube method. If we need to change the behaviour of Iris at runtime, we have a module for that: iris.config
. In there we can add option classes for controlling precisely this kind of behaviour. See #2467 for a good example of doing just that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to even consider sharing data, I would have thought that using iris.FUTURE
might also have been an appropriate way to go. It's less intrusive.
Is there a direct need to have this as a property to fit the use case?
@@ -66,6 +67,10 @@ def test_matrix(self): | |||
self.assertEqual(type(cube.data), np.ndarray) | |||
self.assertArrayEqual(cube.data, data) | |||
|
|||
def test_default_share_data(self): | |||
cube = Cube(np.arange(1)) | |||
self.assertFalse(cube.share_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
share_data
should not be a property of a cube.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cube.share_data
is part of the Iris public API
https://github.com/SciTools/iris/blob/v1.13.0/lib/iris/cube.py#L780
it was proposed by a contributor and reviewed and merged by me; it was included in the 1.13 release.
This followed a long and detailed set of feature requirements, that was drastically paired down to try and minimise impact for version 2
I do not think it is appropriate to remove the part of the API in the next version of Iris. There has been no deprecation warning or any thought on alternative API options to retain the required functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought on alternative API options
There's one just above in my review comments that can be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was proposed by a contributor and reviewed and merged by me
@marqh then it seems that this functionality was not properly thought through before being added to v1.13, despite numerous requests that it was. This puts you in a difficult situation because this PR is not getting merged in its current form. This behaviour will not be retained in its current form, so your proposed API will have to change, as per the recommendations elsewhere in the review of this PR.
@@ -1877,6 +1882,50 @@ def test_remove_cell_measure(self): | |||
[[self.b_cell_measure, (0, 1)]]) | |||
|
|||
|
|||
class Test_share_data(tests.IrisTest): | |||
def setter_lazy_data(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this will be run: test methods needs to be prefixed with test_
for unittest to pick them up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a mistake, these should be def test...
self.assertFalse(cube.has_lazy_data()) | ||
self.assertTrue(cube._share_data) | ||
|
||
def setter_realised_data(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this will be run: test methods needs to be prefixed with test_
for unittest to pick them up.
new_coord.bounds = bounds | ||
else: | ||
new_coord = copy.deepcopy(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes to coords.py
are not tested.
# We don't want a view of the data, so take a copy of it, unless | ||
# self.share_data is True. | ||
if not self.share_data: | ||
data = deepcopy(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that for such a large change to cube behaviour the testing is very light. We have no idea at all what will happen in general use cases – at the very least this change requires some integration testing, but some more robust unit testing as well would also be good.
@marqh there are still many outstanding issues with this PR that need to be addressed. |
there are many levels of comment in this, it is difficult to follow a coherent train of thought. Rather than post on all, I will pick out a small part, which has implications for all the rest.
I do not think this is a fair reflection on attempts to carefully implement this functionality.
I don't think this level of response is within your remit. According to http://scitools.org.uk/iris/docs/latest/developers_guide/deprecations.html Whether you agree or disagree with the implementation, this functionality and public API has been included in a release, and it is in active use within our user community. As a developer community, we can look to migrate this functionality to a different API over time, but we have signed up to not simply removing API functionality without deprecation warnings. We have also signed up to as @lbdreyer points out in #2681
There are views shared on that issue about this change and alterations to change management processes, but applying these retrospectively to this case is a fairly dubious tactic, I feel.
It is far from clear to me that this is a useful and suitable approach for this particular case, this is about the behaviour of an individual cube, and in particular giving experienced developers the space to optimise their code for specific cubes without Iris trampling all over their careful implementation. I am happy to to engage with detailed discussions about how to implement this API feature for master and Iris 2.0. |
Of course it is, and it's in your remit, and it's in the remit of all Iris devs. If something isn't working we all have the remit to fix it, otherwise Iris will never improve. The change management whitepaper is (a) a part of Iris and changeable, and (b) not the final word on how change will occur – it works for us, not we for it. If it is not working for us then it is within our remit to fix it, which has already been identified as being necessary. This v2 release is the first real test of the change management whitepaper, and it has been shown to just be too strict in its requirements, of which this proposal is a good example. Our change management is based on SemVer. This states that "MAJOR version[s are] when you make incompatible API changes" (maintaining original emphasis), which must happen in this case. |
Problem: you have proposed this unprecedented behaviour change to help a limited subset of Iris users who are also experienced developers. This excludes the majority of Iris users, who frequently are inexperienced developers. They may not expect the behaviour that you propose to introduce, and there is no evidence that they need it either. The proposed solution for the experienced developers is too prominent within Iris (as a cube-level API) so introduces a risk that inexperienced developers will encounter unprecedented behaviour (a cube's data unexpectedly being loaded) through mis-application of this API. |
There is no movement: there is general team intention that this PR will not go into Iris in its current implementation.
There will be a team discussion to determine how to retain the behaviour, which while limited in requirement is undoubtedly useful, and will determine the most appropriate implementation of the required behaviour. |
The shared data feature is perhaps the single most divisive/controversial iris feature in recent memory. I'm putting together the iris 2.0.0 release candidate, and whilst it is clear that we all seem to support the idea of shared data, we certainly don't agree on the implementation. There are some serious and legitimate API concerns with the changes here (and released in For that reason, I'm going to keep the PR open with a |
i will not be putting any further work into this pull request |
Ultimately, I still think this is needed. IMHO we should aim to be "like numpy".
|
I couldn't find a |
Thanks + agreed that is probably better ... |
regarding #2681 I think that 'share_data' is part of the public API in 1.13 and should be preserved
this PR reimplements #2549 with respect to master, a fairly simple exercise
I think this should be included in iris 2.0