Sharedata #2691

marqh · 2017-07-19T14:57:28Z

regarding #2681 I think that 'share_data' is part of the public API in 1.13 and should be preserved

this PR reimplements #2549 with respect to master, a fairly simple exercise

I think this should be included in iris 2.0

DPeterK

While I understand the requirement for this sort of behaviour in Iris, the fact remains that there are some very serious problems with the implementation presented here. The approach proposed simply does not adhere to how Iris makes these sort of runtime changes, which needs to be done using a new options class added to iris.config. Some of the logic proposed here is highly doubtful, and a lot more testing is needed to ensure this will behave correctly in more than just the most basic of usages.

DPeterK · 2017-07-20T07:24:09Z

lib/iris/tests/unit/cube/test_Cube.py

@@ -1877,6 +1882,50 @@ def test_remove_cell_measure(self):
                         [[self.b_cell_measure, (0, 1)]])


+class Test_share_data(tests.IrisTest):
+    def setter_lazy_data(self):
+        cube = Cube(biggus.NumpyArrayAdapter(np.arange(6).reshape(2, 3)))


Nope. Biggus is no longer a dependency of Iris.

this is also a mistake, that I missed due to the earlier mistake, of the test not running

DPeterK · 2017-07-20T08:20:04Z

lib/iris/coords.py

+            # not unncessarily copy the old points.
+
+            # Create a temp-coord to manage deep copying of a small array.
+            temp_coord = copy.copy(self)


Just do self.copy()?

DPeterK · 2017-07-20T08:20:54Z

lib/iris/coords.py

+            # a deepcopy operation.
+            temp_coord._points_dm = DataManager(np.array((1,)))
+            new_coord = copy.deepcopy(temp_coord)
+            del(temp_coord)


This should not be happening: using del in module code is a good indication that something is badly wrong in the code's logic.

DPeterK · 2017-07-20T08:24:24Z

lib/iris/cube.py

@@ -781,6 +781,27 @@ def __init__(self, data, standard_name=None, long_name=None,
            for cell_measure, dims in cell_measures_and_dims:
                self.add_cell_measure(cell_measure, dims)

+        # When True indexing may result in a view onto the original data array,
+        # to avoid unnecessary copying.
+        self._share_data = False


Why has this been placed at the end of the constructor method? There are specific locations in the flow of the constructor code for "properties" such as this one, so why not follow the pattern.

DPeterK · 2017-07-20T08:27:35Z

lib/iris/cube.py

+        self._share_data = False
+
+    @property
+    def share_data(self):


What? Why is this even a cube method? We need to be reducing the size of the cube API, not adding random stuff to it.

This should in no way be a cube method. If we need to change the behaviour of Iris at runtime, we have a module for that: iris.config. In there we can add option classes for controlling precisely this kind of behaviour. See #2467 for a good example of doing just that.

If we were to even consider sharing data, I would have thought that using iris.FUTURE might also have been an appropriate way to go. It's less intrusive.

Is there a direct need to have this as a property to fit the use case?

DPeterK · 2017-07-20T08:48:39Z

lib/iris/tests/unit/cube/test_Cube.py

@@ -66,6 +67,10 @@ def test_matrix(self):
        self.assertEqual(type(cube.data), np.ndarray)
        self.assertArrayEqual(cube.data, data)

+    def test_default_share_data(self):
+        cube = Cube(np.arange(1))
+        self.assertFalse(cube.share_data)


share_data should not be a property of a cube.

cube.share_data is part of the Iris public API
https://github.com/SciTools/iris/blob/v1.13.0/lib/iris/cube.py#L780

it was proposed by a contributor and reviewed and merged by me; it was included in the 1.13 release.
This followed a long and detailed set of feature requirements, that was drastically paired down to try and minimise impact for version 2

I do not think it is appropriate to remove the part of the API in the next version of Iris. There has been no deprecation warning or any thought on alternative API options to retain the required functionality.

thought on alternative API options

There's one just above in my review comments that can be implemented.

it was proposed by a contributor and reviewed and merged by me

@marqh then it seems that this functionality was not properly thought through before being added to v1.13, despite numerous requests that it was. This puts you in a difficult situation because this PR is not getting merged in its current form. This behaviour will not be retained in its current form, so your proposed API will have to change, as per the recommendations elsewhere in the review of this PR.

DPeterK · 2017-07-20T09:31:09Z

lib/iris/tests/unit/cube/test_Cube.py

@@ -1877,6 +1882,50 @@ def test_remove_cell_measure(self):
                         [[self.b_cell_measure, (0, 1)]])


+class Test_share_data(tests.IrisTest):
+    def setter_lazy_data(self):


I don't believe this will be run: test methods needs to be prefixed with test_ for unittest to pick them up.

this is a mistake, these should be def test...

DPeterK · 2017-07-20T09:31:13Z

lib/iris/tests/unit/cube/test_Cube.py

+        self.assertFalse(cube.has_lazy_data())
+        self.assertTrue(cube._share_data)
+
+    def setter_realised_data(self):


I don't believe this will be run: test methods needs to be prefixed with test_ for unittest to pick them up.

DPeterK · 2017-07-20T09:32:19Z

lib/iris/coords.py

            new_coord.bounds = bounds
+        else:
+            new_coord = copy.deepcopy(self)


These changes to coords.py are not tested.

DPeterK · 2017-07-20T09:34:38Z

lib/iris/cube.py

+        # We don't want a view of the data, so take a copy of it, unless
+        # self.share_data is True.
+        if not self.share_data:
+            data = deepcopy(data)


I'm concerned that for such a large change to cube behaviour the testing is very light. We have no idea at all what will happen in general use cases – at the very least this change requires some integration testing, but some more robust unit testing as well would also be good.

DPeterK · 2017-07-24T10:12:32Z

@marqh there are still many outstanding issues with this PR that need to be addressed.

marqh · 2017-07-24T15:59:32Z

there are many levels of comment in this, it is difficult to follow a coherent train of thought.

Rather than post on all, I will pick out a small part, which has implications for all the rest.

then it seems that this functionality was not properly thought through before being added to v1.13, despite numerous requests that it was.

I do not think this is a fair reflection on attempts to carefully implement this functionality.

This puts you in a difficult situation because this PR is not getting merged in its current form. This behaviour will not be retained in its current form, so your proposed API will have to change, as per the recommendations elsewhere in the review of this PR.

I don't think this level of response is within your remit. According to http://scitools.org.uk/iris/docs/latest/developers_guide/deprecations.html
If you need to make a backwards-incompatible change to a public API [1] that has been included in a release (e.g. deleting a method), then you must first deprecate the old behaviour in at least one release, before removing/updating it in the next major release.

Whether you agree or disagree with the implementation, this functionality and public API has been included in a release, and it is in active use within our user community.

As a developer community, we can look to migrate this functionality to a different API over time, but we have signed up to not simply removing API functionality without deprecation warnings. We have also signed up to
Where possible, your deprecation warning should include advice on how to avoid using the deprecated API.
So a replacement methodology should be provided as part of deprecation.

as @lbdreyer points out in #2681

share_data is a property of a public object (iris.cube.Cube) and therefore surely needs to be deprecated - it can't just vanish.

There are views shared on that issue about this change and alterations to change management processes, but applying these retrospectively to this case is a fairly dubious tactic, I feel.

The approach proposed simply does not adhere to how Iris makes these sort of runtime changes, which needs to be done using a new options class added to iris.config

It is far from clear to me that this is a useful and suitable approach for this particular case, this is about the behaviour of an individual cube, and in particular giving experienced developers the space to optimise their code for specific cubes without Iris trampling all over their careful implementation.
Perhaps there is an opportunity to investigate this in the future, but I am wary of making a jump to this approach when all I am trying to do is preserve the limited and useful behaviour from v1.13.

I am happy to to engage with detailed discussions about how to implement this API feature for master and Iris 2.0.
The strong language (in bold typeface) about API decisions and data loading is not helping this, in my view.
@dkillick are you prepared to engage in a discussion with me about how to implement this API feature as per our API maintenance statements and focus review efforts on fine details of coding an appropriate levels of testing, or are you firmly attached to your positions on API, loading data, implementing shared lazy data and so on?

DPeterK · 2017-07-25T08:21:39Z

I don't think this level of response is within your remit

Of course it is, and it's in your remit, and it's in the remit of all Iris devs. If something isn't working we all have the remit to fix it, otherwise Iris will never improve. The change management whitepaper is (a) a part of Iris and changeable, and (b) not the final word on how change will occur – it works for us, not we for it. If it is not working for us then it is within our remit to fix it, which has already been identified as being necessary. This v2 release is the first real test of the change management whitepaper, and it has been shown to just be too strict in its requirements, of which this proposal is a good example.

Our change management is based on SemVer. This states that "MAJOR version[s are] when you make incompatible API changes" (maintaining original emphasis), which must happen in this case.

DPeterK · 2017-07-25T08:30:24Z

in particular giving experienced developers the space to optimise their code for specific cubes without Iris trampling all over their careful implementation

limited and useful behaviour

Problem: you have proposed this unprecedented behaviour change to help a limited subset of Iris users who are also experienced developers. This excludes the majority of Iris users, who frequently are inexperienced developers. They may not expect the behaviour that you propose to introduce, and there is no evidence that they need it either. The proposed solution for the experienced developers is too prominent within Iris (as a cube-level API) so introduces a risk that inexperienced developers will encounter unprecedented behaviour (a cube's data unexpectedly being loaded) through mis-application of this API.

DPeterK · 2017-07-25T08:35:08Z

are you firmly attached to your positions

There is no movement: there is general team intention that this PR will not go into Iris in its current implementation.

are you prepared to engage in a discussion

There will be a team discussion to determine how to retain the behaviour, which while limited in requirement is undoubtedly useful, and will determine the most appropriate implementation of the required behaviour.

pelson · 2017-10-26T15:12:16Z

The shared data feature is perhaps the single most divisive/controversial iris feature in recent memory. I'm putting together the iris 2.0.0 release candidate, and whilst it is clear that we all seem to support the idea of shared data, we certainly don't agree on the implementation.

There are some serious and legitimate API concerns with the changes here (and released in iris 1.13) that make me reluctant to accept this change as part of the v2.0.0 release preparation. Fortunately, as with any new feature, we do have the capacity to introduce this change slowly through keyword arguments / the FUTURE mechanism during any minor release.

For that reason, I'm going to keep the PR open with a Status: Decision needed label, but move it out of the v2.0.0 milestone. In practice, we will therefore need to document the breaking change that not re-adding this feature introduces. My apologies to those waiting on this feature - we all want to see the performance boosts that not always copying the data brings, but it is clear that we don't yet agree on the maintenance/API costs we pay for gaining them.

marqh · 2018-06-01T14:27:09Z

i will not be putting any further work into this pull request

pp-mo · 2018-09-13T17:27:34Z

Ultimately, I still think this is needed.

IMHO we should aim to be "like numpy".
In this context, that means in the worst cases (e.g. indexing) :

"Result is usually a view, but in some cases a copy."
"it's too complicated to explain exactly when."
"it might change in future releases"

stickler-ci · 2018-09-13T17:28:09Z

I couldn't find a .stickler.yml file in this repository. I can make one for you, or you can create one by following the documentation.

DPeterK · 2018-09-17T09:41:07Z

@pp-mo is quite right that this topic needs to be on our radar, but re-opening a rejected PR is not the best way to ensure this. I've raised #3172 to make sure the fact there's a problem is recorded so that it can be fully addressed.

pp-mo · 2018-09-17T14:45:16Z

I've raised #3172 to make sure the fact there's a problem is recorded

Thanks + agreed that is probably better ...

Carwyn Pelley and others added 3 commits July 14, 2017 07:19

share_data flag

97133e2

handle dask and data manager

595f2a5

pep8

68ccb57

marqh added this to the v2.0 milestone Jul 19, 2017

marqh mentioned this pull request Jul 19, 2017

Deal with #2549 being merged in v1.13 but not v2.0 #2681

Closed

DPeterK requested changes Jul 20, 2017

View reviewed changes

fix test not running mistake

5691517

pelson self-assigned this Oct 19, 2017

pelson added the Status: Decision Required label Oct 26, 2017

pelson added the Experience: High label Oct 26, 2017

pelson removed this from the v2.0.0 milestone Oct 26, 2017

marqh closed this Jun 1, 2018

pp-mo reopened this Sep 13, 2018

DPeterK mentioned this pull request Sep 17, 2018

Allow cubes/coords/etc to share data #3172

Open

DPeterK closed this Sep 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharedata #2691

Sharedata #2691

marqh commented Jul 19, 2017

DPeterK left a comment

DPeterK Jul 20, 2017

marqh Jul 21, 2017

DPeterK Jul 20, 2017

DPeterK Jul 20, 2017

DPeterK Jul 20, 2017

DPeterK Jul 20, 2017

bjlittle Jul 24, 2017

DPeterK Jul 20, 2017

marqh Jul 21, 2017

DPeterK Jul 24, 2017

DPeterK Jul 24, 2017

DPeterK Jul 20, 2017

marqh Jul 21, 2017

DPeterK Jul 20, 2017

DPeterK Jul 20, 2017

DPeterK Jul 20, 2017

DPeterK commented Jul 24, 2017

marqh commented Jul 24, 2017

DPeterK commented Jul 25, 2017

DPeterK commented Jul 25, 2017

DPeterK commented Jul 25, 2017

pelson commented Oct 26, 2017

marqh commented Jun 1, 2018

pp-mo commented Sep 13, 2018 •

edited

Loading

stickler-ci commented Sep 13, 2018

DPeterK commented Sep 17, 2018

pp-mo commented Sep 17, 2018

Sharedata #2691

Sharedata #2691

Conversation

marqh commented Jul 19, 2017

DPeterK left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DPeterK commented Jul 24, 2017

marqh commented Jul 24, 2017

DPeterK commented Jul 25, 2017

DPeterK commented Jul 25, 2017

DPeterK commented Jul 25, 2017

pelson commented Oct 26, 2017

marqh commented Jun 1, 2018

pp-mo commented Sep 13, 2018 • edited Loading

stickler-ci commented Sep 13, 2018

DPeterK commented Sep 17, 2018

pp-mo commented Sep 17, 2018

pp-mo commented Sep 13, 2018 •

edited

Loading