Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharedata #2691

Closed
wants to merge 4 commits into from
Closed

Sharedata #2691

wants to merge 4 commits into from

Conversation

marqh
Copy link
Member

@marqh marqh commented Jul 19, 2017

regarding #2681 I think that 'share_data' is part of the public API in 1.13 and should be preserved

this PR reimplements #2549 with respect to master, a fairly simple exercise

I think this should be included in iris 2.0

@marqh marqh added this to the v2.0 milestone Jul 19, 2017
Copy link
Member

@DPeterK DPeterK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the requirement for this sort of behaviour in Iris, the fact remains that there are some very serious problems with the implementation presented here. The approach proposed simply does not adhere to how Iris makes these sort of runtime changes, which needs to be done using a new options class added to iris.config. Some of the logic proposed here is highly doubtful, and a lot more testing is needed to ensure this will behave correctly in more than just the most basic of usages.

@@ -1877,6 +1882,50 @@ def test_remove_cell_measure(self):
[[self.b_cell_measure, (0, 1)]])


class Test_share_data(tests.IrisTest):
def setter_lazy_data(self):
cube = Cube(biggus.NumpyArrayAdapter(np.arange(6).reshape(2, 3)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Biggus is no longer a dependency of Iris.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also a mistake, that I missed due to the earlier mistake, of the test not running

# not unncessarily copy the old points.

# Create a temp-coord to manage deep copying of a small array.
temp_coord = copy.copy(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do self.copy()?

# a deepcopy operation.
temp_coord._points_dm = DataManager(np.array((1,)))
new_coord = copy.deepcopy(temp_coord)
del(temp_coord)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be happening: using del in module code is a good indication that something is badly wrong in the code's logic.

@@ -781,6 +781,27 @@ def __init__(self, data, standard_name=None, long_name=None,
for cell_measure, dims in cell_measures_and_dims:
self.add_cell_measure(cell_measure, dims)

# When True indexing may result in a view onto the original data array,
# to avoid unnecessary copying.
self._share_data = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this been placed at the end of the constructor method? There are specific locations in the flow of the constructor code for "properties" such as this one, so why not follow the pattern.

self._share_data = False

@property
def share_data(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What? Why is this even a cube method? We need to be reducing the size of the cube API, not adding random stuff to it.

This should in no way be a cube method. If we need to change the behaviour of Iris at runtime, we have a module for that: iris.config. In there we can add option classes for controlling precisely this kind of behaviour. See #2467 for a good example of doing just that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to even consider sharing data, I would have thought that using iris.FUTURE might also have been an appropriate way to go. It's less intrusive.

Is there a direct need to have this as a property to fit the use case?

@@ -66,6 +67,10 @@ def test_matrix(self):
self.assertEqual(type(cube.data), np.ndarray)
self.assertArrayEqual(cube.data, data)

def test_default_share_data(self):
cube = Cube(np.arange(1))
self.assertFalse(cube.share_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

share_data should not be a property of a cube.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cube.share_data is part of the Iris public API
https://github.com/SciTools/iris/blob/v1.13.0/lib/iris/cube.py#L780

it was proposed by a contributor and reviewed and merged by me; it was included in the 1.13 release.
This followed a long and detailed set of feature requirements, that was drastically paired down to try and minimise impact for version 2

I do not think it is appropriate to remove the part of the API in the next version of Iris. There has been no deprecation warning or any thought on alternative API options to retain the required functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought on alternative API options

There's one just above in my review comments that can be implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was proposed by a contributor and reviewed and merged by me

@marqh then it seems that this functionality was not properly thought through before being added to v1.13, despite numerous requests that it was. This puts you in a difficult situation because this PR is not getting merged in its current form. This behaviour will not be retained in its current form, so your proposed API will have to change, as per the recommendations elsewhere in the review of this PR.

@@ -1877,6 +1882,50 @@ def test_remove_cell_measure(self):
[[self.b_cell_measure, (0, 1)]])


class Test_share_data(tests.IrisTest):
def setter_lazy_data(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this will be run: test methods needs to be prefixed with test_ for unittest to pick them up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a mistake, these should be def test...

self.assertFalse(cube.has_lazy_data())
self.assertTrue(cube._share_data)

def setter_realised_data(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this will be run: test methods needs to be prefixed with test_ for unittest to pick them up.

new_coord.bounds = bounds
else:
new_coord = copy.deepcopy(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes to coords.py are not tested.

# We don't want a view of the data, so take a copy of it, unless
# self.share_data is True.
if not self.share_data:
data = deepcopy(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that for such a large change to cube behaviour the testing is very light. We have no idea at all what will happen in general use cases – at the very least this change requires some integration testing, but some more robust unit testing as well would also be good.

@DPeterK
Copy link
Member

DPeterK commented Jul 24, 2017

@marqh there are still many outstanding issues with this PR that need to be addressed.

@marqh
Copy link
Member Author

marqh commented Jul 24, 2017

there are many levels of comment in this, it is difficult to follow a coherent train of thought.

Rather than post on all, I will pick out a small part, which has implications for all the rest.

then it seems that this functionality was not properly thought through before being added to v1.13, despite numerous requests that it was.

I do not think this is a fair reflection on attempts to carefully implement this functionality.

This puts you in a difficult situation because this PR is not getting merged in its current form. This behaviour will not be retained in its current form, so your proposed API will have to change, as per the recommendations elsewhere in the review of this PR.

I don't think this level of response is within your remit. According to http://scitools.org.uk/iris/docs/latest/developers_guide/deprecations.html
If you need to make a backwards-incompatible change to a public API [1] that has been included in a release (e.g. deleting a method), then you must first deprecate the old behaviour in at least one release, before removing/updating it in the next major release.

Whether you agree or disagree with the implementation, this functionality and public API has been included in a release, and it is in active use within our user community.

As a developer community, we can look to migrate this functionality to a different API over time, but we have signed up to not simply removing API functionality without deprecation warnings. We have also signed up to
Where possible, your deprecation warning should include advice on how to avoid using the deprecated API.
So a replacement methodology should be provided as part of deprecation.

as @lbdreyer points out in #2681

share_data is a property of a public object (iris.cube.Cube) and therefore surely needs to be deprecated - it can't just vanish.

There are views shared on that issue about this change and alterations to change management processes, but applying these retrospectively to this case is a fairly dubious tactic, I feel.

The approach proposed simply does not adhere to how Iris makes these sort of runtime changes, which needs to be done using a new options class added to iris.config

It is far from clear to me that this is a useful and suitable approach for this particular case, this is about the behaviour of an individual cube, and in particular giving experienced developers the space to optimise their code for specific cubes without Iris trampling all over their careful implementation.
Perhaps there is an opportunity to investigate this in the future, but I am wary of making a jump to this approach when all I am trying to do is preserve the limited and useful behaviour from v1.13.

I am happy to to engage with detailed discussions about how to implement this API feature for master and Iris 2.0.
The strong language (in bold typeface) about API decisions and data loading is not helping this, in my view.
@dkillick are you prepared to engage in a discussion with me about how to implement this API feature as per our API maintenance statements and focus review efforts on fine details of coding an appropriate levels of testing, or are you firmly attached to your positions on API, loading data, implementing shared lazy data and so on?

@DPeterK
Copy link
Member

DPeterK commented Jul 25, 2017

I don't think this level of response is within your remit

Of course it is, and it's in your remit, and it's in the remit of all Iris devs. If something isn't working we all have the remit to fix it, otherwise Iris will never improve. The change management whitepaper is (a) a part of Iris and changeable, and (b) not the final word on how change will occur – it works for us, not we for it. If it is not working for us then it is within our remit to fix it, which has already been identified as being necessary. This v2 release is the first real test of the change management whitepaper, and it has been shown to just be too strict in its requirements, of which this proposal is a good example.

Our change management is based on SemVer. This states that "MAJOR version[s are] when you make incompatible API changes" (maintaining original emphasis), which must happen in this case.

@DPeterK
Copy link
Member

DPeterK commented Jul 25, 2017

in particular giving experienced developers the space to optimise their code for specific cubes without Iris trampling all over their careful implementation

limited and useful behaviour

Problem: you have proposed this unprecedented behaviour change to help a limited subset of Iris users who are also experienced developers. This excludes the majority of Iris users, who frequently are inexperienced developers. They may not expect the behaviour that you propose to introduce, and there is no evidence that they need it either. The proposed solution for the experienced developers is too prominent within Iris (as a cube-level API) so introduces a risk that inexperienced developers will encounter unprecedented behaviour (a cube's data unexpectedly being loaded) through mis-application of this API.

@DPeterK
Copy link
Member

DPeterK commented Jul 25, 2017

are you firmly attached to your positions

There is no movement: there is general team intention that this PR will not go into Iris in its current implementation.

are you prepared to engage in a discussion

There will be a team discussion to determine how to retain the behaviour, which while limited in requirement is undoubtedly useful, and will determine the most appropriate implementation of the required behaviour.

@pelson
Copy link
Member

pelson commented Oct 26, 2017

The shared data feature is perhaps the single most divisive/controversial iris feature in recent memory. I'm putting together the iris 2.0.0 release candidate, and whilst it is clear that we all seem to support the idea of shared data, we certainly don't agree on the implementation.

There are some serious and legitimate API concerns with the changes here (and released in iris 1.13) that make me reluctant to accept this change as part of the v2.0.0 release preparation. Fortunately, as with any new feature, we do have the capacity to introduce this change slowly through keyword arguments / the FUTURE mechanism during any minor release.

For that reason, I'm going to keep the PR open with a Status: Decision needed label, but move it out of the v2.0.0 milestone. In practice, we will therefore need to document the breaking change that not re-adding this feature introduces. My apologies to those waiting on this feature - we all want to see the performance boosts that not always copying the data brings, but it is clear that we don't yet agree on the maintenance/API costs we pay for gaining them.

@pelson pelson removed this from the v2.0.0 milestone Oct 26, 2017
@marqh
Copy link
Member Author

marqh commented Jun 1, 2018

i will not be putting any further work into this pull request

@marqh marqh closed this Jun 1, 2018
@pp-mo
Copy link
Member

pp-mo commented Sep 13, 2018

Ultimately, I still think this is needed.

IMHO we should aim to be "like numpy".
In this context, that means in the worst cases (e.g. indexing) :

  • "Result is usually a view, but in some cases a copy."
  • "it's too complicated to explain exactly when."
  • "it might change in future releases"

@pp-mo pp-mo reopened this Sep 13, 2018
@stickler-ci
Copy link

I couldn't find a .stickler.yml file in this repository. I can make one for you, or you can create one by following the documentation.

@DPeterK
Copy link
Member

DPeterK commented Sep 17, 2018

@pp-mo is quite right that this topic needs to be on our radar, but re-opening a rejected PR is not the best way to ensure this. I've raised #3172 to make sure the fact there's a problem is recorded so that it can be fully addressed.

@DPeterK DPeterK closed this Sep 17, 2018
@pp-mo
Copy link
Member

pp-mo commented Sep 17, 2018

I've raised #3172 to make sure the fact there's a problem is recorded

Thanks + agreed that is probably better ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants