-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharedata #2691
Sharedata #2691
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -521,14 +521,24 @@ def copy(self, points=None, bounds=None): | |
raise ValueError('If bounds are specified, points must also be ' | ||
'specified') | ||
|
||
new_coord = copy.deepcopy(self) | ||
if points is not None: | ||
# We do not perform a deepcopy when we supply new points so as to | ||
# not unncessarily copy the old points. | ||
|
||
# Create a temp-coord to manage deep copying of a small array. | ||
temp_coord = copy.copy(self) | ||
temp_coord.bounds = None | ||
# note: DataManager cannot be None or DataManager(None) for | ||
# a deepcopy operation. | ||
temp_coord._points_dm = DataManager(np.array((1,))) | ||
new_coord = copy.deepcopy(temp_coord) | ||
del(temp_coord) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should not be happening: using |
||
new_coord._points_dm = None | ||
new_coord.points = points | ||
# Regardless of whether bounds are provided as an argument, new | ||
# points will result in new bounds, discarding those copied from | ||
# self. | ||
# new points will result in new bounds. | ||
new_coord.bounds = bounds | ||
else: | ||
new_coord = copy.deepcopy(self) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These changes to |
||
|
||
return new_coord | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -781,6 +781,27 @@ def __init__(self, data, standard_name=None, long_name=None, | |
for cell_measure, dims in cell_measures_and_dims: | ||
self.add_cell_measure(cell_measure, dims) | ||
|
||
# When True indexing may result in a view onto the original data array, | ||
# to avoid unnecessary copying. | ||
self._share_data = False | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why has this been placed at the end of the constructor method? There are specific locations in the flow of the constructor code for "properties" such as this one, so why not follow the pattern. |
||
|
||
@property | ||
def share_data(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What? Why is this even a cube method? We need to be reducing the size of the cube API, not adding random stuff to it. This should in no way be a cube method. If we need to change the behaviour of Iris at runtime, we have a module for that: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we were to even consider sharing data, I would have thought that using Is there a direct need to have this as a property to fit the use case? |
||
""" | ||
Share cube data when slicing/indexing cube if True. | ||
Setting this flag to True will realise the data payload, | ||
if it is lazy, as lazy data cannot currently be shared across cubes. | ||
""" | ||
return self._share_data | ||
|
||
@share_data.setter | ||
def share_data(self, value): | ||
# If value is True: realise the data (if is hasn't already been) as | ||
# sharing lazy data is not possible. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not believe this comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok; most things are possible, at some level. this is not possible within the current implementation would you just prefer There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer lazy data to be shared, thanks. |
||
if value and self.has_lazy_data(): | ||
_ = self.data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line makes no sense:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it states in the doc string that this will happen i'm not minded to add a runtime warning saying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
this was covered in detail in the long efforts to get functionality merged onto master in 1.12 (failed) then 1.13 (succeeded) within current implementations, sharing of lazy data is not helpful or useful; it is only a functional thing on realised data. This could change in the future, but not in this minimal implementation. thus: if one wants to share data across cubes, sub-cube slices or similar, the data must be realised first
this is a pattern that has been used elsewhere to avoid any risk the the data will be streamed to standard out, i use it quite often myself. The code in this form is a way of saying, i am realising the data, but i'm not using it here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That doesn't defend the unprecedented behaviour. This method should not load data; only There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think it could be helpful and useful; I'm sure there are use-cases beyond this very tight use-case that could make use of such behaviour. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can I have an example of that from existing Iris code, please? Of course, it would be much the better if this method were not loading data at all. |
||
self._share_data = bool(value) | ||
|
||
@property | ||
def metadata(self): | ||
""" | ||
|
@@ -2173,8 +2194,10 @@ def new_cell_measure_dims(cm_): | |
dimension_mapping, data = iris.util._slice_data_with_keys( | ||
cube_data, keys) | ||
|
||
# We don't want a view of the data, so take a copy of it. | ||
data = deepcopy(data) | ||
# We don't want a view of the data, so take a copy of it, unless | ||
# self.share_data is True. | ||
if not self.share_data: | ||
data = deepcopy(data) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm concerned that for such a large change to cube behaviour the testing is very light. We have no idea at all what will happen in general use cases – at the very least this change requires some integration testing, but some more robust unit testing as well would also be good. |
||
|
||
# We can turn a masked array into a normal array if it's full. | ||
if ma.isMaskedArray(data): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ | |
|
||
from itertools import permutations | ||
|
||
import dask.array as da | ||
import numpy as np | ||
import numpy.ma as ma | ||
|
||
|
@@ -66,6 +67,10 @@ def test_matrix(self): | |
self.assertEqual(type(cube.data), np.ndarray) | ||
self.assertArrayEqual(cube.data, data) | ||
|
||
def test_default_share_data(self): | ||
cube = Cube(np.arange(1)) | ||
self.assertFalse(cube.share_data) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
it was proposed by a contributor and reviewed and merged by me; it was included in the 1.13 release. I do not think it is appropriate to remove the part of the API in the next version of Iris. There has been no deprecation warning or any thought on alternative API options to retain the required functionality. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There's one just above in my review comments that can be implemented. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@marqh then it seems that this functionality was not properly thought through before being added to v1.13, despite numerous requests that it was. This puts you in a difficult situation because this PR is not getting merged in its current form. This behaviour will not be retained in its current form, so your proposed API will have to change, as per the recommendations elsewhere in the review of this PR. |
||
|
||
|
||
class Test_extract(tests.IrisTest): | ||
def test_scalar_cube_exists(self): | ||
|
@@ -1877,6 +1882,50 @@ def test_remove_cell_measure(self): | |
[[self.b_cell_measure, (0, 1)]]) | ||
|
||
|
||
class Test_share_data(tests.IrisTest): | ||
def setter_lazy_data(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe this will be run: test methods needs to be prefixed with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a mistake, these should be |
||
cube = Cube(biggus.NumpyArrayAdapter(np.arange(6).reshape(2, 3))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope. Biggus is no longer a dependency of Iris. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is also a mistake, that I missed due to the earlier mistake, of the test not running |
||
cube.share_data = True | ||
self.assertFalse(cube.has_lazy_data()) | ||
self.assertTrue(cube._share_data) | ||
|
||
def setter_realised_data(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't believe this will be run: test methods needs to be prefixed with |
||
cube = Cube(np.arange(6).reshape(2, 3)) | ||
cube.share_data = True | ||
self.assertFalse(cube.has_lazy_data()) | ||
self.assertTrue(cube._share_data) | ||
|
||
|
||
class Test___getitem__no_share_data(tests.IrisTest): | ||
def test_lazy_array(self): | ||
cube = Cube(da.from_array(np.arange(6).reshape(2, 3), chunks=6)) | ||
cube2 = cube[1:] | ||
self.assertTrue(cube2.has_lazy_data()) | ||
cube.data | ||
self.assertTrue(cube2.has_lazy_data()) | ||
|
||
def test_ndarray(self): | ||
cube = Cube(np.arange(6).reshape(2, 3)) | ||
cube2 = cube[1:] | ||
self.assertIsNot(cube.data.base, cube2.data.base) | ||
|
||
|
||
class Test___getitem__share_data(tests.IrisTest): | ||
def test_lazy_array(self): | ||
cube = Cube(da.from_array(np.arange(6).reshape(2, 3), chunks=6)) | ||
cube.share_data = True | ||
cube2 = cube[1:] | ||
self.assertFalse(cube.has_lazy_data()) | ||
self.assertFalse(cube2.has_lazy_data()) | ||
self.assertIs(cube.data.base, cube2.data.base) | ||
|
||
def test_ndarray(self): | ||
cube = Cube(np.arange(6).reshape(2, 3)) | ||
cube.share_data = True | ||
cube2 = cube[1:] | ||
self.assertIs(cube.data.base, cube2.data.base) | ||
|
||
|
||
class Test__getitem_CellMeasure(tests.IrisTest): | ||
def setUp(self): | ||
cube = Cube(np.arange(6).reshape(2, 3)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do
self.copy()
?