Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data first #2392

Merged
merged 41 commits into from
Feb 23, 2017
Merged

Data first #2392

merged 41 commits into from
Feb 23, 2017

Conversation

marqh
Copy link
Member

@marqh marqh commented Feb 20, 2017

for discussion regarding how to implement this interface

@QuLogic QuLogic added this to the dask milestone Feb 20, 2017
data = iris._lazy_data.as_lazy_data(proxy)
cube = iris.cube.Cube(data)
data = da.from_array(proxy, chunks=100)
cube = iris.cube.Cube(data, fill_value=fill_value)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this pass the data type in as well?

cube = iris.cube.Cube(data,
attributes=metadata.attributes,
cell_methods=metadata.cell_methods,
dim_coords_and_dims=metadata.dim_coords_and_dims,
aux_coords_and_dims=metadata.aux_coords_and_dims)
aux_coords_and_dims=metadata.aux_coords_and_dims,
fill_value=field.bmdi)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also set the data type

@@ -901,15 +902,15 @@ def _make_cube(field, converter):
metadata = converter(field)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please re-implement this, don't try/accept

lib/iris/cube.py Outdated
@@ -1631,9 +1636,10 @@ def lazy_data(self, array=None):
A lazy array, representing the Cube data array.

"""
result = None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't allow this, if there is not _numpy_array and no _dask_array raise an exception

@bjlittle bjlittle self-assigned this Feb 21, 2017
self.create_lbpack(lbpack),
None,
mask.shape, np.dtype('>f4'),
-999, mask=mask)
return np.ma.masked_array(data, np.isnan(data), fill_value=-999)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.ma -> ma

@@ -34,7 +34,7 @@ def test_masked(self):
masked_array = np.ma.masked_array([[1.0, 2.0], [3.0, 4.0]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.ma -> ma

@@ -34,7 +34,7 @@ def test_masked(self):
masked_array = np.ma.masked_array([[1.0, 2.0], [3.0, 4.0]],
mask=[[0, 1], [0, 0]])

result = array_masked_to_nans(masked_array)
result = array_masked_to_nans(masked_array).data

self.assertIsInstance(result, np.ndarray)
self.assertFalse(isinstance(result, np.ma.MaskedArray))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.ma -> ma

@@ -48,7 +48,7 @@ def test_masked(self):
def test_empty_mask(self):
masked_array = np.ma.masked_array([1.0, 2.0], mask=[0, 0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.ma -> ma

@@ -48,7 +48,7 @@ def test_masked(self):
def test_empty_mask(self):
masked_array = np.ma.masked_array([1.0, 2.0], mask=[0, 0])

result = array_masked_to_nans(masked_array)
result = array_masked_to_nans(masked_array).data

self.assertIsInstance(result, np.ndarray)
self.assertFalse(isinstance(result, np.ma.MaskedArray))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.ma -> ma

import iris.util
from iris._lazy_data import is_lazy_data, array_masked_to_nans
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marqh Minor point but from iris._lazy ... sorts before alphabetical, and should be the first iris import.

lib/iris/cube.py Outdated
import iris._merge
import iris.exceptions
import iris.util
from iris._lazy_data import is_lazy_data, array_masked_to_nans
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marqh the import order is wrong


def has_lazy_data(self):
return is_lazy_data(self._my_data)
return True if self._numpy_array is None else False
Copy link
Member

@bjlittle bjlittle Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marqh I'd prefer

return self._numpy_array is None

Also, isn't it the case the both cube._numpy_array and cube._dask_array could be populated? If so, are we saying that if that's the case, the cube isn't lazy?

Is there an explicit statement to define whether it's valid for both of these attributes to be populated? Sutely, it's totally invalid for both not to be populated ... do we have checks and balances for that to maintain that integrity within the cube?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, isn't it the case the both cube._numpy_array and cube._dask_array could be populated?
yes, at present this is the case
If so, are we saying that if that's the case, the cube isn't lazy?
yes, it is not lazy
if cube._numpy_array exists it is always used in preference

Is there an explicit statement to define whether it's valid for both of these attributes to be populated?
No, there is no explicit statement. It is an open question.
We should answer it. The question here is should we answer it on this PR or raise an issue and answer it?
The utility of having both is the cube._numpy_array may be set to None, reverting the cube back to it's initally loaded state, with a file proxy. I don't know how valuable this may be.

Sutely, it's totally invalid for both not to be populated ...

Yes, that is invalid and exceptions are thrown

do we have checks and balances for that to maintain that integrity within the cube?

there are some, are there enough?

@marqh
Copy link
Member Author

marqh commented Feb 23, 2017

@bjlittle

ok, further review actions pushed. hopefully i've maintained test success; we shall see

please squash and merge at will once travis is content

@bjlittle
Copy link
Member

@marqh Agreed.

If the tests pass, then let's bank this PR. It's a feature branch and we need to move forwards 👍

@bjlittle
Copy link
Member

bjlittle commented Feb 23, 2017

Ping @marqh you still have a couple of tests referencing the now nuked _data_cache in PP land ...

@marqh
Copy link
Member Author

marqh commented Feb 23, 2017

Ping @marqh you still have a couple of tests referencing the now nuked _data_cache ...

doh
trying again

@marqh
Copy link
Member Author

marqh commented Feb 23, 2017

@bjlittle
it has gone green
...

@bjlittle bjlittle merged commit b71bd27 into SciTools:dask Feb 23, 2017
@DPeterK
Copy link
Member

DPeterK commented Feb 23, 2017

Woop! 🎉 🍰 🍷

marqh added a commit to marqh/iris that referenced this pull request Feb 23, 2017
Core refactor for dask usage
@@ -47,6 +47,8 @@ def _test_coord(self, cube, point, bounds=None, **kwargs):
if bounds is not None:
self.assertArrayEqual(coords[0].bounds, [bounds])

# hits a segfault, very odd
Copy link
Member

@pp-mo pp-mo Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hits a segfault

I think these may be fixed by : #2395
Short version: since we changed our test for lazy data, a simple Mock tests as lazy instead of real.
Then np.ma.asarray() can hit an endless loop (whereas np.asarray was ok).

marqh added a commit to marqh/iris that referenced this pull request Feb 24, 2017
Core refactor for dask usage
# The proxy supplies nan filled arrays and caches data.
data = self._data[...]
if data.dtype.kind == 'i' and self.bmdi == -1e30:
self.bmdi = -9999
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number ?

bjlittle pushed a commit to bjlittle/iris that referenced this pull request May 31, 2017
Core refactor for dask usage
@QuLogic QuLogic modified the milestones: dask, v2.0 Aug 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants