Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate cube.fill_value thru cube.data setter. #2524

Merged

Conversation

bjlittle
Copy link
Member

@bjlittle bjlittle commented May 4, 2017

This PR is a follow-on action from #2492 and detailed in #2516 and allows the cube.fill_value to be set from the fill_value of concrete masked data being passed to cube.data setter.

All of the CML changes are due to either tests directly setting cube.data with a masked array, or tests that use testing stock cubes that set cube.data directly with a masked array ... thus populating the cube.fill_value and thus bleeding thru to the CML. This is the behaviour that we want to see.

If data passed to the cube.data setter is not masked data, then the cube.fill_value will be clear, as before.

This PR also neatly completes the circle with fill_value and masked arrays wrt the cube, in that the cube.data getter will set the fill_value of any masked array returned to the user with the cube.fill_value, and now this PR allows the fill_value of a masked array to set the cube.fill_value through the cube.data setter.

@bjlittle bjlittle added the dask label May 4, 2017
@bjlittle bjlittle added this to the dask milestone May 4, 2017
@bjlittle
Copy link
Member Author

bjlittle commented May 4, 2017

Ping @dkillick and @lbdreyer ...

@lbdreyer lbdreyer self-assigned this May 4, 2017
@lbdreyer
Copy link
Member

lbdreyer commented May 4, 2017

If you create a new masked array with a specific fill value, make it lazy, then make it a cube's data
cube.data = lazy_array_that_was_originally_masked_with_a_fill_value
the fill value will get lost, but I guess there's nothing we can do about that and probably not something we need to account for

@lbdreyer
Copy link
Member

lbdreyer commented May 4, 2017

This looks good to me 👍 So unless there are any objections I'm going to merge this in!

cube = Cube(np.array(0))
self.assertIsNone(cube.fill_value)
cube.data = ma.masked_array(1, fill_value=fill_value)
self.assertEqual(cube.fill_value, fill_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't check the data type of the fill value is correct but presumably the testing of that will be in the cube.fill_value setter

Copy link
Member Author

@bjlittle bjlittle May 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is that unit testing elsewhere covers that, and it does 😉

@lbdreyer lbdreyer merged commit fe99864 into SciTools:dask May 4, 2017
@bjlittle
Copy link
Member Author

bjlittle commented May 4, 2017

@lbdreyer Awesome, thanks!

@bjlittle bjlittle deleted the dask-assign-fill-value-thru-cube-data branch May 4, 2017 12:34
@lbdreyer
Copy link
Member

lbdreyer commented May 4, 2017

I'm having second thoughts on this PR...

  1. It has introduced a disparity between cube.fill_value behaviour depending on whether you instantiate a cube with a masked array or add it to a cube already there, i.e.:
>>> data = np.array([1, 2, 3])
>>> masked_data = ma.MaskedArray([1, 2, 3], mask=[True, False, False])
>>> 
>>> cube = Cube(masked_data)
>>> print cube.fill_value
None
>>> 
>>> cube = Cube(data)
>>> print cube.fill_value
None
>>> cube.data = masked_data
>>> print cube.fill_value
999999
  1. I also noticed that it introduce some strange behaviour with one of our test files which has a dtype of unint8
>>> cube = iris.load_cube('iris-test-data/test_data/abfAVHRRBUVI01.1985apra.abf')
>>> print type(cube.data)
<class 'numpy.ma.core.MaskedArray'>
>>> print cube.fill_value
None
>>> cube.data = cube.data.copy(order='C')
>>> print cube.fill_value
63

This is because, regardless of what type of integer the masked array's dtype is, the fill_value of an integer masked array will always be 999999 (:warning:)
The cube's data in the above example is a masked array with a dtype of uint8 so the cube.fill_value setter is changing the default numpy fill value to be 63:

>>> np.array([999999], dtype=np.uint8)
array([63], dtype=uint8)

@bjlittle
Copy link
Member Author

bjlittle commented May 5, 2017

@lbdreyer Nice detective work! Let's discuss a way forward off-line and commit to a follow-on PR 👍

@QuLogic QuLogic modified the milestones: dask, v2.0 Aug 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants