Dask merge concat fill value #2520

bjlittle · 2017-05-03T12:27:30Z

This PR refines the approach taken with handling the cube.fill_value by merge and concatenate.

This PR extends the expected behaviour of numpy with regards to how it manages the fill_value when combining masked arrays and/or ndarrays to cubes (it is valid for a cube to have a fill_value even though it has a ndarray payload).

In summary, when combining cubes via merge or concatenate, the resultant cube will have:

a fill_value that is non-None iff all cubes have the same non-None fill_value
a fill_value that is None, if:
- at least one cube has a fill_value of None, or
- the candidate cube and the proto-cube have a different fill_value

This simply means that,

if the cube.fill_value is set to a non-None value, and the cube has a masked payload, then the masked array returned by cube.data will have a fill_value that matches the cube.fill_value.
if the cube.fill_value is set to None, and the cube has a masked payload, then the masked array returned by cube.data will have the default fill_value generated by numpy for that specific dtype.

bjlittle · 2017-05-03T13:08:21Z

Ping @djkirkham and @lbdreyer ... 😄

lbdreyer · 2017-05-03T15:02:42Z

lib/iris/_merge.py

-            # Allow fill-value promotion if either fill-value is None.
-            if self.fill_value is not None and other.fill_value is not None:
-                msg = 'cube fill value differs: {} != {}'
-                msgs.append(msg.format(self.fill_value, other.fill_value))


Why did you choose to allow for merging and concatenating cubes with different fill values?
I realise this is not a simple question so I can call you to discuss this if that's easier?

@lbdreyer This is more in alignment with the behaviour of upstream/master, which doesn't discriminate based on fill_value. So no breaking or surprising behaviour for the end user.

It also aligns with the behaviour of numpy, which will merge/concatenate arrays with different fill_values, but the resultant array has a fill_value that is replaced with the numpy nominated default for that dtype.

With regards to cubes, we're doing the same now in this PR - if we have two cubes that only differ on fill_value, say one is 1234 and the other is 4321, then they will merge/concatenate into one cube, but the resultant cube will have a cube.fill_value of None. When the user does a cube.data to get the data, the resulting masked array (assuming the payload realizes as masked) will have a fill_value that is the numpy nominated default - note that, the cube.fill_value remains equal to None in this case ... until the user changes it via cube.fill_value = fill_value, or cube.replace(..., fill_value=fill_value)

lbdreyer · 2017-05-03T16:12:45Z

lib/iris/tests/test_concatenate.py

+            np_fill_value = ma.masked_array(0, dtype=result.dtype).fill_value
+            self.assertEqual(result.data.fill_value, np_fill_value)
+
+    def test_fill_value_invariant_to_order__different_non_None(self):


Other than testing the different order, is this much different to test_concat_masked_2y2d__default_fill_value_from_diff?

I don't understand what you gain from doing all the permutations?

Okay I understand that this concatenates cubes with x's like

concat [0, 1] [2, 3] [4, 5] => [0, 1, 2, 3, 4, 5] concat [0, 1] [4, 5] [2, 3] => [0, 1, 2, 3, 4, 5] concat [4, 5] [0, 1] 2, 3] => [0, 1, 2, 3, 4, 5]

and checking the fill value is correct after that

lbdreyer · 2017-05-03T17:12:37Z

lib/iris/tests/test_concatenate.py

@@ -748,6 +730,49 @@ def test_concat_2x2d_aux_xy_bounds(self):
        self.assertEqual(len(result), 1)
        self.assertEqual(result[0].shape, (2, 4))

+    def test_fill_value_invariant_to_order__same_non_None(self):


This test naming was a bit confusing to me. I only understood what __same_non_None meant when I started writing a comment asking you what it is.
Naming tests is verrry difficult so I am happy with it as is, but I am going to try to think of a different name

Yeah, test naming ... ugh ... you are right, I'll don my thinking cap 🤠

lbdreyer · 2017-05-03T17:29:38Z

lib/iris/tests/test_concatenate.py

-        emsg = 'Fill values differ'
-        with self.assertRaisesRegexp(iris.exceptions.ConcatenateError, emsg):
-            cubes.concatenate_cube()
-

 class Test2D(tests.IrisTest):


You never test the fill value when all inputs that get concatenated are unmasked. You could always add a quick check to the test on L535

lbdreyer · 2017-05-03T17:36:31Z

lib/iris/tests/test_merge.py

+        return result
+
+    def _check_fill_value(self, result, fill0, fill1):
+        fill_value = self._expected_fill(fill0, fill1)


Rename to be expected_fill_value or expected_cube_fill_value?
May make it easier to follow

lbdreyer · 2017-05-03T17:38:39Z

lib/iris/tests/test_merge.py

+                                        [False, True])
+        fill_combos = itertools.product([None, fill_value],
+                                        [fill_value, None])
+        self.combos = itertools.product(lazy_combos, fill_combos)


Was it too complicated to have another itertools.product for ndarray and masked?

I think I'm a wee bit wary of folding that in, as it seems like we'll end up with a single test that does absolutely everything and will be a tad impenetrable ... is that okay?

Sure! makes sense

lbdreyer · 2017-05-03T17:40:28Z

This looks very close!

The testing of concatenate isn't less complete than the testing of merge, but there are endless combinations (masked/unmasked, fill_value_same/fill_value_diff/fill_value_None, lazy/real etc) that we can test for... I want to go through some possible important combinations that we are missing.
but otherwise I'm happy with the changes.

lbdreyer · 2017-05-04T10:03:53Z

I've looked into testing of concatenate a little more and ignoring the endless combinations we could test I think there are two missing combinations that we should at least consider.

There doesn't appear to be a test for concatenating cubes that are masked and unmasked where both have a fill value of None.
Also there isn't testing of fil value when both cubes are unmasked.

bjlittle · 2017-05-04T11:00:21Z

@lbdreyer Okay, so I've addressed your review actions in the simplest way possible. Adding a concatenate equivalent to the merge TestDataMergeCombos isn't really practical and would involve quite a bit of effort, so I've added the missing test coverage through new tests and adding to existing tests.

djkirkham · 2017-05-04T11:02:33Z

lib/iris/_concatenate.py

+                if cube_signature.fill_value is None or \
+                        cube_signature.fill_value != fill_value:
+                    # Demote the fill value to the default.
+                    self._cube_signature.fill_value = None


This can be simplified to

if cube_signature.fill_value != self._cube_signature.fill_value: self._cube_signature.fill_value = None

Yup, also applies to _merge.py 👍

djkirkham · 2017-05-04T11:05:09Z

lib/iris/_merge.py

+                if other.fill_value is None or \
+                        cube_signature.fill_value != other.fill_value:
+                    # Demote the fill value to the default.
+                    signature = self._build_signature(self._source,


Why do we need a call to _build_signature here, but in the concatenate case you can just set fill_value to None?

One is a class instance (writable) the other is a specialization of a namedtuple (read-only)

djkirkham · 2017-05-04T11:55:41Z

lib/iris/tests/test_concatenate.py

-        self.assertEqual(result[0].data.fill_value, fill_value)
+        self.assertIsNone(result[0].fill_value)
+        np_fill_value = ma.masked_array(0, dtype=result[0].dtype).fill_value
+        self.assertEqual(result[0].data.fill_value, np_fill_value)


I'm wondering if we need this check. I think perhaps it should be part of the contract that if the input fill values don't match then the fill value you get out on the array shouldn't be relied upon to have any particular value. That said, we are making an effort to make sure the correct fill value is placed on the data array when it is valid, so I'm not sure.

For me it's just a fill_value, something, anything ... but a consistent, reliable fill_value, and what numpy offers seems reasonable (and predictable) given the case where cube.fill_value is None

The follow on point for me is whether a warning should be issues to notify the user that the fill_value in the masked array returned by the cube.data getter is a default numpy fill_value ...

I'd be happy either way on both points to be honest

djkirkham · 2017-05-04T12:13:44Z

lib/iris/tests/test_concatenate.py

+            self.assertIsNone(result.fill_value)
+            np_fill_value = ma.masked_array(0, dtype=result.dtype).fill_value
+            self.assertEqual(result.data.fill_value, np_fill_value)
+


Nice test coverage!

I try 😉

At least we're trying to join the dots on what's happening here ... so if it changes, a whole bunch of tests are now gonna complain .... which is a good thing!

lbdreyer · 2017-05-04T13:39:46Z

lib/iris/tests/test_merge.py

+
+    def _check_fill_value(self, result, fill0, fill1):
+        fill_value = self._expected_fill_value(fill0, fill1)
+        if fill_value is None:


When I said I would prefer a rename I meant the variable, not the method,
i.e. expected_fill_value = self._expected_fill(fill0, fill1)
and then the self.assertEqual checks would be:

self.assertEqual(data.fill_value, expected_fill_value)

lbdreyer · 2017-05-04T13:41:20Z

@bjlittle if you address my last comment I will happily merge this in

bjlittle · 2017-05-04T14:04:05Z

@lbdreyer Thanks 👍

* Fix concatenate fill value. * Fix merge fill value. * Added merge invariant tests. * Added concatenate invariant tests. * Review actions. * Simplify merge/concatenate match condition. * expected_fill_value

bjlittle added 4 commits May 3, 2017 10:50

Fix concatenate fill value.

74e8290

Fix merge fill value.

703a49a

Added merge invariant tests.

f6a46aa

Added concatenate invariant tests.

fb0a3a5

bjlittle added the dask label May 3, 2017

bjlittle added this to the dask milestone May 3, 2017

bjlittle added the Status: Work in Progress label May 3, 2017

bjlittle requested a review from djkirkham May 3, 2017 12:28

bjlittle mentioned this pull request May 3, 2017

Outstanding - Integrate DataManager with cube #2516

Closed

5 tasks

lbdreyer reviewed May 3, 2017

View reviewed changes

Review actions.

e222205

djkirkham reviewed May 4, 2017

View reviewed changes

Simplify merge/concatenate match condition.

d44458d

lbdreyer reviewed May 4, 2017

View reviewed changes

expected_fill_value

414c362

lbdreyer merged commit 25a7615 into SciTools:dask May 4, 2017

QuLogic removed the Status: Work in Progress label May 4, 2017

bjlittle deleted the dask-merge-concat-fill-value branch May 4, 2017 14:05

bjlittle mentioned this pull request May 4, 2017

Handle MaskedConstant in cube maths #2526

Merged

QuLogic modified the milestones: dask, v2.0 Aug 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask merge concat fill value #2520

Dask merge concat fill value #2520

bjlittle commented May 3, 2017 •

edited

Loading

bjlittle commented May 3, 2017

lbdreyer May 3, 2017

bjlittle May 3, 2017 •

edited

Loading

lbdreyer May 3, 2017

lbdreyer May 3, 2017

lbdreyer May 3, 2017 •

edited

Loading

bjlittle May 4, 2017 •

edited

Loading

lbdreyer May 3, 2017

lbdreyer May 3, 2017

lbdreyer May 3, 2017

bjlittle May 4, 2017

lbdreyer May 4, 2017

lbdreyer commented May 3, 2017

lbdreyer commented May 4, 2017

bjlittle commented May 4, 2017

djkirkham May 4, 2017

bjlittle May 4, 2017

djkirkham May 4, 2017

bjlittle May 4, 2017 •

edited

Loading

djkirkham May 4, 2017

bjlittle May 4, 2017 •

edited

Loading

djkirkham May 4, 2017

djkirkham May 4, 2017

bjlittle May 4, 2017

lbdreyer May 4, 2017

lbdreyer commented May 4, 2017 •

edited

Loading

bjlittle commented May 4, 2017

Dask merge concat fill value #2520

Dask merge concat fill value #2520

Conversation

bjlittle commented May 3, 2017 • edited Loading

bjlittle commented May 3, 2017

Choose a reason for hiding this comment

bjlittle May 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbdreyer May 3, 2017 • edited Loading

Choose a reason for hiding this comment

bjlittle May 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbdreyer commented May 3, 2017

lbdreyer commented May 4, 2017

bjlittle commented May 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjlittle May 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjlittle May 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbdreyer commented May 4, 2017 • edited Loading

bjlittle commented May 4, 2017

bjlittle commented May 3, 2017 •

edited

Loading

bjlittle May 3, 2017 •

edited

Loading

lbdreyer May 3, 2017 •

edited

Loading

bjlittle May 4, 2017 •

edited

Loading

bjlittle May 4, 2017 •

edited

Loading

bjlittle May 4, 2017 •

edited

Loading

lbdreyer commented May 4, 2017 •

edited

Loading