-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask merge concat fill value #2520
Conversation
Ping @djkirkham and @lbdreyer ... 😄 |
# Allow fill-value promotion if either fill-value is None. | ||
if self.fill_value is not None and other.fill_value is not None: | ||
msg = 'cube fill value differs: {} != {}' | ||
msgs.append(msg.format(self.fill_value, other.fill_value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to allow for merging and concatenating cubes with different fill values?
I realise this is not a simple question so I can call you to discuss this if that's easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbdreyer This is more in alignment with the behaviour of upstream/master
, which doesn't discriminate based on fill_value
. So no breaking or surprising behaviour for the end user.
It also aligns with the behaviour of numpy
, which will merge/concatenate arrays with different fill_values
, but the resultant array has a fill_value
that is replaced with the numpy
nominated default for that dtype
.
With regards to cubes, we're doing the same now in this PR - if we have two cubes that only differ on fill_value
, say one is 1234
and the other is 4321
, then they will merge/concatenate into one cube, but the resultant cube will have a cube.fill_value
of None
. When the user does a cube.data
to get the data, the resulting masked array (assuming the payload realizes as masked) will have a fill_value
that is the numpy
nominated default - note that, the cube.fill_value
remains equal to None
in this case ... until the user changes it via cube.fill_value = fill_value
, or cube.replace(..., fill_value=fill_value)
np_fill_value = ma.masked_array(0, dtype=result.dtype).fill_value | ||
self.assertEqual(result.data.fill_value, np_fill_value) | ||
|
||
def test_fill_value_invariant_to_order__different_non_None(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than testing the different order, is this much different to test_concat_masked_2y2d__default_fill_value_from_diff?
I don't understand what you gain from doing all the permutations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I understand that this concatenates cubes with x's like
concat [0, 1] [2, 3] [4, 5] => [0, 1, 2, 3, 4, 5]
concat [0, 1] [4, 5] [2, 3] => [0, 1, 2, 3, 4, 5]
concat [4, 5] [0, 1] 2, 3] => [0, 1, 2, 3, 4, 5]
and checking the fill value is correct after that
@@ -748,6 +730,49 @@ def test_concat_2x2d_aux_xy_bounds(self): | |||
self.assertEqual(len(result), 1) | |||
self.assertEqual(result[0].shape, (2, 4)) | |||
|
|||
def test_fill_value_invariant_to_order__same_non_None(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test naming was a bit confusing to me. I only understood what __same_non_None
meant when I started writing a comment asking you what it is.
Naming tests is verrry difficult so I am happy with it as is, but I am going to try to think of a different name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, test naming ... ugh ... you are right, I'll don my thinking cap 🤠
lib/iris/tests/test_concatenate.py
Outdated
emsg = 'Fill values differ' | ||
with self.assertRaisesRegexp(iris.exceptions.ConcatenateError, emsg): | ||
cubes.concatenate_cube() | ||
|
||
|
||
class Test2D(tests.IrisTest): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You never test the fill value when all inputs that get concatenated are unmasked. You could always add a quick check to the test on L535
lib/iris/tests/test_merge.py
Outdated
return result | ||
|
||
def _check_fill_value(self, result, fill0, fill1): | ||
fill_value = self._expected_fill(fill0, fill1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to be expected_fill_value
or expected_cube_fill_value
?
May make it easier to follow
[False, True]) | ||
fill_combos = itertools.product([None, fill_value], | ||
[fill_value, None]) | ||
self.combos = itertools.product(lazy_combos, fill_combos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it too complicated to have another itertools.product
for ndarray and masked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm a wee bit wary of folding that in, as it seems like we'll end up with a single test that does absolutely everything and will be a tad impenetrable ... is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! makes sense
This looks very close! The testing of concatenate isn't less complete than the testing of merge, but there are endless combinations (masked/unmasked, fill_value_same/fill_value_diff/fill_value_None, lazy/real etc) that we can test for... I want to go through some possible important combinations that we are missing. |
I've looked into testing of concatenate a little more and ignoring the endless combinations we could test I think there are two missing combinations that we should at least consider.
|
@lbdreyer Okay, so I've addressed your review actions in the simplest way possible. Adding a concatenate equivalent to the merge |
lib/iris/_concatenate.py
Outdated
if cube_signature.fill_value is None or \ | ||
cube_signature.fill_value != fill_value: | ||
# Demote the fill value to the default. | ||
self._cube_signature.fill_value = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified to
if cube_signature.fill_value != self._cube_signature.fill_value:
self._cube_signature.fill_value = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, also applies to _merge.py
👍
lib/iris/_merge.py
Outdated
if other.fill_value is None or \ | ||
cube_signature.fill_value != other.fill_value: | ||
# Demote the fill value to the default. | ||
signature = self._build_signature(self._source, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a call to _build_signature
here, but in the concatenate case you can just set fill_value
to None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One is a class instance (writable) the other is a specialization of a namedtuple
(read-only)
self.assertEqual(result[0].data.fill_value, fill_value) | ||
self.assertIsNone(result[0].fill_value) | ||
np_fill_value = ma.masked_array(0, dtype=result[0].dtype).fill_value | ||
self.assertEqual(result[0].data.fill_value, np_fill_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we need this check. I think perhaps it should be part of the contract that if the input fill values don't match then the fill value you get out on the array shouldn't be relied upon to have any particular value. That said, we are making an effort to make sure the correct fill value is placed on the data array when it is valid, so I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it's just a fill_value
, something, anything ... but a consistent, reliable fill_value
, and what numpy
offers seems reasonable (and predictable) given the case where cube.fill_value
is None
The follow on point for me is whether a warning should be issues to notify the user that the fill_value
in the masked array returned by the cube.data
getter is a default numpy
fill_value
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy either way on both points to be honest
self.assertIsNone(result.fill_value) | ||
np_fill_value = ma.masked_array(0, dtype=result.dtype).fill_value | ||
self.assertEqual(result.data.fill_value, np_fill_value) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice test coverage!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try 😉
At least we're trying to join the dots on what's happening here ... so if it changes, a whole bunch of tests are now gonna complain .... which is a good thing!
lib/iris/tests/test_merge.py
Outdated
|
||
def _check_fill_value(self, result, fill0, fill1): | ||
fill_value = self._expected_fill_value(fill0, fill1) | ||
if fill_value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I said I would prefer a rename I meant the variable, not the method,
i.e. expected_fill_value = self._expected_fill(fill0, fill1)
and then the self.assertEqual checks would be:
self.assertEqual(data.fill_value, expected_fill_value)
@bjlittle if you address my last comment I will happily merge this in |
@lbdreyer Thanks 👍 |
* Fix concatenate fill value. * Fix merge fill value. * Added merge invariant tests. * Added concatenate invariant tests. * Review actions. * Simplify merge/concatenate match condition. * expected_fill_value
This PR refines the approach taken with handling the
cube.fill_value
by merge and concatenate.This PR extends the expected behaviour of
numpy
with regards to how it manages thefill_value
when combining masked arrays and/or ndarrays to cubes (it is valid for a cube to have afill_value
even though it has a ndarray payload).In summary, when combining cubes via merge or concatenate, the resultant cube will have:
fill_value
that is non-None
iff all cubes have the same non-None
fill_value
fill_value
that isNone
, if:fill_value
ofNone
, orfill_value
This simply means that,
cube.fill_value
is set to a non-None
value, and the cube has a masked payload, then the masked array returned bycube.data
will have afill_value
that matches thecube.fill_value
.cube.fill_value
is set toNone
, and the cube has a masked payload, then the masked array returned bycube.data
will have the defaultfill_value
generated bynumpy
for that specificdtype
.