Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle missing values in clamp #31066

Merged
merged 1 commit into from
Jun 11, 2019
Merged

Handle missing values in clamp #31066

merged 1 commit into from
Jun 11, 2019

Conversation

ararslan
Copy link
Member

This adds support for clamping missing values. The existing clamp code errors on missing as it uses ifelse on comparisons that can return missing. This came up in some private code.

@ararslan ararslan added maths Mathematical functions missing data Base.missing and related functionality labels Feb 14, 2019
@ararslan ararslan requested a review from nalimilan February 14, 2019 01:13
test/math.jl Outdated Show resolved Hide resolved
@oxinabox
Copy link
Contributor

I feel we should correctly handle Missings in all positions,
thus I propose the following testset

@test ismissing(clamp(missing, 1, 100))
@test ismissing(clamp(51, missing, 100))
@test ismissing(clamp(51, 10, missing))
@test ismissing(clamp(51, missing, missing))
@test ismissing(clamp(missing, missing, missing))
@test clamp(1, 52, missing) == 52
@test clamp(1000, missing, 101) == 101

In 1.0.3 they all error

@omus
Copy link
Member

omus commented Feb 14, 2019

@KristofferC are your 👎 reactions due to this being another case of extending a function just to support missing values?

@KristofferC
Copy link
Member

KristofferC commented Feb 14, 2019

Yes, especially seeing the suggestion for all the permutations of the arguments.

@StefanKarpinski
Copy link
Member

Man, we really need a generic way to handle this kind of thing instead of people wanting to add all the methods for missing.

@JeffBezanson
Copy link
Member

This is an interesting example of why it's hard to do generically though. One missing argument doesn't necessarily imply the result has to be missing.

@ararslan
Copy link
Member Author

I feel we should correctly handle Missings in all positions

I purposefully left that out due to the ambiguity. If the value you're trying to clamp is missing, the result should unambiguously be missing. But does it make sense to try to clamp a non-missing value between something you know and something you don't know? You could go by how missing sorts (as greater than all values) and treat it like an infinite upper bound, or you could say "I don't know what this is" and return missing. The lack of a clear answer there to me means that it should be a case where the user needs to ensure their bounds are non-missing, even if the value to be clamped is not.

@nalimilan
Copy link
Member

nalimilan commented Feb 14, 2019

You could go by how missing sorts (as greater than all values) and treat it like an infinite upper bound, or you could say "I don't know what this is" and return missing.

I don't think that would be consistent with the way we treat missing everywhere else. We've been pretty clear that missing should propagate, and not be treated as an infinite upper bound. sort is quite specific since we have to put missing values somewhere (they "propagate" in some sense since they aren't lost).

But it sounds fine to only define methods for the first argument. I just wanted to point that it would make sense to always return missing in general if one of the arguments is missing (unless a method has been defined to override this default).

@nickrobinson251
Copy link
Contributor

Bump. This seems useful behaviour (the simple "clamping a missing is a missing" case the PR adds) :)

@oxinabox
Copy link
Contributor

oxinabox commented Jun 5, 2019

Yes, we should do this, then open an issue to discuss the other cases I rose

@ararslan ararslan force-pushed the aa/clamp-missing branch from 0513b19 to 50ae2d8 Compare June 6, 2019 19:24
@ararslan
Copy link
Member Author

ararslan commented Jun 6, 2019

Rebased.

@bkamins
Copy link
Member

bkamins commented Jun 9, 2019

Two small points:

  1. what with the case lo==hi (except that then it would not be type stable if we said we know the value, but it would be a small union)
  2. for a general case passmissing from Missings.jl can be used

@ararslan
Copy link
Member Author

ararslan commented Jun 9, 2019

  1. If lo == hi but the value to clamp is still missing, I would think the result should unambiguously be missing. If you're attempting to clamp a missing value to one particular value, you can't know what the missing would have represented were it not missing, so you can't know whether the provided bounds are even within the correct domain.

  2. Yep, good call. One downside of passmissing is that some functions should only return a missing value if one particular argument is missing, not if any are missing. An example that comes to mind is norm, for which norm(missing) should be missing, but norm(x, missing) should be an error. That's an aside though.

@nalimilan
Copy link
Member

I agree clamp should propagate missing even when lo==hi. In particular, clamp(NaN, 0.0, 0.0) returns NaN, not 0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants