-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix overflows in quantile
#145
Conversation
The `a + γ*(b-a)` introduced by JuliaLang/julia#16572 has the advantage that it increases with `γ` even when `a` and `b` are very close, but it has the drawback that it is not robust to overflow. This is likely to happen in practice with small integer and floating point types. Conversely, the `(1-γ)*a + γ*b` which is currently used only for non-finite quantities is robust to overflow but may not always increase with `γ` as when `a` and `b` are very close or (more frequently) equal since precision loss can give a slightly smaller value for a larger `γ`. This can be problematic as it breaks an expected invariant. So keep using the `a + γ*(b-a)` formula when `a ≈ b`, in which case it's almost like returning either `a` or `b` but less arbitrary.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #145 +/- ##
==========================================
+ Coverage 96.98% 96.99% +0.01%
==========================================
Files 1 1
Lines 431 433 +2
==========================================
+ Hits 418 420 +2
Misses 13 13
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It required also testing if the function is non-decreasing if we increase b
and switch the formula, but I tested it and it holds.
It's already covered by the test added a long time ago by JuliaLang/julia#16572. That's how I realized the problem. ;-) EDIT: You mean |
In general I mean that it should be monotonic in |
OK. So you mean two tests like this are needed? @test issorted(quantile([1.0, 1.0+eps(), 1.0+2eps(), 1.0+3eps()], range(0, 1, length=100)))
@test issorted(quantile([1.0, 1.0+2eps(), 1.0+4eps(), 1.0+6eps()], range(0, 1, length=100))) |
Yes - something like this (this is not strictly needed 😄, but I run such tests and they were OK). |
Before #145 `Date` and `DateTime` were supported with `quantile` as long as the cut point falls between two equal values. Restore this behavior as some code may rely on this given that it is the most common situation with large datasets.
Before #145 `Date` and `DateTime` were supported with `quantile` as long as the cut point falls between two equal values. Restore this behavior as some code may rely on this given that it is the most common situation with large datasets.
The
a + γ*(b-a)
introduced by JuliaLang/julia#16572 has the advantage that it increases withγ
even whena
andb
are very close, but it has the drawback that it is not robust to overflow. This is likely to happen in practice with small integer and floating point types.Conversely, the
(1-γ)*a + γ*b
which is currently used only for non-finite quantities is robust to overflow but may not always increase withγ
as whena
andb
are very close or (more frequently) equal since precision loss can give a slightly smaller value for a largerγ
. This can be problematic as it breaks an expected invariant.So keep using the
a + γ*(b-a)
formula whena ≈ b
, in which case it's almost like returning eithera
orb
but less arbitrary.Fixes #144.