Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantile Remapping Fix #304

Merged
merged 3 commits into from
May 30, 2022
Merged

Quantile Remapping Fix #304

merged 3 commits into from
May 30, 2022

Commits on May 27, 2022

  1. Upgrade DDSketch to 0.1.3

    DDSketch 0.1.3 resolves an issue with the rank calculation that could
    cause the sketch to return the value of neighboring ranks, instead of
    the one requested.
    Dan Wilbanks committed May 27, 2022
    Configuration menu
    Copy the full SHA
    9cfd1f4 View commit details
    Browse the repository at this point in the history
  2. Return self.min()/max() for the 0.0/1.0 quantiles

    The 0.0 and 1.0 quantiles correspond to the minimum and maximum values
    of the distribution. This change returns these values when the 0.0 and
    1.0 quantiles are requested.
    
    One benefit of doing so is that the minimum and maximum are the exact
    values for these quantiles, rather than estimates. Another benefit is
    that the 0.0 quantile can be considered a special case, since for any
    quantile, we return the maximum value below that quantile. Since there
    is no value below the 0.0 quantile, we instead return the minimum.
    Dan Wilbanks committed May 27, 2022
    Configuration menu
    Copy the full SHA
    d215b30 View commit details
    Browse the repository at this point in the history
  3. Fix rank calculation in Summary::quantile()

    - Changes the rank calculation in Summary::quantile() to 'q * total'
    from 'floor(q * (total - 1))'. The 'floor' in the previous calculation
    was implicit due to the cast to u64.
    
    - Changes the condition for mapping to the 'negative', 'positive', or
    'zeroes' sketch to be '<=' instead of '<'.
    
    - Defers rounding the quantile to the nearest rank to the underlying
    sketch. This change is not necessary for correctness, but could be
    desireable. For example, deferring this operation could allow the
    underlying sketch to perform interpolation.
    
    - Fixes incorrect assertions in test_basics(), which pass with the
    changes to the rank calculation and quantile mapping.
    
    The previous method could incorrectly map a request for a value at a
    particular rank to the adjacent rank in some cases. Consider a
    distribution with the values {0.0, 1.0}. In this case, any request for a
    quantile < 1.0 would map to the 'zeroes' sketch. for example, the rank
    for q = 0.9 is computed as 'floor(0.9 * (2 - 1)) = floor(0.9) = 0'.
    This would map to the 'zeroes' sketch while it should instead map to a
    request for q = 0.8 in the positive sketch. The new method would give us
    a rank of '0.9 * 2 = 1.8' which correctly maps to 0.8 in the positive
    sketch.
    
    The condition for mapping to the appropriate sketch now uses the '<='
    operator instead of the '<' operator. Consider a request for q = 0.5,
    with the distribution {-1.0, 1.0}. The '<=' operator ensures that this
    request is mapped to q = 0.0 in the negative sketch (would be 1.0, but
    we flip the distribution) rather than q = 0.0 in the positive sketch.
    
    The mapping for q = 0.0 was handled as a special case in a prior commit
    to allow for the updated mapping, as q = 0.0 would otherwise always map
    to the negative sketch, whether or not the negative sketch had any
    values.
    
    Both of the cases described above are now covered (with a different
    distribution) in test_basics().
    Dan Wilbanks committed May 27, 2022
    Configuration menu
    Copy the full SHA
    5274708 View commit details
    Browse the repository at this point in the history