Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OTEL-2348] Improve DDSketch to Sketch conversion #468

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jade-guiton-dd
Copy link

@jade-guiton-dd jade-guiton-dd commented Jan 17, 2025

What does this PR do?

This PR modifies the algorithm used to round bin counts to integers as part of the conversion from DDSketch to Sketch (pkg/quantile/ddsketch.go).

Motivation

Some test cases in TestConvertDDSketchIntoSketch were failing on main, but only when running locally on an ARM device. This test computes the percentiles of a distribution before and after the DDSketch to Sketch conversion, and checks that the relative error introduced is below some bound. The failures turned out to be because the rounding algorithm used as part of the conversion is somewhat numerically unstable.

The algorithm rounds bin counts down to an integer, but keeps track of the accumulated rounding error, and adds 1 extra unit to the current bin once it reaches 1.0. Unfortunately, because of floating point imprecision, this accumulated error can end up being 0.99999 in cases where it should be 1.0, delaying the addition of the extra unit until the next bin with a non-integer count, which can be far away from the logical range the extra unit belongs to. Depending on the distribution, this transfer of a single unit between distant bins can end up significantly changing the quantiles.

The test already skipped the comparison for the P99 of certain distributions because of this issue. However, ARM CPUs have slight differences in floating-point behavior from x86, this issue was triggered in cases that were not already skipped.

To solve this issue at the source, I modified the rounding algorithm so that it inserts the extra units determined from accumulated rounding error when said error reaches 0.5, instead of 1.0. This is done differently from the original algorithm, by tracking the sum of the input float counts and the output integer counts, and using math.Round to compute the difference. This adds the extra unit somewhere in the middle of its logical range of bins, instead of at or past the end, and should not present the same numerical instability, at least in the common case where the input DDSketch has integer counts.

Changes to tests

Switching to this algorithm fixes the tests that were originally failing on ARM, and also allows us to remove the exceptions that were previously made in the test.

Because this PR changes the exact output of the algorithm in a lot of cases, I updated the test files for TestExponentialHistogramTranslatorOptions, which expects the output bin counts to have very precise values.

I also increased the fineness of the distribution generated in TestKnownDistributionsQuantile to let the test pass, and corrected the name of a few test cases.

I also replaced the formula for the expected error bound in TestConvertDDSketchIntoSketch. The current one was copy-pasted from TestCreateDDSketchWithSketchMapping, even though they compute very different things. I added a comment explaining my reasoning above the new calculation. The old bound was ~0.0233 and the new one is ~0.0315, so it is a bit more permissive, but stays within the 5% relative error expected from the input OTel histograms, as tested by the TestKnownDistributionsQuantile end-to-end test.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.26%. Comparing base (3aaff86) to head (2c5fcb4).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #468      +/-   ##
==========================================
- Coverage   81.33%   81.26%   -0.07%     
==========================================
  Files          48       48              
  Lines        3686     3673      -13     
==========================================
- Hits         2998     2985      -13     
  Misses        606      606              
  Partials       82       82              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jade-guiton-dd jade-guiton-dd marked this pull request as ready for review January 17, 2025 16:49
@jade-guiton-dd jade-guiton-dd requested a review from a team as a code owner January 17, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants