[OTEL-2348] Improve DDSketch to Sketch conversion #468

jade-guiton-dd · 2025-01-17T15:14:02Z

What does this PR do?

This PR modifies the algorithm used to round bin counts to integers as part of the conversion from DDSketch to Sketch (pkg/quantile/ddsketch.go).

Motivation

Some test cases in TestConvertDDSketchIntoSketch were failing on main, but only when running locally on an ARM device. This test computes the percentiles of a distribution before and after the DDSketch to Sketch conversion, and checks that the relative error introduced is below some bound. The failures turned out to be because the rounding algorithm used as part of the conversion is somewhat numerically unstable.

The algorithm rounds bin counts down to an integer, but keeps track of the accumulated rounding error, and adds 1 extra unit to the current bin once it reaches 1.0. Unfortunately, because of floating point imprecision, this accumulated error can end up being 0.99999 in cases where it should be 1.0, delaying the addition of the extra unit until the next bin with a non-integer count, which can be far away from the logical range the extra unit belongs to. Depending on the distribution, this transfer of a single unit between distant bins can end up significantly changing the quantiles.

The test already skipped the comparison for the P99 of certain distributions because of this issue. However, ARM CPUs have slight differences in floating-point behavior from x86, this issue was triggered in cases that were not already skipped.

To solve this issue at the source, I modified the rounding algorithm so that it inserts the extra units determined from accumulated rounding error when said error reaches 0.5, instead of 1.0. This is done differently from the original algorithm, by tracking the sum of the input float counts and the output integer counts, and using math.Round to compute the difference. This adds the extra unit somewhere in the middle of its logical range of bins, instead of at or past the end, and should not present the same numerical instability, at least in the common case where the input DDSketch has integer counts.

Changes to tests

Switching to this algorithm fixes the tests that were originally failing on ARM, and also allows us to remove the exceptions that were previously made in the test.

Because this PR changes the exact output of the algorithm in a lot of cases, I updated the test files for TestExponentialHistogramTranslatorOptions, which expects the output bin counts to have very precise values.

I also increased the fineness of the distribution generated in TestKnownDistributionsQuantile to let the test pass, and corrected the name of a few test cases.

I also replaced the formula for the expected error bound in TestConvertDDSketchIntoSketch. The current one was copy-pasted from TestCreateDDSketchWithSketchMapping, even though they compute very different things. I added a comment explaining my reasoning above the new calculation. The old bound was ~0.0233 and the new one is ~0.0315, so it is a bit more permissive, but stays within the 5% relative error expected from the input OTel histograms, as tested by the TestKnownDistributionsQuantile end-to-end test.

codecov-commenter · 2025-01-17T16:19:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.26%. Comparing base (3aaff86) to head (2c5fcb4).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #468      +/-   ##
==========================================
- Coverage   81.33%   81.26%   -0.07%     
==========================================
  Files          48       48              
  Lines        3686     3673      -13     
==========================================
- Hits         2998     2985      -13     
  Misses        606      606              
  Partials       82       82

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jade-guiton-dd added 2 commits January 17, 2025 16:00

Improve stability of DDSketch to Sketch conversion

0f9de49

Add changelog entry

2c5fcb4

jade-guiton-dd marked this pull request as ready for review January 17, 2025 16:49

jade-guiton-dd requested a review from a team as a code owner January 17, 2025 16:49

jade-guiton-dd requested a review from IbraheemA January 17, 2025 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OTEL-2348] Improve DDSketch to Sketch conversion #468

[OTEL-2348] Improve DDSketch to Sketch conversion #468

jade-guiton-dd commented Jan 17, 2025 •

edited

Loading

codecov-commenter commented Jan 17, 2025

[OTEL-2348] Improve DDSketch to Sketch conversion #468

Are you sure you want to change the base?

[OTEL-2348] Improve DDSketch to Sketch conversion #468

Conversation

jade-guiton-dd commented Jan 17, 2025 • edited Loading

What does this PR do?

Motivation

Changes to tests

codecov-commenter commented Jan 17, 2025

Codecov Report

jade-guiton-dd commented Jan 17, 2025 •

edited

Loading