Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sdk-metrics] Turn exemplars on by default in prerelease builds #5545

Conversation

CodeBlanch
Copy link
Member

@CodeBlanch CodeBlanch commented Apr 17, 2024

Changes

  • Set the default ExemplarFilterType to TraceBased in prerelease builds to match spec.

Benchmarks

Counters (SimpleFixedSizeExemplarReservoir)

Using TraceBased WITHOUT an active trace has some cost (we need to check Activity.Current.Recorded in the hot path):

Method AggregationTemporality ExemplarFilterType Mean Cost Increased By
CounterHotPath Cumulative AlwaysOff 10.51 ns
CounterWith1LabelsHotPath Cumulative AlwaysOff 36.37 ns
CounterWith2LabelsHotPath Cumulative AlwaysOff 44.98 ns
CounterWith3LabelsHotPath Cumulative AlwaysOff 61.44 ns
CounterHotPath Cumulative TraceBased 11.29 ns 7.4%
CounterWith1LabelsHotPath Cumulative TraceBased 37.48 ns 3.1%
CounterWith2LabelsHotPath Cumulative TraceBased 46.68 ns 3.8%
CounterWith3LabelsHotPath Cumulative TraceBased 64.76 ns 5.4%

Using TraceBased WITH an active trace has more cost (we need to check Activity.Current.Recorded and do a random-based sample in the hot path):

Method AggregationTemporality ExemplarFilterType Mean Cost Increased By
CounterHotPath Cumulative AlwaysOff 10.44 ns
CounterWith1LabelsHotPath Cumulative AlwaysOff 36.86 ns
CounterWith2LabelsHotPath Cumulative AlwaysOff 45.82 ns
CounterWith3LabelsHotPath Cumulative AlwaysOff 61.02 ns
CounterHotPath Cumulative TraceBased 18.32 ns 75.4%
CounterWith1LabelsHotPath Cumulative TraceBased 46.46 ns 26.0%
CounterWith2LabelsHotPath Cumulative TraceBased 53.52 ns 14.6%
CounterWith3LabelsHotPath Cumulative TraceBased 70.00 ns 14.7%

Histograms (AlignedHistogramBucketExemplarReservoir)

Using TraceBased WITHOUT an active trace is interesting. Sometimes I run it things show faster, sometimes it shows slower, and sometimes it shows mixed. I take this as statistically no difference. The cost of the check for Activity.Current.Recorded is dwarfed by the other work to find the bucket and do all the updating:

Method BoundCount ExemplarFilterType Mean Cost Increased By
HistogramHotPath 10 AlwaysOff 37.37 ns
HistogramWith1LabelHotPath 10 AlwaysOff 65.99 ns
HistogramWith3LabelsHotPath 10 AlwaysOff 110.73 ns
HistogramHotPath 10 TraceBased 36.99 ns less than 3%
HistogramWith1LabelHotPath 10 TraceBased 66.86 ns less than 3%
HistogramWith3LabelsHotPath 10 TraceBased 113.68 ns less than 3%

Using TraceBased WITH an active trace has a lot of cost (we need to check Activity.Current.Recorded and we always update exemplar for every measurement in the hot path):

Method BoundCount ExemplarFilterType Mean Cost Increased By
HistogramHotPath 10 AlwaysOff 39.93 ns
HistogramWith1LabelHotPath 10 AlwaysOff 71.05 ns
HistogramWith3LabelsHotPath 10 AlwaysOff 109.03 ns
HistogramHotPath 10 TraceBased 68.96 ns 72.7%
HistogramWith1LabelHotPath 10 TraceBased 96.48 ns 35.8%
HistogramWith3LabelsHotPath 10 TraceBased 153.84 ns 41.1%

This is an interesting area @cijothomas and I have discussed. The spec says for AlignedHistogramBucketExemplarReservoir always keep the last exemplar seen for a bucket. There's a lot of overriding as a result (wasted cycles). A simple thing to do would be keep only the first exemplar for a given export. Or do something more like SimpleFixedSizeExemplarReservoir where we always keep the first one then randomly decide whether or not to keep subsequent exemplars 🤔

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

@CodeBlanch CodeBlanch added pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package metrics Metrics signal related labels Apr 17, 2024
@CodeBlanch CodeBlanch requested a review from a team April 17, 2024 23:04
@CodeBlanch CodeBlanch mentioned this pull request Apr 17, 2024
6 tasks
Copy link

codecov bot commented Apr 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.61%. Comparing base (6250307) to head (4fc1510).
Report is 187 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5545      +/-   ##
==========================================
+ Coverage   83.38%   85.61%   +2.23%     
==========================================
  Files         297      289       -8     
  Lines       12531    12493      -38     
==========================================
+ Hits        10449    10696     +247     
+ Misses       2082     1797     -285     
Flag Coverage Δ
unittests ?
unittests-Solution-Experimental 85.57% <ø> (?)
unittests-Solution-Stable 85.26% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/OpenTelemetry/Metrics/AggregatorStore.cs 86.96% <ø> (+6.58%) ⬆️
.../Metrics/Builder/MeterProviderBuilderExtensions.cs 98.42% <ø> (-1.58%) ⬇️

... and 76 files with indirect coverage changes

Copy link
Member

@vishweshbankwar vishweshbankwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - left a suggestion for changelog.

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with this change, but I want to see if we can make a NoopExemplarReservoir, and make it as the default for non-histograms (spec is flexible to allow that) in the 1st stable release.

@@ -12,6 +12,11 @@
function when configuring a view (applies to individual metrics).
([#5542](https://github.com/open-telemetry/opentelemetry-dotnet/pull/5542))

* **Experimental (pre-release builds only):** The default `ExemplarFilterType`
on `MeterProvider` is now `ExemplarFilterType.TraceBased` which will enable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the perf implications that the users should be aware of?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait on this. My guess is most users will skip over anything prefixed with **Experimental (pre-release builds only):** in the CHANGELOG. What I think would be more useful is on the final entry where we make everything public for stable builds we can add a link there to something in the docs. Thinking like: Understanding performance implications when sampling Exemplars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to address the changelog/doc later.

I think we still need to know the perf implication of this PR as it serves as a critical input while making decisions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the description with some benchmarks.

Copy link
Contributor

@utpilla utpilla Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also provide the increase in memory consumption per metric since we would now allocate an ExemplarReservoir instance for each MetricPoint? It could be an issue for histogram users with high bucket counts.

@cijothomas cijothomas self-requested a review April 18, 2024 22:27
@cijothomas
Copy link
Member

I am okay with this change, but I want to see if we can make a NoopExemplarReservoir, and make it as the default for non-histograms (spec is flexible to allow that) in the 1st stable release.

After looking at the perf numbers, the overhead is non-trivial. So I recommend to keep it off by default for every metric. Users can opt-in to each metric (using views). (Not many backends/venodors are known to support exemplars.)

@CodeBlanch
Copy link
Member Author

Going to keep exemplars off by default for now based on performance analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics Metrics signal related pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants