Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opt-in support for metric overflow attribute #4737

Conversation

utpilla
Copy link
Contributor

@utpilla utpilla commented Aug 2, 2023

Add opt-in support for metric overflow attribute: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#overflow-attribute

For example:

If we have an SDK configuration where the max MetricPoint limit is four and the application emits the following measurements using a Counter:

  • red, apple: 6
  • yellow, lemon: 7
  • green, apple: 2
  • red, cherries: 4
  • green, kiwi: 6

As of today, the SDK would:

  • Reserve one MetricPoint for measurements with zero tags
  • Use the other three MetricPoints for every other measurement with a unique combination of tags

Therefore, we would only pick the first three unique measurements and drop (red, cherries) and (green, kiwi). Note that there is no way to know the total number of fruits sold as we have dropped the last two MetricPoints.

With the changes in this PR, the SDK would:

  • Reserve one MetricPoint for measurements with zero tags
  • Reserve one MetricPoint for the overflow attribute
  • Use the other two MetricPoints for every other measurement with a unique combination of tags

In this case, we would only pick the first two unique measurements, however we would not entirely drop the other three measurements. (green,apple), (red, cherries) and (green, kiwi) would all be aggregated using the new MetricPoint with the overflow attribute.

There are two advantages of doing this:

  1. It provides an easy way to check if the SDK dropped any measurements. The presence of otel.metric.overflow: true when querying a metric indicates that the SDK could not aggregate some number of measurements under their own MetricPoint. As of today, one would have to check the SDK internal logs for this information.
  2. It preserves partial information about the measurements that would have otherwise been dropped. In this particular example, if a user queries for the metric sum across all tag combinations, they will have the true count of the total number of fruits sold: 6 + 7+ 12 (sum of overflow measurements) = 25. In case of Histogram, they would also know the count of measurements that could not be aggregated under their own MetricPoint.

Changes

  • Use metric.overflow.attribute to record metric points that are emitted after the max MetricPoints limit is reached. One MetricPoint in the MetricPoint array would be reserved for this.
  • Users would enable this behavior by setting the environment variable OTEL_DOTNET_EXPERIMENTAL_METRICS_EMIT_OVERFLOW_ATTRIBUTE to true
  • Updated MetricAPITests, MetricSnapshotTests, and AggregatorTests to also run with this behavior enabled

Please provide a brief description of the changes here.

Merge requirement checklist

  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes

@utpilla utpilla requested a review from a team August 2, 2023 23:10
@codecov
Copy link

codecov bot commented Aug 2, 2023

Codecov Report

Merging #4737 (b659be7) into main (b0ab8d5) will increase coverage by 0.03%.
Report is 2 commits behind head on main.
The diff coverage is 60.34%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4737      +/-   ##
==========================================
+ Coverage   82.05%   82.08%   +0.03%     
==========================================
  Files         313      313              
  Lines       12742    12784      +42     
==========================================
+ Hits        10455    10494      +39     
- Misses       2287     2290       +3     
Files Changed Coverage Δ
src/OpenTelemetry/Metrics/AggregatorStore.cs 81.81% <48.78%> (-3.60%) ⬇️
src/Shared/Options/ConfigurationExtensions.cs 96.00% <71.42%> (-4.00%) ⬇️
src/OpenTelemetry/Metrics/MeterProviderSdk.cs 91.66% <100.00%> (+0.04%) ⬆️
src/OpenTelemetry/Metrics/Metric.cs 95.78% <100.00%> (+0.04%) ⬆️
src/OpenTelemetry/Metrics/MetricReaderExt.cs 87.17% <100.00%> (+0.45%) ⬆️

... and 5 files with indirect coverage changes

@vishweshbankwar
Copy link
Member

Need changelog and readme update (can be a separate PR)

Copy link
Member

@alanwest alanwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utpilla I think this is a really interesting proposal! Are you considering proposing this in the specification? I think it might be a good "official" addition and enable us to remove the feature flag in the future.

Given this PR is also proposing a new way for us to feature flag experimental stuff... I have some thoughts about this AppContext mechanic.

src/OpenTelemetry/Metrics/AggregatorStore.cs Outdated Show resolved Hide resolved
Co-authored-by: Cijo Thomas <cithomas@microsoft.com>
@utpilla utpilla merged commit 10a8989 into open-telemetry:main Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants