-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update cumulative to delta processor. #4444
Update cumulative to delta processor. #4444
Conversation
597f202
to
982ef89
Compare
@hossain-rayhan @bogdandrutu This PR should be ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-feld this is a very large change 👍🏽 would be good to split it.
@bogdandrutu splitting things up is a great suggestion! Would it be OK to split this into multiple commits within this PR or would you prefer to have multiple PRs? Additionally, if you have a preference for how to split this PR up, I would also be happy to follow any guidance 😄 |
982ef89
to
9c4f07a
Compare
Hi @bogdandrutu and @hossain-rayhan - I spent some time working to minimize any unnecessary changes to the processor. I have split the logic out into an ordered set of commits which should make review easier. Please let me know if there are any questions! |
9c4f07a
to
acea837
Compare
acea837
to
d9e83fd
Compare
MaxStale time.Duration `mapstructure:"max_stale"` | ||
|
||
// Set to false in order to convert non monotonic metrics | ||
MonotonicOnly bool `mapstructure:"monotonic_only"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the default behavior here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior is to convert only monotonic metrics. (defaults to true)
My understanding is that non-monotonic sum metrics are treated as gauges and therefore should not be converted by default 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Delta-temporality systems generally treat cumulative non-monotonic sums as gauges. Having an option to convert non-monotonic sums is fine too, but I wouldn't enable it by default.
Can we make this a field named NonMonotonic
with the default false? A user would set non_monotonic: true
to get the non-default behavior.
Metrics []string `mapstructure:"metrics"` | ||
|
||
// The total time a state entry will live past the time it was last seen. Set to 0 to retain state indefinitely. | ||
MaxStale time.Duration `mapstructure:"max_stale"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the use case here. Wondering, how important is it to expose as a config option? Less configuarability gives us simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case is for the prometheus receiver. The scrape interval is configurable and so we wouldn't want metrics to go stale prior to the scraper completing. Allowing this value to be configurable means that users can configure the staleness with their scrape interval + scrape duration + fudge factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I suggest MaxStaleness
.
953d826
to
20b0f52
Compare
Kindly pinging reviewers @hossain-rayhan @dashpole @Aneurysm9 @rakyll @bogdandrutu @alolita. We have folks waiting on this PR - your review would be appreciated! 😄 |
20b0f52
to
cc9e0b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
@open-telemetry/collector-contrib-approvers is there any way somebody can take a look at this PR? It has been open for ~21 days now and we have folks waiting to use this processor. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
cc9e0b1
to
33a783c
Compare
This needs to be merged before 9/24. Suggest that @dmitryax review. |
|
||
- `metrics`: The processor uses metric names to identify a set of cumulative sum metrics and converts them to cumulative delta. Defaults to converting all metric names. | ||
- `max_stale`: The total time a state entry will live past the time it was last seen. Set to 0 to retain state indefinitely. Default: 0 | ||
- `monotonic_only`: Specify whether only monotonic metrics are converted from cumulative to delta. Default: `true`. Set to `false` to convert metrics regardless of monotonic setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I there any real use case that you want to cover where the support of "gauges to deltas" needed? Otherwise I would recommend to remove this option and always skip non monotonic metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, this is a good point. I think I'll go ahead and remove this option. After that update, this library will only attempt to convert monotonic metrics.
|
||
The following settings can be optionally configured: | ||
|
||
- `metrics`: The processor uses metric names to identify a set of cumulative sum metrics and converts them to cumulative delta. Defaults to converting all metric names. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question here. Do you have a use case where all metrics need to be translated? I think it would be the best if we will add regex support in future instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New Relic requires all metrics to be translated from cumulative -> delta temporality 😄
The New Relic platform does not currently support cumulative metric temporalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Given the circumstances of @a-feld's departure from the project, I advise merging this with a number of TODOs and associated tickets to continue ongoing development.
The only substantial point I've raised applies to the collector in general. There ought to be a standard way to create a map of metric identity, because more than one processor is going to use such a thing. The OTel-Go SDK has such a thing, but it's not perfectly simple to adapt. The tracking entry here could be replaced by a struct that is usable as a map key w/o the Write()
mechanism, but I wouldn't hold up this PR because of it. The collector should support something like the OTel-Go attribute.Distinct
type that applies to model/pdata
types so that code like this can easily create a map key based on resource, instrumentation library, name, unit and attributes. I would assign the related issues to @hossain-rayhan and volunteer to help review.
MaxStale time.Duration `mapstructure:"max_stale"` | ||
|
||
// Set to false in order to convert non monotonic metrics | ||
MonotonicOnly bool `mapstructure:"monotonic_only"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Delta-temporality systems generally treat cumulative non-monotonic sums as gauges. Having an option to convert non-monotonic sums is fine too, but I wouldn't enable it by default.
Can we make this a field named NonMonotonic
with the default false? A user would set non_monotonic: true
to get the non-default behavior.
Metrics []string `mapstructure:"metrics"` | ||
|
||
// The total time a state entry will live past the time it was last seen. Set to 0 to retain state indefinitely. | ||
MaxStale time.Duration `mapstructure:"max_stale"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I suggest MaxStaleness
.
fromDataPoint.SetStartTimestamp(pdata.NewTimestampFromTime(result.(delta).prevTimestamp)) | ||
} | ||
metric.Sum().SetAggregationTemporality(pdata.AggregationTemporalityDelta) | ||
ilms.RemoveIf(func(ilm pdata.InstrumentationLibraryMetrics) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the new style here, thanks. 😀
// the first data point is omitted since the initial | ||
// reference is not assumed to be zero | ||
if !valid { | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. This behavior is described in https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/datamodel.md#resets-and-gaps
MetricName string | ||
MetricUnit string | ||
StartTimestamp pdata.Timestamp | ||
Attributes pdata.AttributeMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the Write
implementation make me nervous because Go has built-in support for using structs as map keys. The OTel-Go SDK exports an attribute.Set
type used for metric identity inside the SDK, see https://github.com/open-telemetry/opentelemetry-go/blob/92551d3933c9c7ef5eaf4f93f876a5487d0024b9/attribute/set.go#L198
With one of the attribute.Distinct
types for the resource and one for the attributes, the only other point type here is InstrumentationLibrary
. I'd be happy with a TODO to revisit this.
// NaN is used to signal "stale" metrics. | ||
// These are ignored for now. | ||
// https://github.com/open-telemetry/opentelemetry-collector/pull/3423 | ||
if metricID.IsFloatVal() && math.IsNaN(metricPoint.FloatValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// * Stale removal wins, updating goroutine will still see | ||
// the removed state but the state after the update will | ||
// not be persisted. The next update will load an entirely | ||
// new state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds valid to me.
This could be split into one part for staleness and one part for treating non-monotonic sums, but overall this looks like an improvement where benefits outweigh risks. |
The configuration has been updated so that: * An empty list of metrics defaults to converting all cumulative metrics to delta. * The state TTL (MaxStale) is now considered configurable * There is a configuration option to enable/disable conversion of non-monotonic cumulative metrics. Non-monotonic cumulative metrics are normally considered gauges and aggregation temporality should not be converted. The default is that only monotonic values are converted.
* Uses the internal tracker library instead of AWS metrics library. This enables separation of timeseries by resource and instrumentation library. * Metric data points which are invalid (the first in a series of non-monontic cumulative values) are now removed from the dataset.
By default, the cumulative to delta processor now converts all metrics. Previously, the processor did not convert any metrics unless listed in the configuration.
Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
0a6b4ff
to
007837d
Compare
I hope someone at NR will pick this up! |
@jmacd Yep! We'll work to bridge the gap soon and reopen the PR. Thank you for your and others' extensive review to date. |
#5772) * Add metric tracking library to replace awsmetrics dependency. * Update configuration of cumulativetodelta processor. The configuration has been updated so that: * An empty list of metrics defaults to converting all cumulative metrics to delta. * The state TTL (MaxStale) is now considered configurable * There is a configuration option to enable/disable conversion of non-monotonic cumulative metrics. Non-monotonic cumulative metrics are normally considered gauges and aggregation temporality should not be converted. The default is that only monotonic values are converted. * Update cumulativetodelta processor logic. * Uses the internal tracker library instead of AWS metrics library. This enables separation of timeseries by resource and instrumentation library. * Metric data points which are invalid (the first in a series of non-monontic cumulative values) are now removed from the dataset. * Update processor default test case. By default, the cumulative to delta processor now converts all metrics. Previously, the processor did not convert any metrics unless listed in the configuration. * Remove extra validate function call. * Add processor benchmark. * Remove aws metrics library. * Update readme. * Convert to using attributes rather than labels map. * Fix usage of deprecated api. * Retain NaN values. * Exported field comments should start with field name. Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com> * Remove monotonic configuration option. * Fixes after merge * Rename MaxStale to MaxStaleness * Update README to reflect removal of monotonic_only setting * Change processor to only convert metrics explicitly specified in the config * Reintroduce test for invalid config with no metric names * Rename max_stale to max_staleness * Fix README * List of metric names can no longer be nil or empty Co-authored-by: Allan Feldman <afeldman@newrelic.com> Co-authored-by: Allan Feldman <6374032+a-feld@users.noreply.github.com> Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
Description:
This PR improves on the cumulative to delta processor. Implemented in this PR:
Benchmarks:
Before this PR
After this PR