Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cumulative to delta processor. #4444

Closed

Conversation

a-feld
Copy link
Member

@a-feld a-feld commented Aug 4, 2021

Description:

This PR improves on the cumulative to delta processor. Implemented in this PR:

  • Resource and instrumentation library attributes are used as part of the metric identity (i.e. for two metrics with distinct resource attributes and the same metric name, the metrics will now aggregate separately)
  • Detection of monotonic counter reset via start timestamp
  • Detection of monotonic counter reset via non monotonic value
  • Default to converting all monotonic cumulative counters
  • Removes AWS metric library dependency

Benchmarks:

Before this PR

/usr/local/bin/go test -benchmem -run=^$ -bench ^(BenchmarkConsumeMetrics)$ github.com/open-telemetry/opentelemetry-collector-contrib/processor/cumulativetodeltaprocessor

goos: darwin
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/cumulativetodeltaprocessor
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkConsumeMetrics-12    	 1441018	       831.5 ns/op	     320 B/op	      11 allocs/op

After this PR

/usr/local/bin/go test -benchmem -run=^$ -bench ^(BenchmarkConsumeMetrics)$ github.com/open-telemetry/opentelemetry-collector-contrib/processor/cumulativetodeltaprocessor

goos: darwin
goarch: amd64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/processor/cumulativetodeltaprocessor
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkConsumeMetrics-12    	 2320220	       494.8 ns/op	     128 B/op	       5 allocs/op
PASS
ok  	github.com/open-telemetry/opentelemetry-collector-contrib/processor/cumulativetodeltaprocessor	1.903s

@a-feld a-feld force-pushed the update-cumulative-to-delta branch 2 times, most recently from 597f202 to 982ef89 Compare August 5, 2021 16:00
@a-feld a-feld marked this pull request as ready for review August 5, 2021 16:00
@a-feld a-feld requested review from a team and jrcamp August 5, 2021 16:00
@a-feld
Copy link
Member Author

a-feld commented Aug 5, 2021

@hossain-rayhan @bogdandrutu This PR should be ready for review!
This is a follow up to #4216 (comment)

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-feld this is a very large change 👍🏽 would be good to split it.

@a-feld
Copy link
Member Author

a-feld commented Aug 6, 2021

@bogdandrutu splitting things up is a great suggestion! Would it be OK to split this into multiple commits within this PR or would you prefer to have multiple PRs?

Additionally, if you have a preference for how to split this PR up, I would also be happy to follow any guidance 😄
If not, I can work on splitting this out into logical smaller commits within the PR around the bullet points specified in the summary.

@a-feld
Copy link
Member Author

a-feld commented Aug 6, 2021

Hi @bogdandrutu and @hossain-rayhan - I spent some time working to minimize any unnecessary changes to the processor. I have split the logic out into an ordered set of commits which should make review easier. Please let me know if there are any questions!

@a-feld a-feld force-pushed the update-cumulative-to-delta branch from 9c4f07a to acea837 Compare August 6, 2021 21:14
@a-feld a-feld requested a review from bogdandrutu August 6, 2021 21:14
@a-feld a-feld force-pushed the update-cumulative-to-delta branch from acea837 to d9e83fd Compare August 6, 2021 21:28
MaxStale time.Duration `mapstructure:"max_stale"`

// Set to false in order to convert non monotonic metrics
MonotonicOnly bool `mapstructure:"monotonic_only"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the default behavior here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behavior is to convert only monotonic metrics. (defaults to true)
My understanding is that non-monotonic sum metrics are treated as gauges and therefore should not be converted by default 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Delta-temporality systems generally treat cumulative non-monotonic sums as gauges. Having an option to convert non-monotonic sums is fine too, but I wouldn't enable it by default.

Can we make this a field named NonMonotonic with the default false? A user would set non_monotonic: true to get the non-default behavior.

Metrics []string `mapstructure:"metrics"`

// The total time a state entry will live past the time it was last seen. Set to 0 to retain state indefinitely.
MaxStale time.Duration `mapstructure:"max_stale"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the use case here. Wondering, how important is it to expose as a config option? Less configuarability gives us simplicity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case is for the prometheus receiver. The scrape interval is configurable and so we wouldn't want metrics to go stale prior to the scraper completing. Allowing this value to be configurable means that users can configure the staleness with their scrape interval + scrape duration + fudge factor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I suggest MaxStaleness.

@a-feld a-feld force-pushed the update-cumulative-to-delta branch 3 times, most recently from 953d826 to 20b0f52 Compare August 18, 2021 15:03
@a-feld
Copy link
Member Author

a-feld commented Aug 18, 2021

Kindly pinging reviewers @hossain-rayhan @dashpole @Aneurysm9 @rakyll @bogdandrutu @alolita. We have folks waiting on this PR - your review would be appreciated! 😄

@a-feld a-feld force-pushed the update-cumulative-to-delta branch from 20b0f52 to cc9e0b1 Compare August 18, 2021 16:25
Copy link
Contributor

@hossain-rayhan hossain-rayhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

@a-feld
Copy link
Member Author

a-feld commented Aug 25, 2021

@open-telemetry/collector-contrib-approvers is there any way somebody can take a look at this PR? It has been open for ~21 days now and we have folks waiting to use this processor.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 2, 2021
@bogdandrutu bogdandrutu removed the Stale label Sep 3, 2021
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@a-feld a-feld force-pushed the update-cumulative-to-delta branch from cc9e0b1 to 33a783c Compare September 17, 2021 18:49
@mtwo
Copy link
Member

mtwo commented Sep 22, 2021

This needs to be merged before 9/24. Suggest that @dmitryax review.

@jmacd jmacd self-requested a review September 22, 2021 16:30
@dmitryax dmitryax self-assigned this Sep 23, 2021

- `metrics`: The processor uses metric names to identify a set of cumulative sum metrics and converts them to cumulative delta. Defaults to converting all metric names.
- `max_stale`: The total time a state entry will live past the time it was last seen. Set to 0 to retain state indefinitely. Default: 0
- `monotonic_only`: Specify whether only monotonic metrics are converted from cumulative to delta. Default: `true`. Set to `false` to convert metrics regardless of monotonic setting.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I there any real use case that you want to cover where the support of "gauges to deltas" needed? Otherwise I would recommend to remove this option and always skip non monotonic metrics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, this is a good point. I think I'll go ahead and remove this option. After that update, this library will only attempt to convert monotonic metrics.


The following settings can be optionally configured:

- `metrics`: The processor uses metric names to identify a set of cumulative sum metrics and converts them to cumulative delta. Defaults to converting all metric names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here. Do you have a use case where all metrics need to be translated? I think it would be the best if we will add regex support in future instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Relic requires all metrics to be translated from cumulative -> delta temporality 😄
The New Relic platform does not currently support cumulative metric temporalities.

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Given the circumstances of @a-feld's departure from the project, I advise merging this with a number of TODOs and associated tickets to continue ongoing development.

The only substantial point I've raised applies to the collector in general. There ought to be a standard way to create a map of metric identity, because more than one processor is going to use such a thing. The OTel-Go SDK has such a thing, but it's not perfectly simple to adapt. The tracking entry here could be replaced by a struct that is usable as a map key w/o the Write() mechanism, but I wouldn't hold up this PR because of it. The collector should support something like the OTel-Go attribute.Distinct type that applies to model/pdata types so that code like this can easily create a map key based on resource, instrumentation library, name, unit and attributes. I would assign the related issues to @hossain-rayhan and volunteer to help review.

MaxStale time.Duration `mapstructure:"max_stale"`

// Set to false in order to convert non monotonic metrics
MonotonicOnly bool `mapstructure:"monotonic_only"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Delta-temporality systems generally treat cumulative non-monotonic sums as gauges. Having an option to convert non-monotonic sums is fine too, but I wouldn't enable it by default.

Can we make this a field named NonMonotonic with the default false? A user would set non_monotonic: true to get the non-default behavior.

processor/cumulativetodeltaprocessor/config.go Outdated Show resolved Hide resolved
Metrics []string `mapstructure:"metrics"`

// The total time a state entry will live past the time it was last seen. Set to 0 to retain state indefinitely.
MaxStale time.Duration `mapstructure:"max_stale"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I suggest MaxStaleness.

fromDataPoint.SetStartTimestamp(pdata.NewTimestampFromTime(result.(delta).prevTimestamp))
}
metric.Sum().SetAggregationTemporality(pdata.AggregationTemporalityDelta)
ilms.RemoveIf(func(ilm pdata.InstrumentationLibraryMetrics) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the new style here, thanks. 😀

// the first data point is omitted since the initial
// reference is not assumed to be zero
if !valid {
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricName string
MetricUnit string
StartTimestamp pdata.Timestamp
Attributes pdata.AttributeMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the Write implementation make me nervous because Go has built-in support for using structs as map keys. The OTel-Go SDK exports an attribute.Set type used for metric identity inside the SDK, see https://github.com/open-telemetry/opentelemetry-go/blob/92551d3933c9c7ef5eaf4f93f876a5487d0024b9/attribute/set.go#L198

With one of the attribute.Distinct types for the resource and one for the attributes, the only other point type here is InstrumentationLibrary. I'd be happy with a TODO to revisit this.

// NaN is used to signal "stale" metrics.
// These are ignored for now.
// https://github.com/open-telemetry/opentelemetry-collector/pull/3423
if metricID.IsFloatVal() && math.IsNaN(metricPoint.FloatValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// * Stale removal wins, updating goroutine will still see
// the removed state but the state after the update will
// not be persisted. The next update will load an entirely
// new state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds valid to me.

@jmacd
Copy link
Contributor

jmacd commented Sep 23, 2021

This could be split into one part for staleness and one part for treating non-monotonic sums, but overall this looks like an improvement where benefits outweigh risks.

a-feld and others added 13 commits September 24, 2021 12:21
The configuration has been updated so that:
* An empty list of metrics defaults to converting all cumulative metrics
  to delta.
* The state TTL (MaxStale) is now considered configurable
* There is a configuration option to enable/disable conversion of
  non-monotonic cumulative metrics. Non-monotonic cumulative metrics are
  normally considered gauges and aggregation temporality should not be
  converted. The default is that only monotonic values are converted.
* Uses the internal tracker library instead of AWS metrics library. This
  enables separation of timeseries by resource and instrumentation
  library.
* Metric data points which are invalid (the first in a series of
  non-monontic cumulative values) are now removed from the dataset.
By default, the cumulative to delta processor now converts all metrics.
Previously, the processor did not convert any metrics unless listed in
the configuration.
Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
@a-feld a-feld force-pushed the update-cumulative-to-delta branch from 0a6b4ff to 007837d Compare September 24, 2021 16:22
@a-feld a-feld closed this Sep 24, 2021
@jmacd
Copy link
Contributor

jmacd commented Sep 24, 2021

I hope someone at NR will pick this up!

@alanwest
Copy link
Member

alanwest commented Sep 25, 2021

I hope someone at NR will pick this up!

@jmacd Yep! We'll work to bridge the gap soon and reopen the PR. Thank you for your and others' extensive review to date.

bogdandrutu pushed a commit that referenced this pull request Oct 19, 2021
#5772)

* Add metric tracking library to replace awsmetrics dependency.

* Update configuration of cumulativetodelta processor.

The configuration has been updated so that:
* An empty list of metrics defaults to converting all cumulative metrics
  to delta.
* The state TTL (MaxStale) is now considered configurable
* There is a configuration option to enable/disable conversion of
  non-monotonic cumulative metrics. Non-monotonic cumulative metrics are
  normally considered gauges and aggregation temporality should not be
  converted. The default is that only monotonic values are converted.

* Update cumulativetodelta processor logic.

* Uses the internal tracker library instead of AWS metrics library. This
  enables separation of timeseries by resource and instrumentation
  library.
* Metric data points which are invalid (the first in a series of
  non-monontic cumulative values) are now removed from the dataset.

* Update processor default test case.

By default, the cumulative to delta processor now converts all metrics.
Previously, the processor did not convert any metrics unless listed in
the configuration.

* Remove extra validate function call.

* Add processor benchmark.

* Remove aws metrics library.

* Update readme.

* Convert to using attributes rather than labels map.

* Fix usage of deprecated api.

* Retain NaN values.

* Exported field comments should start with field name.

Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>

* Remove monotonic configuration option.

* Fixes after merge

* Rename MaxStale to MaxStaleness

* Update README to reflect removal of monotonic_only setting

* Change processor to only convert metrics explicitly specified in the config

* Reintroduce test for invalid config with no metric names

* Rename max_stale to max_staleness

* Fix README

* List of metric names can no longer be nil or empty

Co-authored-by: Allan Feldman <afeldman@newrelic.com>
Co-authored-by: Allan Feldman <6374032+a-feld@users.noreply.github.com>
Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
hex1848 pushed a commit to hex1848/opentelemetry-collector-contrib that referenced this pull request Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants