EMF: Store the initial value for cumulative metrics #3425

bjrara · 2021-05-18T18:28:56Z

Why do we need it?
This is a follow-up PR of open-telemetry/opentelemetry-collector#3047.

In this PR, EMF exporter changes its strategy when handling cumulative metric values such as counters, and prometheus summaries - it stores the first scraped records as initial value in cache, and skip sending them to the backend. As a result, the deltas calculated from the second are treated as the first values that EMF exporter will export.

For example, if counter metrics are received in the following order:

1st batch: request_total {code=400, value=100}
2nd batch: request_total {code=400, value=101}
3rd batch: request_total {code=400, value=102}

The metrics will be sent by EMF as:
~~1st batch: SKIP~~
~~2nd batch: request_total {code=400, value=1}~~
~~3rd batch: request_total {code=400, value=2}~~

1st batch: SKIP
2nd batch: request_total {code=400, value=1}
3rd batch: request_total {code=400, value=1}

Aneurysm9 · 2021-05-18T19:39:41Z

I'm having trouble understanding why this is necessary. If the exporter needs to translate cumulative metrics to deltas, shouldn't the initial sample still be present? I'd expect the delta output stream to look something like this:

1st batch: request_total {code=400, value=100}
2nd batch: request_total {code=400, value=1}
3rd batch: request_total {code=400, value=2}

The alternative means that I can never know the actual magnitude of the tracked value since the baseline is missing.

Can some more prose be added to the README explaining what this adjustment process is doing and why?

bjrara · 2021-05-18T20:51:07Z

I'm having trouble understanding why this is necessary. If the exporter needs to translate cumulative metrics to deltas, shouldn't the initial sample still be present? I'd expect the delta output stream to look something like this:
1st batch: request_total {code=400, value=100}
2nd batch: request_total {code=400, value=1}
3rd batch: request_total {code=400, value=2}
The alternative means that I can never know the actual magnitude of the tracked value since the baseline is missing.

Can some more prose be added to the README explaining what this adjustment process is doing and why?

Just realized the example in the description was wrong..., I corrected the adjusted value in EMF part. Sorry about the confusion.

Not sure if the issue becomes clearer as the example is corrected. In EMF exporter, we send deltas (of the current value and its previous value) to the CloudWatch backend when handling cumulative metrics.
If we scrape metrics every one minute, deltas will be the amount that is increased in that one minute.

Take the case above as an example, as request_total is increased by 1 every minute, delta in the 2nd, 3rd minute should be 1.

bjrara · 2021-05-18T21:26:10Z

I'm having trouble understanding why this is necessary. If the exporter needs to translate cumulative metrics to deltas, shouldn't the initial sample still be present? I'd expect the delta output stream to look something like this:
1st batch: request_total {code=400, value=100}
2nd batch: request_total {code=400, value=1}
3rd batch: request_total {code=400, value=2}
The alternative means that I can never know the actual magnitude of the tracked value since the baseline is missing.

Can some more prose be added to the README explaining what this adjustment process is doing and why?

IMHO, as we are exporting deltas by the most recent two records, we don't need the baseline, we only need know what its previous metric value is. The delta concept is alike rate() in Prometheus, but instead of user defining [5m] in the expression, the time window is decided by the scrape interval.

Previously, prometheus receiver sent the delta of the current metric value and the initial one, so EMF exporter could take it for granted that the first record received is delta-ed (by its previous value which is also the initial), for instance:

Metrics received from prometheus receiver were:

1st batch: SKIP (EMF wouldn't receive it)
2nd batch: request_total {code=400, value=1}
3rd batch: request_total {code=400, value=2}

And EMF exporter would send them as:

1st batch: NOT RECEIVED
2nd batch: request_total {code=400, value=1}
3rd batch: request_total {code=400, value=1}

Now since the logic in Prometheus receiver is changed, the cache and skip for the initials needs to be handled in EMF exporter instead.

Metrics received from prometheus receiver are:

1st batch: request_total {code=400, value=100}
2nd batch: request_total {code=400, value=101}
3rd batch: request_total {code=400, value=102}

And EMF exporter will send them as:

1st batch: SAVE as initial and SKIP
2nd batch: request_total {code=400, value=1}
3rd batch: request_total {code=400, value=1}

exporter/awsemfexporter/datapoint.go

mxiamxia

LGTM

internal/aws/metrics/metric_calculator.go

bogdandrutu · 2021-06-03T15:05:35Z

Please rebase to fix conflicts

…to backend

bjrara · 2021-06-03T22:48:50Z

Please rebase to fix conflicts

Done. Thx!

@djaglowski

As discussed during the SIG, we want to move storage extension to core, starting with the interface (this PR) so persistent buffer implementation (open-telemetry/opentelemetry-collector#2285) could use it **Link to tracking Issue:** #3424 **Testing:** Just the interface, no tests **Documentation:** README.md with API cc @djaglowski @tigrannajaryan

mircohacker · 2022-12-14T08:21:23Z

Hey @bjrara @Aneurysm9 @mxiamxia this behaviour does not work in my usecase. Is there a way to disable it or a work around?

bjrara requested a review from anuraaga as a code owner May 18, 2021 18:28

bjrara requested a review from a team May 18, 2021 18:28

github-actions bot assigned kbrockhoff May 18, 2021

mxiamxia reviewed May 19, 2021

View reviewed changes

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

bjrara force-pushed the delta branch 5 times, most recently from 93be710 to 4ff1746 Compare May 19, 2021 20:12

mxiamxia approved these changes May 21, 2021

View reviewed changes

bjrara force-pushed the delta branch from 4ff1746 to 0ce2150 Compare May 21, 2021 21:23

Aneurysm9 reviewed May 21, 2021

View reviewed changes

internal/aws/metrics/metric_calculator.go Outdated Show resolved Hide resolved

bjrara force-pushed the delta branch 6 times, most recently from f49fa9e to df6ad6b Compare May 28, 2021 22:15

Aneurysm9 approved these changes Jun 1, 2021

View reviewed changes

bjrara force-pushed the delta branch from df6ad6b to 25b4c34 Compare June 3, 2021 20:04

EMF: Store the initial value for cumulative metrics and skip sending …

559129a

…to backend

bjrara force-pushed the delta branch from 25b4c34 to 559129a Compare June 3, 2021 20:42

bogdandrutu merged commit e6e5817 into open-telemetry:main Jun 8, 2021

bryan-aguilar mentioned this pull request Dec 1, 2022

[exporter/awsemf] Exporter ignores the first batch of a metric send to cloudwatch aws-observability/aws-otel-collector#1653

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EMF: Store the initial value for cumulative metrics #3425

EMF: Store the initial value for cumulative metrics #3425

bjrara commented May 18, 2021 •

edited

Loading

Aneurysm9 commented May 18, 2021

bjrara commented May 18, 2021

bjrara commented May 18, 2021 •

edited

Loading

mxiamxia left a comment

bogdandrutu commented Jun 3, 2021

bjrara commented Jun 3, 2021

mircohacker commented Dec 14, 2022

EMF: Store the initial value for cumulative metrics #3425

EMF: Store the initial value for cumulative metrics #3425

Conversation

bjrara commented May 18, 2021 • edited Loading

Aneurysm9 commented May 18, 2021

bjrara commented May 18, 2021

bjrara commented May 18, 2021 • edited Loading

mxiamxia left a comment

Choose a reason for hiding this comment

bogdandrutu commented Jun 3, 2021

bjrara commented Jun 3, 2021

mircohacker commented Dec 14, 2022

bjrara commented May 18, 2021 •

edited

Loading

bjrara commented May 18, 2021 •

edited

Loading