Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design] New Processors to calculate rate from Cumulative Sum metrics #3751

Closed
hossain-rayhan opened this issue Jun 10, 2021 · 5 comments
Closed

Comments

@hossain-rayhan
Copy link
Contributor

Design doc link.

Use case
I am looking for a way to calculate rate metrics from incoming cumulative data points. For example, to calculate the cpu usage rate metrics, we need to get the usage difference of current data point and previous data point, and then divide the usage difference by time-difference of these two data points.

time_difference = (current_datapoint_time - previous_datapoint_time)
cpu_usage_rate = (current_datapoint_cpu_usage - previous_datapoint_cpu_usage) / time_difference

Can any of the existing processors do that?
No. We have metrics_transform_processor and experimental_metrics_generation_processor which transforms and generates new metrics. But none of them are able to calculate the rate from incoming cumulative sum data points

Solution
We want to make the components loosely coupled and focus on some very specific functionalities. For calculating rate from cumulative sum metrics, following two processors might be a good path forward.

  1. cumulative_to_delta_processor: A processor which converts cumulative sum data points to delta. This needs to know the previous datapoint.
  2. delta_to_rate_processor: A processor which calculates the rates. This does not depend on the previous datapoint.

1. Cumulative to Delta Processor

For sum data points where aggregation temporality is CUMULATIVE, a metric aggregator reports changes since a fixed start time. Whereas, the value for a DELTA metric is based only on the time interval associated with one measurement cycle. There is no dependency on previous measurements like is the case for CUMULATIVE metrics.

Calculation:
For converting cumulative metrics to delta, we need to store the previous data point. For the value of the converted delta metric, we can take the difference of these two data points. Also, we will update the StartTimeUnixNano of the current data point with the TimeUnixNano of the previous data point.

Consider the following example. Say we have three different data points for an incoming CUMULATIVE Sum metric where we have the same start time for all of them.

Val = 100, StartTimeUnixNano = 0, TimeUnixNano = 5
Val = 300, StartTimeUnixNano = 0, TimeUnixNano = 7
Val = 700, StartTimeUnixNano = 0, TimeUnixNano = 9

The conversion would be like the following,

  1. For the first data point, we don’t have any previous data. So, everything would remain the same. Val = 100, StartTimeUnixNano = 0, TimeUnixNano = 5
  2. For the second, Val = (300 - 100) = 200, StartTimeUnixNano = 5, TimeUnixNano = 7
  3. For the 3rd, Val = (700 - 300) = 400, StartTimeUnixNano = 7, TimeUnixNano = 9

Configuration:
For converting a CUMULATIVE metric to DELTA, it’s important to know the previous data point, and to support this, we need to uniquely identify a metric datapoint. Customers should have a way to define the metric names as well as the set of resource attributes/ metric labels to uniquely identify a metric. Matching those attributes/labels, we will maintain a map to store the previous data point and convert the current one to DELTA.

processors:
  cumulativetodelta:
    metrics:
      - name: metric1
        resource_attribute_keys: [attr1, attr2]
        metric_label_keys: [label1]
      - name: metric2
        resource_attribute_keys: [attr2]
        metric_label_keys: [label2]

2. Delta to Rate Processor

A processor which calculates the rate from a DELTA metric, and this rate is a Gauge metric. To calculate the rate from a DELTA, we don’t need to know the previous data point. Following equation will give the rate.

rate_metric_value = value / (TimeUnixNano - StartTimeUnixNano)

Configuration:
We should have a way to take input for which metrics we want to apply the calculation. Also, users should have the option for updating the metric name as well as the units from within the same processor.

processors:
  delta_to_rate:
    metrics:
      - name: container_cpu_usage_total_seconds
         new_name: container_cpu_usage_totoal_rate
         unit: millicores/second
      - name: container_cpu_usage_system_seconds
         new_name: container_cpu_usage_system_rate
         unit: millicores/second
@jrcamp
Copy link
Contributor

jrcamp commented Jun 14, 2021

Can it be a new action in metrics_transform_processor or its successor instead of separate processor?

@mxiamxia
Copy link
Member

mxiamxia commented Jun 25, 2021

Hi @hossain-rayhan, This seems to be a common use case in AWS. We have implemented a common cumulative metrics rate/delta calculation utils under aws/internal folder. It provides a default metrics delta value calculation implementation and the other component can also define their own custom Delta calculation func and then use it per needed.

Q: do we really need a new processor to do the delta calculation? I think the current implementation should cover this use cases. (you just need to pull it out from aws/internal folder)

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/internal/aws/metrics/metric_calculator.go#L33

@hossain-rayhan
Copy link
Contributor Author

Hi @jrcamp and @mxiamxia , This is a pretty common requirement asked by multiple vendors. I discussed this in our weekly meetings couple of times and reviewed the design. The suggestion actually came from Bogdan and Josh McDoland. They felt making two different processors would be ideal here-

  1. Cumulative to Delta Processor
  2. Delta to Rate Processor

Also, @mxiamxia where do you suggest to put the code after taking out of the aws/internal folder.

@github-actions github-actions bot added the Stale label Aug 11, 2021
@bogdandrutu bogdandrutu removed the Stale label Aug 11, 2021
alexperez52 referenced this issue in open-o11y/opentelemetry-collector-contrib Aug 18, 2021
Bumps [github.com/knadh/koanf](https://github.com/knadh/koanf) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/knadh/koanf/releases)
- [Commits](knadh/koanf@v1.2.0...v1.2.1)

---
updated-dependencies:
- dependency-name: github.com/knadh/koanf
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@hossain-rayhan
Copy link
Contributor Author

All PRs got merged for two new processors. Closing this one.

@Efrat19
Copy link

Efrat19 commented Jul 10, 2023

@hossain-rayhan Can the deltatorate processor also be used for non-monotonic SUMs? https://opentelemetry.io/docs/specs/otel/metrics/data-model/#sums

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants