Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/metricstransform] Allow scaling exponential histograms #29803

Closed
TylerHelmuth opened this issue Dec 12, 2023 · 15 comments
Closed

[processor/metricstransform] Allow scaling exponential histograms #29803

TylerHelmuth opened this issue Dec 12, 2023 · 15 comments
Labels
closed as inactive enhancement New feature or request priority:p2 Medium processor/metricstransform Metrics Transform processor Stale

Comments

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Dec 12, 2023

Component(s)

metricstransformprocessor

Is your feature request related to a problem? Please describe.

To help normalize telemetry between old and stable http semantic conventions, I need to be able to convert the http.client.duration histogram I need to be able to scale from ms to seconds. The metrics transform processor can do this for histograms, but not exponential histograms.

Describe the solution you'd like

Update the processor so it can scale exponential histograms.

Describe alternatives you've considered

No response

Additional context

open-telemetry/semantic-conventions#534

@TylerHelmuth TylerHelmuth added enhancement New feature or request needs triage New item requiring triage priority:p2 Medium processor/metricstransform Metrics Transform processor and removed needs triage New item requiring triage labels Dec 12, 2023
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

Pinging code owners for processor/metricstransform: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member Author

pinging @jmacd @kentquirk @dashpole

@TylerHelmuth
Copy link
Member Author

pinging @jpkrohling

@jpkrohling
Copy link
Member

I pinged a few folks here at Grafana Labs who might be interested in the topic.

@aknuds1
Copy link
Contributor

aknuds1 commented Jan 25, 2024

I will look into this when I'm back from PTO, i.e. Tuesday 6th of February.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Mar 26, 2024
@aknuds1
Copy link
Contributor

aknuds1 commented Mar 26, 2024

Note to self: Look into this.

@github-actions github-actions bot removed the Stale label Mar 27, 2024
@jpkrohling
Copy link
Member

@sh0rez, this might be something you could help with as well.

@sh0rez
Copy link
Member

sh0rez commented Apr 30, 2024

for clarification, we are talking about scaling the time values, not changing the exponential histograms scale property (the bucket widths), right?

scale and rescale have very specific meaning for exponential histograms, so lets maybe reword this issue a little?

@danelson
Copy link
Contributor

In the older version of the semantic conventions, auto instrumentation produced a metric named http.server.duration which used a unit of ms. In never versions it produces a metric named http.server.request.duration with a unit of s.

Here is a contrived example in .NET.

using System.Diagnostics.Metrics;
using OpenTelemetry;
using OpenTelemetry.Metrics;
using OpenTelemetry.Resources;

namespace Example
{
    public class Program
    {
        public static void Main(string[] args)
        {
            using var meter = new Meter("TestMeter");

            using var meterProviderBuilder = Sdk.CreateMeterProviderBuilder()
                .ConfigureResource(resource => resource
                    .AddService("example-service"))
                .AddMeter(meter.Name)
                .AddView((instrument) =>
                {
                    if (instrument.GetType().GetGenericTypeDefinition() == typeof(Histogram<>))
                    {
                        return new Base2ExponentialBucketHistogramConfiguration();
                    }

                    return null;
                })
                .AddOtlpExporter()
                .Build();

            // Older application records like this
            var oldConvention = meter.CreateHistogram<double>("http.server.duration", "ms");
            oldConvention.Record(500);

            // Newer application records like this
            var newConvention = meter.CreateHistogram<double>("http.server.request.duration", "s");
            newConvention.Record(0.5);
        }
    }
}

This produces the following result for me

{
    "name": "http.server.duration",
    "unit": "ms",
    "exponentialHistogram":
    {
        "dataPoints":
        [
            {
                "startTimeUnixNano": "1714499479632475900",
                "timeUnixNano": "1714499479644263300",
                "count": "1",
                "sum": 500,
                "scale": 20,
                "positive":
                {
                    "offset": 9401306,
                    "bucketCounts":
                    [
                        "1"
                    ]
                },
                "negative":
                {},
                "min": 500,
                "max": 500
            }
        ],
        "aggregationTemporality": 2
    }
}

vs

{
    "name": "http.server.request.duration",
    "unit": "s",
    "exponentialHistogram":
    {
        "dataPoints":
        [
            {
                "startTimeUnixNano": "1714499479636591800",
                "timeUnixNano": "1714499479644296800",
                "count": "1",
                "sum": 0.5,
                "scale": 20,
                "positive":
                {
                    "offset": -1048577,
                    "bucketCounts":
                    [
                        "1"
                    ]
                },
                "negative":
                {},
                "min": 0.5,
                "max": 0.5
            }
        ],
        "aggregationTemporality": 2
    }
}

If I have a collector deployment that is sent information from different applications emitting data that follow both old and new conventions it makes it difficult to query the data on the backend. I need to be aware that this exists, hope my query language supports a conversion, etc.

It would be great if the collector could convert the old convention payload into the new convention payload.

You are correct that we are not taking about the scale field but rather a complete conversion of the payload, in this case from ms to s.

Copy link
Contributor

github-actions bot commented Jul 1, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@wildum
Copy link
Contributor

wildum commented Jul 9, 2024

Hey, I can try to tackle this one :) Because the scaling from one bucket to another is constant: base**(i) / base**(i+1) == base**(i+1) / base**(i+2), we don't need to adjust the scale field. I think that we can just update the offset without touching the buckets.

jpkrohling pushed a commit that referenced this issue Aug 13, 2024
…rt (#34039)

**Description:** This PR adds support for the exponential histograms for
the `experimental_scale_value` in the metricstransformprocessor.

The scaling works by scaling the middle value of the first bucket (the
one that corresponds to the offset) and finding the offset corresponding
to this new value (the method used is described here:
https://opentelemetry.io/docs/specs/otel/metrics/data-model/#all-scales-use-the-logarithm-function).

The buckets are actually unchanged because they are "shifted" by the new
offset. I initially remapped all the values but I ended up always having
the same buckets so I left the buckets untouched to make the code
simpler and save on performance.

**Link to tracking Issue:**
#29803

**Testing:** unit test + e2e local test
Copy link
Contributor

github-actions bot commented Sep 9, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Sep 9, 2024
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this issue Sep 12, 2024
…rt (open-telemetry#34039)

**Description:** This PR adds support for the exponential histograms for
the `experimental_scale_value` in the metricstransformprocessor.

The scaling works by scaling the middle value of the first bucket (the
one that corresponds to the offset) and finding the offset corresponding
to this new value (the method used is described here:
https://opentelemetry.io/docs/specs/otel/metrics/data-model/#all-scales-use-the-logarithm-function).

The buckets are actually unchanged because they are "shifted" by the new
offset. I initially remapped all the values but I ended up always having
the same buckets so I left the buckets untouched to make the code
simpler and save on performance.

**Link to tracking Issue:**
open-telemetry#29803

**Testing:** unit test + e2e local test
Copy link
Contributor

github-actions bot commented Nov 8, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed as inactive enhancement New feature or request priority:p2 Medium processor/metricstransform Metrics Transform processor Stale
Projects
None yet
Development

No branches or pull requests

6 participants