Reconsider the approach of applying the full normalization #72

dmitryax · 2023-06-21T07:18:38Z

We recently attempted to activate the pkg.translator.prometheus.NormalizeName feature gate in the collector. This feature gate applies full metrics normalization by default in both the Prometheus receiver and exporter. However, the process did not go smoothly and was seen as an unnecessary disruptive change. Two issues were reported regarding this matter:

After discussing the topic in the Collector SIG on June 6th, a decision was made to revert the feature gate back to its alpha state until both issues are resolved.

To initiate the discussion towards resolving these problems, I have two questions:

1. Why do we need to change the original Prometheus metrics names in the Prometheus receiver?

While it is beneficial to populate as many fields as possible in an OTel metric object from the available information in a Prometheus endpoint, including the metric unit, changing the original metric name seems unnecessary. Unless we have predefined mappings, we cannot convert a Prometheus metric name into an OTel metric with a name compliant with the OTel spec. By simply removing the unit and _total suffixes, we end up with metric names that may be unfamiliar to users. For example, a commonly used Prometheus metric, process_cpu_seconds_total, would become process_cpu after the full normalization, which lacks references in both OTel and Prometheus contexts. However, if we retain the name process_cpu_seconds_total in the OTel metric name (with s as the unit), it is evident where the metric originates.

If we decide against changing the original name in the Prometheus receiver:

The Prom -> OTel -> Prom scenario remains unaffected since the full normalization in the Prometheus exporter will not add additional prefixes if they already exist in the metric name.
In an OTel -> Prom -> OTel scenario (assuming full normalization exists in the Prometheus exporter), different results would be obtained. For instance, system.cpu.time would become an OTel metric named system_cpu_time_seconds_total instead of system_cpu_time. I believe both system_cpu_time_seconds_total and system_cpu_time metric names should be acceptable since it is not possible to revert back to system.cpu.time.

2. Do we want to provide an option for the end user to disable/enable the full normalization in the Prometheus exporter? If so, what would be the default setting?

Unlike the Prometheus receiver, the full normalization in the Prometheus exporter is important as it prevents the loss of the unit field value during conversion. However, there might be situations where users prefer to avoid additional suffixes and keep the metric name as close to the original OTel name as possible. Considering the feedback from open-telemetry/opentelemetry-collector-contrib#21743, it seems reasonable to provide this option. If we do include it, what should be the default behavior? It is plausible that the default setting should be the enabled full normalization, but alternative viewpoints are welcome.

The text was updated successfully, but these errors were encountered:

dmitryax · 2023-06-22T06:56:36Z

Discussed this issue in the Prometheus WG meeting.

Q 1: @gouthamve mentioned that Prometheus is going to support dots in metric names going forward, so, potentially, we can restore OTel metric names in the OTel -> Prom -> OTel scenario by applying the full normalization. I'm still not sure if it justifies changing all the Prometheus metrics in the Prometheus receiver because of the reasons mentioned above (process_cpu_seconds_total -> process_cpu). Maybe we can update the Prometheus exporter normalization and mark all the metrics coming from it (e.g., by adding a special note to the HELP section) in a way that the Prometheus receiver from the other side can read the hint and restore the original OTel metric names. If the hint contains the original OTel metric name like otel:system.cpu.time, we don't even need to depend on the dots support. @open-telemetry/wg-prometheus WDYT?

Q 2: Look like there are no objections to making the normalization on the Prometheus exporter side configurable and having it enabled by default.

dashpole · 2023-06-22T13:35:39Z

Sorry I missed the discussion yesterday; I'm just getting back from leave and am catching up now. The response to enabling the feature gate makes me think our approach might have to change. If we tried to make the change at a future point, we would likely run into the same pushback, and i'm not sure the value we are getting for the change is worth the pain it would cause users right now.

My suggestion is that we enable normalization on prometheus exporters by default, but disable it on the prometheus receiver by default. It would essentially make only additive changes by default. Prom -> OTel -> Prom would still round trip successfully in that case, but OTel -> Prom -> OTel would end up with extra suffixes in addition to underscores. To round-trip OTel -> Prom -> OTel, a user would need to:

Disable sanitization (. -> _) in their OTel SDK prometheus exporter. This knob would be introduced after prometheus allows dots.
Enable suffix trimming in their prometheus receiver.

Since we want OTel SDK prometheus exporters to work with older prometheus servers, sanitization (. -> _) will have to stay the default behavior. That means regardless of whether we trim suffixes in the prometheus receiver by default, a user would have to change default behavior to be able to round-trip metric names.

I'm not a fan of encoding/decoding information from/to the HELP text. It seems likely to break, and pollutes the help text with information users are unlikely to care about.

gouthamve · 2023-06-22T16:53:34Z

My suggestion is that we enable normalization on prometheus exporters by default, but disable it on the prometheus receiver by default.

I like it. Echoes what I said here: open-telemetry/opentelemetry-collector-contrib#21743 (comment)

dmitryax · 2023-06-23T18:46:42Z

My suggestion is that we enable normalization on prometheus exporters by default, but disable it on the prometheus receiver by default.

@dashpole I agree with this in general. But WDYT about the suggestion I mentioned in my previous comment about adding an OTel metric hint to the HELP section by the Prometheus exporter. For example, if any metric has otel:system.cpu.time in the HELP section, the OTel Prometheus receiver would use that name instead. It'll make OTel -> Prom -> OTel work without dots support and any configuration option on the Prometheus receiver. Is that realistic?

dashpole · 2023-06-23T18:47:50Z

I'm not a fan of encoding/decoding information from/to the HELP text. It seems likely to break, and pollutes the help text with information users are unlikely to care about.

dmitryax · 2023-06-23T18:53:09Z

OK, let's proceed with the configuration options for the full normalization, which will be disabled on the receiver and enabled on the exporter. We probably don't need a feature gate for that, and the existing one can be deprecated.

dashpole · 2023-09-20T20:17:29Z

This has been completed in the collector

This was referenced Jun 23, 2023

.count metric naming convention only applies to UpDownCounters open-telemetry/semantic-conventions#107

Merged

Revert #5308? open-telemetry/opentelemetry-java#5437

Closed

dashpole mentioned this issue Jun 30, 2023

[Prometheus compatibility] Do not require trimming suffixes by default open-telemetry/opentelemetry-specification#3580

Merged

dashpole mentioned this issue Sep 6, 2023

Prometheus exporters: graduate pkg.translator.prometheus.NormalizeName feature gate open-telemetry/opentelemetry-collector-contrib#26488

Open

dashpole closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider the approach of applying the full normalization #72

Reconsider the approach of applying the full normalization #72

dmitryax commented Jun 21, 2023 •

edited

Loading

dmitryax commented Jun 22, 2023

dashpole commented Jun 22, 2023

gouthamve commented Jun 22, 2023

dmitryax commented Jun 23, 2023

dashpole commented Jun 23, 2023

dmitryax commented Jun 23, 2023

dashpole commented Sep 20, 2023

Reconsider the approach of applying the full normalization #72

Reconsider the approach of applying the full normalization #72

Comments

dmitryax commented Jun 21, 2023 • edited Loading

1. Why do we need to change the original Prometheus metrics names in the Prometheus receiver?

2. Do we want to provide an option for the end user to disable/enable the full normalization in the Prometheus exporter? If so, what would be the default setting?

dmitryax commented Jun 22, 2023

dashpole commented Jun 22, 2023

gouthamve commented Jun 22, 2023

dmitryax commented Jun 23, 2023

dashpole commented Jun 23, 2023

dmitryax commented Jun 23, 2023

dashpole commented Sep 20, 2023

dmitryax commented Jun 21, 2023 •

edited

Loading