Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheusremotewrite] add otel_scope_info metric to prometheus remote write exporter #24248

Closed
wants to merge 21 commits into from

Conversation

blakeroberts-wk
Copy link
Contributor

Description:

These changes add scope name and version to a metric data point as labels. Additionally, an otel_scope_info metric may be emitted if the scope contains additional attributes. Similar to the target_info metric, the otel_scope_info metric can be joined with a metric on job, instance, otel_scope_name, and otel_scope_version to retrieve instrumentation scope attributes.

These changes align the prometheus remote write exporter with the following specification update: open-telemetry/opentelemetry-specification#2703

Link to tracking Issue: #21091

Testing:

I pulled these changes into a collector which accepts OTLP metrics from many java agents and exports data to a prometheus compatible backend.

// series 1
http_server_active_requests{
http_method="GET",
http_scheme="http",
instance="REDACTED_1",
job="REDACTED_2",
net_host_name="REDACTED_3",
net_host_port="8080",
otel_scope_name="io.opentelemetry.servlet-5.0",
otel_scope_version="1.26.0-alpha"
}
// series 2
http_server_active_requests{
http_method="GET",
http_scheme="http",
instance="REDACTED_1",
job="REDACTED_2",
net_host_name="REDACTED_3",
net_host_port="8080",
otel_scope_name="io.opentelemetry.tomcat-10.0",
otel_scope_version="1.26.0-alpha"
}

After deployment, errors from the backend around samples with duplicate timestamps ceased.

Documentation:

The prometheus remote write exporter's README was updated with the new configuration setting to enable/disable the creation of the otel_scope_info metric. Like the target_info metric, I've enabled this feature by default which may or may not be appropriate.

@blakeroberts-wk
Copy link
Contributor Author

@evan-bradley would you review my changes, please? Thank you.

@evan-bradley
Copy link
Contributor

@blakeroberts-wk Sorry for the delay. Overall what you have looks good, but I don't know that I can answer the question of how we should handle emitting an Instrumentation Scope metric or name/version attributes if we don't know the name of the scope.

Please add a changelog entry for this change and fix the branch conflict.

@blakeroberts-wk
Copy link
Contributor Author

@evan-bradley I have updated the branch and added a change log entry. are there any more concerns around these changes?

pkg/translator/prometheusremotewrite/helper.go Outdated Show resolved Hide resolved
pkg/translator/prometheusremotewrite/helper.go Outdated Show resolved Hide resolved
.chloggen/O11Y-3152.yaml Outdated Show resolved Hide resolved
pkg/translator/prometheusremotewrite/helper.go Outdated Show resolved Hide resolved
@evan-bradley
Copy link
Contributor

@evan-bradley I have updated the branch and added a change log entry. are there any more concerns around these changes?

I don't have any concerns, but before we can merge a functional change like this we need approval from one of the code owners.

@Aneurysm9 @rapphil Please take a look when you are able.

return
}
if scope.Attributes().Len() == 0 {
// If the scope doesn't have additional attributes, then otel_scope_info isn't useful.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will contain the name and version of the instrumentation scope. Isn't this information per se useful?

It will contain the name and version of the instrumentation scope. Isn't this information per se useful? It means that there is at least a single metric being emitted in that scope.

Having scope_info in the configuration enabled and not generating the otel_scope_info metric is misleading and will confuse users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking into the compliance matrix, it seems that only PHP and rust will be able to set the scope using the Otel API?

get_meter accepts attributes.

So one more point to for users to have the least surprise by always emitting this metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this information per se useful? It means that there is at least a single metric being emitted in that scope.

If one was looking for metrics under a specific scope, they'd probably do something like a topk query on

__name__{otel_scope_name="w/e"}

In general, actual usage of otel_scope_info is most likely after knowing the __name__ you're after then

<query> / on (otel_scope_name, otel_scope_version) group_left otel_scope_info

Though the query would return no data if otel_scope_info doesn't exist. So if the feature is enabled, it might make sense to always report it so queries that use the metric always return data regardless of whether or not the joined labels add novel information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit that emitted the metric regardless.

But now I'm second guessing it. This has the potential to add a lot of active time series that won't be valuable the majority of the time. The collector could receive metrics from many languages and only PHP and rust even have the potential to make this metric add novel information. And the query with the join tends to be a rather advanced use case (so advanced that documentation doesn't exist around how to properly use otel_scope_info let alone target_info) Additionally, not emitting the metric if it doesn't have attributes would align with the implementation of target_info: https://github.com/blakeroberts-wk/opentelemetry-collector-contrib/blob/3f146d90bd9c622c33359c71fcae89162a263f2c/pkg/translator/prometheusremotewrite/helper.go#L589-L591

}
default:
errs = multierr.Append(errs, errors.New("unsupported metric type"))
}
}
addScopeTargetInfo(scope, resource, settings, mostRecentTimestamp, tsMap)
Copy link
Contributor

@rapphil rapphil Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, should make the signature follow the hierarchy: addScopeTargetInfo(resource, scope, settings, mostRecentTimestamp, tsMap)?

gouthamve added a commit to gouthamve/agent that referenced this pull request Aug 10, 2023
The collector doesn't implement this yet and the metrics are not very useful.

See: open-telemetry/opentelemetry-collector-contrib#24248

Signed-off-by: Goutham <gouthamve@gmail.com>
rfratto pushed a commit to grafana/agent that referenced this pull request Aug 15, 2023
* Disable scope_info by default

The collector doesn't implement this yet and the metrics are not very useful.

See: open-telemetry/opentelemetry-collector-contrib#24248

Signed-off-by: Goutham <gouthamve@gmail.com>

* Add changelog entry

Signed-off-by: Goutham <gouthamve@gmail.com>

---------

Signed-off-by: Goutham <gouthamve@gmail.com>
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 29, 2023
@jpkrohling
Copy link
Member

@rapphil, is there anything blocking this PR from being merged?

@evan-bradley
Copy link
Contributor

@blakeroberts-wk Looks like some of the tests are failing, can you try rebasing/merging in main?

@Aneurysm9 @rapphil could you take a look as the code owners?

@blakeroberts-wk
Copy link
Contributor Author

I have merged main.

The unit test TestDeleteAfterRead_SkipPartials within github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer passes when ran locally.

Can the CI check be re-ran or skipped? @evan-bradley

@blakeroberts-wk
Copy link
Contributor Author

@evan-bradley are we still feeling confident in these changes?

@evan-bradley
Copy link
Contributor

I don't see any issues with these changes, but we need an approval from the code owners of this component.

@Aneurysm9 @rapphil could you please take a look?

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 26, 2023
@github-actions github-actions bot removed the Stale label Dec 28, 2023
@evan-bradley
Copy link
Contributor

@Aneurysm9 @rapphil Please take a look.

@Aneurysm9
Copy link
Member

I think this is missing allowance for the use case where metrics are ingested with the prometheus receiver and expected to be exported without modification. This is similar to the discussions previously regarding the normalization of metric names that do not necessarily make sense when receiving and exporting prometheus data. Configuration allowing the operator to control this behavior should be added.

@blakeroberts-wk
Copy link
Contributor Author

I think this is missing allowance for the use case where metrics are ingested with the prometheus receiver and expected to be exported without modification.

This implementation will not add the labels if the scope name/version are empty. However, if the scope name/version are empty, the prometheus receiver adds a default name and version:

ils.Scope().SetName(receiverName)
ils.Scope().SetVersion(t.buildInfo.Version)
.

I think it makes sense for the receiver to modify this data. But it does make it awkward to meet the expectation that prometheus data received by the collector isn't modified when exported in prometheus format. To be honest, I find this expectation to be silly. If I wanted to proxy prometheus data without modification, I'd use a prometheus server. If I wanted to proxy prometheus data with modification (which I do in practice), I'd use the otel collector. So to use the otel collector with the intent to modify prometheus data, I'd argue it'd be unreasonable for me to expect that data won't be modified. To be clear, modifications to the metric name seem intrusive. But adding labels used to appropriately identify a unique time series seems like reasonable, if not expected, modification.

Furthermore, the design docs of the prometheus receiver make the aforementioned expectation impossible:

It doesn't need to provide one-to-one mapping

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/prometheusreceiver/DESIGN.md#map-prometheus-metrics-to-the-corresponding-opentelemetry-metrics-properly

Because the prometheus receiver does not guarantee data is mapped one-to-one, the prometheus remote write exporter cannot guarantee its output matches whatever input was given to the receiver.

Configuration allowing the operator to control this behavior should be added.

This seems reasonable, but low value. Would it be acceptable to open an issue about adding this configuration?

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 15, 2024
@dashpole dashpole removed the Stale label Feb 15, 2024
Copy link
Contributor

github-actions bot commented Mar 1, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Mar 15, 2024
@gespi1
Copy link

gespi1 commented Jul 19, 2024

@evan-bradley @blakeroberts-wk what is the status of these changes, are they being superseded by another PR elsewhere or traction has stopped and still waiting on @Aneurysm9 ?

@evan-bradley
Copy link
Contributor

I'm sorry, I lost track of this. This change was still dependent on the code owners' approval. I'm not aware of any other PRs that have superseded it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants