Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Prometheus receiver observability #4973

Closed
dashpole opened this issue Nov 17, 2020 · 23 comments
Closed

Improve Prometheus receiver observability #4973

dashpole opened this issue Nov 17, 2020 · 23 comments
Labels
comp:prometheus Prometheus related issues receiver/prometheus Prometheus receiver Stale

Comments

@dashpole
Copy link
Contributor

dashpole commented Nov 17, 2020

Is your feature request related to a problem? Please describe.
The current set of otelcol receiver observability metrics seem to be otelcol_receiver_accepted_metric_points and otelcol_receiver_refused_metric_points. Compared with running a prometheus server, this is missing a lot of metrics which are useful for debugging. For example, the prometheus_sd_discovered_targets can tell me which targets my config has discovered, or the prometheus_target_metadata_cache_bytes metric tells me how large the receiver's metadata cache is.

Describe the solution you'd like
Allow ingesting metrics from the the prometheus.DefaultGatherer into the metrics pipeline. To accomplish this, implement a "bridge" from the prometheus gatherer to opentelemetry apis. Add configuration to the prometheus receiver to enable the collection of these metrics. Disable collection by default.

Describe alternatives you've considered

  1. Allow exposing the prometheus receiver metrics on a prometheus endpoint (working PoC)
    a. pros: Simple
    b. cons: Self-scraping requires additional configuration; self-scraping is inefficient compared to in-process alternatives.
  2. Status Quo
    b. cons: Difficult to debug prometheus service discovery, target health, caching, and other problems.

Additional context
While it is not in the stated design goals of the collector, it would be useful to be able to insert a collector into an existing prometheus pipeline:
Before:
prometheus application -> prometheus server
After:
prometheus application -> opentelemetry collector -> prometheus server

This would allow users to make use of opentelemetry features (additional receivers/processors), or help facilitate migrations to/from the prometheus server.

These operational metrics are one of the things a user would currently "lose" if they inserted an opentelemetry collector into their prometheus pipeline.

@bogdandrutu bogdandrutu transferred this issue from open-telemetry/opentelemetry-collector Aug 30, 2021
@alolita alolita added the comp:prometheus Prometheus related issues label Sep 2, 2021
@gouthamve
Copy link
Member

Hi, the collector already exposes internal metrics on :8888, why not add these metrics there?

@dashpole
Copy link
Contributor Author

That may be possible. The promhttp.HandlerFor() method provides a single handler for a single gatherer, but we might be able to implement a composite handler to add both to the same endpoint.

I think we would want to only add the prometheus server metrics if a prometheus receiver is being used. Otherwise, it's a lot of extra metrics without benefit.

@jpkrohling
Copy link
Member

We had a few discussions in the past about handling our "own" telemetry reporting. I have a task to document what component owners should do, but I'm blocked by the current state of otel-go SDK. The main idea is that we want component owners to use the otel-api for instrumentation, and allow operators to specify a special telemetry pipeline to send the collector's own pipeline.

open-telemetry/opentelemetry-collector#4198

@dashpole
Copy link
Contributor Author

dashpole commented Apr 12, 2022

We probably wouldn't get what we are looking for by directly using the otel-go API, since we want to know what is going on inside the prometheus server (library), which uses the prometheus client. The original proposal above of writing a bridge from prometheus.Gatherer to the OTel-go api seems likely to be the best option

@gouthamve
Copy link
Member

writing a bridge from prometheus.Gatherer to the OTel-go api

I believe we can already do that: https://github.com/open-telemetry/opentelemetry-go/blob/main/exporters/prometheus/prometheus.go#L89-L123

I beleive that by doing the following:

import "go.opentelemetry.io/otel/exporters/prometheus"

cfg := prometheus.Config {
    Gatherer: gatherer,
}

exp, err := prometheus.New(cfg, controller)

return exp.MeterProvider() 

This will provide us with a MeterProvider which we can use in the OTel pipeline. I'll need to double-check how to use this meter provider though.

@dashpole
Copy link
Contributor Author

I think that lets you go OTel -> prometheus, not the other way around.

@gouthamve
Copy link
Member

🤦 You are right! I always thought by passing prometheus.DefaultGatherer in the config, the metrics in there will be exposed.

I'll have to see how to get a MeterProvider out of a Prometheus.Registry. Hrm.

@Duncan-dsf
Copy link

How is this issue going now? I also need to relate to this issue.

@Duncan-dsf
Copy link

Can we build a extension to expose metrics in prometheus.DefaultGatherer?

@dashpole
Copy link
Contributor Author

dashpole commented Aug 2, 2022

Thats not a bad idea short-term, but long-term we probably want to unify the self-observability of components long-term.

@mwear
Copy link
Member

mwear commented Aug 24, 2022

I would like to know if a metric is scraped by the prometheus receiver, and from what scrape target. Most receivers set the instrumentation scope name to the receiver name (e.g. otelcol/postgresqlreceiver). It doesn't appear that the prometheus receiver sets the instrumentation scope at all. I have noticed that it creates a resource to represent the scrape target, which is enough to answer the scrape target part of the question, but there isn't a way to know the metric came from the prometheus receiver (afaict). Should the prometheus receiver set the instrumentation scope to identify metrics it has scraped?

I'm not sure if this is worthy of its own issue and discussion, or if it should be part of this one.

@dashpole
Copy link
Contributor Author

I would recommend opening a separate issue. I think that would be a good idea. It is also related to open-telemetry/opentelemetry-specification#2703

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 10, 2022
@dashpole dashpole added receiver/prometheus Prometheus receiver and removed Stale labels Nov 10, 2022
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@fatsheep9146
Copy link
Contributor

hi, any progress on this? @dashpole

could we use the method like open-telemetry/opentelemetry-collector#6297?

@dashpole
Copy link
Contributor Author

could we use the method like open-telemetry/opentelemetry-collector#6297?

Yes. I think it might actually be really easy to add these metrics to our current prometheus endpoint. The harder part would be to make those metrics work if we aren't using a Prometheus exporter for the metrics. That would require a Prometheus bridge of some sort.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 31, 2023
@dashpole dashpole removed the Stale label Jul 31, 2023
@dashpole
Copy link
Contributor Author

Prometheus bridge: open-telemetry/opentelemetry-go#4351

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

github-actions bot commented Dec 4, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

github-actions bot commented Feb 5, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 5, 2024
@atoulme atoulme closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:prometheus Prometheus related issues receiver/prometheus Prometheus receiver Stale
Projects
None yet
Development

No branches or pull requests

8 participants