Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Job and Instance labels #575

Closed
liamawhite opened this issue Feb 27, 2020 · 8 comments · Fixed by #2979
Closed

Prometheus Job and Instance labels #575

liamawhite opened this issue Feb 27, 2020 · 8 comments · Fixed by #2979
Labels
help wanted Good issue for contributors to OpenTelemetry Service to pick up
Milestone

Comments

@liamawhite
Copy link
Contributor

liamawhite commented Feb 27, 2020

When using the Prometheus receiver and exporter I no longer get the job or instance labels that are default in Prometheus. Is this expected behaviour?

When Prometheus scrapes them I lose these fields even with honor_labels: true. I have manually checked the metrics and they are missing these labels so it's not an issue with Prometheus. Is there a way to manually add them back in with some relabel_configs magic? I have tried but it seems to get ignored whenever I use the job or instance labels.

I eventually want two exporters here one of which can be scraped locally by Prometheus and the other will forward onto another collecter. Our default use case is without Prometheus so scraping Prometheus with the collector isn't an option.

Version: v0.2.3

    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          exporters: [prometheus]
    exporters:
      prometheus:
        endpoint: "0.0.0.0:9090"  
    receivers:
      prometheus:
        config:
          global:
            scrape_interval: {{ .Values.scrapeInterval }}
          scrape_configs:
          - job_name: 'pilot'
            kubernetes_sd_configs:
            - role: endpoints
              namespaces:
                names:
                - {{ .Release.Namespace }}
            relabel_configs:
            - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
              action: keep
              regex: istio-pilot;http-monitoring
- job_name: 'oc-agent'
      scrape_interval: 5s
      honor_labels: true
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
          - {{ .Release.Namespace }}
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: otel-collector;prometheus
@bogdandrutu bogdandrutu added this to the GA 1.0 milestone Aug 4, 2020
@arpitjindal97
Copy link

It is so crucial bug and still open from more than 6 months

@tigrannajaryan tigrannajaryan added the help wanted Good issue for contributors to OpenTelemetry Service to pick up label Oct 2, 2020
@tigrannajaryan tigrannajaryan modified the milestones: GA 1.0, Backlog Oct 2, 2020
@hairyhenderson
Copy link
Contributor

I can confirm this is still the current behaviour. I tried setting the job label through a relabel_config, but started getting an error when they didn't match.

The relabel_config I'm using is:

                  - source_labels: [__meta_kubernetes_pod_label_k8s_app]
                    target_label: job

And I'm seeing this error:

{"level":"warn","ts":1602094121.4748123,"caller":"scrape/scrape.go:1058","msg":"Appending scrape report failed","component_kind":"receiver","component_type":"prometheus","component_name":"prometheus","scrape_pool":"etcd-pods","target":"https://10.240.53.137:4001/metrics","err":"unable to find a target group with job=etcd-server"}

When I set the job_name to be the same as what would get relabeled, the error goes away, but I still see neither job nor instance.

In my case, I'll be using Prometheus to scrape the otelcol instance, so I'll just use labels other than job and instance to set them appropriately and then have another relabel_config in Prometheus to set them to what they should be.

I'd like to help fix this, though I'm not sure where to start.

@pingleig
Copy link

The error comes from

return nil, errors.New("unable to find a target group with job=" + job)

The problem is the TargetsAll in prometheus scrape manager get its keys from the static prometheus scrape config. It has no way to know the job label within each scraped target, which can be set in the following ways

  • exported metrics can contain job label https://prometheus.io/docs/concepts/jobs_instances/
  • service discovery can set job label directly
  • relabel config based on meta label from service discovery
  • in static scrape config as job_name (lowest priority, only used when there is no job label from other places)

However, in otel's prometheus receiver, it can't distinguish the source of job label

job, instance := ls.Get(model.JobLabel), ls.Get(model.InstanceLabel)

One way to workaround as a user is instead of using job label, rename it to job1 and you will pass the prometheus receiver. Then down the pipeline use processors like Metrics Transform Processor to rename is back to job so exporters can the right job label ... (I haven't tested it hough ...

Another way to workaround this is having the receiver figure out a way to have a scrape_job label that won't get override and always match the job_name in scrape_configs. Inside the internal package, check this hidden label and remove it after usage so it won't passed down the pipeline.

@bjrara
Copy link
Contributor

bjrara commented Apr 7, 2021

Prometheus receiver needs job and instance label to find the corresponding scraping target and the metric metadata from the cache. The metadata is used to build otel Metrics which would be passed through the pipeline.

How to get the target?

Currently, there's no good way to get the correct target when receiving metrics from Prometheus. The Appender interface doesn't support target information propagation.

Yet the target is known to the appender when it is first initiated. Ref

return newScrapeLoop(
	...
	func(l labels.Labels) labels.Labels { return mutateReportSampleLabels(l, opts.target) },
	func(ctx context.Context) storage.Appender { return appender(app.Appender(ctx), opts.limit) },

It's possible to add target metadata to Appender creation process.

Also there's another open PR (prometheus/prometheus#7771) from Prometheus repo that suggests to pass target metadata into Appender in Add. However, this PR has been stable for 3 months, and I don't know if there would be any follow up actions.

type Appender interface {
	Add(l labels.Labels, m Metadata, t int64, v float64) (uint64, error)
}

Re/label restriction in Prometheus receiver

Today, prometheus receiver could only use job and instance to find the scraping target which puts an extra undocumented restriction on label and relabel.

func (s *metadataService) Get(job, instance string) (MetadataCache, error) {
targetGroup, ok := s.sm.TargetsAll()[job]
if !ok {
return nil, errors.New("unable to find a target group with job=" + job)
}
// from the same targetGroup, instance is not going to be duplicated
for _, target := range targetGroup {
if target.Labels().Get(model.InstanceLabel) == instance {
return &mCache{target}, nil

The unable to find a target group error can be triggered by

  1. Labeling original metrics with job or instance;
  2. Relabeling job or instance in the scrape config to a different value.

Although Prometheus receiver should be able to identify job and instance in case 1 when hornor_label is set to false where label conflicts would be resolved with exported_ prefix, it's hard to distinguish whether it is auto generated by Prometheus or added by users.

Before the issue is fixed, I would suggest people avoid changing these two labels.

@rakyll
Copy link
Contributor

rakyll commented Apr 7, 2021

cc @odeke-em

@rakyll
Copy link
Contributor

rakyll commented Apr 7, 2021

We will automatically add job and instance labels, this is a known bug. Related to open-telemetry/prometheus-interoperability-spec#7.

@bjrara
Copy link
Contributor

bjrara commented Apr 8, 2021

We will automatically add job and instance labels, this is a known bug. Related to open-telemetry/wg-prometheus#7.

I've submitted a fix for auto labeling. #2897. Please let me know if it helps.

tigrannajaryan pushed a commit that referenced this issue Apr 16, 2021
…2897)

In Prometheus, `job` and `instance` are the two auto generated labels, however they are both dropped by prometheus receiver. Although these information is still available in `service.name` and `host`:`port`, it breaks the data contract for most Prometheus users (who use `job` and `instance` to consume metrics in their own system).
This PR adds `job` and `instance` as well-known labels in prometheus receiver to fix the issue.

**Link to tracking Issue:** 
#575
#2499
#2363
open-telemetry/prometheus-interoperability-spec#7
@rakyll
Copy link
Contributor

rakyll commented Apr 20, 2021

bogdandrutu pushed a commit that referenced this issue Apr 22, 2021
…orter (#2979)

This is a follow up to #2897.

Fixes #575
Fixes #2499
Fixes #2363
Fixes open-telemetry/prometheus-interoperability-spec#37
Fixes open-telemetry/prometheus-interoperability-spec#39
Fixes open-telemetry/prometheus-interoperability-spec#44

Passing compliance tests:

$ go test --tags=compliance -run "TestRemoteWrite/otelcollector/Job.+" -v ./
=== RUN TestRemoteWrite
=== RUN TestRemoteWrite/otelcollector
=== RUN TestRemoteWrite/otelcollector/JobLabel
=== PAUSE TestRemoteWrite/otelcollector/JobLabel
=== CONT TestRemoteWrite/otelcollector/JobLabel
--- PASS: TestRemoteWrite (10.02s)
--- PASS: TestRemoteWrite/otelcollector (0.00s)
--- PASS: TestRemoteWrite/otelcollector/JobLabel (10.02s)
PASS
ok github.com/prometheus/compliance/remote_write 10.382s
$ go test --tags=compliance -run "TestRemoteWrite/otelcollector/Instance.+" -v ./
=== RUN TestRemoteWrite
=== RUN TestRemoteWrite/otelcollector
=== RUN TestRemoteWrite/otelcollector/InstanceLabel
=== PAUSE TestRemoteWrite/otelcollector/InstanceLabel
=== CONT TestRemoteWrite/otelcollector/InstanceLabel
--- PASS: TestRemoteWrite (10.01s)
--- PASS: TestRemoteWrite/otelcollector (0.00s)
--- PASS: TestRemoteWrite/otelcollector/InstanceLabel (10.01s)
PASS
ok github.com/prometheus/compliance/remote_write 10.291s
$ go test --tags=compliance -run "TestRemoteWrite/otelcollector/RepeatedLabels.+" -v ./
=== RUN TestRemoteWrite
=== RUN TestRemoteWrite/otelcollector
--- PASS: TestRemoteWrite (0.00s)
--- PASS: TestRemoteWrite/otelcollector (0.00s)
testing: warning: no tests to run
PASS
pingleig added a commit to pingleig/opentelemetry-collector-contrib that referenced this issue Jun 28, 2021
prom receiver is expecting metric's job name same as the one in config.
In order to keep similar behaviour as cloudwatch agent's discovery impl,
we support getting job name from docker label, but it will break metric
type.

For long term solution, see open-telemetry/opentelemetry-collector#575 (comment)
bogdandrutu pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this issue Jun 30, 2021
#3785)

* ext: ecsobserver Add filter and export yaml

* ext: ecsobserver Add unit test for overall discovery

- Update README

* ext: ecsobserver Rename exported structs

* ext: ecsobserver Merge duplicated create test fetcher

- Stop the collector process from extension using `host.ReportFatalError`
  otherwiese the failure of extension just log.

* ext: ecsobserver Explain the rename job lable logic

prom receiver is expecting metric's job name same as the one in config.
In order to keep similar behaviour as cloudwatch agent's discovery impl,
we support getting job name from docker label, but it will break metric
type.

For long term solution, see open-telemetry/opentelemetry-collector#575 (comment)

* ext: ecsoberver Inject test fetcher using config

Was using context.

* ext: ecsobserver Test fatal error on component.Host

* ext: ecsobserver Move fetcher injection to its own func

* ext: ecsobserver Add comment to example config in README
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
* Remove GetDescriptor

* Add Must var hotfix
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023
swiatekm pushed a commit to swiatekm/opentelemetry-collector that referenced this issue Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Good issue for contributors to OpenTelemetry Service to pick up
Projects
None yet
8 participants