Prometheus Job and Instance labels #575

liamawhite · 2020-02-27T01:51:50Z

When using the Prometheus receiver and exporter I no longer get the job or instance labels that are default in Prometheus. Is this expected behaviour?

When Prometheus scrapes them I lose these fields even with honor_labels: true. I have manually checked the metrics and they are missing these labels so it's not an issue with Prometheus. Is there a way to manually add them back in with some relabel_configs magic? I have tried but it seems to get ignored whenever I use the job or instance labels.

I eventually want two exporters here one of which can be scraped locally by Prometheus and the other will forward onto another collecter. Our default use case is without Prometheus so scraping Prometheus with the collector isn't an option.

Version: v0.2.3

    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          exporters: [prometheus]
    exporters:
      prometheus:
        endpoint: "0.0.0.0:9090"  
    receivers:
      prometheus:
        config:
          global:
            scrape_interval: {{ .Values.scrapeInterval }}
          scrape_configs:
          - job_name: 'pilot'
            kubernetes_sd_configs:
            - role: endpoints
              namespaces:
                names:
                - {{ .Release.Namespace }}
            relabel_configs:
            - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
              action: keep
              regex: istio-pilot;http-monitoring

- job_name: 'oc-agent'
      scrape_interval: 5s
      honor_labels: true
      kubernetes_sd_configs:
      - role: endpoints
        namespaces:
          names:
          - {{ .Release.Namespace }}
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: otel-collector;prometheus

The text was updated successfully, but these errors were encountered:

arpitjindal97 · 2020-09-13T20:05:29Z

It is so crucial bug and still open from more than 6 months

hairyhenderson · 2020-10-07T18:15:03Z

I can confirm this is still the current behaviour. I tried setting the job label through a relabel_config, but started getting an error when they didn't match.

The relabel_config I'm using is:

                  - source_labels: [__meta_kubernetes_pod_label_k8s_app]
                    target_label: job

And I'm seeing this error:

{"level":"warn","ts":1602094121.4748123,"caller":"scrape/scrape.go:1058","msg":"Appending scrape report failed","component_kind":"receiver","component_type":"prometheus","component_name":"prometheus","scrape_pool":"etcd-pods","target":"https://10.240.53.137:4001/metrics","err":"unable to find a target group with job=etcd-server"}

When I set the job_name to be the same as what would get relabeled, the error goes away, but I still see neither job nor instance.

In my case, I'll be using Prometheus to scrape the otelcol instance, so I'll just use labels other than job and instance to set them appropriately and then have another relabel_config in Prometheus to set them to what they should be.

I'd like to help fix this, though I'm not sure where to start.

pingleig · 2021-03-12T19:56:38Z

The error comes from

opentelemetry-collector/receiver/prometheusreceiver/internal/metadata.go

Line 42 in b13bc9d

return nil, errors.New("unable to find a target group with job=" + job)

The problem is the TargetsAll in prometheus scrape manager get its keys from the static prometheus scrape config. It has no way to know the job label within each scraped target, which can be set in the following ways

exported metrics can contain job label https://prometheus.io/docs/concepts/jobs_instances/
service discovery can set job label directly
relabel config based on meta label from service discovery
in static scrape config as job_name (lowest priority, only used when there is no job label from other places)

However, in otel's prometheus receiver, it can't distinguish the source of job label

opentelemetry-collector/receiver/prometheusreceiver/internal/transaction.go

Line 123 in b13bc9d

job, instance := ls.Get(model.JobLabel), ls.Get(model.InstanceLabel)

One way to workaround as a user is instead of using job label, rename it to job1 and you will pass the prometheus receiver. Then down the pipeline use processors like Metrics Transform Processor to rename is back to job so exporters can the right job label ... (I haven't tested it hough ...

Another way to workaround this is having the receiver figure out a way to have a scrape_job label that won't get override and always match the job_name in scrape_configs. Inside the internal package, check this hidden label and remove it after usage so it won't passed down the pipeline.

NOT tested. For open-telemetry#575 (comment)

bjrara · 2021-04-07T02:39:25Z

Prometheus receiver needs job and instance label to find the corresponding scraping target and the metric metadata from the cache. The metadata is used to build otel Metrics which would be passed through the pipeline.

How to get the target?

Currently, there's no good way to get the correct target when receiving metrics from Prometheus. The Appender interface doesn't support target information propagation.

Yet the target is known to the appender when it is first initiated. Ref

return newScrapeLoop(
	...
	func(l labels.Labels) labels.Labels { return mutateReportSampleLabels(l, opts.target) },
	func(ctx context.Context) storage.Appender { return appender(app.Appender(ctx), opts.limit) },

It's possible to add target metadata to Appender creation process.

Also there's another open PR (prometheus/prometheus#7771) from Prometheus repo that suggests to pass target metadata into Appender in Add. However, this PR has been stable for 3 months, and I don't know if there would be any follow up actions.

type Appender interface {
	Add(l labels.Labels, m Metadata, t int64, v float64) (uint64, error)
}

Re/label restriction in Prometheus receiver

Today, prometheus receiver could only use job and instance to find the scraping target which puts an extra undocumented restriction on label and relabel.

opentelemetry-collector/receiver/prometheusreceiver/internal/metadata.go

Lines 39 to 48 in b13bc9d

    
           func (s *metadataService) Get(job, instance string) (MetadataCache, error) { 
        
           	targetGroup, ok := s.sm.TargetsAll()[job] 
        
           	if !ok { 
        
           		return nil, errors.New("unable to find a target group with job=" + job) 
        
           	} 
        
           	// from the same targetGroup, instance is not going to be duplicated 
        
           	for _, target := range targetGroup { 
        
           		if target.Labels().Get(model.InstanceLabel) == instance { 
        
           			return &mCache{target}, nil

The unable to find a target group error can be triggered by

Labeling original metrics with job or instance;
Relabeling job or instance in the scrape config to a different value.

Although Prometheus receiver should be able to identify job and instance in case 1 when hornor_label is set to false where label conflicts would be resolved with exported_ prefix, it's hard to distinguish whether it is auto generated by Prometheus or added by users.

Before the issue is fixed, I would suggest people avoid changing these two labels.

rakyll · 2021-04-07T22:50:52Z

cc @odeke-em

rakyll · 2021-04-07T22:52:07Z

We will automatically add job and instance labels, this is a known bug. Related to open-telemetry/prometheus-interoperability-spec#7.

bjrara · 2021-04-08T02:05:41Z

We will automatically add job and instance labels, this is a known bug. Related to open-telemetry/wg-prometheus#7.

I've submitted a fix for auto labeling. #2897. Please let me know if it helps.

…2897) In Prometheus, `job` and `instance` are the two auto generated labels, however they are both dropped by prometheus receiver. Although these information is still available in `service.name` and `host`:`port`, it breaks the data contract for most Prometheus users (who use `job` and `instance` to consume metrics in their own system). This PR adds `job` and `instance` as well-known labels in prometheus receiver to fix the issue. **Link to tracking Issue:** #575 #2499 #2363 open-telemetry/prometheus-interoperability-spec#7

rakyll · 2021-04-20T03:16:38Z

Duplicate of open-telemetry/prometheus-interoperability-spec#37.

…orter (#2979) This is a follow up to #2897. Fixes #575 Fixes #2499 Fixes #2363 Fixes open-telemetry/prometheus-interoperability-spec#37 Fixes open-telemetry/prometheus-interoperability-spec#39 Fixes open-telemetry/prometheus-interoperability-spec#44 Passing compliance tests: $ go test --tags=compliance -run "TestRemoteWrite/otelcollector/Job.+" -v ./ === RUN TestRemoteWrite === RUN TestRemoteWrite/otelcollector === RUN TestRemoteWrite/otelcollector/JobLabel === PAUSE TestRemoteWrite/otelcollector/JobLabel === CONT TestRemoteWrite/otelcollector/JobLabel --- PASS: TestRemoteWrite (10.02s) --- PASS: TestRemoteWrite/otelcollector (0.00s) --- PASS: TestRemoteWrite/otelcollector/JobLabel (10.02s) PASS ok github.com/prometheus/compliance/remote_write 10.382s $ go test --tags=compliance -run "TestRemoteWrite/otelcollector/Instance.+" -v ./ === RUN TestRemoteWrite === RUN TestRemoteWrite/otelcollector === RUN TestRemoteWrite/otelcollector/InstanceLabel === PAUSE TestRemoteWrite/otelcollector/InstanceLabel === CONT TestRemoteWrite/otelcollector/InstanceLabel --- PASS: TestRemoteWrite (10.01s) --- PASS: TestRemoteWrite/otelcollector (0.00s) --- PASS: TestRemoteWrite/otelcollector/InstanceLabel (10.01s) PASS ok github.com/prometheus/compliance/remote_write 10.291s $ go test --tags=compliance -run "TestRemoteWrite/otelcollector/RepeatedLabels.+" -v ./ === RUN TestRemoteWrite === RUN TestRemoteWrite/otelcollector --- PASS: TestRemoteWrite (0.00s) --- PASS: TestRemoteWrite/otelcollector (0.00s) testing: warning: no tests to run PASS

prom receiver is expecting metric's job name same as the one in config. In order to keep similar behaviour as cloudwatch agent's discovery impl, we support getting job name from docker label, but it will break metric type. For long term solution, see open-telemetry/opentelemetry-collector#575 (comment)

#3785) * ext: ecsobserver Add filter and export yaml * ext: ecsobserver Add unit test for overall discovery - Update README * ext: ecsobserver Rename exported structs * ext: ecsobserver Merge duplicated create test fetcher - Stop the collector process from extension using `host.ReportFatalError` otherwiese the failure of extension just log. * ext: ecsobserver Explain the rename job lable logic prom receiver is expecting metric's job name same as the one in config. In order to keep similar behaviour as cloudwatch agent's discovery impl, we support getting job name from docker label, but it will break metric type. For long term solution, see open-telemetry/opentelemetry-collector#575 (comment) * ext: ecsoberver Inject test fetcher using config Was using context. * ext: ecsobserver Test fatal error on component.Host * ext: ecsobserver Move fetcher injection to its own func * ext: ecsobserver Add comment to example config in README

* Remove GetDescriptor * Add Must var hotfix

…open-telemetry#575)

bogdandrutu added this to the GA 1.0 milestone Aug 4, 2020

tigrannajaryan added the help wanted Good issue for contributors to OpenTelemetry Service to pick up label Oct 2, 2020

tigrannajaryan modified the milestones: GA 1.0, Backlog Oct 2, 2020

dashpole mentioned this issue Feb 5, 2021

Ensure Prometheus target is added as a label to incoming samples automatically open-telemetry/prometheus-interoperability-spec#7

Closed

pingleig mentioned this issue Mar 12, 2021

Error "Permanent error: server returned HTTP status 400 Bad Request" aws-observability/aws-otel-collector#396

Closed

pingleig added a commit to pingleig/opentelemetry-collector that referenced this issue Mar 12, 2021

hack: A workaround to use job name from scrape config

35c639a

NOT tested. For open-telemetry#575 (comment)

This was referenced Mar 18, 2021

[extension][ecsobserver] Init ECS service discovery using docker label open-telemetry/opentelemetry-collector-contrib#2734

Closed

[prometheus] Relabel job and instance cause metric type error and dropping metrics aws/amazon-cloudwatch-agent#193

Closed

hardproblems mentioned this issue Mar 30, 2021

[prometheusreceiver] keep instance and job label hardproblems/opentelemetry-collector#1

Closed

bjrara mentioned this issue Apr 6, 2021

Add job and instance as well-known labels in prometheus receiver #2897

Merged

This was referenced Apr 20, 2021

Don't drop instance and job labels from the Prometheus receiver #2964

Closed

Don't drop instance and job labels in the Prometheus Remote Write Exporter #2979

Merged

bogdandrutu closed this as completed in #2979 Apr 22, 2021

pingleig mentioned this issue Jun 17, 2021

[extension/ecsobserver] Write discovered targets as Prometheus file sd open-telemetry/opentelemetry-collector-contrib#3785

Merged

MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021

Remove GetDescriptor (open-telemetry#575)

7623fc5

* Remove GetDescriptor * Add Must var hotfix

hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this issue Apr 27, 2023

Allow running integration tests on mac os (open-telemetry#575)

bd04fd7

swiatekm pushed a commit to swiatekm/opentelemetry-collector that referenced this issue Oct 9, 2024

[opentelemetry-demo] Fix schedulingRules handling for demo components (…

5a683df

…open-telemetry#575)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus Job and Instance labels #575

Prometheus Job and Instance labels #575

liamawhite commented Feb 27, 2020 •

edited

Loading

arpitjindal97 commented Sep 13, 2020

hairyhenderson commented Oct 7, 2020

pingleig commented Mar 12, 2021

bjrara commented Apr 7, 2021

rakyll commented Apr 7, 2021

rakyll commented Apr 7, 2021

bjrara commented Apr 8, 2021 •

edited

Loading

rakyll commented Apr 20, 2021

Prometheus Job and Instance labels #575

Prometheus Job and Instance labels #575

Comments

liamawhite commented Feb 27, 2020 • edited Loading

arpitjindal97 commented Sep 13, 2020

hairyhenderson commented Oct 7, 2020

pingleig commented Mar 12, 2021

bjrara commented Apr 7, 2021

How to get the target?

Re/label restriction in Prometheus receiver

rakyll commented Apr 7, 2021

rakyll commented Apr 7, 2021

bjrara commented Apr 8, 2021 • edited Loading

rakyll commented Apr 20, 2021

liamawhite commented Feb 27, 2020 •

edited

Loading

bjrara commented Apr 8, 2021 •

edited

Loading