Keep scrape config in line with the new Prometheus scrape config #2091

beorn7 · 2020-05-18T14:34:21Z

This is triggered by grafana/jsonnet-libs#261 .

The above PR changes the instance label to be actually unique within
a scrape config. It also adds a pod and a container target label
so that metrics can easily be joined with metrics from cAdvisor, KSM,
and the Kubelet.

This commit adds the same to the Loki scrape config. It also removes
the container_name label. It is the same as the container label
and was already added to Loki previously. However, the
container_name label is deprecated and has disappeared in K8s 1.16,
so that it will soon become useless for direct joining.

owen-d

This looks good to me, just a few comments.

I think we'lll need to update our mixin dashboards that use the labels which have changed (instance and container_name)
We may want to update our helm templates to keep these in sync.

owen-d · 2020-05-18T23:12:30Z

production/ksonnet/promtail/scrape_config.libsonnet

      },

-      // Include container_name label
+      // Rename instances to the concatenation of pod:container:port.
+      // All three components are needed to guarantee a unique instance label.


Can you elaborate on this? I'd expect pod/container would be enough.

Hmm… So it is definitely not enough for Prometheus because a single container can expose different metrics ports. I'm not sure if Loki can do something comparable (i.e. multiple "log streams" from a single container). I guess we really want to keep the instance labels in sync between Prometheus and Loki, that's kind of the point, but how does Loki deal with a setup where a single container exposes two separate metrics endpoints but only one log stream?

I don't think instance is necessarily important for consistency, only namespace, pod, and container are. Logs only produce 1 stream (stdout) while a service can expose metrics on multiple ports. These simply don't have a 1:1 relationship of any sort.

So how about just not having an instance label in Loki? It would make sense semantically. Logs are per container, so namespace/pod/container is sufficient for Loki. Prometheus needs the port in addition, but instead of adding a port label, we use the instance label, as it is tradition in Prometheus land, and we even keep it unique on its own to avoid surprises.

To connect Prometheus metrics with Loki logs, you'd go via namespace/pod/container.
Does that make sense? (Also looking at master @tomwilkie .)

So how about just not having an instance label in Loki?

Make sense, as long as namespace/pod/container is consistent.

beorn7 · 2020-05-19T12:32:37Z

I'll look into Helm and dashboards ASAP.

chancez · 2020-05-20T23:49:45Z

Related: #2070

beorn7 · 2020-05-21T11:53:36Z

Thanks for the cross reference. This is just coincidence, but a nice one. I'm coming from the Grafana internal K8s side to use the right labels. So this is definitely the perfect opportunity to improve both sides and get them in sync.

This is triggered by grafana/jsonnet-libs#261 . The above PR removes the `instance` label. As it has turned out (see PR linked above), a sane `instance` label in Prometheus has to be unique, and that includes the case where a single container exposes metrics on two different endpoints. However, that scenario would still only result in one log stream for Loki to scrape. Therefore, Loki and Prometheus need to sync via target labels uniquely identifying a container (rather than a metrics endpoint). Those labels are namespace, pod, container, also added here. This commit removes the `container_name` label. It is the same as the `container` label and was already added to Loki previously. However, the `container_name` label is deprecated and has disappeared in K8s 1.16, so that it will soon become useless for direct joining.

- container_name and pod_name have disappeared in K8s 1.16. - We are changing the instance name in Prometheus (and removing it in Loki altogether). The instance name used to be the pod name, so switch usage of the instance label to the pod label. The pod label will generally only appear as a target label with rolling out recent changes to target labels. This will unfortunately break dashboards if looking at older data.

codecov-commenter · 2020-05-22T17:10:39Z

Codecov Report

Merging #2091 into master will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2091   +/-   ##
=======================================
  Coverage   61.21%   61.22%           
=======================================
  Files         146      146           
  Lines       11187    11187           
=======================================
+ Hits         6848     6849    +1     
+ Misses       3792     3789    -3     
- Partials      547      549    +2

Impacted Files	Coverage Δ
pkg/promtail/targets/tailer.go	`73.86% <0.00%> (-4.55%)`	⬇️
pkg/promtail/targets/filetarget.go	`68.67% <0.00%> (-1.81%)`	⬇️
pkg/logql/evaluator.go	`90.69% <0.00%> (+0.58%)`	⬆️
pkg/logql/vector.go	`87.50% <0.00%> (+18.75%)`	⬆️

beorn7 · 2020-05-22T17:15:13Z

Soooo, now the following has happened:

Helm chart brought in sync WRT target label changes.
Essentially removed the instance label altogether.
Consistently adding a container and a pod label (rather than container_name or pod_name or none).
Updated all the dashboards to use container and pod as labels rather than container_name or pod_name or instance. (Prometheus metrics will continue to have an instance label, it just will look more complex. In the uses of the instance label so far, we actually wanted the pod label, which didn't exist, but will be added as a target label everywhere with the changes being rolled out together with the changes in this PR.)

beorn7 · 2020-05-22T17:16:16Z

All your feedback will be appreciated, in particular @tomwilkie , with whom I agreed that not having the instance label in Loki at all is the way to go.

beorn7 · 2020-05-25T14:06:06Z

Could I have 👀 on this? And approval, if possible. (This will only make it to production after a vendoring update, for which we'll have another round of reviews.)

owen-d

LGTM - thanks.

beorn7 · 2020-05-27T11:06:35Z

I'm merging this now because we have to get this vendoring update going. Will solicit feedback in the vendoring PR.

…ets to better display this. Also updated the dynamodb and cassandra clients to have the same buckets for consistency. (grafana#2091) Signed-off-by: Edward Welch <edward.welch@grafana.com>

beorn7 requested review from tomwilkie and joe-elliott May 18, 2020 14:34

pull-request-size bot added the size/S label May 18, 2020

beorn7 mentioned this pull request May 18, 2020

Change the instance name for standard pod scraping to be unique grafana/jsonnet-libs#261

Merged

rfratto mentioned this pull request May 18, 2020

Sync Agent scrape_configs with latest Jsonnet usage grafana/agent#73

Closed

owen-d reviewed May 18, 2020

View reviewed changes

beorn7 force-pushed the beorn7/scrape-config branch from c25644e to 1b1de59 Compare May 22, 2020 16:58

pull-request-size bot added size/M and removed size/S labels May 22, 2020

owen-d approved these changes May 26, 2020

View reviewed changes

beorn7 merged commit a5aa341 into master May 27, 2020

beorn7 deleted the beorn7/scrape-config branch May 27, 2020 11:06

rfratto mentioned this pull request May 31, 2020

Promtail default scrape configuration labels should match Kubernetes conventions #2070

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep scrape config in line with the new Prometheus scrape config #2091

Keep scrape config in line with the new Prometheus scrape config #2091

beorn7 commented May 18, 2020

owen-d left a comment

owen-d May 18, 2020

beorn7 May 19, 2020

chancez May 20, 2020

beorn7 May 21, 2020

tomwilkie May 22, 2020

beorn7 commented May 19, 2020

chancez commented May 20, 2020

beorn7 commented May 21, 2020

codecov-commenter commented May 22, 2020 •

edited

Loading

beorn7 commented May 22, 2020

beorn7 commented May 22, 2020

beorn7 commented May 25, 2020

owen-d left a comment •

edited

Loading

beorn7 commented May 27, 2020

Keep scrape config in line with the new Prometheus scrape config #2091

Keep scrape config in line with the new Prometheus scrape config #2091

Conversation

beorn7 commented May 18, 2020

owen-d left a comment

Choose a reason for hiding this comment

owen-d May 18, 2020

Choose a reason for hiding this comment

beorn7 May 19, 2020

Choose a reason for hiding this comment

chancez May 20, 2020

Choose a reason for hiding this comment

beorn7 May 21, 2020

Choose a reason for hiding this comment

tomwilkie May 22, 2020

Choose a reason for hiding this comment

beorn7 commented May 19, 2020

chancez commented May 20, 2020

beorn7 commented May 21, 2020

codecov-commenter commented May 22, 2020 • edited Loading

Codecov Report

beorn7 commented May 22, 2020

beorn7 commented May 22, 2020

beorn7 commented May 25, 2020

owen-d left a comment • edited Loading

Choose a reason for hiding this comment

beorn7 commented May 27, 2020

codecov-commenter commented May 22, 2020 •

edited

Loading

owen-d left a comment •

edited

Loading