Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep scrape config in line with the new Prometheus scrape config #2091

Merged
merged 2 commits into from
May 27, 2020

Conversation

beorn7
Copy link
Contributor

@beorn7 beorn7 commented May 18, 2020

This is triggered by grafana/jsonnet-libs#261 .

The above PR changes the instance label to be actually unique within
a scrape config. It also adds a pod and a container target label
so that metrics can easily be joined with metrics from cAdvisor, KSM,
and the Kubelet.

This commit adds the same to the Loki scrape config. It also removes
the container_name label. It is the same as the container label
and was already added to Loki previously. However, the
container_name label is deprecated and has disappeared in K8s 1.16,
so that it will soon become useless for direct joining.

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, just a few comments.

  1. I think we'lll need to update our mixin dashboards that use the labels which have changed (instance and container_name)
  2. We may want to update our helm templates to keep these in sync.

},

// Include container_name label
// Rename instances to the concatenation of pod:container:port.
// All three components are needed to guarantee a unique instance label.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this? I'd expect pod/container would be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… So it is definitely not enough for Prometheus because a single container can expose different metrics ports. I'm not sure if Loki can do something comparable (i.e. multiple "log streams" from a single container). I guess we really want to keep the instance labels in sync between Prometheus and Loki, that's kind of the point, but how does Loki deal with a setup where a single container exposes two separate metrics endpoints but only one log stream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think instance is necessarily important for consistency, only namespace, pod, and container are. Logs only produce 1 stream (stdout) while a service can expose metrics on multiple ports. These simply don't have a 1:1 relationship of any sort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how about just not having an instance label in Loki? It would make sense semantically. Logs are per container, so namespace/pod/container is sufficient for Loki. Prometheus needs the port in addition, but instead of adding a port label, we use the instance label, as it is tradition in Prometheus land, and we even keep it unique on its own to avoid surprises.

To connect Prometheus metrics with Loki logs, you'd go via namespace/pod/container.
Does that make sense? (Also looking at master @tomwilkie .)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how about just not having an instance label in Loki?

Make sense, as long as namespace/pod/container is consistent.

@beorn7
Copy link
Contributor Author

beorn7 commented May 19, 2020

I'll look into Helm and dashboards ASAP.

@chancez
Copy link
Contributor

chancez commented May 20, 2020

Related: #2070

@beorn7
Copy link
Contributor Author

beorn7 commented May 21, 2020

Thanks for the cross reference. This is just coincidence, but a nice one. I'm coming from the Grafana internal K8s side to use the right labels. So this is definitely the perfect opportunity to improve both sides and get them in sync.

This is triggered by grafana/jsonnet-libs#261 .

The above PR removes the `instance` label. As it has turned out (see
PR linked above), a sane `instance` label in Prometheus has to be
unique, and that includes the case where a single container exposes
metrics on two different endpoints. However, that scenario would still
only result in one log stream for Loki to scrape.

Therefore, Loki and Prometheus need to sync via target labels uniquely
identifying a container (rather than a metrics endpoint). Those labels
are namespace, pod, container, also added here.

This commit removes the `container_name` label. It is the same as the
`container` label and was already added to Loki previously. However,
the `container_name` label is deprecated and has disappeared in K8s
1.16, so that it will soon become useless for direct joining.
- container_name and pod_name have disappeared in K8s 1.16.

- We are changing the instance name in Prometheus (and removing it in
  Loki altogether). The instance name used to be the pod name, so
  switch usage of the instance label to the pod label.

The pod label will generally only appear as a target label with
rolling out recent changes to target labels. This will unfortunately
break dashboards if looking at older data.
@codecov-commenter
Copy link

codecov-commenter commented May 22, 2020

Codecov Report

Merging #2091 into master will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2091   +/-   ##
=======================================
  Coverage   61.21%   61.22%           
=======================================
  Files         146      146           
  Lines       11187    11187           
=======================================
+ Hits         6848     6849    +1     
+ Misses       3792     3789    -3     
- Partials      547      549    +2     
Impacted Files Coverage Δ
pkg/promtail/targets/tailer.go 73.86% <0.00%> (-4.55%) ⬇️
pkg/promtail/targets/filetarget.go 68.67% <0.00%> (-1.81%) ⬇️
pkg/logql/evaluator.go 90.69% <0.00%> (+0.58%) ⬆️
pkg/logql/vector.go 87.50% <0.00%> (+18.75%) ⬆️

@beorn7
Copy link
Contributor Author

beorn7 commented May 22, 2020

Soooo, now the following has happened:

  • Helm chart brought in sync WRT target label changes.
  • Essentially removed the instance label altogether.
  • Consistently adding a container and a pod label (rather than container_name or pod_name or none).
  • Updated all the dashboards to use container and pod as labels rather than container_name or pod_name or instance. (Prometheus metrics will continue to have an instance label, it just will look more complex. In the uses of the instance label so far, we actually wanted the pod label, which didn't exist, but will be added as a target label everywhere with the changes being rolled out together with the changes in this PR.)

@beorn7
Copy link
Contributor Author

beorn7 commented May 22, 2020

All your feedback will be appreciated, in particular @tomwilkie , with whom I agreed that not having the instance label in Loki at all is the way to go.

@beorn7
Copy link
Contributor Author

beorn7 commented May 25, 2020

Could I have 👀 on this? And approval, if possible. (This will only make it to production after a vendoring update, for which we'll have another round of reviews.)

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks.

@beorn7
Copy link
Contributor Author

beorn7 commented May 27, 2020

I'm merging this now because we have to get this vendoring update going. Will solicit feedback in the vendoring PR.

@beorn7 beorn7 merged commit a5aa341 into master May 27, 2020
@beorn7 beorn7 deleted the beorn7/scrape-config branch May 27, 2020 11:06
cyriltovena pushed a commit to cyriltovena/loki that referenced this pull request Jun 11, 2021
…ets to better display this. Also updated the dynamodb and cassandra clients to have the same buckets for consistency. (grafana#2091)

Signed-off-by: Edward Welch <edward.welch@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants