loki-mixin: Complex configuration needed compared to other mixins #4838

DaveOHenry · 2021-11-29T10:53:30Z

We are using Helm and there are currently quite a few things that have to be tweaked to make the dashboards work:

set showMultiCluster:: false for loki-logs.json and loki-operational.json
override clusterLabel:: and matchers:: for every dashboard that uses these fields
override the dashboard variables (especially cluster and namespace)
override jobMatcher(job):: and namespaceMatcher():: functions (currently used by loki-deletion.json, loki-reads-resources.json, loki-retention.json and loki-writes-resources.json)

To summarize my findings:

The main issues are the hardcoded "cluster" label and job name "($namespace)/$job"
I haven't found a way to make loki-reads-resources.json and loki-writes-resources.json work other than overriding the complete dashboard definition, but this would kind of defeat the purpose of a mixin, wouldn't it?
There are some issues with the CPU and Memory panels, but I think this has to do with the deployment mode we are using (loki-distributed Helm chart): Looks like all of our containers are simply named "loki", see also [loki-distributed] Rename containers to fix some dashboard panels from loki-mixin helm-charts#933
I don't know what this cortex-gw is. Never heard of it before and ignored it for now.

I'm not using the loki-deletes.json and loki-mixin-recording-rules.json dashboards currently, but the others are mostly working and are a great starting point to explore various aspects of Loki.
My jsonnet experience is pretty basic, but I still hope that this config.libsonnet overrides help some people to get started more quickly and maybe save some headaches:

local lokimixin = import './vendor/github.com/grafana/loki/production/loki-mixin/mixin.libsonnet';
local utils = import 'mixin-utils/utils.libsonnet';

lokimixin
{
  # override functions from dashboard-utils.libsonnet
  jobMatcher(job)::
    'monitoring_stack=~"$cluster", job=~"%s"' % job,
  namespaceMatcher()::
    'monitoring_stack=~"$cluster", namespace=~"$namespace"',
  _config+:: {
    // Tags for dashboards.
    tags: ['loki'],

    // The label used to differentiate between different application instances (i.e. 'pod' in a kubernetes install).
    per_instance_label: 'pod',

    // The label used to differentiate between different nodes (i.e. servers).
    per_node_label: 'instance',
  },
  grafanaDashboards+: {
    'loki-chunks.json'+:{
      clusterLabel:: 'monitoring_stack',
      matchers:: {
        ingester: [utils.selector.re('job', 'loki-ingester')],
      },
      templating+: {
        list:
          std.map(
            function(item)
              if item.name == 'cluster' then
                item+ {
                  query: 'label_values(loki_build_info, monitoring_stack)'
                }
              else
              if item.name == 'namespace' then
                item+ {
                  query: 'label_values(loki_build_info{monitoring_stack=~"$cluster"}, namespace)'
                }
              else
                item,
            super.list
          ),
      },
    },
    'loki-deletion.json'+:{
      clusterLabel:: 'monitoring_stack',
      matchers:: {
        ingester: [utils.selector.re('job', 'loki-ingester')],
      },
      templating+: {
        list:
          std.map(
            function(item)
              if item.name == 'cluster' then
                item+ {
                  query: 'label_values(loki_build_info, monitoring_stack)'
                }
              else
              if item.name == 'namespace' then
                item+ {
                  query: 'label_values(loki_build_info{monitoring_stack=~"$cluster"}, namespace)'
                }
              else
                item,
            super.list
          ),
      },
    },
    'loki-logs.json'+:{
      showMultiCluster:: false,
      clusterLabel:: 'monitoring_stack',
      templating+: {
        list:
          std.map(
            function(item)
              if item.name == 'cluster' then
                item+ {
                  query: 'label_values(loki_build_info, monitoring_stack)'
                }
              else
              if item.name == 'namespace' then
                item+ {
                  query: 'label_values(loki_build_info{monitoring_stack=~"$cluster"}, namespace)'
                }
              else
              if item.name == 'deployment' then
                item+ {
                  query: 'label_values(kube_deployment_created{monitoring_stack="$cluster", namespace="$namespace"}, deployment)'
                }
              else
              if item.name == 'pod' then
                item+ {
                  query: 'label_values(kube_pod_container_info{monitoring_stack="$cluster", namespace="$namespace", pod=~"$deployment.*"}, pod)'
                }
              else
              if item.name == 'container' then
                item+ {
                  query: 'label_values(kube_pod_container_info{monitoring_stack="$cluster", namespace="$namespace", pod=~"$pod", pod=~"$deployment.*"}, container)'
                }
              else
              # preselect correct data source
              if item.name == 'logs' then
                item+ {
                  regex: '.*loki-prd'
                }
              else
                item,
            super.list
          ),
      },
    },
    'loki-operational.json'+:{
      clusterLabel:: 'monitoring_stack',
      showMultiCluster:: false,
      matchers:: {
        cortexgateway: [utils.selector.re('job', 'cortex-gw')],
        distributor: [utils.selector.re('job', 'loki-distributor')],
        ingester: [utils.selector.re('job', 'loki-ingester')],
        querier: [utils.selector.re('job', 'loki-querier')],
      },
      templating+: {
        list:
          std.map(
            function(item)
              if item.name == 'cluster' then
                item+ {
                  query: 'label_values(loki_build_info, monitoring_stack)'
                }
              else
              if item.name == 'namespace' then
                item+ {
                  query: 'label_values(loki_build_info{monitoring_stack=~"$cluster"}, namespace)'
                }
              else
                item,
            super.list
          ),
      },
    },
    'loki-reads-resources.json'+:{
      templating+: {
        list:
          std.map(
            function(item)
              if item.name == 'cluster' then
                item+ {
                  query: 'label_values(loki_build_info, monitoring_stack)'
                }
              else
              if item.name == 'namespace' then
                item+ {
                  query: 'label_values(loki_build_info{monitoring_stack=~"$cluster"}, namespace)'
                }
              else
                item,
            super.list
          ),
      },
    },
    'loki-reads.json'+:{
      clusterLabel:: 'monitoring_stack',
      matchers:: {
        cortexgateway: [utils.selector.re('job', 'cortex-gw')],
        queryFrontend: [utils.selector.re('job', 'loki-query-frontend')],
        querier: [utils.selector.re('job', 'loki-querier')],
        ingester: [utils.selector.re('job', 'loki-ingester')],
        querierOrIndexGateway: [utils.selector.re('job', 'loki-(querier|index-gateway)')],
      },
      templating+: {
        list:
          std.map(
            function(item)
              if item.name == 'cluster' then
                item+ {
                  query: 'label_values(loki_build_info, monitoring_stack)'
                }
              else
              if item.name == 'namespace' then
                item+ {
                  query: 'label_values(loki_build_info{monitoring_stack=~"$cluster"}, namespace)'
                }
              else
                item,
            super.list
          ),
      },
    },
    'loki-retention.json'+:{
      templating+: {
        list:
          std.map(
            function(item)
              if item.name == 'cluster' then
                item+ {
                  query: 'label_values(loki_build_info, monitoring_stack)'
                }
              else
              if item.name == 'namespace' then
                item+ {
                  query: 'label_values(loki_build_info{monitoring_stack=~"$cluster"}, namespace)'
                }
              else
                item,
            super.list
          ),
      },
    },
    'loki-writes-resources.json'+:{
      templating+: {
        list:
          std.map(
            function(item)
              if item.name == 'cluster' then
                item+ {
                  query: 'label_values(loki_build_info, monitoring_stack)'
                }
              else
              if item.name == 'namespace' then
                item+ {
                  query: 'label_values(loki_build_info{monitoring_stack=~"$cluster"}, namespace)'
                }
              else
                item,
            super.list
          ),
      },
    },
    'loki-writes.json'+:{
      clusterLabel:: 'monitoring_stack',
      matchers:: {
        cortexgateway: [utils.selector.re('job', 'cortex-gw')],
        distributor: [utils.selector.re('job', 'loki-distributor')],
        ingester: [utils.selector.re('job', 'loki-ingester')],
      },
      templating+: {
        list:
          std.map(
            function(item)
              if item.name == 'cluster' then
                item+ {
                  query: 'label_values(loki_build_info, monitoring_stack)'
                }
              else
              if item.name == 'namespace' then
                item+ {
                  query: 'label_values(loki_build_info{monitoring_stack=~"$cluster"}, namespace)'
                }
              else
                item,
            super.list
          ),
      },
    },
  }
}

Originally posted by @DaveOHenry in #1978 (comment)

The text was updated successfully, but these errors were encountered:

Ottovsky · 2021-12-21T16:41:12Z

Hey, thanks a lot for this, I have hit the same problem with loki-mixin not being suitable for loki-distributed out of the box.

stale · 2022-03-02T20:49:36Z

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely
to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

DaveOHenry · 2022-03-02T22:35:28Z

Dear stalebot. Please keep this issue open.

…

________________________________ From: stale[bot] ***@***.***> Sent: Wednesday, March 2, 2022 9:49:47 PM To: grafana/loki ***@***.***> Cc: David Heinrich ***@***.***>; Mention ***@***.***> Subject: Re: [grafana/loki] loki-mixin: Complex configuration needed compared to other mixins (Issue #4838) Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days. We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded. Stalebots are also emotionless and cruel and can close issues which are still very relevant. If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry. We regularly sort for closed issues which have a stale label sorted by thumbs up. We may also: * Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed). * Add a keepalive label to silence the stalebot if the issue is very common/popular/important. We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot. — Reply to this email directly, view it on GitHub<#4838 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE77SLP42C5LUPAJKVC36R3U57H6XANCNFSM5I6Z4A6A>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

mordp1 · 2022-03-17T11:59:57Z

I override the almost complete dashboard definition to work here. I did need to review and adapt some dashboards, in my perspective not so clearly this loki-mixin.
But let me comment on something that i found to use here:

"($namespace)/$job" = i used e.g. job=~".+-ingester.+"
Me too
Yeap, all containers="loki", i changed dashboards to looking for pod (e.g. pod=~".+-ingester.+")
cortex-gw I suppose it's ingress. I use here nginx to do this, so i just delete the lines where i see job=~"($namespace)/cortex-gw", some dashboards looking for "loki_request_duration_seconds_count".
the dashboard "Memory (go heap in use)" for gateway, I changed to collect nginx_ingress_controller_requests so the name now it is "Ingress Request"

Extra:

Some dashboards i see [$__rate_interval], so I did need to change to this: [1m] or [5m] to work fine
Almost all dashboards use this: job_route:loki_request_duration_seconds_bucket:sum_rate, I do not know why, but remove the suffix(sum_rate) a prefix(job_route), works here (e.g just use loki_request_duration_seconds_bucket).
I used Prometheus to understand metrics and what kind off other labels we have and manipulate the queries

pgassmann · 2022-03-28T08:56:37Z

the job_route:loki_request_duration_seconds_bucket:sum_rate are metrics generated by the recording rules. these rules are can also be generated from this mixin.

jsonnet -J vendor -S -e 'std.manifestYamlDoc((import "loki-mixin/recording_rules.libsonnet").prometheusRules)' > recording_rules.yml

stale · 2022-04-27T22:08:09Z

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely
to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

DaveOHenry · 2022-04-28T06:58:03Z

Dear stale bot. Please keep this issue open.

irizzant · 2022-05-17T14:08:57Z

@DaveOHenry I'm hitting the same issue with the mixins.

Why not filing a PR with the suggested updates for jsonnet?

DaveOHenry · 2022-05-17T14:50:25Z

I guess it's not as easy as it sounds, because there are many different functions and techniques used in the various dashboards.
My overrides are just another layer of configuration that work (kind of) in my environment. Therefore it is not possible to just pack them into a PR.
I mainly posted the jsonnet code to
a) hopefully save people some headaches
b) to raise the awareness for the current state of the loki-mixin
c) show the main problems that many people will certainly encounter

I literally spent hours trying to figure out how the mixin is supposed to work, but still don't know jsonnet good enough to implement the needed changes.
Additionally there is no consensus about how the loki-mixin should handle the configuration (or if it should be configurable at all ...).

irizzant · 2022-05-17T15:13:33Z

It took me a lot to make it work in jsonnet too, but a PR is good to discuss about the changes and get these overrides to a point that could be useful for everyone.

DaveOHenry · 2022-05-18T16:13:17Z

Unfortunately the overrides are quite limited here. There are some cases in this mixin where one has to override complete dashboard definitions. This means that one cannot simply update the mixin afterwards anymore. Instead the overrides have to be updated as well. This kind of defeats the purpose of a mixin, because they were meant to be easily portable and consumable as far as I know.

…

________________________________ From: irizzant ***@***.***> Sent: Tuesday, May 17, 2022 5:13:46 PM To: grafana/loki ***@***.***> Cc: David Heinrich ***@***.***>; Mention ***@***.***> Subject: Re: [grafana/loki] loki-mixin: Complex configuration needed compared to other mixins (Issue #4838) It took me a lot to make it work in jsonnet too, but a PR is good to discuss about the changes and get these overrides to a point that could be useful for everyone. — Reply to this email directly, view it on GitHub<#4838 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AE77SLPWYXHWXWS46DJDDX3VKOZSVANCNFSM5I6Z4A6A>. You are receiving this because you were mentioned.Message ID: ***@***.***>

irizzant · 2022-05-18T19:08:31Z

Please have a look at #6189, it's the same setup I'm currently using and the dashboards are working fine.
Feel free to add anything that could be missing there.

DaveOHenry · 2022-06-29T13:42:26Z

Looks like the linked PR #6189 does not fix anything related to this issue. There is still a lot of tweaking needed to make the mixin work and there is no hint for the user how to configure these kind of things.
#6266 basically describes the same problem and can be considered a duplicate of this issue.

However, some improvements were made in other PRs, namely:
#6383
#5535
#5536

... but the job name in many queries is still hardcoded to something like "($namespace)/$job" or "$namespace/compactor" and I also saw a hardcoded "cluster" label somewhere. I'll try to take a closer look if time permits.

irizzant · 2022-06-29T15:36:48Z

The linked PR #6189 implies that:

you don't need any of the jsonnet overrides you reported for the selectors, just add the create_service_monitor parameter to the configuration and the generated ServiceMonitor will do the job
Prometheus Rules / Alerts and Grafana dashboards can be automatically generated by the snippet reported in the PR itself.

so it actually does address the problems reported in this issue

DaveOHenry · 2022-06-29T15:46:21Z

How can a ServiceMonitor object possibly configure the selectors in the loki-mixin for me? I also don't use jsonnet to deploy Loki. So it actually does not fix anything in my case 😢

irizzant · 2022-06-29T20:49:34Z

The ServiceMonitor object can very well make the existing dashboards selectors work out-of-the-box without adding any of the jsonnet you reported above, and indeed it's already doing this job in our current setup.

Of course it involves using Jsonnet (which is by the way the recommended installation method) but the very same principle can be applied with to a ServiceMonitor generated by a Helm chart.

DaveOHenry · 2022-06-29T21:38:51Z

I understand your point, but the users should not be forced to build their infrastructure around a mixin. It should be configurable and easily consumable. That‘s what mixins were made for in the first place. Just take a look at the thanos or kubernetes mixins. They are easily configurable. It takes a few minutes to set the correct selectors and everything is just working. These mixins do not care if you use jsonnet, Helm or any other method for deploying your infrastructure.

irizzant · 2022-06-30T07:31:20Z

Just take a look at the thanos or kubernetes mixins. They are easily configurable. It takes a few minutes to set the correct selectors and everything is just working. These mixins do not care if you use jsonnet, Helm or any other method for deploying your infrastructure.

Actually all mixins (thanos included) make use of jsonnet only and they do not provide any valid PrometheusRule or ConfigMap directly applicable to a k8s cluster.
This is why they require tools to transform their content into valid k8s resources, like the compiled Loki manifests reported here.
Depending on what deployment method you use (Helm, Jsonnet) you have to take the appropriate actions to get the k8s resources you need.

Given the compiled manifests, it shouldn't be a problem to include them in Helm charts and also add a ServiceMonitor which computes the right labeling to make Grafana dashboards work as expected.

I'll see if I have some time to work on this too and file a PR

DaveOHenry · 2022-06-30T08:46:07Z

Most people do not care how these mixins are written. They just need a config file, install a tool and execute 1-2 commands to generate the manifests. See also https://monitoring.mixins.dev

I agree that the ServiceMonitor route works for some cases, but most people have a defined set of labels for all of their applications and they certainly do not want to change that just for this single mixin.
Also keep in mind that there are also users that use a vanilla Prometheus without any operator.

irizzant · 2022-06-30T09:27:08Z

Most people do not care how these mixins are written. They just need a config file, install a tool and execute 1-2 commands to generate the manifests.

I agree, but this depends on the way you use the mixins: if you use jsonnet you can import and use jsonnet directly, if you use Helm you first have to generate the manifests.

I agree that the ServiceMonitor route works for some cases, but most people have a defined set of labels for all of their applications and they certainly do not want to change that just for this single mixin.

I can't see how the ServiceMonitor labeling done in the PR could impact the existing set of labels for their applications at all.
In any case, the labeling is customizable.

Also keep in mind that there are also users that use a vanilla Prometheus without any operator.

Then they will need custom tools to inject Rules and Alerts into Prometheus yaml configuration anyway.
Either way, as you can see, the way you use mixins is actually dependent on the way you deployed Prometheus and Loki itself.

I totally agree that mixins should be easy to use and, compared to the way it was before, they are now for jsonnet users.
As I wrote, whenever I find time to do that, I'll try to file a PR to help Helm users

DaveOHenry mentioned this issue Jan 6, 2022

loki-mixin: Fix selector configuration in chunks dashboard #4861

Closed

3 tasks

stale bot added the stale A stale issue or PR that will automatically be closed. label Mar 2, 2022

stale bot removed the stale A stale issue or PR that will automatically be closed. label Mar 2, 2022

stale bot added the stale A stale issue or PR that will automatically be closed. label Apr 27, 2022

stale bot removed the stale A stale issue or PR that will automatically be closed. label Apr 28, 2022

irizzant mentioned this issue May 18, 2022

Loki Prometheus mixin: Fix incorrect selectors and simplify mixin usage #6189

Merged

4 tasks

trevorwhitney closed this as completed in #6189 Jun 28, 2022

sjentzsch mentioned this issue Jun 21, 2023

Dashboards mixin #9469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loki-mixin: Complex configuration needed compared to other mixins #4838

loki-mixin: Complex configuration needed compared to other mixins #4838

DaveOHenry commented Nov 29, 2021 •

edited

Loading

Ottovsky commented Dec 21, 2021

stale bot commented Mar 2, 2022

DaveOHenry commented Mar 2, 2022 via email

mordp1 commented Mar 17, 2022 •

edited

Loading

pgassmann commented Mar 28, 2022

stale bot commented Apr 27, 2022

DaveOHenry commented Apr 28, 2022

irizzant commented May 17, 2022

DaveOHenry commented May 17, 2022 •

edited

Loading

irizzant commented May 17, 2022

DaveOHenry commented May 18, 2022 via email

irizzant commented May 18, 2022

DaveOHenry commented Jun 29, 2022

irizzant commented Jun 29, 2022 •

edited

Loading

DaveOHenry commented Jun 29, 2022

irizzant commented Jun 29, 2022

DaveOHenry commented Jun 29, 2022 via email •

edited

Loading

irizzant commented Jun 30, 2022 •

edited

Loading

DaveOHenry commented Jun 30, 2022 •

edited

Loading

irizzant commented Jun 30, 2022

loki-mixin: Complex configuration needed compared to other mixins #4838

loki-mixin: Complex configuration needed compared to other mixins #4838

Comments

DaveOHenry commented Nov 29, 2021 • edited Loading

Ottovsky commented Dec 21, 2021

stale bot commented Mar 2, 2022

DaveOHenry commented Mar 2, 2022 via email

mordp1 commented Mar 17, 2022 • edited Loading

pgassmann commented Mar 28, 2022

stale bot commented Apr 27, 2022

DaveOHenry commented Apr 28, 2022

irizzant commented May 17, 2022

DaveOHenry commented May 17, 2022 • edited Loading

irizzant commented May 17, 2022

DaveOHenry commented May 18, 2022 via email

irizzant commented May 18, 2022

DaveOHenry commented Jun 29, 2022

irizzant commented Jun 29, 2022 • edited Loading

DaveOHenry commented Jun 29, 2022

irizzant commented Jun 29, 2022

DaveOHenry commented Jun 29, 2022 via email • edited Loading

irizzant commented Jun 30, 2022 • edited Loading

DaveOHenry commented Jun 30, 2022 • edited Loading

irizzant commented Jun 30, 2022

DaveOHenry commented Nov 29, 2021 •

edited

Loading

mordp1 commented Mar 17, 2022 •

edited

Loading

DaveOHenry commented May 17, 2022 •

edited

Loading

irizzant commented Jun 29, 2022 •

edited

Loading

DaveOHenry commented Jun 29, 2022 via email •

edited

Loading

irizzant commented Jun 30, 2022 •

edited

Loading

DaveOHenry commented Jun 30, 2022 •

edited

Loading