Loki queries not split across queriers #9195

gtorre · 2023-04-19T14:07:14Z

TL;DR: How do I split queries across all of the available loki-read pods (queriers)? Do I need to setup the query-frontend as a separate deployment per the docs or is there something I'm missing in our config?

Hello all! I've recently setup Loki in our environment, we ship logs from 200K+ assets to Loki every 30 minutes or so via the Loki push API. We have two labels, and Loki seems pretty happy right now in regards to writes (our backend is S3.) The problem is querying logs. When I search a few hours back, queries are pretty snappy. However, we need to look back as far as 30 days and this is where we're seeing issues. I believe queries are not being split up and load balanced across the available loki-read pods (queriers).

We're using Loki in the single binary mode with loki-read and loki-write pods after reading this blog post. There are quite a few Helm charts, but it seems the developers are saying to use the official grafana/loki chart. What I'm confused about is that even though the recommendation seems to be to use the grafana/loki chart which creates two statefulsets and a deployment for the gateway, I still see the query-frontend being mentioned in the official docs, which makes it seem like we should be setting up a separate instance of the query-frontend. However, when I hit the /services endpoint on the loki-read pods I can see that the query-frontend service is already running, 🤔

ring => Running
query-scheduler => Running
index-gateway-ring => Running
store => Running
querier => Running
index-gateway => Running
compactor => Running
cache-generation-loader => Running
server => Running
query-frontend-tripperware => Running
usage-report => Running
ingester-querier => Running
query-frontend => Running                                <==== 👋 
ruler => Running
memberlist-kv => Running

Here is our config currently:

auth_enabled: false
common:
  path_prefix: /var/loki
  replication_factor: 3
  storage:
    s3: 
      bucketnames: <bucket>
      region: us-east-2
limits_config:
  unordered_writes: true
  max_streams_per_user: 0
  max_global_streams_per_user: 0
  enforce_metric_name: false
  max_cache_freshness_per_query: 10m
  reject_old_samples: false
  reject_old_samples_max_age: 168h
  split_queries_by_interval: 15m
  max_chunks_per_query: 20000000
  max_query_length: 0
  max_query_series: 1000
  cardinality_limit: 1000000
  max_streams_matchers_per_query: 10000
  max_concurrent_tail_requests: 100
  max_entries_limit_per_query: 10000
ingester:
  wal:
    replay_memory_ceiling: 8GB
memberlist:
  join_members:
  - loki-memberlist
schema_config:
  configs:
  - from: "2022-01-11"
    index:
      period: 24h
      prefix: loki_index_
    object_store: s3
    schema: v12
    store: boltdb-shipper
server:
  grpc_listen_port: 9095
  http_listen_port: 3100
  http_server_read_timeout: 300s
  http_server_write_timeout: 60s
  http_server_idle_timeout: 240s
  grpc_server_max_recv_msg_size: 8388608 # default is 4194304 bytes
  grpc_server_max_send_msg_size: 8388608 # default is 4194304 bytes
frontend:
  log_queries_longer_than: 5s

When we query for two days of logs, we get this error:

Get "http://loki-gateway.loki.svc:80/loki/api/v1/query_range?direction=backward&end=1681912422566000000&limit=10000000&query=%7Blog%3D%22system%22%7D+%7C%3D+%22asset_tag%3Da1e2f8%22&start=1681739622566000000&step=60000ms": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

During this query ☝️ some of the queriers weren't doing anything at all:

In addition, when I look through the logs, I would expect each querier to perform a small query different from other queriers, i.e. I should see different values for start_delta and end_delta in each query, but I see the same exact queries being performed by the queriers.

Finally, here are our loki-read pod resources:

- op: replace
  path: /spec/replicas
  value: 9
- op: replace
  path: /spec/template/spec/containers/0/resources
  value:
    requests:
      cpu: 3000m
      memory: 6Gi
    limits:
      cpu: 3000m
      memory: 8Gi

The text was updated successfully, but these errors were encountered:

liguozhong · 2023-04-23T07:30:15Z

querier:
  max_concurrent: 5

This configuration should help you get a more balanced load. This is a scheduling problem. After seeing this config, I believe you can understand what is going on.

gtorre · 2023-04-24T14:05:03Z

@liguozhong thanks for your feedback here. While waiting for a response from the community, I tried a few more things including what you suggested:

      query_range:
        align_queries_with_step: true
        parallelise_shardable_queries: true # false increase RAM usage!!
        max_retries: 5
        cache_results: true
        results_cache:
          cache:
            # We're going to use the in-process "FIFO" cache
            enable_fifocache: true
            fifocache:
              size: 1024
              validity: 24h
      frontend:
        log_queries_longer_than: 5s
        compress_responses: true
        max_outstanding_per_tenant: 10000000
        grpc_client_config:
          grpc_compression: snappy
          max_recv_msg_size: 1048576000
          max_send_msg_size: 1048576000
      querier:
        engine:
          timeout: 5m
        max_concurrent: 9
      frontend_worker:
        grpc_client_config:
          grpc_compression: snappy
          max_recv_msg_size: 1048576000
          max_send_msg_size: 1048576000
        parallelism: 24
      query_scheduler:
        max_outstanding_requests_per_tenant: 2160

Unfortunately querier.max_concurrent didn't help at all.

liguozhong · 2023-04-25T02:08:17Z

You set max_concurrent: 9, 9 is a relatively large value, unless your total number of logql queries reaches more than 9 * querier pod replica, otherwise it is difficult for you to see load balancing.
You can set max_concurrent: 1 or max_concurrent: 2 to get a more balanced result, 5 is a good result for me, of course, if the machine resources are sufficient, 2 is the best

luddskunk · 2023-05-04T12:47:37Z

Any updates @gtorre on solving your issue? I'm struggling with the same thing and any insights you can share would be valuable! :)

gtorre · 2023-05-04T13:11:06Z

@luddskunk no progress unfortunately, we're now looking at alternatives.

sandeepsukhani · 2023-05-15T11:23:56Z

@gtorre sorry for the delay in the response. I was looking at the issue and found the root cause for it. I am looking at a temporary fix for this with config changes until the permanent fix is out. I will keep posting the progress here.

luddskunk · 2023-05-15T11:30:10Z

@sandeepsukhani Thanks for letting us know, and getting back to it. Do you have any details on the root cause to share?

sandeepsukhani · 2023-05-15T12:00:43Z

The issue is in the code for initializing queriers for read nodes, which tries to access ring client instance from query_scheduler instance. However with SSD query_scheduler runs in backend nodes not the read nodes. This causes us to hit this condition and only process queries on a single read node. If you are getting hit by the same issue, you will see this log line in the initial logs of read nodes.

luddskunk · 2023-05-15T12:30:32Z

@sandeepsukhani
You're very much correct, I confirm that I am seeing these warnings with Simple Scalabe Deployment in my Loki logs! Okay, so something happened during the split of backend/read re-design. That would explain it! Nice find and hopefully we can continue to use this mode.

Would be interesting if we can explicitly set this somewhere in the helm chart values-file or similar.

gtorre-cs · 2023-05-16T14:02:56Z

Thanks @sandeepsukhani! Please keep us posted 🤞

…-scheduler replicas (#9477) **What this PR does / why we need it**: Currently, we have a bug in our code when running Loki in SSD mode and using the ring for query-scheduler discovery. It causes queries to not be distributed to all the available read pods. I have explained the issue in detail in [the PR which fixes the code](#9471). Since this bug causes a major query performance impact and code release might take time, in this PR we are doing a new helm release which fixes the issue by using the k8s service for discovering `query-scheduler` replicas. **Which issue(s) this PR fixes**: Fixes #9195

sandeepsukhani · 2023-05-22T06:11:38Z

Hey Folks, we have released a new helm chart version i.e. 5.5.3, which should fix the issue. Please reopen this issue if you folks still face any problems with query distribution to all the available read pods. You can verify that the fix is working by looking at cortex_query_scheduler_connected_frontend_clients and cortex_query_scheduler_connected_querier_clients metrics from backend pods.

luddskunk · 2023-05-22T13:07:47Z

Just wanted to let you know I tried it out and I see a dramatic increase in query performance. Thanks alot for your efforts here @sandeepsukhani!

…-scheduler replicas (grafana#9477) **What this PR does / why we need it**: Currently, we have a bug in our code when running Loki in SSD mode and using the ring for query-scheduler discovery. It causes queries to not be distributed to all the available read pods. I have explained the issue in detail in [the PR which fixes the code](grafana#9471). Since this bug causes a major query performance impact and code release might take time, in this PR we are doing a new helm release which fixes the issue by using the k8s service for discovering `query-scheduler` replicas. **Which issue(s) this PR fixes**: Fixes grafana#9195

**What this PR does / why we need it**: When we run the `query-scheduler` in `ring` mode, `queriers` and `query-frontend` discover the available `query-scheduler` instances using the ring. However, we have a problem when `query-schedulers` are not running in the same process as queriers and query-frontend since [we try to get the ring client interface from the scheduler instance](https://github.com/grafana/loki/blob/abd6131bba18db7f3575241c5e6dc4eed879fbc0/pkg/loki/modules.go#L358). This causes queries not to be spread across all the available queriers when running in SSD mode because [we point querier workers to query frontend when there is no ring client and scheduler address configured](https://github.com/grafana/loki/blob/b05f4fced305800b32641ae84e3bed5f1794fa7d/pkg/querier/worker_service.go#L115). I have fixed this issue by adding a new hidden target to initialize the ring client in `reader`/`member` mode based on which service is initializing it. `reader` mode will be used by `queriers` and `query-frontend` for discovering `query-scheduler` instances from the ring. `member` mode will be used by `query-schedulers` for registering themselves in the ring. I have also made a couple of changes not directly related to the issue but it fixes some problems: * [reset metric registry for each integration test](18c4fe5) - Previously we were reusing the same registry for all the tests and just [ignored the attempts to register same metrics](https://github.com/grafana/loki/blob/01f0ded7fcb57e3a7b26ffc1e8e3abf04a403825/integration/cluster/cluster.go#L113). This causes the registry to have metrics registered only from the first test so any updates from subsequent tests won't reflect in the metrics. metrics was the only reliable way for me to verify that `query-schedulers` were connected to `queriers` and `query-frontend` when running in ring mode in the integration test that I added to test my changes. This should also help with other tests where earlier it was hard to reliably check the metrics. * [load config from cli as well before applying dynamic config](f9e2448) - Previously we were applying dynamic config considering just the config from config file. This results in unexpected config changes, for example, [this config change](https://github.com/grafana/loki/blob/4148dd2c51cb827ec3889298508b95ec7731e7fd/integration/loki_micro_services_test.go#L66) was getting ignored and [dynamic config tuning was unexpectedly turning on ring mode](https://github.com/grafana/loki/blob/52cd0a39b8266564352c61ab9b845ab597008770/pkg/loki/config_wrapper.go#L94) in the config. It is better to do any config tuning based on both file and cli args configs. **Which issue(s) this PR fixes**: Fixes #9195

**What this PR does / why we need it**: When we run the `query-scheduler` in `ring` mode, `queriers` and `query-frontend` discover the available `query-scheduler` instances using the ring. However, we have a problem when `query-schedulers` are not running in the same process as queriers and query-frontend since [we try to get the ring client interface from the scheduler instance](https://github.com/grafana/loki/blob/abd6131bba18db7f3575241c5e6dc4eed879fbc0/pkg/loki/modules.go#L358). This causes queries not to be spread across all the available queriers when running in SSD mode because [we point querier workers to query frontend when there is no ring client and scheduler address configured](https://github.com/grafana/loki/blob/b05f4fced305800b32641ae84e3bed5f1794fa7d/pkg/querier/worker_service.go#L115). I have fixed this issue by adding a new hidden target to initialize the ring client in `reader`/`member` mode based on which service is initializing it. `reader` mode will be used by `queriers` and `query-frontend` for discovering `query-scheduler` instances from the ring. `member` mode will be used by `query-schedulers` for registering themselves in the ring. I have also made a couple of changes not directly related to the issue but it fixes some problems: * [reset metric registry for each integration test](grafana@18c4fe5) - Previously we were reusing the same registry for all the tests and just [ignored the attempts to register same metrics](https://github.com/grafana/loki/blob/01f0ded7fcb57e3a7b26ffc1e8e3abf04a403825/integration/cluster/cluster.go#L113). This causes the registry to have metrics registered only from the first test so any updates from subsequent tests won't reflect in the metrics. metrics was the only reliable way for me to verify that `query-schedulers` were connected to `queriers` and `query-frontend` when running in ring mode in the integration test that I added to test my changes. This should also help with other tests where earlier it was hard to reliably check the metrics. * [load config from cli as well before applying dynamic config](grafana@f9e2448) - Previously we were applying dynamic config considering just the config from config file. This results in unexpected config changes, for example, [this config change](https://github.com/grafana/loki/blob/4148dd2c51cb827ec3889298508b95ec7731e7fd/integration/loki_micro_services_test.go#L66) was getting ignored and [dynamic config tuning was unexpectedly turning on ring mode](https://github.com/grafana/loki/blob/52cd0a39b8266564352c61ab9b845ab597008770/pkg/loki/config_wrapper.go#L94) in the config. It is better to do any config tuning based on both file and cli args configs. **Which issue(s) this PR fixes**: Fixes grafana#9195 (cherry picked from commit 0a5e149)

emadolsky · 2024-08-06T16:03:35Z

Hi! As a result of 79b876b, the configuration for SSD mode is gone...
The problem is relevant again.

This was referenced May 17, 2023

query-scheduler: fix query distribution in SSD mode #9471

Merged

helm: release a new helm charts to use k8s service for discover query-scheduler replicas #9477

Merged

sandeepsukhani closed this as completed in #9477 May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loki queries not split across queriers #9195

Loki queries not split across queriers #9195

gtorre commented Apr 19, 2023

liguozhong commented Apr 23, 2023

gtorre commented Apr 24, 2023

liguozhong commented Apr 25, 2023

luddskunk commented May 4, 2023

gtorre commented May 4, 2023

sandeepsukhani commented May 15, 2023

luddskunk commented May 15, 2023

sandeepsukhani commented May 15, 2023 •

edited

Loading

luddskunk commented May 15, 2023

gtorre-cs commented May 16, 2023

sandeepsukhani commented May 22, 2023

luddskunk commented May 22, 2023

emadolsky commented Aug 6, 2024

Loki queries not split across queriers #9195

Loki queries not split across queriers #9195

Comments

gtorre commented Apr 19, 2023

liguozhong commented Apr 23, 2023

gtorre commented Apr 24, 2023

liguozhong commented Apr 25, 2023

luddskunk commented May 4, 2023

gtorre commented May 4, 2023

sandeepsukhani commented May 15, 2023

luddskunk commented May 15, 2023

sandeepsukhani commented May 15, 2023 • edited Loading

luddskunk commented May 15, 2023

gtorre-cs commented May 16, 2023

sandeepsukhani commented May 22, 2023

luddskunk commented May 22, 2023

emadolsky commented Aug 6, 2024

sandeepsukhani commented May 15, 2023 •

edited

Loading