Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki queries not split across queriers #9195

Closed
gtorre opened this issue Apr 19, 2023 · 13 comments · Fixed by #9471 or #9477
Closed

Loki queries not split across queriers #9195

gtorre opened this issue Apr 19, 2023 · 13 comments · Fixed by #9471 or #9477

Comments

@gtorre
Copy link

gtorre commented Apr 19, 2023

TL;DR: How do I split queries across all of the available loki-read pods (queriers)? Do I need to setup the query-frontend as a separate deployment per the docs or is there something I'm missing in our config?


Hello all! I've recently setup Loki in our environment, we ship logs from 200K+ assets to Loki every 30 minutes or so via the Loki push API. We have two labels, and Loki seems pretty happy right now in regards to writes (our backend is S3.) The problem is querying logs. When I search a few hours back, queries are pretty snappy. However, we need to look back as far as 30 days and this is where we're seeing issues. I believe queries are not being split up and load balanced across the available loki-read pods (queriers).

We're using Loki in the single binary mode with loki-read and loki-write pods after reading this blog post. There are quite a few Helm charts, but it seems the developers are saying to use the official grafana/loki chart. What I'm confused about is that even though the recommendation seems to be to use the grafana/loki chart which creates two statefulsets and a deployment for the gateway, I still see the query-frontend being mentioned in the official docs, which makes it seem like we should be setting up a separate instance of the query-frontend. However, when I hit the /services endpoint on the loki-read pods I can see that the query-frontend service is already running, 🤔

ring => Running
query-scheduler => Running
index-gateway-ring => Running
store => Running
querier => Running
index-gateway => Running
compactor => Running
cache-generation-loader => Running
server => Running
query-frontend-tripperware => Running
usage-report => Running
ingester-querier => Running
query-frontend => Running                                <==== 👋 
ruler => Running
memberlist-kv => Running

Here is our config currently:

auth_enabled: false
common:
  path_prefix: /var/loki
  replication_factor: 3
  storage:
    s3: 
      bucketnames: <bucket>
      region: us-east-2
limits_config:
  unordered_writes: true
  max_streams_per_user: 0
  max_global_streams_per_user: 0
  enforce_metric_name: false
  max_cache_freshness_per_query: 10m
  reject_old_samples: false
  reject_old_samples_max_age: 168h
  split_queries_by_interval: 15m
  max_chunks_per_query: 20000000
  max_query_length: 0
  max_query_series: 1000
  cardinality_limit: 1000000
  max_streams_matchers_per_query: 10000
  max_concurrent_tail_requests: 100
  max_entries_limit_per_query: 10000
ingester:
  wal:
    replay_memory_ceiling: 8GB
memberlist:
  join_members:
  - loki-memberlist
schema_config:
  configs:
  - from: "2022-01-11"
    index:
      period: 24h
      prefix: loki_index_
    object_store: s3
    schema: v12
    store: boltdb-shipper
server:
  grpc_listen_port: 9095
  http_listen_port: 3100
  http_server_read_timeout: 300s
  http_server_write_timeout: 60s
  http_server_idle_timeout: 240s
  grpc_server_max_recv_msg_size: 8388608 # default is 4194304 bytes
  grpc_server_max_send_msg_size: 8388608 # default is 4194304 bytes
frontend:
  log_queries_longer_than: 5s

When we query for two days of logs, we get this error:

Get "http://loki-gateway.loki.svc:80/loki/api/v1/query_range?direction=backward&end=1681912422566000000&limit=10000000&query=%7Blog%3D%22system%22%7D+%7C%3D+%22asset_tag%3Da1e2f8%22&start=1681739622566000000&step=60000ms": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

During this query ☝️ some of the queriers weren't doing anything at all:

image

In addition, when I look through the logs, I would expect each querier to perform a small query different from other queriers, i.e. I should see different values for start_delta and end_delta in each query, but I see the same exact queries being performed by the queriers.

Finally, here are our loki-read pod resources:

- op: replace
  path: /spec/replicas
  value: 9
- op: replace
  path: /spec/template/spec/containers/0/resources
  value:
    requests:
      cpu: 3000m
      memory: 6Gi
    limits:
      cpu: 3000m
      memory: 8Gi
@liguozhong
Copy link
Contributor

querier:
  max_concurrent: 5

This configuration should help you get a more balanced load. This is a scheduling problem. After seeing this config, I believe you can understand what is going on.

@gtorre
Copy link
Author

gtorre commented Apr 24, 2023

@liguozhong thanks for your feedback here. While waiting for a response from the community, I tried a few more things including what you suggested:

      query_range:
        align_queries_with_step: true
        parallelise_shardable_queries: true # false increase RAM usage!!
        max_retries: 5
        cache_results: true
        results_cache:
          cache:
            # We're going to use the in-process "FIFO" cache
            enable_fifocache: true
            fifocache:
              size: 1024
              validity: 24h
      frontend:
        log_queries_longer_than: 5s
        compress_responses: true
        max_outstanding_per_tenant: 10000000
        grpc_client_config:
          grpc_compression: snappy
          max_recv_msg_size: 1048576000
          max_send_msg_size: 1048576000
      querier:
        engine:
          timeout: 5m
        max_concurrent: 9
      frontend_worker:
        grpc_client_config:
          grpc_compression: snappy
          max_recv_msg_size: 1048576000
          max_send_msg_size: 1048576000
        parallelism: 24
      query_scheduler:
        max_outstanding_requests_per_tenant: 2160

Unfortunately querier.max_concurrent didn't help at all.

@liguozhong
Copy link
Contributor

You set max_concurrent: 9, 9 is a relatively large value, unless your total number of logql queries reaches more than 9 * querier pod replica, otherwise it is difficult for you to see load balancing.
You can set max_concurrent: 1 or max_concurrent: 2 to get a more balanced result, 5 is a good result for me, of course, if the machine resources are sufficient, 2 is the best

@luddskunk
Copy link

Any updates @gtorre on solving your issue? I'm struggling with the same thing and any insights you can share would be valuable! :)

@gtorre
Copy link
Author

gtorre commented May 4, 2023

@luddskunk no progress unfortunately, we're now looking at alternatives.

@sandeepsukhani
Copy link
Contributor

@gtorre sorry for the delay in the response. I was looking at the issue and found the root cause for it. I am looking at a temporary fix for this with config changes until the permanent fix is out. I will keep posting the progress here.

@luddskunk
Copy link

@sandeepsukhani Thanks for letting us know, and getting back to it. Do you have any details on the root cause to share?

@sandeepsukhani
Copy link
Contributor

sandeepsukhani commented May 15, 2023

The issue is in the code for initializing queriers for read nodes, which tries to access ring client instance from query_scheduler instance. However with SSD query_scheduler runs in backend nodes not the read nodes. This causes us to hit this condition and only process queries on a single read node. If you are getting hit by the same issue, you will see this log line in the initial logs of read nodes.

@luddskunk
Copy link

@sandeepsukhani
You're very much correct, I confirm that I am seeing these warnings with Simple Scalabe Deployment in my Loki logs! Okay, so something happened during the split of backend/read re-design. That would explain it! Nice find and hopefully we can continue to use this mode.

Would be interesting if we can explicitly set this somewhere in the helm chart values-file or similar.

@gtorre-cs
Copy link

Thanks @sandeepsukhani! Please keep us posted 🤞

sandeepsukhani added a commit that referenced this issue May 22, 2023
…-scheduler replicas (#9477)

**What this PR does / why we need it**:
Currently, we have a bug in our code when running Loki in SSD mode and
using the ring for query-scheduler discovery. It causes queries to not
be distributed to all the available read pods. I have explained the
issue in detail in [the PR which fixes the
code](#9471).

Since this bug causes a major query performance impact and code release
might take time, in this PR we are doing a new helm release which fixes
the issue by using the k8s service for discovering `query-scheduler`
replicas.

**Which issue(s) this PR fixes**:
Fixes #9195
@sandeepsukhani
Copy link
Contributor

Hey Folks, we have released a new helm chart version i.e. 5.5.3, which should fix the issue. Please reopen this issue if you folks still face any problems with query distribution to all the available read pods. You can verify that the fix is working by looking at cortex_query_scheduler_connected_frontend_clients and cortex_query_scheduler_connected_querier_clients metrics from backend pods.

@luddskunk
Copy link

Just wanted to let you know I tried it out and I see a dramatic increase in query performance. Thanks alot for your efforts here @sandeepsukhani!

rgarcia6520 pushed a commit to rgarcia6520/grafana-loki that referenced this issue May 26, 2023
…-scheduler replicas (grafana#9477)

**What this PR does / why we need it**:
Currently, we have a bug in our code when running Loki in SSD mode and
using the ring for query-scheduler discovery. It causes queries to not
be distributed to all the available read pods. I have explained the
issue in detail in [the PR which fixes the
code](grafana#9471).

Since this bug causes a major query performance impact and code release
might take time, in this PR we are doing a new helm release which fixes
the issue by using the k8s service for discovering `query-scheduler`
replicas.

**Which issue(s) this PR fixes**:
Fixes grafana#9195
sandeepsukhani added a commit that referenced this issue Jun 6, 2023
**What this PR does / why we need it**:
When we run the `query-scheduler` in `ring` mode, `queriers` and
`query-frontend` discover the available `query-scheduler` instances
using the ring. However, we have a problem when `query-schedulers` are
not running in the same process as queriers and query-frontend since [we
try to get the ring client interface from the scheduler
instance](https://github.com/grafana/loki/blob/abd6131bba18db7f3575241c5e6dc4eed879fbc0/pkg/loki/modules.go#L358).

This causes queries not to be spread across all the available queriers
when running in SSD mode because [we point querier workers to query
frontend when there is no ring client and scheduler address
configured](https://github.com/grafana/loki/blob/b05f4fced305800b32641ae84e3bed5f1794fa7d/pkg/querier/worker_service.go#L115).

I have fixed this issue by adding a new hidden target to initialize the
ring client in `reader`/`member` mode based on which service is
initializing it. `reader` mode will be used by `queriers` and
`query-frontend` for discovering `query-scheduler` instances from the
ring. `member` mode will be used by `query-schedulers` for registering
themselves in the ring.

I have also made a couple of changes not directly related to the issue
but it fixes some problems:
* [reset metric registry for each integration
test](18c4fe5)
- Previously we were reusing the same registry for all the tests and
just [ignored the attempts to register same
metrics](https://github.com/grafana/loki/blob/01f0ded7fcb57e3a7b26ffc1e8e3abf04a403825/integration/cluster/cluster.go#L113).
This causes the registry to have metrics registered only from the first
test so any updates from subsequent tests won't reflect in the metrics.
metrics was the only reliable way for me to verify that
`query-schedulers` were connected to `queriers` and `query-frontend`
when running in ring mode in the integration test that I added to test
my changes. This should also help with other tests where earlier it was
hard to reliably check the metrics.
* [load config from cli as well before applying dynamic
config](f9e2448)
- Previously we were applying dynamic config considering just the config
from config file. This results in unexpected config changes, for
example, [this config
change](https://github.com/grafana/loki/blob/4148dd2c51cb827ec3889298508b95ec7731e7fd/integration/loki_micro_services_test.go#L66)
was getting ignored and [dynamic config tuning was unexpectedly turning
on ring
mode](https://github.com/grafana/loki/blob/52cd0a39b8266564352c61ab9b845ab597008770/pkg/loki/config_wrapper.go#L94)
in the config. It is better to do any config tuning based on both file
and cli args configs.

**Which issue(s) this PR fixes**:
Fixes #9195
sandeepsukhani added a commit to sandeepsukhani/loki that referenced this issue Jun 6, 2023
**What this PR does / why we need it**:
When we run the `query-scheduler` in `ring` mode, `queriers` and
`query-frontend` discover the available `query-scheduler` instances
using the ring. However, we have a problem when `query-schedulers` are
not running in the same process as queriers and query-frontend since [we
try to get the ring client interface from the scheduler
instance](https://github.com/grafana/loki/blob/abd6131bba18db7f3575241c5e6dc4eed879fbc0/pkg/loki/modules.go#L358).

This causes queries not to be spread across all the available queriers
when running in SSD mode because [we point querier workers to query
frontend when there is no ring client and scheduler address
configured](https://github.com/grafana/loki/blob/b05f4fced305800b32641ae84e3bed5f1794fa7d/pkg/querier/worker_service.go#L115).

I have fixed this issue by adding a new hidden target to initialize the
ring client in `reader`/`member` mode based on which service is
initializing it. `reader` mode will be used by `queriers` and
`query-frontend` for discovering `query-scheduler` instances from the
ring. `member` mode will be used by `query-schedulers` for registering
themselves in the ring.

I have also made a couple of changes not directly related to the issue
but it fixes some problems:
* [reset metric registry for each integration
test](grafana@18c4fe5)
- Previously we were reusing the same registry for all the tests and
just [ignored the attempts to register same
metrics](https://github.com/grafana/loki/blob/01f0ded7fcb57e3a7b26ffc1e8e3abf04a403825/integration/cluster/cluster.go#L113).
This causes the registry to have metrics registered only from the first
test so any updates from subsequent tests won't reflect in the metrics.
metrics was the only reliable way for me to verify that
`query-schedulers` were connected to `queriers` and `query-frontend`
when running in ring mode in the integration test that I added to test
my changes. This should also help with other tests where earlier it was
hard to reliably check the metrics.
* [load config from cli as well before applying dynamic
config](grafana@f9e2448)
- Previously we were applying dynamic config considering just the config
from config file. This results in unexpected config changes, for
example, [this config
change](https://github.com/grafana/loki/blob/4148dd2c51cb827ec3889298508b95ec7731e7fd/integration/loki_micro_services_test.go#L66)
was getting ignored and [dynamic config tuning was unexpectedly turning
on ring
mode](https://github.com/grafana/loki/blob/52cd0a39b8266564352c61ab9b845ab597008770/pkg/loki/config_wrapper.go#L94)
in the config. It is better to do any config tuning based on both file
and cli args configs.

**Which issue(s) this PR fixes**:
Fixes grafana#9195

(cherry picked from commit 0a5e149)
@emadolsky
Copy link

Hi! As a result of 79b876b, the configuration for SSD mode is gone...
The problem is relevant again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants