Skip to content

Commit

Permalink
doc: Add shuffle sharding doc (grafana#6173)
Browse files Browse the repository at this point in the history
* doc: Add shuffle sharding doc

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Add query-scheduler config as well

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Add line explaining what is "query-of-death" exactly means

* PR remarsk from Karen

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Fix image URL

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>
  • Loading branch information
3 people authored and lxwzy committed Nov 7, 2022
1 parent 148a204 commit 5e86d66
Show file tree
Hide file tree
Showing 2 changed files with 109 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
109 changes: 109 additions & 0 deletions docs/sources/operations/shuffle-sharding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
title: Shuffle sharding
menuTitle: Shuffle sharding
description: Shuffle sharding can isolate a tenant workload from other tenant workloads, providing a better sharing of resources.
weight: 100
---

# Shuffle sharding

Shuffle sharding is a resource-management technique used to isolate tenant workloads from other tenant workloads, to give each tenant more of a single-tenant experience when running in a shared cluster.
This technique is explained by AWS in their article [Workload isolation using shuffle-sharding](https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/).
A reference implementation has been shown in the [Route53 Infima library](https://github.com/awslabs/route53-infima/blob/master/src/main/java/com/amazonaws/services/route53/infima/SimpleSignatureShuffleSharder.java).

## The issues that shuffle sharding mitigates

Shuffle sharding can be configured for the query path.

The query path is sharded by default, and the default does not use shuffle sharding.
Each tenant’s query is sharded across all queriers, so the workload uses all querier instances.

In a multi-tenant cluster, sharding across all instances of a component may exhibit these issues:

- Any outage of a component instance affects all tenants
- A misbehaving tenant affects all other tenants

An individual query may create issues for all tenants.
A single tenant or a group of tenants may issue an expensive query:
one that causes a querier component to hit an out-of-memory error,
or one that causes a querier component to crash.
Once the error occurs,
the tenant or tenants issuing the error-causing query will be reassigned
to other running queriers,
up to the limit imposed by the `max_queriers_per_tenant` configuration.
This, in turn, may affect the queriers that have been reassigned.

## How shuffle sharding works

The idea of shuffle sharding is to assign each tenant to a shard composed by a subset of the Loki queriers, aiming to minimize the overlapping instances between distinct tenants.

A misbehaving tenant will affect only its shard's queriers. Due to the low overlap of queriers among tenants, only a small subset of tenants will be affected bythe misbehaving tenant.
Shuffle sharding requires no more resources than the default sharding strategy.

Shuffle sharding does not fix all issues.
If a tenant repeatedly sends a problematic query, the crashed querier
will be disconnected from the query-frontend, and a new querier
will be immediately assigned to the tenant’s shard.
This invalidates the positive effects of shuffle sharding.
In this case,
configuring a delay between when a querier disconnects because of a crash,
and when the crashed querier is actually removed from the tenant’s shard
and another healthy querier is added as a replacement improves the situation.
A delay of 1 minute may be a reasonable value in
the query-frontend with configuration parameter
`-query-frontend.querier-forget-delay=1m`, and in the query-scheduler with configuration parameter
`-query-scheduler.querier-forget-delay=1m`.

### Low probability of overlapping instances

If an example Loki cluster runs 50 queriers and assigns each tenant 4 out of 50 queriers, shuffling instances between each tenant, there are 230K possible combinations.

Statistically, randomly picking two distinct tenants, there is:

- a 71% chance that they will not share any instance
- a 26% chance that they will share only 1 instance
- a 2.7% chance that they will share 2 instances
- a 0.08% chance that they will share 3 instances
- only a 0.0004% chance that their instances will fully overlap

![overlapping instances probability](../shuffle-sharding-probability.png)

## Configuration

Enable shuffle sharding by setting `-frontend.max-queriers-per-tenant` to a value higher than 0 and lower than the number of available queriers.
The value of the per-tenant configuration
`max_queriers_per_tenant` sets the quantity of allocated queriers.
This option is only available when using the query-frontend, with or without a scheduler.

The per-tenant configuration parameter
`max_query_parallelism` describes how many sub queries, after query splitting and query sharding, can be scheduled to run at the same time for each request of any tenant.

Configuration parameter
`querier.concurrency` controls the quanity of worker threads (goroutines) per single querier.

The maximum number of queriers can be overridden on a per-tenant basis in the limits overrides configuration by `max_queriers_per_tenant`.

## Shuffle sharding metrics

These metrics reveal information relevant to shuffle sharding:

- the overall query-scheduler queue duration, `cortex_query_scheduler_queue_duration_seconds_*`

- the query-scheduler queue length per tenant, `cortex_query_scheduler_queue_length`

- the query-scheduler queue duration per tenant can be found with this query:
```
max_over_time({cluster="$cluster",container="query-frontend", namespace="$namespace"} |= "metrics.go" |logfmt | unwrap duration(queue_time) | __error__="" [5m]) by (org_id)
```

Too many spikes in any of these metrics may imply:

- A particular tenant is trying to use more query resources than they were allocated.
- That tenant may need an increase in the value of `max_queriers_per_tenant`.
- Loki instances may be under provisioned.

A useful query checks how many queriers are being used by each tenant:

```
count by (org_id) (sum by (org_id, pod) (count_over_time({job="$namespace/querier", cluster="$cluster"} |= "metrics.go" | logfmt [$__interval])))
```

0 comments on commit 5e86d66

Please sign in to comment.