Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Add shuffle sharding doc #6173

Merged
merged 15 commits into from
Jul 21, 2022
Merged

doc: Add shuffle sharding doc #6173

merged 15 commits into from
Jul 21, 2022

Conversation

kavirajk
Copy link
Contributor

@kavirajk kavirajk commented May 17, 2022

Signed-off-by: Kaviraj kavirajkanagaraj@gmail.com

What this PR does / why we need it:
Add shuffle sharding doc for Loki.

  1. What is it?
  2. Why we need it on Loki?
  3. Key Configs
  4. Key metrics

And some operational advice.

Which issue(s) this PR fixes:
Fixes #NA

Special notes for your reviewer:
Markdown preview: https://github.com/grafana/loki/blob/0dcc6034ea927b16ff9a79eb71100b98a736da78/docs/sources/operations/shuffle-sharding.md

Checklist

  • Documentation added
  • Is this an important fix or new feature? Add an entry in the CHANGELOG.md.

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0.3%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall it looks great on technical explanation. There are some suggestions by Dan and I will leave it to Karen for suggesting any corrections in writing.

The maximum number of queriers can be overridden on a per-tenant basis in the limits overrides configuration (`max_queriers_per_tenant`).

### The impact of “query of death”
In the event a tenant is repeatedly sending a “query of death” which leads the querier to crash or getting killed because of out-of-memory, the crashed querier will get disconnected from the query-frontend and a new querier will be immediately assigned to the tenant’s shard. This practically invalidates the assumption that shuffle-sharding can be used to contain the blast radius in case of a query of death.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a line explaining a bit more why that is a problem by mentioning that the tenant would get assigned to other running queries to match the max_queriers_per_tenant and might eventually affect other tenant shards which beats the purpose of this feature.

kavirajk and others added 2 commits May 26, 2022 15:33
Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
kavirajk and others added 4 commits May 26, 2022 15:33
Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.6%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

-           ingester	-0.1%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
-        distributor	-0.3%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

1 similar comment
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kavirajk
Copy link
Contributor Author

kavirajk commented Jun 2, 2022

@KMiller-Grafana can you take a look at the doc when you get a chance?

@KMiller-Grafana KMiller-Grafana added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Jul 20, 2022
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jul 21, 2022
Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>
Copy link
Contributor

@jeschkies jeschkies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the write up. It help me understand shuffle sharding better 🙂

Copy link
Contributor

@KMiller-Grafana KMiller-Grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small change please!

docs/sources/operations/shuffle-sharding.md Outdated Show resolved Hide resolved
Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@KMiller-Grafana KMiller-Grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much.

@KMiller-Grafana KMiller-Grafana added the backport release-2.6.x Tag a PR with this label to create a PR which cherry pics it into the release-2.6.x branch label Jul 21, 2022
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@kavirajk kavirajk merged commit a1e75da into main Jul 21, 2022
@kavirajk kavirajk deleted the kavirajk/doc-shuffle-sharding branch July 21, 2022 21:05
grafanabot pushed a commit that referenced this pull request Jul 21, 2022
* doc: Add shuffle sharding doc

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Add query-scheduler config as well

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Add line explaining what is "query-of-death" exactly means

* PR remarsk from Karen

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Fix image URL

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>
(cherry picked from commit a1e75da)
vlad-diachenko pushed a commit that referenced this pull request Jul 22, 2022
* doc: Add shuffle sharding doc

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Add query-scheduler config as well

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Add line explaining what is "query-of-death" exactly means

* PR remarsk from Karen

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Fix image URL

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>
(cherry picked from commit a1e75da)

Co-authored-by: Kaviraj Kanagaraj <kavirajkanagaraj@gmail.com>
lxwzy pushed a commit to lxwzy/loki that referenced this pull request Nov 7, 2022
* doc: Add shuffle sharding doc

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Add query-scheduler config as well

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>

* Add line explaining what is "query-of-death" exactly means

* PR remarsk from Karen

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Fix image URL

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

* Update docs/sources/operations/shuffle-sharding.md

Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>

Co-authored-by: Dan Mihai Dinu <33793329+danmdinu@users.noreply.github.com>
Co-authored-by: Karen Miller <84039272+KMiller-Grafana@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release-2.6.x Tag a PR with this label to create a PR which cherry pics it into the release-2.6.x branch size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants