Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve auto-sharding documentation #1559

Merged
merged 1 commit into from
Sep 21, 2021

Conversation

fpetkovski
Copy link
Contributor

@fpetkovski fpetkovski commented Aug 27, 2021

What this PR does / why we need it:
This PR improves the documentation on auto-sharding. It explains the pros and cons of using auto-sharding vs manual sharding.

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
Does not affect it

Which issue(s) this PR fixes:
Fixes #1546

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 27, 2021
@@ -43,7 +43,7 @@ are deleted they are no longer visible on the `/metrics` endpoint.
- [kube-state-metrics vs. metrics-server](#kube-state-metrics-vs-metrics-server)
- [Scaling kube-state-metrics](#scaling-kube-state-metrics)
- [Resource recommendation](#resource-recommendation)
- [Horizontal scaling (sharding)](#horizontal-scaling-sharding)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think horizontal scaling can be a bit confusing since it might indicate that all KSM replicas are the same.

README.md Outdated

There is also an experimental feature, that allows kube-state-metrics to auto discover its nominal position if it is deployed in a StatefulSet, in order to automatically configure sharding. This is an experimental feature and may be broken or removed without notice.
KSM supports a feature which allows each shard to discover its nominal position when deployed in a StatefulSet. which is useful for automatically configuring sharding. This is an experimental feature and may be broken or removed without notice.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brancz do you have any background information on why this feature was marked as experimental? Is it simply because it never got promoted to stable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kubernetes/kube-state-metrics-maintainers should we start marking this as non-experimental with the next release to get more users test it? Or do we have plans to remove it at some point?

@fpetkovski
Copy link
Contributor Author

/assign @mrueg

@mrueg
Copy link
Member

mrueg commented Sep 13, 2021

Thanks for working on this @fpetkovski! I added some comments

@SuperQ as you asked about more information, is this doc change helpful for you?

README.md Outdated

There is also an experimental feature, that allows kube-state-metrics to auto discover its nominal position if it is deployed in a StatefulSet, in order to automatically configure sharding. This is an experimental feature and may be broken or removed without notice.
KSM supports a feature which allows each shard to discover its nominal position when deployed in a StatefulSet. which is useful for automatically configuring sharding. This is an experimental feature and may be broken or removed without notice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to introduce KSM as an abbreviation for the project or stick with "kube-state-metrics"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I used it as second nature 😄. I added the abbreviation next to the first reference of kube-state-metrics at the beginning of the readme. I think that it is used commonly used in the community so IMO it makes sense to have it in the docs. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I you change the sentence structure a bit, you could write it like this.

Automatic sharding allows each shard each shard to discover its nominal position

This avoids needing to be overly self-referential about the fact that this is a kube-state-metrics feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, applied

README.md Outdated

* `--shard` (zero indexed)
* `--total-shards`

Sharding is done by taking an md5 sum of the Kubernetes Object's UID and performing a modulo operation on it, with the total number of shards. The configured shard decides whether the object is handled by the respective instance of kube-state-metrics or not. Note that this means all instances of kube-state-metrics even if sharded will have the network traffic and the resource consumption for unmarshaling objects for all objects, not just the ones it is responsible for. To optimize this further, the Kubernetes API would need to support sharded list/watch capabilities. Overall memory consumption should be 1/n th of each shard compared to an unsharded setup. Typically, kube-state-metrics needs to be memory and latency optimized in order for it to return its metrics rather quickly to Prometheus.
Sharding is done by taking an md5 sum of the Kubernetes Object's UID and performing a modulo operation on it with the total number of shards. Each shard decides whether the object is handled by the respective instance of kube-state-metrics or not. Note that this means all instances of kube-state-metrics, even if sharded, will have the network traffic and the resource consumption for unmarshaling objects for all objects, not just the ones they are responsible for. To optimize this further, the Kubernetes API would need to support sharded list/watch capabilities. In the optimal case, memory consumption for each shard will be 1/n compared to an unsharded setup. Typically, kube-state-metrics needs to be memory and latency optimized in order for it to return its metrics rather quickly to Prometheus.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention --use-apiserver-cache as a way to reduce load on etcd here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, added the suggestion

@fpetkovski fpetkovski force-pushed the document-autosharding branch 2 times, most recently from c7184e0 to 99dcf27 Compare September 13, 2021 09:38
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
Copy link
Contributor

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mrueg
Copy link
Member

mrueg commented Sep 15, 2021

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 15, 2021
@mrueg
Copy link
Member

mrueg commented Sep 21, 2021

/lgtm

Thanks for improving the doc @fpetkovski !

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fpetkovski, mrueg, SuperQ

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit ef61220 into kubernetes:master Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation on Sharding Features / Status of experimental AutoSharding Feature
4 participants