Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharding metrics per node via fieldSelector #1864

Merged
merged 8 commits into from
Nov 7, 2022

Conversation

CatherineF-dev
Copy link
Contributor

@CatherineF-dev CatherineF-dev commented Oct 20, 2022

What this PR does / why we need it:
Increase scalability for big k8s cluster by sharding metrics per node. It only supports for resources which support nodeName fieldSelector. For example, pod metrics.

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality) No

Which issue(s) this PR fixes *:
Fixes #1863

Tested in a k8s cluster

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 20, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 20, 2022
@CatherineF-dev CatherineF-dev force-pushed the sharding-per-node branch 7 times, most recently from 41219ce to c10b6dd Compare October 20, 2022 15:21
@dgrisonnet
Copy link
Member

/assign

@CatherineF-dev
Copy link
Contributor Author

Ping~

@CatherineF-dev
Copy link
Contributor Author

cc @rexagod, could you help review as well? Thanks!

Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall this looks good. I would suggest using NodeName instead of Nodename since it's more readable and it's also not a single word.

internal/store/builder.go Outdated Show resolved Hide resolved
pkg/app/server.go Outdated Show resolved Hide resolved
internal/store/builder.go Outdated Show resolved Hide resolved
@mrueg
Copy link
Member

mrueg commented Oct 26, 2022

I'd be curious about the scaling model here, could you provide some details how this can be used as a reference for users?

@CatherineF-dev
Copy link
Contributor Author

Reply @mrueg

  • Pod metrics: daemonset
  • Other metrics: deployment

When deployed as a daemonset, set nodename to current NODE_NAME.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-state-metrics
spec:
  template:
    spec:
      containers:
      - name: kube-state-metrics
        args:
        - --resources=pods
        - --nodename=$(NODE_NAME)

@CatherineF-dev CatherineF-dev force-pushed the sharding-per-node branch 4 times, most recently from 13da89e to e1c9864 Compare October 26, 2022 19:48
@rexagod
Copy link
Member

rexagod commented Oct 27, 2022

@CatherineF-dev Correct me if I'm wrong, but shouldn't the above example target StatefulSets?

@CatherineF-dev
Copy link
Contributor Author

Reply @rexagod

It's a daemonset. Each kube-state-metrics agent will only collect pod metrics from the same node.

  • Pod metrics: daemonset
  • Other metrics: deployment/statefulset

Copy link
Member

@rexagod rexagod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this change! 🎉

Commenting my review, instead of blocking this PR, since these are all optional suggestions.

docs/cli-arguments.md Outdated Show resolved Hide resolved
internal/store/builder.go Show resolved Hide resolved
internal/store/builder.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
pkg/options/options.go Outdated Show resolved Hide resolved
pkg/options/types.go Show resolved Hide resolved
@CatherineF-dev CatherineF-dev force-pushed the sharding-per-node branch 4 times, most recently from 92f30ae to 8f9dbea Compare November 4, 2022 03:40
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, I think tests need a bit more work.

tests/compare_benchmarks.sh Outdated Show resolved Hide resolved
tests/e2e/main_test.go Outdated Show resolved Hide resolved
tests/e2e/main_test.go Outdated Show resolved Hide resolved
@CatherineF-dev
Copy link
Contributor Author

CatherineF-dev commented Nov 4, 2022

Cleaned up. @fpetkovski

I added some WIP codes after mrueg's review.
Will add E2E tests with PR #1873. So that we can kubectl apply * instead of using client-go to test daemonset kube-state-metrics. Using client-go to convert deployment into daemonset in runtime is less simple and needs more maintaining work.

@CatherineF-dev
Copy link
Contributor Author

This PR is ready to have a review cc @dgrisonnet

Copy link
Member

@dgrisonnet dgrisonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but the implementation looks good to me, great job @CatherineF-dev!

README.md Show resolved Hide resolved
docs/cli-arguments.md Outdated Show resolved Hide resolved
pkg/options/options.go Outdated Show resolved Hide resolved
pkg/options/types.go Outdated Show resolved Hide resolved
Co-authored-by: Manuel Rüger <manuel@rueg.eu>
@CatherineF-dev CatherineF-dev force-pushed the sharding-per-node branch 2 times, most recently from 417a876 to 9a2f5e0 Compare November 4, 2022 17:49
CatherineF-dev added a commit to CatherineF-dev/kube-state-metrics that referenced this pull request Nov 5, 2022
Validate options

Sharding per node

Clean

Move merging fieldselectors into app/server.go and replace namespaceFitler with fieldSelectorFilter

Refactoring

Provide scaling example

Update README.md

Co-authored-by: Manuel Rüger <manuel@rueg.eu>

Refactoring
@dgrisonnet
Copy link
Member

Awesome work!

/lgtm
/unhold

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Nov 7, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CatherineF-dev, dgrisonnet, mrueg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support sharding by node_name for pod metrics
6 participants