Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eks-prow-build: introduce stable node group #6257

Merged
merged 2 commits into from
Jan 9, 2024

Conversation

xmudrii
Copy link
Member

@xmudrii xmudrii commented Jan 9, 2024

Some components are not taking often node rotations well:

  • Flux: a lot of alerts coming into #k8s-infra-alerts upon evicting/rescheduling Flux pods
  • KubeCost: it's watching pods and their resource usage all the time, as well as, has integrated Prometheus
  • Our monitoring stack: same as KubeCost

To mitigate this issue, I created a stable node group, that's not autoscaled, with three nodes. This node group is tainted and it's only going to run these components (and potentially some other components that might benefit from this).

This PR is the first part, there'll be another PR to change manifests for these components. Terraform changes are already applied and the node group has been successfully created.

Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/infra Infrastructure management, infrastructure design, code in infra/ labels Jan 9, 2024
@k8s-ci-robot k8s-ci-robot added area/infra/aws Issues or PRs related to Kubernetes AWS infrastructure approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. labels Jan 9, 2024
Signed-off-by: Marko Mudrinić <mudrinic.mare@gmail.com>
Copy link
Member

@upodroid upodroid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

alert fatigue :D

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 9, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: upodroid, xmudrii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 93b0864 into kubernetes:main Jan 9, 2024
3 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Jan 9, 2024
@sftim
Copy link
Contributor

sftim commented Jan 9, 2024

Cool. Does this lay the ground for using spot launch more for everything else?

@xmudrii xmudrii deleted the eks-prow-stable branch January 9, 2024 12:41
@xmudrii
Copy link
Member Author

xmudrii commented Jan 9, 2024

Does this lay the ground for using spot launch more for everything else?

In a way, but we likely need a more sophisticated cluster-autoscaling solution (e.g. Karpenter)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/infra/aws Issues or PRs related to Kubernetes AWS infrastructure area/infra Infrastructure management, infrastructure design, code in infra/ cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants