Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disruption budgets not workings as expected/documented #1438

Open
komapa opened this issue Jul 17, 2024 · 3 comments
Open

Disruption budgets not workings as expected/documented #1438

komapa opened this issue Jul 17, 2024 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@komapa
Copy link

komapa commented Jul 17, 2024

Description

Observed Behavior:

Hello, we have the the following disruption budget on a NodePool:

budgets:
    - nodes: "1"
    - duration: 60m
      nodes: 1%
      schedule: '@hourly'
    - duration: 17h
      nodes: "0"
      schedule: 0 9 * * *

Expected Behavior:

We expect, for a NodePool with around 200 nodes, to disrupt at most 2 nodes per hour out side of the 17h where we define no disruptions (nodes: 0). What we see though is disruptions happening at the a much higher rate. see the attached graph of karpenter.karpenter_interruption_actions_performed.count metric.

Screenshot 2024-07-18 at 1 03 48 AM

Reproduction Steps (Please include YAML):

budgets:
    - nodes: "1"
    - duration: 60m
      nodes: 1%
      schedule: '@hourly'
    - duration: 17h
      nodes: "0"
      schedule: 0 9 * * *

Versions:

  • Chart Version: 0.35.2
  • Kubernetes Version (kubectl version): 1.28.x
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@komapa komapa added the kind/bug Categorizes issue or PR as related to a bug. label Jul 17, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 17, 2024
@miadabrin
Copy link

might also be related to this #1421

@engedaam
Copy link
Contributor

@komapa Are you using Spot instances? karpenter_interruption_actions_performed is a metric that is produced by the karpenter-provider-aws. That metric is emitted due to Spot interruption events, which is not a form of voluntary disruption. karpenter_interruption_actions_performed is expected not to respect disruption budgets, since spot interupation events can occur within a given disruption window. https://karpenter.sh/docs/concepts/disruption/#interruption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

4 participants