Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a metric that tracks the number of preemptions issued by a ClusterQueue #2538

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

vladikkuzn
Copy link
Contributor

@vladikkuzn vladikkuzn commented Jul 5, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds a metric that tracks the number of preemptions issued by a ClusterQueue

Which issue(s) this PR fixes:

Fixes #2491

Special notes for your reviewer:

NONE

Does this PR introduce a user-facing change?

Add preempted_workloads_total metric that tracks the number of preemptions issued by a ClusterQueue)

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 5, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vladikkuzn
Once this PR has been reviewed and has the lgtm label, please assign mimowo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 5, 2024
Copy link

netlify bot commented Jul 5, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit aaf0922
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66968c4041e4190008816f66

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 5, 2024
@vladikkuzn
Copy link
Contributor Author

/assign

@vladikkuzn
Copy link
Contributor Author

/test all

1 similar comment
@vladikkuzn
Copy link
Contributor Author

/test all

@vladikkuzn vladikkuzn marked this pull request as ready for review July 9, 2024 10:04
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 9, 2024
@vladikkuzn
Copy link
Contributor Author

@alculquicondor
Copy link
Contributor

/cc

Copy link
Contributor

@gabesaba gabesaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits, looks good overall

test/util/util.go Outdated Show resolved Hide resolved
pkg/metrics/metrics_test.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
@@ -137,6 +137,7 @@ var _ = ginkgo.Describe("Preemption", func() {

util.FinishEvictionForWorkloads(ctx, k8sClient, lowWl1, lowWl2)
util.ExpectEvictedWorkloadsTotalMetric(cq.Name, kueue.WorkloadEvictedByPreemption, 2)
util.ExpectPreemptedWorkloadsTotalMetric(cq.Name, "InClusterQueue", 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there integ tests covering the other 3 cases? can we add assertions there too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify which cases do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other 3 reasons - InCohortReclamation etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look into this

pkg/metrics/metrics_test.go Outdated Show resolved Hide resolved
pkg/scheduler/preemption/preemption.go Outdated Show resolved Hide resolved
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2024
@alculquicondor
Copy link
Contributor

/release-note-edit

Add preempted_workloads_total metric that tracks the number of preemptions issued by a ClusterQueue)

Use the name of the user facing metric, as opposed to the variable name.

@vladikkuzn
Copy link
Contributor Author

/retest

@alculquicondor
Copy link
Contributor

Please rebase

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2024
…rQueue

* Merge metric for preempted workloads into ReportPreemption
* test helper
* Expect all preemption reasons
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 16, 2024
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good, but looking at the test updates, I realized that there is a bug in #2411 (comment)

@alculquicondor alculquicondor mentioned this pull request Jul 16, 2024
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a metric that tracks the number of preemptions issued by a ClusterQueue
5 participants