Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide support for ProvisioningRequest's CapacityRevoked condition #2196

Merged
merged 12 commits into from
Jun 7, 2024

Conversation

PBundyra
Copy link
Contributor

@PBundyra PBundyra commented May 14, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Provide support for the new ProvisioningRequest's condition CapacityRevoked. When user uses a job that allows retries and sets .spec.backOffLimit > 0 and the job gets evicted because of MaxRunDuration an according AdmissionCheck gets rejected.

After a Workload is finished we no longer delete corresponding ProvisioningRequest. This is because Workload might be finished due to MaxRunDuration timeout, before ClusterAutoscaler updates ProvisioningRequest's status to CapacityRevoked and the Kueue's controller has no opportunity to react to that.

Additionally I did a small cleanup.

Which issue(s) this PR fixes:

Part of #2041

Special notes for your reviewer:

I've tested it e2e manually

Does this PR introduce a user-facing change?

ProvisioningRequets: Support for ProvisioningRequest's condition `CapacityRevoked`. ProvisioningRequests objects persist until the corresponding Job or the Workload is deleted

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels May 14, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 14, 2024
Copy link

netlify bot commented May 14, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit bbcf7c5
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6662c8dfedea9f0008ea2d70

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 14, 2024
@PBundyra
Copy link
Contributor Author

/assign @yaroslava-serdiuk
/assign @alculquicondor

@yaroslava-serdiuk
Copy link
Contributor

please, fix the tests

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 14, 2024
@PBundyra PBundyra force-pushed the support-provreq branch 4 times, most recently from 5726dba to 8fe6b4f Compare May 14, 2024 10:50
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 14, 2024
Copy link
Contributor

@yaroslava-serdiuk yaroslava-serdiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment

@yaroslava-serdiuk
Copy link
Contributor

Please, rebase the PR to resolve merge conflict

@PBundyra PBundyra force-pushed the support-provreq branch 2 times, most recently from cff252f to cf6330a Compare May 15, 2024 11:26
@yaroslava-serdiuk
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 22bd86626a7137e9e4aebc737e06cc5dcebd29d5

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2024
@PBundyra
Copy link
Contributor Author

/assign @tenzen-y

@alculquicondor
Copy link
Contributor

Please also add to the release note that ProvisioningRequests objects persist until the Job or the Workload is deleted, IIUC.

pkg/controller/admissionchecks/provisioning/controller.go Outdated Show resolved Hide resolved
pkg/controller/admissionchecks/provisioning/controller.go Outdated Show resolved Hide resolved
pkg/controller/admissionchecks/provisioning/controller.go Outdated Show resolved Hide resolved
pkg/controller/admissionchecks/provisioning/controller.go Outdated Show resolved Hide resolved
pkg/controller/admissionchecks/provisioning/controller.go Outdated Show resolved Hide resolved
pkg/workload/workload.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment / question regarding sending workload updates per AC

pkg/controller/admissionchecks/provisioning/controller.go Outdated Show resolved Hide resolved
@mimowo
Copy link
Contributor

mimowo commented Jun 6, 2024

/lgtm
@tenzen-y anything to add? The deactivation will happen in the follow up PR.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 4d47a984051746c6a582c663b021d272b2b094fc

@tenzen-y
Copy link
Member

tenzen-y commented Jun 6, 2024

/lgtm @tenzen-y anything to add? The deactivation will happen in the follow up PR.

Let me see this.

@alculquicondor
Copy link
Contributor

Please update description and release note.

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/hold
for nit and description updates.

pkg/controller/admissionchecks/provisioning/controller.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 6, 2024
Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 7, 2024
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, lgtm

pkg/workload/workload.go Show resolved Hide resolved
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 7, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 8c067e1612f2fd8f0620a2824677ee2fa7bf2e0f

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, PBundyra, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alculquicondor,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tenzen-y
Copy link
Member

tenzen-y commented Jun 7, 2024

/hold

@tenzen-y
Copy link
Member

tenzen-y commented Jun 7, 2024

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 7, 2024
@k8s-ci-robot k8s-ci-robot merged commit d728cb8 into kubernetes-sigs:main Jun 7, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.8 milestone Jun 7, 2024
PBundyra added a commit to PBundyra/kueue that referenced this pull request Jun 7, 2024
…ubernetes-sigs#2196)

* Provide support for ProvisioningRequest's CapacityRevoked condition

* Clean-up

* Improve naming

* Add an unit test case, cleanup logs, slight logic changes to provisioning controller

* Clean up, improve integration tests for provisioning request, set Evicted condition for Workload if the capacity for a ProvisioningRequest has been revoked

* Add controllers to ProvisioningRequest integration tests, check evicted metric in integration tests, do not set evicted condition in the provisioning controller

* Fix integration tests

* Delete unit test that checks if that Evicted conditions stays the same

* Discard deactivating workload in the provisioning controller

* Batch workload status updates

* Update pkg/controller/admissionchecks/provisioning/controller.go

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>

* Use IsActive function, clean up provisioning integration tests

---------
Fiona-Waters pushed a commit to Fiona-Waters/kueue that referenced this pull request Jun 25, 2024
…ubernetes-sigs#2196)

* Provide support for ProvisioningRequest's CapacityRevoked condition

* Clean-up

* Improve naming

* Add an unit test case, cleanup logs, slight logic changes to provisioning controller

* Clean up, improve integration tests for provisioning request, set Evicted condition for Workload if the capacity for a ProvisioningRequest has been revoked

* Add controllers to ProvisioningRequest integration tests, check evicted metric in integration tests, do not set evicted condition in the provisioning controller

* Fix integration tests

* Delete unit test that checks if that Evicted conditions stays the same

* Discard deactivating workload in the provisioning controller

* Batch workload status updates

* Update pkg/controller/admissionchecks/provisioning/controller.go

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>

* Use IsActive function, clean up provisioning integration tests

---------

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
@alculquicondor
Copy link
Contributor

/release-note-edit

ProvisioningRequets: Support for ProvisioningRequest's condition `CapacityRevoked`. ProvisioningRequests objects persist until the corresponding Job or the Workload is deleted

@PBundyra PBundyra deleted the support-provreq branch July 31, 2024 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants