Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WaitForPodsReady] Make requeue base delay confiurable #2040

Merged
merged 3 commits into from
Apr 24, 2024

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Apr 23, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of #2009

Special notes for your reviewer:

I'm yet wondering about making the exponent and jitter configurable, wdyt?

Does this PR introduce a user-facing change?

Make the PodsReady base delay for requeuing configurable

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Apr 23, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 23, 2024
Copy link

netlify bot commented Apr 23, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 86461d9
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6628b7277f782e00082026df

@mimowo
Copy link
Contributor Author

mimowo commented Apr 23, 2024

/test all

@mimowo mimowo marked this pull request as ready for review April 23, 2024 12:09
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Apr 23, 2024

/assign @tenzen-y

//
// Defaults to 10.
// +optional
BaseDelaySeconds *int32 `json:"baseDelaySeconds,omitempty"`
Copy link
Contributor Author

@mimowo mimowo Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm yet wondering about making the exponent and jitter configurable, wdyt @alculquicondor @tenzen-y?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep this simple, but @mimowo and/or @alculquicondor have requests from your customers or you have concrete use cases, I'm ok with implementing them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not the jitter. But I'm also ok leaving the exponent out for now.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 23, 2024
@mimowo mimowo changed the title [WaitForPodsReady] Make requeue base delay confiurable WIP: [WaitForPodsReady] Make requeue base delay confiurable Apr 23, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2024
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, lgtm
Also could you implement validations here:

func validateWaitForPodsReady(c *configapi.Configuration) field.ErrorList {

@mimowo mimowo changed the title WIP: [WaitForPodsReady] Make requeue base delay confiurable [WaitForPodsReady] Make requeue base delay confiurable Apr 23, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Apr 23, 2024

Basically, lgtm
Also could you implement validations here:

Added, thanks
/assign @alculquicondor

Comment on lines -50 to +37
func SetupControllers(mgr ctrl.Manager, qManager *queue.Manager, cc *cache.Cache, cfg *configapi.Configuration, controllerOpts ...ControllerOption) (string, error) {
func SetupControllers(mgr ctrl.Manager, qManager *queue.Manager, cc *cache.Cache, cfg *configapi.Configuration) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like the Options pattern better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the new parameter is not needed any longer since the configuration is not passed in cfg *configapi.Configuration. I guess we could replace cfg *configapi.Configuration with Options, but it would be a lot of changes, so I wouldn't like to mix this into this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now changing WithControllerRequeuingBaseDelaySeconds to something like WithControllerRequeuingBackoffBaseSeconds can work and we can do a clean-up to drop the cfg usage.

Copy link
Contributor Author

@mimowo mimowo Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that, but the interim state of passing the params which are part of the cfg both ways would look weird. I'm happy to open a dedicated issue to do the refactoring at once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we agree we should go towards Options , why start with a step back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed as a result of this PR implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, going the path of keeping it requires more changes. I don't think refactoring of this function is a priority right now. So we should leave the code in a state that makes sense for the mid-term.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going the path of keeping it requires more changes

I'm not convinced, we used to have for now changing WithControllerRequeuingBaseDelaySeconds we can just replace it with WithControllerRequeuingBackoffBaseSeconds .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we have the same information flowing through cfg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me put the context of the reason why we added the controllerOpts.

We introduced the controllerOpts in #2025 instead of Config API to cherry-pick to the release v0.6 as an interim approach. Then, we try to remove the interim approach in this PR.

I agree with refactoring this function, but as @alculquicondor and @mimowo, I also think that the refactoring is beyond this PR responsibility.
So, I agree with leaving the refactoring in another PR.

}
for _, opt := range controllerOpts {
opt(&ctrlOpts)
}

if err := NewWorkloadReconciler(mgr.GetClient(), qManager, cc,
mgr.GetEventRecorderFor(constants.WorkloadControllerName),
WithWorkloadUpdateWatchers(qRec, cqRec),
WithPodsReadyTimeout(podsReadyTimeout(cfg)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be worth something like:

WithPodsReadyTimeout(cfg.WaitForPodsReady)

So we don't need to keep adding options for every single field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I just pushed a commit along the lines, but I use:
WithWaitForPodsReady(waitForPodsReady(cfg.WaitForPodsReady)),
because the controller needs the entire config. Also I use the waitForPodsReady function to convert the underlying config into a structure easier to consume. In particular, I don't need the enable field.

@tenzen-y
Copy link
Member

@mimowo Also, could you open another PR to update the document?
https://kueue.sigs.k8s.io/docs/tasks/manage/setup_sequential_admission/#enabling-waitforpodsready

@mimowo
Copy link
Contributor Author

mimowo commented Apr 24, 2024

@mimowo Also, could you open another PR to update the document?
https://kueue.sigs.k8s.io/docs/tasks/manage/setup_sequential_admission/#enabling-waitforpodsready

Sure, I will follow up on this.

@mimowo
Copy link
Contributor Author

mimowo commented Apr 24, 2024

/test pull-kueue-test-e2e-main-1-29
/test pull-kueue-test-scheduling-perf-main
/test pull-kueue-test-unit-main
/test pull-kueue-verify-main
not sure why the tests are failed, they don't even seem started
image

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 24, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 27145d6920748a7c15ef5d1b55a9fd434e10e386

@tenzen-y
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 24, 2024
@tenzen-y
Copy link
Member

/test pull-kueue-test-e2e-main-1-29 /test pull-kueue-test-scheduling-perf-main /test pull-kueue-test-unit-main /test pull-kueue-verify-main not sure why the tests are failed, they don't even seem started image

/test pull-kueue-verify-main

@mimowo
Copy link
Contributor Author

mimowo commented Apr 24, 2024

/test pull-kueue-build-image-main

@k8s-ci-robot k8s-ci-robot merged commit 9b25d99 into kubernetes-sigs:main Apr 24, 2024
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.7 milestone Apr 24, 2024
@mimowo mimowo deleted the make-requeue-configurable branch May 29, 2024 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants