Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research :: Readiness Scheduling Gates #1223

Closed
thisthat opened this issue Apr 13, 2023 · 3 comments
Closed

Research :: Readiness Scheduling Gates #1223

thisthat opened this issue Apr 13, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request question Further information is requested scheduler
Milestone

Comments

@thisthat
Copy link
Member

Goal

Prototype a KLT implementation that uses Pod Scheduling Readiness instead of the Scheduling Plugin.

Technical Details

KLT currently uses a custom plugin for the scheduler to forbid Pods from being bound to a node. With the new addition of the Readiness Gates, we could eliminate the extra binary and use the new API. This has some consequences for our mutating webhook: it should add a gate when a manifest is applied (on update it won't). When the pre-checks are completed successfully, the gates shall be removed.

Resources

@thisthat thisthat added enhancement New feature or request question Further information is requested scheduler labels Apr 13, 2023
@thisthat thisthat added this to the 0.8 milestone Apr 13, 2023
@thisthat
Copy link
Member Author

thisthat commented Apr 13, 2023

Old PoC that used a workaround with extra labels: #632

@RealAnna RealAnna self-assigned this Apr 13, 2023
@thisthat thisthat modified the milestones: 0.7.1, 0.8 May 3, 2023
@thisthat
Copy link
Member Author

thisthat commented May 4, 2023

PR kubernetes-sigs/controller-runtime#2189 has been merged and unlocks this research

@RealAnna
Copy link
Contributor

RealAnna commented May 22, 2023

The new controller runtime v 0.15.0 will introduce multiple breaking changes that will require code refactoring/changes in all our operators:

  • the server struct for the webhooks is now an interface, so we need to change all reference of it since we use it to override configs when setting up certificates
  • the decoder injector has been removed this means we need to explicitly pass one to the webhooks, as written in Remove Decoder Injector interface from webhook #1445
  • few ways to setup the controller manager are deprecated: to set up the port we should use WebhookServer.Port instead and for ClientDisableCacheFor weshould use Client.Cache.DisableCacheFor.
  • multiple controller unit tests still fail in the poc after the new library version, so they need refactoring. Maybe I missed even other breaking changes 😿

After these changes and the import of controller runtime and k8s 1.27.1 libraries, we can substitute the scheduler with gates, in the poc this is done by adding an extra webhook reacting to the creation of new pods:

  • extract common functions from the mutating webhook
  • introduce a gating webhook that simply adds a keptn gated at creation
  • remove the gate in the webhook instance controller after all pre check passed
  • modify operator/config/default/webhooknamespaces_patch.yaml and operator/config/webhook/manifests.yaml to include the new webhook
  • add the extra webhook in the list passed to builder.run in operator/main.go
  • create new test and adapt existing ones, this is partially done in the poc poc: scheduling gates beta verison #1249

It is left do decide also what to do with the scheduler. Shall we do a breaking change and delete it or add some logic to enable/disable gates or scheduler for at least one version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested scheduler
Projects
Archived in project
Development

No branches or pull requests

2 participants