-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add support for controller-manager webhook shutdown delay #2601
base: main
Are you sure you want to change the base?
Changes from 6 commits
9501679
bf963c8
f9393cb
12f9864
5c3c96e
4bd1de4
384d8ce
2b4c651
3529110
dca33cb
9351414
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -101,6 +101,10 @@ type Options struct { | |
|
||
// WebhookMux is the multiplexer that handles different webhooks. | ||
WebhookMux *http.ServeMux | ||
|
||
// ShutdownDelay delays server shutdown to wait for clients to stop opening | ||
// new connections before closing server listeners. Defaults to 0. | ||
ShutdownDelay time.Duration | ||
} | ||
|
||
// NewServer constructs a new webhook.Server from the provided options. | ||
|
@@ -247,9 +251,19 @@ func (s *DefaultServer) Start(ctx context.Context) error { | |
go func() { | ||
<-ctx.Done() | ||
log.Info("Shutting down webhook server with timeout of 1 minute") | ||
|
||
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute) | ||
defer cancel() | ||
// Disable HTTP keep-alives to close persistent connections after next | ||
// HTTP request. Clients may reconnect until routes are updated and | ||
// server listeners are closed however this should start gradually | ||
// migrating clients to server instances that are not about to shutdown | ||
srv.SetKeepAlivesEnabled(false) | ||
// Wait for the specified shutdown delay or until the shutdown context | ||
// expires, whichever happens first | ||
select { | ||
case <-time.After(s.Options.ShutdownDelay): | ||
case <-ctx.Done(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This construction basically means that ShutdownDelay should not be longer than 1 minute, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm yeah I don't think it's great being capped at 1 minute. Does it look better now? I've changed it to always wait for the full shutdown delay |
||
} | ||
if err := srv.Shutdown(ctx); err != nil { | ||
// Error from closing listeners, or context timeout | ||
log.Error(err, "error shutting down the HTTP server") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question:
Would it make sense to set a > 0 default, eg. 15s which is less than the 30 seconds default graceful shutdown delay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a tricky one for sure, but because it's so specific to the Kubernetes cluster you're running on it doesn't feel right to align to any particular cluster or cloud provider (e.g. I'm not sure what value EKS and AKS use for example).
IMO we should keep it at 0 for backwards compatibility and to ensure that connection draining 'doesn't work' consistently across clusters by default. I think this is also more likely to encourage applications that care a lot about this (e.g. Gatekeeper) to expose a flag to change this to make it easier to reconfigure for the cluster you're running on (i.e. push the decision onto users of controller-runtime since an optimal default isn't clear).
No problems with changing to 15 seconds by default though if you were leaning towards that (that would be better for us anyway because we're using GKE 😄). We could alternatively align to the kube-proxy default of 1 second? But on GKE at least this won't help much.