-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inject preStop hook into the proxy sidecar container to stop it last #3798
Conversation
@grampelberg ptal |
0b34764
to
8e208c8
Compare
How to test the PR:
|
8e208c8
to
ad04947
Compare
Rebased from master |
2c5d8fc
to
e2e2c2c
Compare
Refreshed the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @KIVagant , this is looking promising 👍
I've got a few comments below after a first pass. Also, in order for the flag to be persisted into linkerd-config
for it to be applied not only when manually injecting but also when injecting through the webhook, the following changes are required:
diff --git a/cli/cmd/install.go b/cli/cmd/install.go
index 41ac4e62..1d8104bd 100644
--- a/cli/cmd/install.go
+++ b/cli/cmd/install.go
@@ -685,7 +685,8 @@ func (options *installOptions) buildValuesWithoutIdentity(configs *pb.All) (*l5d
Request: options.proxyMemoryRequest,
},
},
- UID: options.proxyUID,
+ UID: options.proxyUID,
+ WaitBeforeExitSeconds: int32(options.waitBeforeExitSeconds),
}
inboundPortStrs := []string{}
@@ -831,6 +832,7 @@ func (options *installOptions) proxyConfig() *pb.Proxy {
DisableExternalProfiles: !options.enableExternalProfiles,
ProxyVersion: options.proxyVersion,
ProxyInitImageVersion: options.initImageVersion,
+ WaitBeforeExitSeconds: int32(options.waitBeforeExitSeconds),
}
}
Edit: This change is no longer required, as per my comment below
@KIVagant After talking further over this with @grampelberg we came to the conclusion we only want this to be exposed as an option to diff --git a/controller/gen/config/config.pb.go b/controller/gen/config/config.pb.go
index 4c7f7997..42472eaa 100644
--- a/controller/gen/config/config.pb.go
+++ b/controller/gen/config/config.pb.go
@@ -182,7 +182,6 @@ type Proxy struct {
DisableExternalProfiles bool `protobuf:"varint,12,opt,name=disable_external_profiles,json=disableExternalProfiles,proto3" json:"disable_external_profiles,omitempty"`
ProxyVersion string `protobuf:"bytes,13,opt,name=proxy_version,json=proxyVersion,proto3" json:"proxy_version,omitempty"`
ProxyInitImageVersion string `protobuf:"bytes,14,opt,name=proxy_init_image_version,json=proxyInitImageVersion,proto3" json:"proxy_init_image_version,omitempty"`
- WaitBeforeExitSeconds int32 `protobuf:"varint,15,opt,name=wait_before_exit_seconds,json=waitBeforeExitSeconds,proto3" json:"wait_before_exit_seconds,omitempty"`
XXX_NoUnkeyedLiteral struct{} `json:"-"`
XXX_unrecognized []byte `json:"-"`
XXX_sizecache int32 `json:"-"`
@@ -269,13 +268,6 @@ func (m *Proxy) GetOutboundPort() *Port {
return nil
}
-func (m *Proxy) GetWaitBeforeExitSeconds() int32 {
- if m != nil {
- return m.WaitBeforeExitSeconds
- }
- return 0
-}
-
func (m *Proxy) GetResource() *ResourceRequirements {
if m != nil {
return m.Resource
diff --git a/pkg/inject/inject.go b/pkg/inject/inject.go
index 9252532e..237015a9 100644
--- a/pkg/inject/inject.go
+++ b/pkg/inject/inject.go
@@ -749,7 +749,7 @@ func (conf *ResourceConfig) proxyWaitBeforeExitSeconds() int32 {
}
}
- return conf.configs.GetProxy().GetWaitBeforeExitSeconds()
+ return 0
}
func (conf *ResourceConfig) proxyResourceRequirements() *l5dcharts.Resources {
diff --git a/pkg/inject/inject_test.go b/pkg/inject/inject_test.go
index 18ffb2a1..fbd7ae00 100644
--- a/pkg/inject/inject_test.go
+++ b/pkg/inject/inject_test.go
@@ -66,7 +66,6 @@ func TestConfigAccessors(t *testing.T) {
LogLevel: &config.LogLevel{Level: "info,linkerd2_proxy=debug"},
DisableExternalProfiles: false,
ProxyVersion: proxyVersion,
- WaitBeforeExitSeconds: 0,
}
globalConfig := &config.Global{ |
@alpeb I cannot remember what leaded me to adding the protobuf configuration but without that, I guess, annotation won't work and also I had some problems with the default value evaluation when made a mistake in the line, so I really feel this is important to have the config line. |
My last patch above should remove all the references to the config variable. If you still have problems you can also shoot me a message in slack. |
2b5208a
to
71fdbf7
Compare
Today's update:
Tested with the following examples:
|
71fdbf7
to
4521ed3
Compare
The next step would be to find a way how to avoid using bash in the hook and integrate the sleep into the proxy, but this is controversial. |
05641c6
to
8292d52
Compare
Rebased from master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I just have one comment
cli/cmd/doc.go
Outdated
@@ -207,5 +207,9 @@ func generateAnnotationsDocs() []annotationDoc { | |||
Name: k8s.ProxyTraceCollectorSvcAccountAnnotation, | |||
Description: "The trace collector's service account name. E.g., `tracing-service-account`. If not provided, it will be defaulted to `default`.", | |||
}, | |||
{ | |||
Name: k8s.ProxyWaitBeforeExitAnnotation, | |||
Description: "The proxy sidecar will stay alive for at least the given period before receiving SIGTERM signal from Kubernetes but no longer than pod's `terminationGracePeriodSeconds`. Accepts values in two formats: time.Duration or as an unsigned integer (number of seconds). If not provided, it will be defaulted to `0`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accepts values in two formats
I think we should support only time.Duration
. It will simplify the proxyWaitBeforeExit()
function in pkg/inject/inject.go
. Any reasons why we need to support unsigned integer too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one reason. When I was testing the time.Duration
I found that without "s" at the end the value will be ignored which is confusing for end-user. When one writes annotation linkerd/wait: 300
he probably will intuitively expect that the value will be converted to sleep 300
but it won't. In command line the error will be clear and obvious, but the annotation won't produce any messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean; the existing code surrounding conf.proxyWaitBeforeExit()
doesn't offer the opportunity to propagate any errors back to the caller. How about if we support only int32
and convert everything into seconds (which I think is consistent with the current variable name ,WaitBeforeExitSeconds
)? Technically, sleep
only works with s
, m
, h
and d
, which is a subset of what time.Duration
supports. If the description says time.Duration
is supported, we will need to account for suffices like ns
, us
and ms
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am totally okay with using int32
only but now it conflicts with CLI arguments.
Anyway, someone will get into the trap with one or another version. ¯_(ツ)_/¯ I don't believe that we can transparently pass time.Duration
format to bash sleep
just as a string without accurate converting. So please choose what should be left, it is not a principal question for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd have both the CLI flag and the annotation in the same format.
Is it possible for the injector to do validation of the annotation and fail to create the pod if it has this annotation in the wrong format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that we can transparently pass time.Duration format to bash sleep
Correct, hence, I suggested we express it in seconds, similar to the K8s terminationGracePeriodSeconds
. More specifically,
- Rename the new annotation and CLI argument to be consistent with your Helm variable (
WaitBeforeExitSeconds
):--wait-before-exit-seconds
as int32 (or even string if it's easier, just nottime.Duration
)config.alpha.linkerd.io/wait-before-exit-seconds
as string (I don't think annotation supports non-string type)
Anyway, someone will get into the trap with one or another version
I am hoping with the seconds
suffix added, this will be hard to miss.
Is it possible for the injector to do validation of the annotation and fail to create the pod if it has this annotation in the wrong format?
It's doable, but it will require changes to the the return values of an existing function. @KIVagant See comment below regarding skipping the error returned by strconv.ParseUint(override, 10, 32)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I'm returning the code to the initial state with the -seconds
suffix which makes everything obvious for end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Can you update the new option's description in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KIVagant gentle ping about the description update 😉
pkg/charts/linkerd2/values.go
Outdated
@@ -91,6 +91,7 @@ type ( | |||
Resources *Resources | |||
Trace *Trace | |||
UID int64 | |||
WaitBeforeExitSeconds int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of using string type here? My thought is that we only support the time.Duration
format in both linkerd inject
and helm [install|upgrade]
, to provide a consistent usage experience. It also removes the needs to type cast between time.Duration
and int32
.
E.g.
linkerd inject --wait-before-exit=30s
helm install --set proxy.waitBeforeExit=30s
So something like:
diff --git a/cli/cmd/install.go b/cli/cmd/install.go
index ca549398..b58dd632 100644
--- a/cli/cmd/install.go
+++ b/cli/cmd/install.go
@@ -164,6 +164,11 @@ func newInstallOptionsWithDefaults() (*installOptions, error) {
return nil, err
}
+ waitBeforeExit, err := time.ParseDuration(defaults.Proxy.WaitBeforeExitSeconds)
+ if err != nil {
+ return nil, err
+ }
+
return &installOptions{
clusterDomain: defaults.ClusterDomain,
controlPlaneVersion: version.Version,
@@ -198,7 +203,7 @@ func newInstallOptionsWithDefaults() (*installOptions, error) {
proxyCPULimit: defaults.Proxy.Resources.CPU.Limit,
proxyMemoryLimit: defaults.Proxy.Resources.Memory.Limit,
enableExternalProfiles: defaults.Proxy.EnableExternalProfiles,
- waitBeforeExit: time.Duration(defaults.Proxy.WaitBeforeExitSeconds) * time.Second,
+ waitBeforeExit: waitBeforeExit,
},
identityOptions: &installIdentityOptions{
trustDomain: defaults.Identity.TrustDomain,
diff --git a/pkg/charts/linkerd2/values.go b/pkg/charts/linkerd2/values.go
index 00db144d..064df76c 100644
--- a/pkg/charts/linkerd2/values.go
+++ b/pkg/charts/linkerd2/values.go
@@ -95,7 +95,7 @@ type (
Resources *Resources `json:"resources"`
Trace *Trace `json:"trace"`
UID int64 `json:"uid"`
- WaitBeforeExitSeconds int32 `json:"waitBeforeExitSeconds"`
+ WaitBeforeExitSeconds string `json:"waitBeforeExitSeconds"`
}
// ProxyInit contains the fields to set the proxy-init container
diff --git a/pkg/inject/inject.go b/pkg/inject/inject.go
index 01321a74..8aa50285 100644
--- a/pkg/inject/inject.go
+++ b/pkg/inject/inject.go
@@ -470,7 +470,7 @@ func (conf *ResourceConfig) injectPodSpec(values *patch) {
},
UID: conf.proxyUID(),
Resources: conf.proxyResourceRequirements(),
- WaitBeforeExitSeconds: int32(conf.proxyWaitBeforeExit() / time.Second),
+ WaitBeforeExitSeconds: conf.proxyWaitBeforeExit().String(),
}
if v := conf.pod.meta.Annotations[k8s.ProxyEnableDebugAnnotation]; v != "" {
8292d52
to
fce26c0
Compare
cli/cmd/doc.go
Outdated
@@ -207,5 +207,9 @@ func generateAnnotationsDocs() []annotationDoc { | |||
Name: k8s.ProxyTraceCollectorSvcAccountAnnotation, | |||
Description: "The trace collector's service account name. E.g., `tracing-service-account`. If not provided, it will be defaulted to `default`.", | |||
}, | |||
{ | |||
Name: k8s.ProxyWaitBeforeExitAnnotation, | |||
Description: "The proxy sidecar will stay alive for at least the given period before receiving SIGTERM signal from Kubernetes but no longer than pod's `terminationGracePeriodSeconds`. Accepts values in two formats: time.Duration or as an unsigned integer (number of seconds). If not provided, it will be defaulted to `0`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that we can transparently pass time.Duration format to bash sleep
Correct, hence, I suggested we express it in seconds, similar to the K8s terminationGracePeriodSeconds
. More specifically,
- Rename the new annotation and CLI argument to be consistent with your Helm variable (
WaitBeforeExitSeconds
):--wait-before-exit-seconds
as int32 (or even string if it's easier, just nottime.Duration
)config.alpha.linkerd.io/wait-before-exit-seconds
as string (I don't think annotation supports non-string type)
Anyway, someone will get into the trap with one or another version
I am hoping with the seconds
suffix added, this will be hard to miss.
Is it possible for the injector to do validation of the annotation and fail to create the pod if it has this annotation in the wrong format?
It's doable, but it will require changes to the the return values of an existing function. @KIVagant See comment below regarding skipping the error returned by strconv.ParseUint(override, 10, 32)
.
pkg/inject/inject.go
Outdated
|
||
// Try to parse plain integer as value in seconds if it is not formatted as Duration | ||
if err != nil { | ||
waitInSeconds, _ := strconv.ParseUint(override, 10, 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still return this error to conf.injectPodSpec()
. Then in that function, we can either:
- Log the error as a warning like what it's already doing to other returned errors, so that at the minimum, the logs can provide a hint as to why the prestop hook isn't added, or
- Update the existing
injectPodSpec()
function to propagate the errors back to the original caller
I think option 2 is the right way to do it, but just not sure of the change scope it entails. I am not against option 1, as I don't think it's hard for the user to fix a typo (or whatever) in the override
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert the code state to the variant with -seconds
suffix everywhere which accepts uint32. Even without logging the error, if a sleep command hasn't been enabled, anyone should quickly find the problem just from the annotation and arguments naming.
Up: applying the opt 1.
9aeddb3
to
637ccd4
Compare
04812a9
to
d14d429
Compare
Today's update:
|
d14d429
to
76f2e21
Compare
76f2e21
to
87b6db5
Compare
Opppss... I hit |
87b6db5
to
a6a0110
Compare
This commit adds support for a Graceful Shutdown technique that is used by some Kubernetes administrators while the more perspective configuration is being discussed in kubernetes/kubernetes#65502 The problem is that RollingUpdate strategy does not guarantee that all traffic will be sent to a new pod _before_ the previous pod is removed. Kubernetes inside is an event-driven system and when a pod is being terminating, several processes can receive the event simultaneously. And if an Ingress Controller gets the event too late or processes it slower than Kubernetes removes the pod from its Service, users requests will continue flowing into the black whole. According [to the documentation](https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods) > 1. If one of the Pod’s containers has defined a `preStop` hook, > it is invoked inside of the container. If the `preStop` hook is still > running after the grace period expires, step 2 is then invoked with > a small (2 second) extended grace period. > > 2. The container is sent the `TERM` signal. Note that not all > containers in the Pod will receive the `TERM` signal at the same time > and may each require a preStop hook if the order in which > they shut down matters. This commit adds support for the `preStop` hook that can be configured in three forms: 1. As command line argument `--wait-before-exit-seconds` for `linkerd inject` command. 2. As `linkerd2` Helm chart value `Proxy.WaitBeforeExitSeconds`. 2. As `config.alpha.linkerd.io/wait-before-exit-seconds` annotation. If configured, it will add the following preHook to the proxy container definition: ```yaml lifecycle: preStop: exec: command: - /bin/bash - -c - sleep {{.Values.Proxy.WaitBeforeExitSeconds}} ``` To achieve max benefit from the option, the main container should have its own `preStop` hook with the `sleep` command inside which has a smaller period than is set for the proxy sidecar. And none of them must be bigger than `terminationGracePeriodSeconds` configured for the entire pod. An example of a rendered Kubernetes resource where `.Values.Proxy.WaitBeforeExitSeconds` is equal to `40`: ```yaml # application container lifecycle: preStop: exec: command: - /bin/bash - -c - sleep 20 # linkerd-proxy container lifecycle: preStop: exec: command: - /bin/bash - -c - sleep 40 terminationGracePeriodSeconds: 160 # for entire pod ``` Fixes linkerd#3747 Signed-off-by: Eugene Glotov <kivagant@gmail.com>
Signed-off-by: Eugene Glotov <kivagant@gmail.com>
Signed-off-by: Eugene Glotov <kivagant@gmail.com>
a6a0110
to
aea1936
Compare
Today's update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the description in cli/cmd/doc.go
still needs to be updated, but other than that, this looks good to me.
As a follow up task, we should add some documentation to the website explaining how to use this flag to deal with slow draining ALBs.
Signed-off-by: Eugene Glotov <kivagant@gmail.com>
aea1936
to
410b9a8
Compare
Fixed the annotation description and renamed the variable from |
pkg/inject/inject.go
Outdated
k8s.ProxyWaitBeforeExitSecondsAnnotation, override) | ||
} | ||
|
||
return waitBeforeExitSeconds, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is being swallowed on the calling side. You could keep on logging the error here, and then do like in the other annotation parsing methods where if there's an error a default value is used (0), and the function would always return a uint64
only.
Also, can I ask you to use additional commits instead of force-pushing? It would make reviews a lot easier. It doesn't matter if there are lots of commits because we squash them together into a single commit prior to merging with master 😉
Tested
|
Signed-off-by: Eugene Glotov <kivagant@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @KIVagant for the hard work!
Just to confirm we're on the same page, the last set of tests you posted is what users should be expecting, right?
The set represents the actual behavior of this implementation. As you may see, big numbers don't work well but if the value stays below |
This commit adds support for a Graceful Shutdown technique that is used
by some Kubernetes administrators while the more perspective
configuration is being discussed in
kubernetes/kubernetes#65502
The problem is that RollingUpdate strategy does not guarantee that all
traffic will be sent to a new pod before the previous pod is removed.
Kubernetes inside is an event-driven system and when a pod is being
terminating, several processes can receive the event simultaneously.
And if an Ingress Controller gets the event too late or processes it
slower than Kubernetes removes the pod from its Service, users requests
will continue flowing into the black whole.
According to the documentation
This commit adds support for the
preStop
hook that can be configuredin three forms:
As command line argument
--wait-before-exit-seconds
forlinkerd inject
command.As
linkerd2
Helm chart valueProxy.WaitBeforeExitSeconds
.As
config.alpha.linkerd.io/wait-before-exit-seconds
annotation.If configured, it will add the following preHook to the proxy container
definition:
To achieve max benefit from the option, the main container should have
its own
preStop
hook with thesleep
command inside which hasa smaller period than is set for the proxy sidecar. And none of them
must be bigger than
terminationGracePeriodSeconds
configured for theentire pod.
An example of a rendered Kubernetes resource where
.Values.Proxy.WaitBeforeExitSeconds
is equal to40
:Fixes #3747