Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inject preStop hook into the proxy sidecar container to stop it last #3798

Merged
merged 6 commits into from
Dec 18, 2019

Conversation

KIVagant
Copy link
Contributor

@KIVagant KIVagant commented Dec 9, 2019

This commit adds support for a Graceful Shutdown technique that is used
by some Kubernetes administrators while the more perspective
configuration is being discussed in
kubernetes/kubernetes#65502

The problem is that RollingUpdate strategy does not guarantee that all
traffic will be sent to a new pod before the previous pod is removed.
Kubernetes inside is an event-driven system and when a pod is being
terminating, several processes can receive the event simultaneously.
And if an Ingress Controller gets the event too late or processes it
slower than Kubernetes removes the pod from its Service, users requests
will continue flowing into the black whole.

According to the documentation

  1. If one of the Pod’s containers has defined a preStop hook,
    it is invoked inside of the container. If the preStop hook is still
    running after the grace period expires, step 2 is then invoked with
    a small (2 second) extended grace period.

  2. The container is sent the TERM signal. Note that not all
    containers in the Pod will receive the TERM signal at the same time
    and may each require a preStop hook if the order in which
    they shut down matters.

This commit adds support for the preStop hook that can be configured
in three forms:

  1. As command line argument --wait-before-exit-seconds for
    linkerd inject command.

  2. As linkerd2 Helm chart value Proxy.WaitBeforeExitSeconds.

  3. As config.alpha.linkerd.io/wait-before-exit-seconds annotation.

If configured, it will add the following preHook to the proxy container
definition:

lifecycle:
  preStop:
    exec:
      command:
        - /bin/bash
        - -c
        - sleep {{.Values.Proxy.WaitBeforeExitSeconds}}

To achieve max benefit from the option, the main container should have
its own preStop hook with the sleep command inside which has
a smaller period than is set for the proxy sidecar. And none of them
must be bigger than terminationGracePeriodSeconds configured for the
entire pod.

An example of a rendered Kubernetes resource where
.Values.Proxy.WaitBeforeExitSeconds is equal to 40:

       # application container
        lifecycle:
          preStop:
            exec:
              command:
                - /bin/bash
                - -c
                - sleep 20

        # linkerd-proxy container 
        lifecycle:
          preStop:
            exec:
              command:
                - /bin/bash
                - -c
                - sleep 40
    terminationGracePeriodSeconds: 160 # for entire pod

Fixes #3747

@KIVagant
Copy link
Contributor Author

KIVagant commented Dec 9, 2019

@grampelberg ptal

@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch 2 times, most recently from 0b34764 to 8e208c8 Compare December 9, 2019 16:42
@KIVagant
Copy link
Contributor Author

KIVagant commented Dec 9, 2019

How to test the PR:

  1. Build and deploy all images from the branch.

  2. Download https://run.linkerd.io/emojivoto.yml

  3. Inject with the --manual flag and --wait-before-exit-seconds 111. Check the result.

  4. Modify manually in an editor the original file and the config.alpha.linkerd.io/wait-before-exit-seconds annotation with random values and leave at least one Deployment without the annotation. Inject with the --manual flag but without the --wait-before-exit-seconds flag. Check the result.

  template:
    metadata:
      annotations:
        config.alpha.linkerd.io/wait-before-exit-seconds: "64"
  1. Install Linkerd from the Helm chart with some WaitBeforeExitSeconds value. Check if injected pods have the configured sleeping period.

@KIVagant
Copy link
Contributor Author

Rebased from master

@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch 2 times, most recently from 2c5d8fc to e2e2c2c Compare December 10, 2019 15:15
@KIVagant
Copy link
Contributor Author

KIVagant commented Dec 10, 2019

Refreshed the new upgrade_overwrite_issuer.golden and upgrade_overwrite_trust_anchors.golden to include default value of waitBeforeExitSeconds

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @KIVagant , this is looking promising 👍

I've got a few comments below after a first pass. Also, in order for the flag to be persisted into linkerd-config for it to be applied not only when manually injecting but also when injecting through the webhook, the following changes are required:

diff --git a/cli/cmd/install.go b/cli/cmd/install.go
index 41ac4e62..1d8104bd 100644
--- a/cli/cmd/install.go
+++ b/cli/cmd/install.go
@@ -685,7 +685,8 @@ func (options *installOptions) buildValuesWithoutIdentity(configs *pb.All) (*l5d
                                Request: options.proxyMemoryRequest,
                        },
                },
-               UID: options.proxyUID,
+               UID:                   options.proxyUID,
+               WaitBeforeExitSeconds: int32(options.waitBeforeExitSeconds),
        }
 
        inboundPortStrs := []string{}
@@ -831,6 +832,7 @@ func (options *installOptions) proxyConfig() *pb.Proxy {
                DisableExternalProfiles: !options.enableExternalProfiles,
                ProxyVersion:            options.proxyVersion,
                ProxyInitImageVersion:   options.initImageVersion,
+               WaitBeforeExitSeconds:   int32(options.waitBeforeExitSeconds),
        }
 }

Edit: This change is no longer required, as per my comment below

charts/linkerd2/values.yaml Outdated Show resolved Hide resolved
cli/cmd/inject.go Outdated Show resolved Hide resolved
@alpeb
Copy link
Member

alpeb commented Dec 10, 2019

@KIVagant After talking further over this with @grampelberg we came to the conclusion we only want this to be exposed as an option to linkerd inject and its corresponding annotation, and not to linkerd install nor as config to be stored in linkerd-config. So some of my comments above are no longer relevant (I'll flag them as so). And it means we don't need changes to the protobuf ProxyStruct. These are the changes that your last push would require:

diff --git a/controller/gen/config/config.pb.go b/controller/gen/config/config.pb.go
index 4c7f7997..42472eaa 100644
--- a/controller/gen/config/config.pb.go
+++ b/controller/gen/config/config.pb.go
@@ -182,7 +182,6 @@ type Proxy struct {
        DisableExternalProfiles bool                  `protobuf:"varint,12,opt,name=disable_external_profiles,json=disableExternalProfiles,proto3" json:"disable_external_profiles,omitempty"`
        ProxyVersion            string                `protobuf:"bytes,13,opt,name=proxy_version,json=proxyVersion,proto3" json:"proxy_version,omitempty"`
        ProxyInitImageVersion   string                `protobuf:"bytes,14,opt,name=proxy_init_image_version,json=proxyInitImageVersion,proto3" json:"proxy_init_image_version,omitempty"`
-       WaitBeforeExitSeconds   int32                 `protobuf:"varint,15,opt,name=wait_before_exit_seconds,json=waitBeforeExitSeconds,proto3" json:"wait_before_exit_seconds,omitempty"`
        XXX_NoUnkeyedLiteral    struct{}              `json:"-"`
        XXX_unrecognized        []byte                `json:"-"`
        XXX_sizecache           int32                 `json:"-"`
@@ -269,13 +268,6 @@ func (m *Proxy) GetOutboundPort() *Port {
        return nil
 }
 
-func (m *Proxy) GetWaitBeforeExitSeconds() int32 {
-       if m != nil {
-               return m.WaitBeforeExitSeconds
-       }
-       return 0
-}
-
 func (m *Proxy) GetResource() *ResourceRequirements {
        if m != nil {
                return m.Resource
diff --git a/pkg/inject/inject.go b/pkg/inject/inject.go
index 9252532e..237015a9 100644
--- a/pkg/inject/inject.go
+++ b/pkg/inject/inject.go
@@ -749,7 +749,7 @@ func (conf *ResourceConfig) proxyWaitBeforeExitSeconds() int32 {
                }
        }
 
-       return conf.configs.GetProxy().GetWaitBeforeExitSeconds()
+       return 0
 }
 
 func (conf *ResourceConfig) proxyResourceRequirements() *l5dcharts.Resources {
diff --git a/pkg/inject/inject_test.go b/pkg/inject/inject_test.go
index 18ffb2a1..fbd7ae00 100644
--- a/pkg/inject/inject_test.go
+++ b/pkg/inject/inject_test.go
@@ -66,7 +66,6 @@ func TestConfigAccessors(t *testing.T) {
                LogLevel:                &config.LogLevel{Level: "info,linkerd2_proxy=debug"},
                DisableExternalProfiles: false,
                ProxyVersion:            proxyVersion,
-               WaitBeforeExitSeconds:   0,
        }
 
        globalConfig := &config.Global{

@KIVagant
Copy link
Contributor Author

@alpeb I cannot remember what leaded me to adding the protobuf configuration but without that, I guess, annotation won't work and also I had some problems with the default value evaluation when made a mistake in the line, so I really feel this is important to have the config line.

@alpeb
Copy link
Member

alpeb commented Dec 10, 2019

My last patch above should remove all the references to the config variable. If you still have problems you can also shoot me a message in slack.

@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch 2 times, most recently from 2b5208a to 71fdbf7 Compare December 11, 2019 14:48
@KIVagant
Copy link
Contributor Author

KIVagant commented Dec 11, 2019

Today's update:

  • removed seconds suffix everywhere except Helm chart values where time.Duration is not supported;
  • updated gcr.io/linkerd-io/go-deps (I am not sure how to do this correctly, please double check)
  • added k8s.ProxyWaitBeforeExitAnnotation to cli/cmd/doc.go
  • --wait-before-exit flag for linkerd inject now accepts time.Duration only, uint is not supported
  • config.alpha.linkerd.io/proxy-wait-before-exit annotation by default expects time.Duration type but if a plain integer is set it will try to interpret the value as "time in seconds". Added an additional positive test case to cover this scenario.
  • removed protobuf configuration.

Tested with the following examples:

  • config.alpha.linkerd.io/proxy-wait-before-exit: 2m results in /bin/bash -c sleep 120
  • config.alpha.linkerd.io/proxy-wait-before-exit: 2m3s results in /bin/bash -c sleep 123
  • config.alpha.linkerd.io/proxy-wait-before-exit: "64" results in /bin/bash -c sleep 64
  • config.alpha.linkerd.io/proxy-wait-before-exit: "111" results in /bin/bash -c sleep 111
  • config.alpha.linkerd.io/proxy-wait-before-exit: "-1" results in missing preStop hook
  • missed annotation results in missing preStop hook
  • linkerd inject --wait-before-exit 120s results in /bin/bash -c sleep 120
  • linkerd inject --wait-before-exit 2m20s results in /bin/bash -c sleep 140
  • linkerd inject --wait-before-exit 0 results in missing preStop hook same as when the
    parameter is omitted
  • linkerd inject --wait-before-exit -1 results in an error

@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch from 71fdbf7 to 4521ed3 Compare December 11, 2019 15:03
@KIVagant
Copy link
Contributor Author

The next step would be to find a way how to avoid using bash in the hook and integrate the sleep into the proxy, but this is controversial.

@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch 2 times, most recently from 05641c6 to 8292d52 Compare December 12, 2019 07:13
@KIVagant
Copy link
Contributor Author

Rebased from master

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I just have one comment

cli/cmd/inject.go Outdated Show resolved Hide resolved
charts/partials/templates/_proxy.tpl Outdated Show resolved Hide resolved
cli/cmd/doc.go Outdated
@@ -207,5 +207,9 @@ func generateAnnotationsDocs() []annotationDoc {
Name: k8s.ProxyTraceCollectorSvcAccountAnnotation,
Description: "The trace collector's service account name. E.g., `tracing-service-account`. If not provided, it will be defaulted to `default`.",
},
{
Name: k8s.ProxyWaitBeforeExitAnnotation,
Description: "The proxy sidecar will stay alive for at least the given period before receiving SIGTERM signal from Kubernetes but no longer than pod's `terminationGracePeriodSeconds`. Accepts values in two formats: time.Duration or as an unsigned integer (number of seconds). If not provided, it will be defaulted to `0`.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepts values in two formats

I think we should support only time.Duration. It will simplify the proxyWaitBeforeExit() function in pkg/inject/inject.go. Any reasons why we need to support unsigned integer too?

Copy link
Contributor Author

@KIVagant KIVagant Dec 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one reason. When I was testing the time.Duration I found that without "s" at the end the value will be ignored which is confusing for end-user. When one writes annotation linkerd/wait: 300 he probably will intuitively expect that the value will be converted to sleep 300 but it won't. In command line the error will be clear and obvious, but the annotation won't produce any messages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean; the existing code surrounding conf.proxyWaitBeforeExit() doesn't offer the opportunity to propagate any errors back to the caller. How about if we support only int32 and convert everything into seconds (which I think is consistent with the current variable name ,WaitBeforeExitSeconds)? Technically, sleep only works with s, m, h and d, which is a subset of what time.Duration supports. If the description says time.Duration is supported, we will need to account for suffices like ns, us and ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am totally okay with using int32 only but now it conflicts with CLI arguments.
Anyway, someone will get into the trap with one or another version. ¯_(ツ)_/¯ I don't believe that we can transparently pass time.Duration format to bash sleep just as a string without accurate converting. So please choose what should be left, it is not a principal question for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd have both the CLI flag and the annotation in the same format.

Is it possible for the injector to do validation of the annotation and fail to create the pod if it has this annotation in the wrong format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that we can transparently pass time.Duration format to bash sleep

Correct, hence, I suggested we express it in seconds, similar to the K8s terminationGracePeriodSeconds. More specifically,

  • Rename the new annotation and CLI argument to be consistent with your Helm variable (WaitBeforeExitSeconds):
    • --wait-before-exit-seconds as int32 (or even string if it's easier, just not time.Duration)
    • config.alpha.linkerd.io/wait-before-exit-seconds as string (I don't think annotation supports non-string type)

Anyway, someone will get into the trap with one or another version

I am hoping with the seconds suffix added, this will be hard to miss.

Is it possible for the injector to do validation of the annotation and fail to create the pod if it has this annotation in the wrong format?

It's doable, but it will require changes to the the return values of an existing function. @KIVagant See comment below regarding skipping the error returned by strconv.ParseUint(override, 10, 32).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I'm returning the code to the initial state with the -seconds suffix which makes everything obvious for end user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Can you update the new option's description in this file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KIVagant gentle ping about the description update 😉

charts/linkerd2/values.yaml Outdated Show resolved Hide resolved
@@ -91,6 +91,7 @@ type (
Resources *Resources
Trace *Trace
UID int64
WaitBeforeExitSeconds int32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of using string type here? My thought is that we only support the time.Duration format in both linkerd inject and helm [install|upgrade], to provide a consistent usage experience. It also removes the needs to type cast between time.Duration and int32.

E.g.

linkerd inject --wait-before-exit=30s

helm install --set proxy.waitBeforeExit=30s

So something like:

diff --git a/cli/cmd/install.go b/cli/cmd/install.go
index ca549398..b58dd632 100644
--- a/cli/cmd/install.go
+++ b/cli/cmd/install.go
@@ -164,6 +164,11 @@ func newInstallOptionsWithDefaults() (*installOptions, error) {
 		return nil, err
 	}
 
+	waitBeforeExit, err := time.ParseDuration(defaults.Proxy.WaitBeforeExitSeconds)
+	if err != nil {
+		return nil, err
+	}
+
 	return &installOptions{
 		clusterDomain:               defaults.ClusterDomain,
 		controlPlaneVersion:         version.Version,
@@ -198,7 +203,7 @@ func newInstallOptionsWithDefaults() (*installOptions, error) {
 			proxyCPULimit:          defaults.Proxy.Resources.CPU.Limit,
 			proxyMemoryLimit:       defaults.Proxy.Resources.Memory.Limit,
 			enableExternalProfiles: defaults.Proxy.EnableExternalProfiles,
-			waitBeforeExit:         time.Duration(defaults.Proxy.WaitBeforeExitSeconds) * time.Second,
+			waitBeforeExit:         waitBeforeExit,
 		},
 		identityOptions: &installIdentityOptions{
 			trustDomain:            defaults.Identity.TrustDomain,
diff --git a/pkg/charts/linkerd2/values.go b/pkg/charts/linkerd2/values.go
index 00db144d..064df76c 100644
--- a/pkg/charts/linkerd2/values.go
+++ b/pkg/charts/linkerd2/values.go
@@ -95,7 +95,7 @@ type (
 		Resources              *Resources    `json:"resources"`
 		Trace                  *Trace        `json:"trace"`
 		UID                    int64         `json:"uid"`
-		WaitBeforeExitSeconds  int32         `json:"waitBeforeExitSeconds"`
+		WaitBeforeExitSeconds  string        `json:"waitBeforeExitSeconds"`
 	}
 
 	// ProxyInit contains the fields to set the proxy-init container
diff --git a/pkg/inject/inject.go b/pkg/inject/inject.go
index 01321a74..8aa50285 100644
--- a/pkg/inject/inject.go
+++ b/pkg/inject/inject.go
@@ -470,7 +470,7 @@ func (conf *ResourceConfig) injectPodSpec(values *patch) {
 		},
 		UID:                   conf.proxyUID(),
 		Resources:             conf.proxyResourceRequirements(),
-		WaitBeforeExitSeconds: int32(conf.proxyWaitBeforeExit() / time.Second),
+		WaitBeforeExitSeconds: conf.proxyWaitBeforeExit().String(),
 	}
 
 	if v := conf.pod.meta.Annotations[k8s.ProxyEnableDebugAnnotation]; v != "" {

charts/linkerd2/README.md Outdated Show resolved Hide resolved
@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch from 8292d52 to fce26c0 Compare December 16, 2019 15:22
cli/cmd/doc.go Outdated
@@ -207,5 +207,9 @@ func generateAnnotationsDocs() []annotationDoc {
Name: k8s.ProxyTraceCollectorSvcAccountAnnotation,
Description: "The trace collector's service account name. E.g., `tracing-service-account`. If not provided, it will be defaulted to `default`.",
},
{
Name: k8s.ProxyWaitBeforeExitAnnotation,
Description: "The proxy sidecar will stay alive for at least the given period before receiving SIGTERM signal from Kubernetes but no longer than pod's `terminationGracePeriodSeconds`. Accepts values in two formats: time.Duration or as an unsigned integer (number of seconds). If not provided, it will be defaulted to `0`.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that we can transparently pass time.Duration format to bash sleep

Correct, hence, I suggested we express it in seconds, similar to the K8s terminationGracePeriodSeconds. More specifically,

  • Rename the new annotation and CLI argument to be consistent with your Helm variable (WaitBeforeExitSeconds):
    • --wait-before-exit-seconds as int32 (or even string if it's easier, just not time.Duration)
    • config.alpha.linkerd.io/wait-before-exit-seconds as string (I don't think annotation supports non-string type)

Anyway, someone will get into the trap with one or another version

I am hoping with the seconds suffix added, this will be hard to miss.

Is it possible for the injector to do validation of the annotation and fail to create the pod if it has this annotation in the wrong format?

It's doable, but it will require changes to the the return values of an existing function. @KIVagant See comment below regarding skipping the error returned by strconv.ParseUint(override, 10, 32).


// Try to parse plain integer as value in seconds if it is not formatted as Duration
if err != nil {
waitInSeconds, _ := strconv.ParseUint(override, 10, 32)
Copy link
Contributor

@ihcsim ihcsim Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still return this error to conf.injectPodSpec(). Then in that function, we can either:

  1. Log the error as a warning like what it's already doing to other returned errors, so that at the minimum, the logs can provide a hint as to why the prestop hook isn't added, or
  2. Update the existing injectPodSpec() function to propagate the errors back to the original caller

I think option 2 is the right way to do it, but just not sure of the change scope it entails. I am not against option 1, as I don't think it's hard for the user to fix a typo (or whatever) in the override value.

Copy link
Contributor Author

@KIVagant KIVagant Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert the code state to the variant with -seconds suffix everywhere which accepts uint32. Even without logging the error, if a sleep command hasn't been enabled, anyone should quickly find the problem just from the annotation and arguments naming.
Up: applying the opt 1.

@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch 2 times, most recently from 9aeddb3 to 637ccd4 Compare December 17, 2019 10:17
@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch 2 times, most recently from 04812a9 to d14d429 Compare December 17, 2019 10:47
@KIVagant
Copy link
Contributor Author

KIVagant commented Dec 17, 2019

Today's update:

  • returned -seconds suffix everywhere to make it clear from any points of view.
  • returned uint (previously it was int32, then Duration) as the format for both, the cli argument and the annotation.
  • added a negative test case to test the resulting value is zero if the annotation is in a wrong format (one scenario).
  • added a log warning message if the annotation is in a wrong format.

@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch from d14d429 to 76f2e21 Compare December 17, 2019 16:17
Dockerfile-proxy Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch from 76f2e21 to 87b6db5 Compare December 17, 2019 16:27
@ihcsim
Copy link
Contributor

ihcsim commented Dec 17, 2019

Opppss... I hit approve by mistake yesterday; but it looks like it's very close 😉.

cli/cmd/root.go Outdated Show resolved Hide resolved
pkg/inject/inject.go Outdated Show resolved Hide resolved
@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch from 87b6db5 to a6a0110 Compare December 17, 2019 19:57
Signed-off-by: Eugene Glotov <kivagant@gmail.com>
This commit adds support for a Graceful Shutdown technique that is used
by some Kubernetes administrators while the more perspective
configuration is being discussed in
kubernetes/kubernetes#65502

The problem is that RollingUpdate strategy does not guarantee that all
traffic will be sent to a new pod _before_ the previous pod is removed.
Kubernetes inside is an event-driven system and when a pod is being
terminating, several processes can receive the event simultaneously.
And if an Ingress Controller gets the event too late or processes it
slower than Kubernetes removes the pod from its Service, users requests
will continue flowing into the black whole.

According [to the documentation](https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods)

> 1. If one of the Pod’s containers has defined a `preStop` hook,
> it is invoked inside of the container. If the `preStop` hook is still
> running after the grace period expires, step 2 is then invoked with
> a small (2 second) extended grace period.
>
> 2. The container is sent the `TERM` signal. Note that not all
> containers in the Pod will receive the `TERM` signal at the same time
> and may each require a preStop hook if the order in which
> they shut down matters.

This commit adds support for the `preStop` hook that can be configured
in three forms:

1. As command line argument `--wait-before-exit-seconds` for
  `linkerd inject` command.

2. As `linkerd2` Helm chart value `Proxy.WaitBeforeExitSeconds`.

2. As `config.alpha.linkerd.io/wait-before-exit-seconds` annotation.

If configured, it will add the following preHook to the proxy container
definition:

```yaml
lifecycle:
  preStop:
    exec:
      command:
        - /bin/bash
        - -c
        - sleep {{.Values.Proxy.WaitBeforeExitSeconds}}
```

To achieve max benefit from the option, the main container should have
its own `preStop` hook with the `sleep` command inside which has
a smaller period than is set for the proxy sidecar. And none of them
must be bigger than `terminationGracePeriodSeconds` configured for the
entire pod.

An example of a rendered Kubernetes resource where
`.Values.Proxy.WaitBeforeExitSeconds` is equal to `40`:

```yaml
       # application container
        lifecycle:
          preStop:
            exec:
              command:
                - /bin/bash
                - -c
                - sleep 20

        # linkerd-proxy container
        lifecycle:
          preStop:
            exec:
              command:
                - /bin/bash
                - -c
                - sleep 40
    terminationGracePeriodSeconds: 160 # for entire pod
```

Fixes linkerd#3747

Signed-off-by: Eugene Glotov <kivagant@gmail.com>
Signed-off-by: Eugene Glotov <kivagant@gmail.com>
Signed-off-by: Eugene Glotov <kivagant@gmail.com>
@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch from a6a0110 to aea1936 Compare December 18, 2019 10:01
@KIVagant
Copy link
Contributor Author

Today's update:

  • uint type replaced with uint64

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the description in cli/cmd/doc.go still needs to be updated, but other than that, this looks good to me.

As a follow up task, we should add some documentation to the website explaining how to use this flag to deal with slow draining ALBs.

Signed-off-by: Eugene Glotov <kivagant@gmail.com>
@KIVagant KIVagant force-pushed the kivagant/inject_prestop_hook branch from aea1936 to 410b9a8 Compare December 18, 2019 20:36
@KIVagant
Copy link
Contributor Author

Fixed the annotation description and renamed the variable from ProxyWaitBeforeExitAnnotation to ProxyWaitBeforeExitSecondsAnnotation

k8s.ProxyWaitBeforeExitSecondsAnnotation, override)
}

return waitBeforeExitSeconds, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is being swallowed on the calling side. You could keep on logging the error here, and then do like in the other annotation parsing methods where if there's an error a default value is used (0), and the function would always return a uint64 only.
Also, can I ask you to use additional commits instead of force-pushing? It would make reviews a lot easier. It doesn't matter if there are lots of commits because we squash them together into a single commit prior to merging with master 😉

@KIVagant
Copy link
Contributor Author

Tested linkerd inject --manual for the following set of options:

  • config.alpha.linkerd.io/proxy-wait-before-exit-seconds: "100" → sleep 100
  • config.alpha.linkerd.io/proxy-wait-before-exit-seconds: 100 → sleep 100 # but kubectl apply won't work with the integer value
  • config.alpha.linkerd.io/proxy-wait-before-exit-seconds: "100s" → no hook
  • config.alpha.linkerd.io/proxy-wait-before-exit-seconds: "-100" → no hook
  • config.alpha.linkerd.io/proxy-wait-before-exit-seconds: "100.100" → no hook
  • config.alpha.linkerd.io/proxy-wait-before-exit-seconds: "999999999999999999999999" → sleep 1.8446744073709552e+19
  • config.alpha.linkerd.io/proxy-wait-before-exit-seconds: "18446744073709551615" → sleep 1.8446744073709552e+19 #uint64 maximum
  • config.alpha.linkerd.io/proxy-wait-before-exit-seconds: "4294967295" → sleep 4.294967295e+09 #uint32 maximum
  • config.alpha.linkerd.io/proxy-wait-before-exit-seconds: "65535" → sleep 65535 #uint16 maximum

Signed-off-by: Eugene Glotov <kivagant@gmail.com>
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @KIVagant for the hard work!

Just to confirm we're on the same page, the last set of tests you posted is what users should be expecting, right?

@KIVagant
Copy link
Contributor Author

Just to confirm we're on the same page, the last set of tests you posted is what users should be expecting, right?

The set represents the actual behavior of this implementation. As you may see, big numbers don't work well but if the value stays below 65535 (I'm not sure about the available maximum), which is quite enough for the goals of this solution, it works as expected.

@alpeb alpeb merged commit 748da80 into linkerd:master Dec 18, 2019
@KIVagant
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configure linkerd-proxy to ignore SIGTERM on a per-workload basis
5 participants