Support events.preStop #1168

AObuchow · 2023-08-23T15:58:54Z

What does this PR do?

Adds support for events.preStop in devworkspaces. Only exec type commands are supported.

Multiple preStop commands may be executed, following a similar format to postStart events, the difference being that the output of the commands are redirected to /proc/1/fd/1 (for stdout) and /proc/1/fd/2 for stderr. This allows reading the output of preStop commands in the kubernetes logs after pod termination.

The webhook warning against the usage of events.preStop was removed, as well as the mention of events.preStop in the unsupported devfile API doc.

What issues does this PR fix or reference?

Fix #1139

Is it tested? How?

Some automated tests (based on the postStart test cases) were added.

To manually test:

Install DWO using the changes from this PR. Make sure the webhook is also built from this PR, otherwise you'll get the old warning about using events.preStop
Apply a DevWorkspace that uses preStop events. I tested with the following:

kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  name: pre-stop-test
spec:
  started: true
  routingClass: 'basic'
  template:
    projects:
    - name: web-nodejs-sample
      git:
        remotes:
          origin: "https://github.com/che-samples/web-nodejs-sample.git"
    components:
      - name: web-terminal
        container:
          image: quay.io/wto/web-terminal-tooling:next
          memoryLimit: 512Mi
          mountSources: true
          command:
           - "tail"
           - "-f"
           - "/dev/null"
    commands:
    - id: failing-command
      exec:
        component: web-terminal
        commandLine: ./not-a-script.sh
        workingDir: /tmp/
    - id: list-in-project
      exec:
        component: web-terminal
        commandLine: ls -al
        workingDir: ${PROJECT_SOURCE}
    - id: long-command
      exec:
        component: web-terminal
        commandLine: |
          date
          pwd
          sleep 5s
          echo "done sleeping"
        workingDir: $HOME
    - id: output-to-file
      exec:
        component: web-terminal
        commandLine: echo "writing to a file" > output-to-file.log
        workingDir: ${PROJECT_SOURCE}
    - id: multi-line-command
      exec:
        component: web-terminal
        commandLine: |
          echo "date: $(date)" > multi-line-command.log
          pwd >> multi-line-command.log
          sleep 5s
          echo "done sleeping: $(date)" >> multi-line-command.log
        workingDir: $HOME
    - id: background-longrunning
      exec:
        component: web-terminal
        commandLine: |
          nohup sh -c '{
            for i in {1..500}; do
              echo "background: $i"
              sleep 1
            done
          }' &
    events:
      preStop:
        - long-command
        - background-longrunning
        - output-to-file
        - failing-command
        - multi-line-command
        - list-in-project

After the Devworkspace starts up, find the workspace pod's name. In my case, workspace623a8f20aa6442f3-6dccd76799-mqnt5 :

$ kubectl get pods -n $NAMESPACE
NAME                                           READY   STATUS    RESTARTS   AGE
devworkspace-webhook-server-7ff4b8d7bf-szb8f   2/2     Running   0          89s
workspace623a8f20aa6442f3-6dccd76799-mqnt5     1/1     Running   0          63s

Open another terminal and monitor the logs of the workspace pod: kubectl logs -f workspace623a8f20aa6442f3-6dccd76799-mqnt5 -n $NAMESPACE
In another terminal, delete the DevWorkspace: kubectl delete dw pre-stop-test -n $NAMESPACE
In the terminal that you're monitoring the logs, you should see the output of all preStop event commands:

$ kubectl logs -f workspace623a8f20aa6442f3-6dccd76799-mqnt5 -n $NAMESPACE
Defaulted container "web-terminal" out of: web-terminal, project-clone (init)
Wed Aug 23 15:56:13 UTC 2023
/home/user
done sleeping
/bin/sh: line 17: ./not-a-script.sh: No such file or directory
background: 1
background: 2
background: 3
background: 4
background: 5
total 56
drwxr-xr-x. 1 1234 root   240 Aug 23 15:56 .
drwxrwxrwx. 1 root root    34 Aug 23 15:52 ..
drwxr-xr-x. 1 1234 root   158 Aug 23 15:52 .git
-rw-r--r--. 1 1234 root    25 Aug 23 15:52 .gitattributes
drwxr-xr-x. 1 1234 root    20 Aug 23 15:52 .github
-rw-r--r--. 1 1234 root    14 Aug 23 15:52 .gitignore
drwxr-xr-x. 1 1234 root    52 Aug 23 15:52 .vscode
-rw-r--r--. 1 1234 root 14197 Aug 23 15:52 LICENSE
-rw-r--r--. 1 1234 root   449 Aug 23 15:52 README.md
drwxr-xr-x. 1 1234 root    12 Aug 23 15:52 app
-rw-r--r--. 1 1234 root  1384 Aug 23 15:52 devfile.yaml
-rw-r--r--. 1 1234 root    18 Aug 23 15:56 output-to-file.log
-rw-r--r--. 1 1234 root 14289 Aug 23 15:52 package-lock.json
-rw-r--r--. 1 1234 root   265 Aug 23 15:52 package.json
background: 6
background: 7
background: 8
background: 9
background: 10
background: 11
background: 12
background: 13
background: 14
background: 15
$ # Pod has fully terminated and preStop events have finished

PR Checklist

E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
- v8-devworkspace-operator-e2e: DevWorkspace e2e test
- v8-che-happy-path: Happy path for verification integration with Che

AObuchow · 2023-08-23T16:04:57Z

I also verified that user-provided preStop events can't interfere with the async-storage-sidecar preStop event:

If a user creates a devworkspace using the async storage strategy and assigns a preStop event to a component named async-storage-sidecar without having a component named async-storage-sidecar, their devworkspace will fail during validation.
If a user uses the async storage strategy, creates a component named async-storage-sidecar and assigns a preStop event to it, their devworkspace will fail validation due to a duplicate container name (the internal async-storage-sidecar container and the user-defined async-storage-sidecar container component).

codecov · 2023-08-23T17:26:41Z

Codecov Report

Patch coverage: 90.76% and project coverage change: +0.38% 🎉

Comparison is base (65bdb9e) 52.53% compared to head (a81c3d7) 52.91%.
Report is 8 commits behind head on main.

❗ Current head a81c3d7 differs from pull request most recent head 52719a0. Consider uploading reports for the commit 52719a0 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1168      +/-   ##
==========================================
+ Coverage   52.53%   52.91%   +0.38%     
==========================================
  Files          82       84       +2     
  Lines        7460     7544      +84     
==========================================
+ Hits         3919     3992      +73     
- Misses       3260     3269       +9     
- Partials      281      283       +2

Files Changed	Coverage Δ
pkg/library/container/container.go	`46.66% <0.00%> (-3.34%)`	⬇️
pkg/library/lifecycle/poststart.go	`94.54% <ø> (-0.54%)`	⬇️
pkg/library/lifecycle/prestop.go	`94.54% <94.54%> (ø)`
pkg/library/lifecycle/util.go	`100.00% <100.00%> (ø)`
webhook/workspace/handler/warning.go	`100.00% <100.00%> (ø)`

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

amisevsk

Generally looks good, I'll test it soon

amisevsk · 2023-08-24T15:05:35Z

pkg/library/lifecycle/prestop.go

+const redirectPreStopOutputFmt = `{
+%s
+} 1>/proc/1/fd/1 2>/proc/1/fd/2
+`


Could you add a doc comment on the intention behind this?

Also, I'm concerned whether this format will suppress any messages on failure -- if a prestop command fails, normally the output is stored in a FailedPreStopHook event. If output is redirected to stdout, it might actually make it harder to figure out what went wrong in case of failure (if the event no longer contains this information)

I'm thinking about this because normally with workspaces, pods are removed and not stopped, so pod logs will disappear (this output would only be visible if you were tailing the logs as the workspace stopped)

Could you add a doc comment on the intention behind this?

Will add some documentation on the intention of the redirection (though it seems this is more of a docker feature than kubernetes standard).

Also, I'm concerned whether this format will suppress any messages on failure -- if a prestop command fails, normally the output is stored in a FailedPreStopHook event. If output is redirected to stdout, it might actually make it harder to figure out what went wrong in case of failure (if the event no longer contains this information)

I'm having trouble figuring out how to actually get terminated pod events to see the FailedPreStopHook event. From my understanding, once a pod terminates, there's no way to actually get information or events from it (without some type of logging system installed on the cluster).

For example, the following pod will create a FailedPreStopHook event, but you can't really catch it:

apiVersion: v1 kind: Pod metadata: name: lifecycle-demo spec: containers: - name: lifecycle-demo-container image: nginx lifecycle: postStart: exec: command: ["/bin/sh","-c","ls"] preStop: exec: command: ["/bin/sh","-c","badcommand"]

In my testing, I could only observe the FailedPreStopHook if I made the postStart command fail: the postStart event would fail, generating a FailedPostStartHook event, then the preStop hook would run, generating a FailedPreStopHook event, and then Kubernetes would restart the pod (eventually reaching a CrashLoopBackoff state).

I'm thinking about this because normally with workspaces, pods are removed and not stopped, so pod logs will disappear (this output would only be visible if you were tailing the logs as the workspace stopped)

This is true.

I think I need to experiment with a cluster logging/monitoring tool to see how the output redirection behaves, because it seems that under normal circumstances, it's impossible to see pod logs or events after pod termination

if a prestop command fails, normally the output is stored in a FailedPreStopHook event.

Oddly enough, I don't see failing postStart (or preStop) command output in the event, though the Kubernetes documentation mentions that it should appear.

I can confirm however that redirecting the command output will still result in a FailedPreStopHook warning event (though this wasn't your actual concern):

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 13s default-scheduler Successfully assigned devworkspace-controller/lifecycle-demo to minikube Normal Pulled 12s kubelet Successfully pulled image "nginx" in 422.082173ms (422.088026ms including waiting) Normal Pulling 10s (x2 over 12s) kubelet Pulling image "nginx" Normal Created 10s (x2 over 11s) kubelet Created container lifecycle-demo-container Normal Started 10s (x2 over 11s) kubelet Started container lifecycle-demo-container Warning FailedPostStartHook 10s (x2 over 11s) kubelet PostStartHook failed Normal Killing 10s (x2 over 11s) kubelet FailedPostStartHook Warning FailedPreStopHook 10s (x2 over 11s) kubelet PreStopHook failed Normal Pulled 10s kubelet Successfully pulled image "nginx" in 464.96827ms (464.973595ms including waiting) Warning BackOff 8s (x2 over 9s) kubelet Back-off restarting failed container lifecycle-demo-container in pod lifecycle-demo_devworkspace-controller(2067b4a6-9f64-4148-a2a5-d6483a315e4b

The above events came from the following pod spec:

apiVersion: v1 kind: Pod metadata: name: lifecycle-demo spec: containers: - name: lifecycle-demo-container image: nginx lifecycle: postStart: exec: command: ["/bin/sh","-c","{ badcommand } 1>/proc/1/fd/1 2>/proc/1/fd/2"] preStop: exec: command: ["/bin/sh","-c","{ badcommand } 1>/proc/1/fd/1 2>/proc/1/fd/2"]

Yeah, the concern is that redirecting output means any logs from a failed hook don't get added to the event.

Sorry I forgot to mention: even if the output isn't redirected, I don't see it in the event when doing kubectl describe.

The following pod spec, which has a failing, non-redirected postStart command:

apiVersion: v1 kind: Pod metadata: name: lifecycle-demo spec: containers: - name: lifecycle-demo-container image: nginx lifecycle: postStart: exec: command: ["/bin/sh","-c","badcommand"] preStop: exec: command: ["/bin/sh","-c","{ badcommand } 1>/proc/1/fd/1 2>/proc/1/fd/2"]

Does not get its command output in the FailedPostStartHook event:

Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 5s default-scheduler Successfully assigned devworkspace-controller/lifecycle-demo to minikube Normal Pulled 4s kubelet Successfully pulled image "nginx" in 589.651197ms (589.662446ms including waiting) Normal Pulling 3s (x2 over 5s) kubelet Pulling image "nginx" Normal Created 3s (x2 over 4s) kubelet Created container lifecycle-demo-container Normal Started 3s (x2 over 4s) kubelet Started container lifecycle-demo-container Warning FailedPostStartHook 3s (x2 over 4s) kubelet PostStartHook failed Normal Killing 3s (x2 over 4s) kubelet FailedPostStartHook Warning FailedPreStopHook 3s (x2 over 4s) kubelet PreStopHook failed Normal Pulled 3s kubelet Successfully pulled image "nginx" in 460.949193ms (461.095788ms including waiting) Warning BackOff 1s (x2 over 2s) kubelet Back-off restarting failed container lifecycle-demo-container in pod lifecycle-demo_devworkspace-controller(3ba486ed-64aa-48e8-a803-65c77c10b347)

This is on Minikube v1.30.1 using Kubernetes v1.26.3 on Docker 23.0.2.

The output is contained in the message field of the event:

Without the redirect:
message: |- Exec lifecycle hook ([/bin/sh -c { echo "hello" ls -al exit 1 } ]) for Container "web-terminal" in Pod "workspace8f91680d98264907-54c9d766c4-pbzcw_dw(b52066eb-3bbf-4562-bc27-396267720503)" failed - error: command '/bin/sh -c { echo "hello" ls -al exit 1 } ' exited with 1: , message: "hello\ntotal 16\ndrwxrwxr-x 1 root root 4096 May 23 18:40 .\ndrwxr-xr-x 1 root root 4096 May 23 18:40 ..\ndrwxrwxr-x 1 root root 4096 May 23 18:40 .config\n-rw-rw---- 1 root root 533 May 23 18:40 .viminfo\n"

With the redirect:
message: |- Exec lifecycle hook ([/bin/sh -c { echo "hello" ls -al exit 1 } 1>/proc/1/fd/1 2>/proc/1/fd/2 ]) for Container "web-terminal" in Pod "workspace8f91680d98264907-c8c5f7c46-nczvc_dw(60ac330d-edcd-4c08-a566-3700173f847b)" failed - error: command '/bin/sh -c { echo "hello" ls -al exit 1 } 1>/proc/1/fd/1 2>/proc/1/fd/2 ' exited with 1: , message: ""

without the redirect, the message ends with

hello total 16 drwxrwxr-x 1 root root 4096 May 23 18:40 . drwxr-xr-x 1 root root 4096 May 23 18:40 .. drwxrwxr-x 1 root root 4096 May 23 18:40 .config -rw-rw---- 1 root root 533 May 23 18:40 .viminfo

You can view this via

kubectl get ev --field-selector reason==FailedPreStopHook -o yaml

Thank you for the example and clarification @amisevsk. To me, this is enough of a reason to not use the 1>/proc/1/fd/1 2>/proc/1/fd/2 redirect.

pkg/library/lifecycle/prestop.go

amisevsk

LGTM

openshift-ci · 2023-08-30T20:00:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk, AObuchow

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [amisevsk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Fix devfile#1139 Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>

Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>

openshift-ci · 2023-08-30T20:55:10Z

New changes are detected. LGTM label has been removed.

AObuchow · 2023-08-30T20:57:55Z

/retest

AObuchow · 2023-08-30T23:39:23Z

/retest

musienko-maxim · 2023-08-31T13:55:52Z

/test v8-images

musienko-maxim · 2023-08-31T14:16:42Z

/test v8-devworkspace-operator-e2e

musienko-maxim · 2023-08-31T20:17:22Z

/test v8-devworkspace-operator-e2e

AObuchow · 2023-09-11T18:57:22Z

/retest

AObuchow · 2023-09-26T20:07:12Z

/retest

AObuchow · 2023-10-03T20:09:56Z

/retest

AObuchow · 2023-10-11T19:35:42Z

/retest

AObuchow requested review from amisevsk and l0rd as code owners August 23, 2023 15:58

AObuchow force-pushed the preStop-events branch from 93baf21 to 77575bf Compare August 23, 2023 16:07

amisevsk reviewed Aug 24, 2023

View reviewed changes

amisevsk approved these changes Aug 30, 2023

View reviewed changes

openshift-ci bot assigned amisevsk Aug 30, 2023

openshift-ci bot added the lgtm label Aug 30, 2023

openshift-ci bot added the approved label Aug 30, 2023

AObuchow added 4 commits August 30, 2023 16:54

feat: Support events.preStop

041c893

Fix devfile#1139 Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>

chore: add tests for events.preStop

f104434

Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>

feat: remove webhook warning for events.preStop

49fc786

Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>

chore: remove events.preStop from unsupported Devfile API doc

52719a0

Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>

AObuchow force-pushed the preStop-events branch from a81c3d7 to 52719a0 Compare August 30, 2023 20:55

openshift-ci bot removed the lgtm label Aug 30, 2023

AObuchow merged commit 50b8100 into devfile:main Oct 12, 2023

AObuchow deleted the preStop-events branch October 12, 2023 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support events.preStop #1168

Support events.preStop #1168

AObuchow commented Aug 23, 2023

AObuchow commented Aug 23, 2023

codecov bot commented Aug 23, 2023 •

edited

Loading

amisevsk left a comment

amisevsk Aug 24, 2023

AObuchow Aug 25, 2023

AObuchow Aug 25, 2023

amisevsk Aug 25, 2023

AObuchow Aug 25, 2023

amisevsk Aug 29, 2023

AObuchow Aug 29, 2023

amisevsk left a comment

openshift-ci bot commented Aug 30, 2023

openshift-ci bot commented Aug 30, 2023

AObuchow commented Aug 30, 2023

AObuchow commented Aug 30, 2023

musienko-maxim commented Aug 31, 2023

musienko-maxim commented Aug 31, 2023

musienko-maxim commented Aug 31, 2023

AObuchow commented Sep 11, 2023

AObuchow commented Sep 26, 2023

AObuchow commented Oct 3, 2023

AObuchow commented Oct 11, 2023

Support events.preStop #1168

Support events.preStop #1168

Conversation

AObuchow commented Aug 23, 2023

What does this PR do?

What issues does this PR fix or reference?

Is it tested? How?

PR Checklist

AObuchow commented Aug 23, 2023

codecov bot commented Aug 23, 2023 • edited Loading

Codecov Report

amisevsk left a comment

Choose a reason for hiding this comment

amisevsk Aug 24, 2023

Choose a reason for hiding this comment

AObuchow Aug 25, 2023

Choose a reason for hiding this comment

AObuchow Aug 25, 2023

Choose a reason for hiding this comment

amisevsk Aug 25, 2023

Choose a reason for hiding this comment

AObuchow Aug 25, 2023

Choose a reason for hiding this comment

amisevsk Aug 29, 2023

Choose a reason for hiding this comment

AObuchow Aug 29, 2023

Choose a reason for hiding this comment

amisevsk left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Aug 30, 2023

openshift-ci bot commented Aug 30, 2023

AObuchow commented Aug 30, 2023

AObuchow commented Aug 30, 2023

musienko-maxim commented Aug 31, 2023

musienko-maxim commented Aug 31, 2023

musienko-maxim commented Aug 31, 2023

AObuchow commented Sep 11, 2023

AObuchow commented Sep 26, 2023

AObuchow commented Oct 3, 2023

AObuchow commented Oct 11, 2023

codecov bot commented Aug 23, 2023 •

edited

Loading