Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seccomp profile installation race condition #2356

Closed
mlimsfdc opened this issue Jul 12, 2024 · 2 comments
Closed

Seccomp profile installation race condition #2356

mlimsfdc opened this issue Jul 12, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@mlimsfdc
Copy link

What happened:

We are seeing a race condition in the operator where the spod daemon reports as healthy before the seccomp profile gets installed on the node. This results in flaky behavior occasionally where the pod crashes on start expecting a seccomp profile that hasn’t yet been installed. The events of the pod occasionally show this behavior when the pod fails to start with a message that the seccomp profile is not in the correct path.

What you expected to happen:

The seccomp profile should always exist on the node before a pod with the security context set is applied / scheduled for creation. We are hoping to resolve this and thought of two possibilities, but would need to run it by the maintainers here.

Option 1:

  • Put a taint on the node that spod is scheduled on, and then remove the taint when the seccomp profile is installed on the node and proceed with pod installation.

Option 2: (outside of the upstream operator)

  • Our pods will have an init container that checks for the existence of the seccomp profile on the localhost path before proceeding.

How to reproduce it (as minimally and precisely as possible):

As it is a race condition, it’s difficult to always reproduce.

Anything else we need to know?:

Environment:

  • Cloud provider or hardware configuration: AWS EKS
  • OS (e.g: cat /etc/os-release): Mac OS AMD64 / ARM64
  • Kernel (e.g. uname -a): Darwin Kernel Version 23.5.0

@mgates

@mlimsfdc mlimsfdc added the kind/bug Categorizes issue or PR as related to a bug. label Jul 12, 2024
@ccojocar
Copy link
Contributor

ccojocar commented Jul 13, 2024

How does the lifecycle of the deployment look like? Are you deploying the seccomp profile manifest in the same time with the pod manifest?

It is possible to check the status in seccompprofile resource, also in the node status, and only proceed when is ready with the rest of the deployment which depends on that seccomp resource.

Put a taint on the node that spod is scheduled on, and then remove the taint when the seccomp profile is installed on the node and proceed with pod installation.

Typically spod runs as a daemonset on all nodes of the cluster. You can also constrain its scheduling.

@ccojocar
Copy link
Contributor

I am closing this issue since there aren't any action items on our side. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants