Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recording webhook: Remove webhook state #1112

Merged
merged 7 commits into from
Sep 7, 2022

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Aug 23, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR contains a self-contained change from the branch where I work on
merging recorded policies. Since the changes are useful (IMO) on their own
and the branch has been getting quite big, I decided to submit them separately.

More information about the individual commits are below and also in the
commit messages.

  • e2e: use readinessProbe to make sure we record containers when they're actually ready
    If no readiness probe is used, then we might end up recording the
    container before it is truly ready and the policy might not contain
    all the calls we need.
  • webhooks: Fix setting finalizers
    Only one call to set the status was used to set both finalizers and the
    status, which doesn't work. Let's use a separate call for setting the
    status and a separate call for setting the finalizer.
  • webhooks: Make setting finalizers and status more resilient with retries
    Especially with multiple replicas, we've seen the webhooks error out due
    to a conflict. This is problematic because the webhooks have a hard-fail
    policy. Let's retry multiple times instead.
  • e2e: test that recording finalizer prevents deleting the resource
    Adds a test for the finalizer.
  • recording webhook: don't use the replica name to track pods being recorded, use labels instead
    The recording webhook used to track container replicas in a sync.Map,
    but its usage was problematic because:
    • all replicas were deleted in case any of replicas with the same
      generatedName were used. This meant replica numbers used for
      recorded policies were being reused and not all containers would
      end up having a recorded policy
    • if a replica was removed, the profilerecording would lose the
      status and finalizers
    • using the replica tracking in the sync.Map means that the webhook
      is maintaining its state, which it shouldn't. While this patch
      doesn't remove the state completely it reduces its usage.

Instead of the above, this patch maintains the finalizer and the status as a combination of examining the pod being admitted as well as the currently existing pods which are listed during admission. The replica number is then always increasing by one e.g. across scaleups or scaledowns of replicated pods. Finally, because the webhook is watching all namespaces, but the
sync.Map was only storing plain names, let's also add namespace as a prefix for the syncMap. This allows to empty the sync.Map once no pods for a single recording exist, keeping the memory usage low over time.

  • recording: Remove "hook" from the list of the supported recording types - Removing the "hook" option enables us to remove the state from the webhook as well as simplifies the list of the supported matrix of recordings

  • recording: Remove state from the webhook, use a random nonce instead
    Instead of keeping track of the replicas in the webhook, let's use a random nonce to distinguish between different container replicas. When the profile is recorded for a replicated container, we already know the pod name, so let's use the hash of the pod to append to the container. This would also make it easier to cross-reference the pods and the containers. In order to parse the recording annotations easily, we also switch from using dashes in the recording annotations to underscores. Because underscores can't be used in k8s object names, they make it easy to split the annotations along them and are allowed in annotations.

Which issue(s) this PR fixes:

Fixes: #744

Does this PR have test?

yes

Special notes for your reviewer:

I'll point them out inline

Does this PR introduce a user-facing change?

In order to increase stability and scalability of the profile recording webhooks, the internal state of the webhooks has been removed.
The user-visible effect is that container recordings no longer include a trailing number in their name (they used to be named e.g. `myrecording-nginx-1, myrecording-nginx-2`) but instead the hash that comes from the pod's generated name.

In addition, the support hook based recording has been deprecated. The only supported modes of profile recording going forward are logs and bpf.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 23, 2022
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 23, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Aug 23, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 23, 2022

Codecov Report

Merging #1112 (b3928c9) into main (c0fd9a4) will decrease coverage by 0.46%.
The diff coverage is 50.83%.

@@            Coverage Diff             @@
##             main    #1112      +/-   ##
==========================================
- Coverage   50.68%   50.21%   -0.47%     
==========================================
  Files          42       42              
  Lines        4735     4809      +74     
==========================================
+ Hits         2400     2415      +15     
- Misses       2259     2314      +55     
- Partials       76       80       +4     

@@ -37,7 +37,7 @@ import (
)

var (
replicas int32 = 3
replicas int32 = 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brings a related question -- since we only run a single replica, but the webhook listens to all namespaces, wouldn't it become a bottleneck?

I think the most systematic solution would be to actually remove the replica tracking from the webhook and make it truly stateless. The replica tracking is used for two things (AFAICS):

  1. when recording with the hook, to use in the filename recorded into
  2. to discern between different container replicas

For both, I guess we could use a random number instead, but I was thinking if it was acceptable to deprecate support for the hook based recording and make merging of policies the default once we have it? Frankly, I don't see much reason in having separate policies for instances of individual containers, wouldn't you want to have a single policy instead? The hook based recording, if we didn't want to deprecate it, could even use random numbers instead.

Alternatively, we could have a single gRPC endpoint just to track the replicas, but that brings even more complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be up for removing support for the hook. It adds extra complexity and we do have other alternatives to offer. We really shouldn't have anything stateful in the webhook... ideally, if we need to keep track of something, we should have a separate system to track the stateful bits... e.g. we might want to start leveraging Redis and probing it from the webhook. Having a performant distributed cache we could query would help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was considering just moving the sync.Map to the operator (controller) and have it fronted by a gRPC API to get the next replica number, but the simpler we can make the webhook the better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an option, but I'd just highly suggest using a dedicated service that's meant for caching, e.g. Redis. It's less code we'd have to maintain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for removing the hook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. I'll work on removing the hook and then I'll see if we can adjust the code to get away with the using e.g. random numbers for the individually recorded container replicas. Afterwards, we'll be able to go back to multiple webhook pods.

Please let me know if I should make that work part of this PR or a follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be part of this PR :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -352,6 +352,13 @@ spec:
containers:
- name: nginx
image: quay.io/security-profiles-operator/test-nginx-unprivileged:1.21
ports:
- containerPort: 8080
readinessProbe:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be something to recommend in our docs

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 6, 2022
@jhrozek jhrozek changed the title recording webhook: Run only one replica in the recording deployment, make container replica tracking more predictable recording webhook: Remove webhook state Sep 6, 2022
@jhrozek
Copy link
Contributor Author

jhrozek commented Sep 6, 2022

oops unit tests

@jhrozek
Copy link
Contributor Author

jhrozek commented Sep 6, 2022

welp:

cgo: C compiler "gcc" not found: exec: "gcc": executable file not found in $PATH

@jhrozek jhrozek added the kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. label Sep 6, 2022
@jhrozek
Copy link
Contributor Author

jhrozek commented Sep 6, 2022

welp:

cgo: C compiler "gcc" not found: exec: "gcc": executable file not found in $PATH

ah! Previously we were installing oci-seccomp-bpf-hook which was dragging in gcc through the kernel-devel dependency. So at least it is not a regressions.

@jhrozek
Copy link
Contributor Author

jhrozek commented Sep 6, 2022

 Unable to connect to the server: net/http: TLS handshake timeout 

/test pull-security-profiles-operator-test-e2e

…e actually ready

If no readiness probe is used, then we might end up recording the
container before it is truly ready and the policy might not contain
all the calls we need.
Only one call to set the status was used to set both finalizers and the
status, which doesn't work. Let's use a separate call for setting the
status and a separate call for setting the finalizer.
Especially with multiple replicas, we've seen the webhooks error out due
to a conflict. This is problematic because the webhooks have a hard-fail
policy.

Let's retry multiple times instead.
…orded, use labels instead

The recording webhook used to track container replicas in a sync.Map,
but its usage was problematic because:
    - all replicas were deleted in case any of replicas with the same
      generatedName were used. This meant replica numbers used for
      recorded policies were being reused and not all containers would
      end up having a recorded policy
    - if a replica was removed, the profilerecording would lose the
      status and finalizers
    - using the replica tracking in the sync.Map means that the webhook
      is maintaining its state, which it shouldn't. While this patch
      doesn't remove the state completely it reduces its usage.

Instead of the above, this patch maintains the finalizer and the status
as a combination of examining the pod being admitted as well as the
currently existing pods which are listed during admission.

The replica number is then always increasing by one e.g. across scaleups
or scaledowns of replicated pods.

Finally, because the webhook is watching all namespaces, but the
sync.Map was only storing plain names, let's also add namespace as a
prefix for the syncMap. This allows to empty the sync.Map once no pods
for a single recording exist, keeping the memory usage low over time.
Removing the "hook" option enables us to remove the state from the
webhook as well as simplifies the list of the supported matrix of
recordings.
Instead of keeping track of the replicas in the webhook, let's use a
random nonce to distinguish between different container replicas. When
the profile is recorded for a replicated container, we already know the
pod name, so let's use the hash of the pod to append to the container.
This would also make it easier to cross-reference the pods and the
containers.

In order to parse the recording annotations easily, we also switch from
using dashes in the recording annotations to underscores. Because
underscores can't be used in k8s object names, they make it easy to
split the annotations along them and are allowed in annotations.
@jhrozek
Copy link
Contributor Author

jhrozek commented Sep 7, 2022

rebased to pick up the recent fixes for the flaky tests -- most of the recording tests are marked as flaky and I want to make sure they pass as well (they did locally, but I was typically running a subset only)

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work, thank you so much!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhrozek, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jhrozek,saschagrunert]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@saschagrunert saschagrunert mentioned this pull request Sep 7, 2022
@jhrozek
Copy link
Contributor Author

jhrozek commented Sep 7, 2022

ubuntu flaky e2e tests passed:
https://github.com/kubernetes-sigs/security-profiles-operator/runs/8223349760?check_suite_focus=true#step:9:6819
/hold until the others pass as well

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 7, 2022
@jhrozek
Copy link
Contributor Author

jhrozek commented Sep 7, 2022

hmm, in flatcar tests I still see:

    e2e_flaky_test.go:23: Skipping flaky tests

going to look into it. But I don't think that flatcar tests anything that fedora and ubuntu wouldn't test..

@jhrozek
Copy link
Contributor Author

jhrozek commented Sep 7, 2022

@jhrozek
Copy link
Contributor Author

jhrozek commented Sep 7, 2022

/hold cancel

@jhrozek jhrozek removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 7, 2022
@k8s-ci-robot k8s-ci-robot merged commit 1e4dfea into kubernetes-sigs:main Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/deprecation Categorizes issue or PR as related to a feature/enhancement marked for deprecation. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The recording webhook's resource updating is racy
5 participants