Explain why and how to isolate the cert-manager workloads #1331

wallrj · 2023-10-20T16:27:22Z

Preview: https://deploy-preview-1331--cert-manager-website.netlify.app/docs/installation/best-practice/#isolate-cert-manager-on-dedicated-node-pools

Followup to #1330
Fixes: cert-manager/cert-manager#5211

In this PR I want to give an example of how to use the affinity and toleration Helm values
and I propose running the cert-manager Pods on dedicated "platform" nodes,
for security reasons, but there may be other good use cases.

@erikgb Please take a look

netlify · 2023-10-20T16:30:17Z

✅ Deploy Preview for cert-manager-website ready!

Name	Link
🔨 Latest commit	`7b22cba`
🔍 Latest deploy log	https://app.netlify.com/sites/cert-manager-website/deploys/6538dd7f64c9fc00080f21e5
😎 Deploy Preview	https://deploy-preview-1331--cert-manager-website.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Signed-off-by: Richard Wall <richard.wall@venafi.com>

content/docs/installation/best-practice.md

erikgb

Thanks for working on this, @wallrj!

We simply use https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector to schedule cert-manager workloads to dedicated platform nodes. I think that should be included as the simplest way of achieving the desired goal. The examples you have put up are simpler to express with nodeselector, I think?

Signed-off-by: Richard Wall <richard.wall@venafi.com>

…ages Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj · 2023-10-23T11:58:00Z

We simply use https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector to schedule cert-manager workloads to dedicated platform nodes. I think that should be included as the simplest way of achieving the desired goal. The examples you have put up are simpler to express with nodeselector, I think?

Done. I agree that nodeSelector is much simpler and works the same as nodeAffinity so I've changed it.
I had to explain that there is a default OS nodeSelector which you must explicitly add to your values.

content/docs/installation/best-practice.md

Co-authored-by: Josh Soref <2119212+jsoref@users.noreply.github.com> Signed-off-by: Richard Wall <wallrj@users.noreply.github.com>

erikgb

I am not an expert in this area, but why do we need tolerations in addition to the nodeSelector? Looks like a duplication to me, and according to https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector, just the nodeSelector should be sufficient. And this setup works well for us.

Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj · 2023-10-24T08:51:07Z

I am not an expert in this area, but why do we need tolerations in addition to the nodeSelector? Looks like a duplication to me, and according to https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector, just the nodeSelector should be sufficient. And this setup works well for us.

I'm not an expert either, but I guess one reason is:

in a multi-tenant cluster,
non-platform Pods might not have nodeSelector or affinity.nodeAffinity settings,
so those could be scheduled to any Nodes, including those that you reserved for your platform's Pods.
You can prevent this by adding a taint to the platform Nodes.

How do you prevent this happening in your cluster?
Do you have an admission webhook that overrides nodeSelector of the Pods of your tenants?

E.g. PodNodeSelector: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#podnodeselector

I will make it clearer that there are various solutions to this problem and that this is only one suggestion.

hawksight

A really useful addition. Had a quick read and looks good to me and seems better to have this live now and correct anything later if needed.

gintautassulskus · 2023-10-24T20:00:53Z

Great contribution, @wallrj! The documentation sounds correct. A minor remark, I do not recognise cainjection and startupapicheck used in the example, but it's been a long time since I used this package.

erikgb

/lgtm

Very nice addition to the docs, @wallrj

jetstack-bot · 2023-10-25T06:04:49Z

@erikgb: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

/lgtm

Very nice addition to the docs, @wallrj

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

erikgb · 2023-10-25T06:07:07Z

How do you prevent this happening in your cluster? Do you have an admission webhook that overrides nodeSelector of the Pods of your tenants?

No idea, but I can check. 😉 We run on Openshift, and it's usually pretty "secure by default".

erikgb · 2023-10-25T06:29:37Z

How do you prevent this happening in your cluster? Do you have an admission webhook that overrides nodeSelector of the Pods of your tenants?

No idea, but I can check. 😉 We run on Openshift, and it's usually pretty "secure by default".

I think this is handled by some Openshift "magic" described here. When a normal user, without write access to namespace resources, schedules a workload, the pod will always get a "worker" label added to the pod nodeSelector. Since none of our nodes matches worker + something else, it means that all end-user workloads will be scheduled on worker nodes. Or not scheduled at all - if a user tries to set nodeSelector. Our platform team can override this default behavior by annotating system namespaces - like cert-manager.

Signed-off-by: Richard Wall <richard.wall@venafi.com>

@schelv

Thanks @schelv for the suggestion. Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj · 2023-10-25T08:43:57Z

I think this is handled by some Openshift "magic" described here.

@erikgb Thanks so much for digging into that! I've added a link to that doc.

content/docs/installation/best-practice.md

SgtCoDFish

/lgtm
/approve
/hold

There's a reasonable looking comment just been added, so I've added a hold if you want to incorporate that suggestion. Ping me for a re-review if needed!

wallrj · 2023-10-25T12:38:24Z

/hold cancel

SgtCoDFish

/lgtm
/approve

jetstack-bot · 2023-10-25T12:57:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: erikgb, hawksight, SgtCoDFish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [SgtCoDFish]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Thanks for all your documentation reviews Peter, for example: * cert-manager#1344 * cert-manager#1338 * cert-manager#1331 We'd like you to be able to `lgtm` future PRs, so we're adding you to the reviewers list. Signed-off-by: Richard Wall <richard.wall@venafi.com>

jetstack-bot added the dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. label Oct 20, 2023

wallrj requested a review from hawksight October 20, 2023 16:27

jetstack-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 20, 2023

wallrj requested a review from erikgb October 20, 2023 16:27

Explain why and how to isolate the cert-manager workloads

faeaf0e

Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj force-pushed the node-placement-recommendations branch from 1f0abd9 to faeaf0e Compare October 20, 2023 16:31

jsoref reviewed Oct 20, 2023

View reviewed changes

erikgb reviewed Oct 20, 2023

View reviewed changes

wallrj added 2 commits October 23, 2023 12:42

Use nodeSelector instead of affinity, for simplicity

ac18af6

Signed-off-by: Richard Wall <richard.wall@venafi.com>

Link to all the general documentation sites in addition to specific p…

6fd4d3f

…ages Signed-off-by: Richard Wall <richard.wall@venafi.com>

jetstack-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 23, 2023

wallrj requested review from erikgb and jsoref October 23, 2023 11:58

jsoref reviewed Oct 23, 2023

View reviewed changes

content/docs/installation/best-practice.md Outdated Show resolved Hide resolved

Update content/docs/installation/best-practice.md

4195d60

Co-authored-by: Josh Soref <2119212+jsoref@users.noreply.github.com> Signed-off-by: Richard Wall <wallrj@users.noreply.github.com>

This was referenced Oct 23, 2023

Question about tolerations cert-manager/cert-manager#5211

Closed

added affinity and tolerations cert-manager/cert-manager#869

Merged

erikgb reviewed Oct 23, 2023

View reviewed changes

Make it clear that the use of taints and toleration is only an example

0a539e5

Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj requested a review from erikgb October 24, 2023 08:51

jetstack-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 24, 2023

hawksight approved these changes Oct 24, 2023

View reviewed changes

erikgb approved these changes Oct 25, 2023

View reviewed changes

wallrj added 2 commits October 25, 2023 08:31

Add link to RedHat OpenShift pod placement documentation

5afffde

Signed-off-by: Richard Wall <richard.wall@venafi.com>

Move links to pod placement docs nearer to where concepts are introduced

7b22cba

Thanks @schelv for the suggestion. Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj requested a review from SgtCoDFish October 25, 2023 08:45

schelv reviewed Oct 25, 2023

View reviewed changes

content/docs/installation/best-practice.md Outdated Show resolved Hide resolved

SgtCoDFish approved these changes Oct 25, 2023

View reviewed changes

wallrj requested a review from SgtCoDFish October 25, 2023 09:19

jetstack-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 25, 2023

SgtCoDFish approved these changes Oct 25, 2023

View reviewed changes

jetstack-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2023

SgtCoDFish merged commit 8bc7e89 into cert-manager:master Oct 25, 2023
4 checks passed

wallrj deleted the node-placement-recommendations branch October 25, 2023 13:14

wallrj mentioned this pull request Nov 15, 2023

Make Peter Fiddes (@hawksight) a reviewer #1346

Merged

jetstack-bot mentioned this pull request Dec 7, 2023

[release-next] Fast-forward master into release-next #1357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain why and how to isolate the cert-manager workloads #1331

Explain why and how to isolate the cert-manager workloads #1331

wallrj commented Oct 20, 2023 •

edited

Loading

netlify bot commented Oct 20, 2023 •

edited

Loading

erikgb left a comment

wallrj commented Oct 23, 2023

erikgb left a comment

wallrj commented Oct 24, 2023 •

edited

Loading

hawksight left a comment

gintautassulskus commented Oct 24, 2023

erikgb left a comment

jetstack-bot commented Oct 25, 2023

erikgb commented Oct 25, 2023

erikgb commented Oct 25, 2023 •

edited

Loading

wallrj commented Oct 25, 2023

SgtCoDFish left a comment

wallrj commented Oct 25, 2023

SgtCoDFish left a comment

jetstack-bot commented Oct 25, 2023

Explain why and how to isolate the cert-manager workloads #1331

Explain why and how to isolate the cert-manager workloads #1331

Conversation

wallrj commented Oct 20, 2023 • edited Loading

netlify bot commented Oct 20, 2023 • edited Loading

✅ Deploy Preview for cert-manager-website ready!

erikgb left a comment

Choose a reason for hiding this comment

wallrj commented Oct 23, 2023

erikgb left a comment

Choose a reason for hiding this comment

wallrj commented Oct 24, 2023 • edited Loading

hawksight left a comment

Choose a reason for hiding this comment

gintautassulskus commented Oct 24, 2023

erikgb left a comment

Choose a reason for hiding this comment

jetstack-bot commented Oct 25, 2023

erikgb commented Oct 25, 2023

erikgb commented Oct 25, 2023 • edited Loading

wallrj commented Oct 25, 2023

SgtCoDFish left a comment

Choose a reason for hiding this comment

wallrj commented Oct 25, 2023

SgtCoDFish left a comment

Choose a reason for hiding this comment

jetstack-bot commented Oct 25, 2023

wallrj commented Oct 20, 2023 •

edited

Loading

netlify bot commented Oct 20, 2023 •

edited

Loading

wallrj commented Oct 24, 2023 •

edited

Loading

erikgb commented Oct 25, 2023 •

edited

Loading