Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable AKS control plane create webhook when paused #3341

Conversation

jackfrancis
Copy link
Contributor

@jackfrancis jackfrancis commented Mar 21, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR disables the create webhook for AKS control plane data when the cluster is paused, which is the cluster state that a "clusterctl move" operation yields.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Contributes to #3303

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

disable AKS control plane create webhook when paused

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 21, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jackfrancis. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nawazkh
Copy link
Member

nawazkh commented Mar 21, 2023

This is for my own understanding;

  • Can someone "pause" a cluster while it is spinning up to disable creation of webhooks ? I am guessing that the non-existent cluster cannot be paused, but would like to confirm..

  • Can disabling creation of webhooks, as done in this PR, be a loophole where a malign user feed the cluster with random values to try break it ?

@jackfrancis jackfrancis added this to the v1.9 milestone Mar 23, 2023
@jackfrancis
Copy link
Contributor Author

This is for my own understanding;

  • Can someone "pause" a cluster while it is spinning up to disable creation of webhooks ? I am guessing that the non-existent cluster cannot be paused, but would like to confirm..
  • Can disabling creation of webhooks, as done in this PR, be a loophole where a malign user feed the cluster with random values to try break it ?

I believe something like what you outline is possible. I'm not sure that a user creating and then breaking its own CAPI cluster in this way would qualify as an attack vector. The existing create webhook is meant as a user convenience of last resort, to remind the user if this data is pre-populated at cluster creation time, that that data configuration is invalid (the AKS service will be authoritative for populating the apiserver endpoint during its own creation lifecycle, not the user) and not allowed by a user.

@CecileRobertMichon
Copy link
Contributor

/cherry-pick release-1.8
/cherry-pick release-1.7

@k8s-infra-cherrypick-robot

@CecileRobertMichon: once the present PR merges, I will cherry-pick it on top of release-1.8 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.8
/cherry-pick release-1.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jackfrancis
Copy link
Contributor Author

/retest

Comment on lines +121 to +122
Namespace: m.Namespace,
Name: m.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this making a similar assumption to the one that's being addressed in #3322? The owner references would probably be the easiest place to check for the Cluster's namespace/name, but that isn't set until after the AzureManagedControlPlane has been created. Otherwise it seems like we'd have to list all Clusters in all namespaces and see if any match this AzureManagedControlPlane and are paused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely true that the name of an AzureManagedControlPlane can be different from the name of the referencing cluster.

I think we can probably introspect each Cluster for Spec.ControlPlaneRef.Name == the name of the AzureManagedControlPlane; is that what you're thinking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking of exactly that. Not sure there's any other way to correlate those right when the object is created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thought I had was to list with a field selector on the spec.controlPlaneRef.* fields but it looks like those aren't supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And do we know for sure that clusterctl move will always create the Cluster before the AzureManagedCluster? The CAPI webhooks appear to allow either to be created first. If we have to account for when no matching Cluster exists yet here, I'm thinking we'd probably want to skip the controlPlaneEndpoint checks in that case to make sure we're not blocking a valid clusterctl move?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed maybe the most practical solution here is to remove these Create checks entirely. They were implemented defensively to protect the user from accidentally doing something wrong, but the outcome of "wrong" in this case is worst-case scenario a never-actually-manageable cluster, but best-case scenario the AKS controllers overwrite those user-provided values with the correct ones from the AKS API anyways, and so it literally has not effect. (We'd need to look into this to know for sure which outcome is the actual one. But either of those is probably not sufficiently catastrophic or likely to happen to justify all this additional webhook complexity.)

Thoughts?

cc @CecileRobertMichon @mtougeron

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed maybe the most practical solution here is to remove these Create checks entirely.

I tend to agree

the AKS controllers overwrite those user-provided values with the correct ones from the AKS API anyways

are we sure this is still the case? I seem to remember we changed it to only set them when empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed UT tells the tale:

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/v1.8.1/api/v1beta1/azuremanagedcontrolplane_webhook_test.go#L1228-L1278

I'll submit a PR that removes create webhooks for this and AzureCluster and we can discuss there.

@CecileRobertMichon
Copy link
Contributor

/close

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jackfrancis jackfrancis deleted the aks-host-port-webhook-paused branch March 24, 2023 21:26
@CecileRobertMichon
Copy link
Contributor

@jackfrancis kubernetes-sigs/cluster-api#8322 should help us do something like this in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants