Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger machine pool instance refresh also on user data change (unless it's only the bootstrap token) #4245

Conversation

AndiDog
Copy link
Contributor

@AndiDog AndiDog commented May 4, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

Partial fix for #4071

If a user changes instance bootstrapping-related fields such as KubeadmConfig.spec.{files,preKubeadmCommands,...}, new instances should get rolled out. This change ensures triggering an ASG instance refresh if more than only the bootstrap token has changed inside user data (= value of the bootstrap data secret generated by CAPI). This assumes CAPI's cloud-init template as user data format. If that format cannot be parsed, we take the safe route and do not trigger an instance refresh.

Once CAPA reconciles, the change does what it should. The problem is that on a change to the KubeadmConfig object, for example, reconciliation won't be triggered immediately, but the time until next reconciliation still depends on the bootstrap token refresh interval/TTL. I'm working on fixing that in separate CAPI/CAPA PRs. Therefore, this PR only partially fixes #4071.

Checklist:

  • squashed commits
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

Release note:

Trigger machine pool instance refresh also on user data change (unless it's only the bootstrap token)

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority labels May 4, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 4, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @AndiDog. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@AndiDog
Copy link
Contributor Author

AndiDog commented May 15, 2023

If CAPI introduced a sort of checksum into the bootstrap secret, we could get rid of the complicated diffing code and thereby support any format (cloud-init, ignition). Since I'm anyway working on CAPI so that it reconciles the secret immediately, I will consider that. For now, let's focus this PR review on the idea of triggering an instance refresh in case of relevant changes.

@AndiDog
Copy link
Contributor Author

AndiDog commented May 15, 2023

As discussed in CAPA office hours, this is probably similar to #4195: the parent config (for me: KubeadmConfig) can change and should trigger node rollout. We agreed to clarify how machine pools should actually behave, such as "should they reference a KubeadmConfigTemplate, or why don't they already?".

@cnmcavoy
Copy link
Contributor

I did some followup of my own from the CAPA office hours...

As discussed in CAPA office hours, this is probably similar to #4195: the parent config (for me: KubeadmConfig) can change and should trigger node rollout. We agreed to clarify how machine pools should actually behave, such as "should they reference a KubeadmConfigTemplate, or why don't they already?".

I lack historical context, but I did a tiny bit of digging as we're also interested in making MachinePools better, and it looks like the bootstrap data being mutable and shared were understood in the original proposal: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20190919-machinepool-api.md

The interaction between MachinePool <-> CABPK will be identical to Machine <-> CABPK except in the following cases:
A KubeadmConfig will be shared by all instances in a MachinePool versus a KubeadmConfig per Machine

kubernetes-sigs/cluster-api#5294 looks like it was tracking work at improving this code area back in 2020-2021, but failed to move forward.

Based on all that, it seems like using the KubeadmConfig is intended.

WDYT?

@AndiDog
Copy link
Contributor Author

AndiDog commented Jun 12, 2023

I'll try and discuss this more generically in CAPI office hours. Let's pause any review of this PR for now.

@AndiDog
Copy link
Contributor Author

AndiDog commented Aug 21, 2023

This is way too hacky. CAPI should provide a contract so providers can check if there's a change. Let's continue solving this generically in kubernetes-sigs/cluster-api#8858.

@AndiDog AndiDog closed this Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updating MachinePool, AWSMachinePool, and KubeadmConfig resources does not trigger an ASG instanceRefresh
3 participants