Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control plane init generates userdata larger than max user data size in AWS #510

Open
ttreptow opened this issue Dec 2, 2024 · 3 comments
Assignees
Labels
kind/bug Something isn't working priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ttreptow
Copy link
Contributor

ttreptow commented Dec 2, 2024

What happened:

Creating a new cluster with rke2 provider v0.8.0, the AWSMachine fails to come up due to the error

"failed to create AWSMachine instance: failed to run instance: InvalidParameterValue: Encoded User data is limited to 25600 bytes"

It mostly seems to affect the first control plane node since the userdata for that node (control plane init) is larger.

Note that the limit is 25600 base64 encoded or around 16k unencoded. This is easily exceeded due to the certs included in the user data as well as rke2 config and any start up scripts needed.

What did you expect to happen:

First node comes up without error

How to reproduce it:

Create an RKE2 cluster using rke2 provider v0.8.0 and cluster-api AWS (CAPA)

Anything else you would like to add:
cloud-init supports gzipped userdata. If i manually replace the generated data with base64 encoded gzipped data, then the node comes up fine. I think the fix would be to always gzip cloud-init userdata.

Environment:

  • rke provider version: v0.8.0
  • OS (e.g. from /etc/os-release): ubuntu 22.04 LTS
@ttreptow ttreptow added kind/bug Something isn't working needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 2, 2024
@ttreptow ttreptow changed the title Control plane init generates userdata larger than max suer data size in AWS Control plane init generates userdata larger than max user data size in AWS Dec 2, 2024
@ttreptow ttreptow mentioned this issue Dec 11, 2024
4 tasks
@alexander-demicev alexander-demicev added this to the v0.11.0 milestone Jan 3, 2025
@alexander-demicev alexander-demicev added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2025
@ttreptow
Copy link
Contributor Author

ttreptow commented Jan 6, 2025

I have a branch with a fix for this (ignore the closed PR, that was incomplete)

I have a design question first:

My fix preemptively gzips all cloudinit userdata. Alternatively we could have a threshold where if the size is less than the threshold it will leave it untouched but then you'd need to figure out a good threshold value across various infrastructure providers. Always compressing is much simpler but it will be a noticeable change to people (and tools?) expecting uncompressed data.

I've been running this change in my clusters without issue

@Danil-Grigorev Danil-Grigorev self-assigned this Jan 10, 2025
@Danil-Grigorev
Copy link
Contributor

@ttreptow Have you tried individually gzip+base64 encoding certificate files in the cloudinit and specifying it in Encoding section? It is supported by the current bootstrapping implementation, and does not suffer from the downside you mentioned.

@alexander-demicev alexander-demicev removed this from the v0.11.0 milestone Jan 14, 2025
@ttreptow
Copy link
Contributor Author

Just to clarify, the cert files in question are generated by cluster-api-provider-rke2 for the etcd cluster and api server. Do you mean gzipping the certs individually here for example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

No branches or pull requests

3 participants