Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorter machineset names #8585

Closed
nickperry opened this issue May 2, 2023 · 10 comments · Fixed by #9298
Closed

Shorter machineset names #8585

nickperry opened this issue May 2, 2023 · 10 comments · Fixed by #9298
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@nickperry
Copy link

nickperry commented May 2, 2023

What would you like to be added (User Story)?

Since 1.4.0, machineset names have been longer due to the inclusion of an additional random string after the template hash.

This ultimately gets included in the cluster's node names, which can present a problem. Node names can only be up to 63 characters at this time.

Particularly for users who have multiple machineDeployments with descriptive names, or are using FQDNs for their node names, every character counts. The longer machineset names can make the difference between viable or unviable node names and cause users to have to rename their machineDeployments to less descriptive names.

Detailed Description

https://github.com/kubernetes-sigs/cluster-api/blame/main/internal/controllers/machinedeployment/machinedeployment_sync.go#L240

		// Note: In previous Cluster API versions (< v1.4.0), the label value was the hash of the full machine
		// template. With the introduction of in-place mutation the machine template of the MachineSet can change.
		// Because of that it is impossible that the label's value to always be the hash of the full machine template.
		// (Because the hash changes when the machine template changes).
		// As a result, we use the hash of the machine template while ignoring all in-place mutable fields, i.e. the
		// machine template with only fields that could trigger a rollout for the machine-template-hash, making it
		// independent of the changes to any in-place mutable fields.

...

                templateHash, err := hash.Compute(mdutil.MachineTemplateDeepCopyRolloutFields(&deployment.Spec.Template))

...
		// Append a random string at the end of template hash. This is required to distinguish MachineSets that
		// could be created with the same spec as a result of rolloutAfter. If not, computeDesiredMachineSet
		// will end up updating the existing MachineSet instead of creating a new one.
		uniqueIdentifierLabelValue = fmt.Sprintf("%d-%s", templateHash, apirand.String(5))

https://kubernetes.slack.com/archives/C8TSNPY4T/p1682698175472899

Anything else you would like to add?

Might it be viable to use just the random string for the uniqueIdentifierLabelValue and drop the hash portion?

Label(s) to be applied

/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2023
@sbueringer
Copy link
Member

sbueringer commented May 2, 2023

I think in uniqueIdentifierLabelValue we want to have both, for context see: #8110 (comment)

But I think this doesn't necessarily mean that we need both in the MachineSet name. As far as I know the MachineSet name is now independent of the uniqueIdentifierLabelValue.

So I wonder if we can do this:

name = computeNewMachineSetName(deployment.Name+"-", apirand.String(5))

instead of

name = computeNewMachineSetName(deployment.Name+"-", apirand.SafeEncodeString(uniqueIdentifierLabelValue))

I don't see the benefit of including the uniqueIdentifierLabelValue in its encoded form in the MachineSet name (or I just forgot why it was important)

@sbueringer
Copy link
Member

sbueringer commented May 2, 2023

Looking into this a bit closer. I think our goal was to limit MachineSet names to max 63 characters. But I think we don't achieve this with our current code

func computeNewMachineSetName(base, suffix string) string {
if len(base) > maxGeneratedNameLength {
base = base[:maxGeneratedNameLength]
}
return fmt.Sprintf("%s%s", base, suffix)
}

This ensures base (which is deployment.Name+"-") is reduced to at most 58 characters and then we add something which can be longer than 5 (suffix which is apirand.SafeEncodeString(uniqueIdentifierLabelValue)).

(This would be also solved by passing in apirand.String(5) as suffix)

@sbueringer
Copy link
Member

cc @ykakarap @fabriziopandini

@sbueringer
Copy link
Member

cc @killianmuldoon

@nickperry
Copy link
Author

nickperry commented May 2, 2023

An example (obfustacted but representative of length) machine name in our environment...

foo-wl-01-zookeeper-zk9v9-7c45bf7958xqccjd-b2w96

with our environment DNS domain appended, the node name would be

foo-wl-01-zookeeper-zk9v9-7c45bf7958xqccjd-b2w96.foo.bar.blah.io

This is one character too long and it will not be able to join the cluster.

@enxebre
Copy link
Member

enxebre commented May 2, 2023

I don't see the benefit of including the uniqueIdentifierLabelValue in its encoded form in the MachineSet name (or I just forgot why it was important)

We created the computeNewMachineSetName abstraction here https://github.com/kubernetes-sigs/cluster-api/pull/8110/files#diff-d9a80cabbd6ae2ad7412b4e57a7f3d474b4816a021e4b84987c0895c6b1189c6R246 preserving existing behaviour.

Then we included the additional apirand(5) here to avoid name clashing during rolloutAfter operations
https://github.com/kubernetes-sigs/cluster-api/pull/8216/files#diff-d9a80cabbd6ae2ad7412b4e57a7f3d474b4816a021e4b84987c0895c6b1189c6R243
breaking the length assumption.

Should we let computeNewMachineSetName account for the suffix length and substract that from the base?
In any case I think we should include a unit to validate that computeNewMachineSetName can not return a string longer than the allowed length.

@enxebre
Copy link
Member

enxebre commented May 2, 2023

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 2, 2023
@sbueringer
Copy link
Member

sbueringer commented May 2, 2023

I think we should just use deployment.Name+"-"+apirand.String(5) as MachineSet name + implement it in a way that we ensure we don't hit the max length.

I don't see any reason to include uniqueIdentifierLabelValue in the MachineSet name (after appending the random string to the template hash + encoding it's unrecognizable anyway)

Example: https://storage.googleapis.com/kubernetes-jenkins/logs/periodic-cluster-api-e2e-main/1653315829829210112/artifacts/clusters/bootstrap/resources/clusterclass-rollout-pdvhn2/MachineSet/clusterclass-rollout-i2j56w-md-0-fv4xl-66b8656df6x8g89s.yaml
MS name: clusterclass-rollout-i2j56w-md-0-fv4xl-66b8656df6x8g89s
machine-template-hash: 2264212892-4p4kz (uniqueIdentifierLabelValue)

@ykakarap
Copy link
Contributor

I don't see the benefit of including the uniqueIdentifierLabelValue in its encoded form in the MachineSet name (or I just forgot why it was important)

IIRC, we originally included the template hash (encoded) in the name to match what was done before in-place propagation. Historical reasons for doing this was to handle cache miss and deal with cases where the controller mistakenly tries to re-create a duplicate MS. But now that we have polling and the random suffix in the names all of this no longer applies.

It should be safe to drop the hash from the name as long as we keep the random suffix. This, as @sbueringer mentioned, should also fix bug that MS names should be <= 63 characters in the current code.

Also, the computeNewMachineSetName assumes that suffix is not more than 5 characters long. Probably best to have that check in that function.

@chrischdi
Copy link
Member

Let me tackle this :-)

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
6 participants