Make it possible to change infra machine naming scheme #10463

lentzi90 · 2024-04-18T09:52:59Z

What would you like to be added (User Story)?

As an operator with existing tooling around the infra resources, I would like a way to influence the naming of the infra machines to reduce the impact of the breaking change that was made to how they are named.
As an operator I would like to change the naming scheme of the nodes during the lifecycle of the cluster since the end user may want a different scheme than what was used by the operators when creating the KCP or MD.
~~As an operator I would like to name the nodes in different ways than the Machines since the end user may benefit from a different naming scheme than what is used by the operators.~~

Detailed Description

There is a breaking change in v1.7.0 (#9833) that changes the way objects gets their names. Previously infra machines would get their name from the infra machine template, and this would then usually (?) propagate to the hostname and ultimately the node.
The change is that the infra machines now get their name from the machine. This is nice for matching the different objects without looking through provider IDs and references. ~~However, the name also propagates to the node by default and that is not always ideal.~~ However, it means old clusters now get their node names from something that didn't influence it in the past, causing confusion for users.

The name also propagates to many infra resources created by the infra provider (in many cases). This means that on upgrade to v1.7.0 we would suddenly get a completely different naming scheme of the cloud resources. We have unfortunately grown used to the old naming scheme and we even have tooling for tracking the cloud resources based on this. With the breaking change, all of this will stop working with no way to getting back the old behavior.

We have also gotten used to, and currently relies on, setting the node name prefix through the infra machine template. Since the infra machine template can easily be replaced, it has allowed us to change the naming scheme of the nodes whenever we want through a rollout. The new system is unfortunately less flexible.

The control plane name can never change, so it is impossible* to change the naming scheme of control plane nodes during the lifecycle of the cluster. MachineDeployments cannot change either, but can at least be replaced. This is less convenient than just a rollout though.

For long lived existing cluster, the breaking change means that the node names, host names, infra machines and infrastructure resources changes permanently without a way to opt out. The KCP names did not influence the naming previously, which means they may in some cases have "improper names" that cannot be fixed at this point without a rebuild of the cluster.

I think it is quite reasonable to want a different naming scheme for the nodes than for the CAPI objects or at least some way of changing the naming scheme. The end user that interacts with the workload cluster may not always even be aware of how that cluster is managed. It is then nice to be able to set meaningful node names for this user and change the scheme during the lifecycle of the cluster.

*) It may be possible to change the configuration and in that way impact the node names. With a KubeadmControlPlane for example, it may be possible to set the spec.kubeadmConfigSpec.joinConfiguration.nodeRegistration.name. It remains unclear if it is possible to achieve something like a prefix + random suffix in this way though.

Anything else you would like to add?

I was tempted to label it a bug first since it limits functionality that was there before, but I do understand that this is intended behavior and was correctly marked as a breaking change.

One possible way to allow changing the naming scheme, would be to add generateName so that we could set this in the template and get generated names based on it.

Label(s) to be applied

/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-04-18T09:53:07Z

This issue is currently awaiting triage.

CAPI contributors will take a look as soon as possible, apply one of the triage/* labels and provide further guidance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

lentzi90 · 2024-04-18T12:10:15Z

Some further investigations show that it is possible to change the node names through the kubeadm config as mentioned above.
Here is an example:

apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: lentzi90-test
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          name: 'my-prefix-{{ local_hostname }}'

This allows changing the names for existing KCPs and MDs, but it is rather limited compared to the old behavior.
The node names needs to be unique, so there is a requirement to include some metadata that is unique for the node. In many cases there will not be much other than the hostname.
It is possible to use this method for setting a prefix or suffix for the nodes at least, but the hostname will then still be part of that name.

fabriziopandini · 2024-04-18T15:25:41Z

I'm sorry the PR caused some issue to you.
There are a couple of issues here to be addressed, might be the most important is that there is no "contract" about how node name are generated. This has never been implemented because it depends by cloud providers. Think e.g. AWS. Also, some provides decided to name infrastructure objects starting from the machine, others starting from the machine, other concatenating names of with the cluster name. We ca definetly do better

Going back to the use case, frankly speaking, having different naming conventions for MD and for the underlying nodes seems confusing to me.

However, we can think a little bit more about this and eventually implement something like https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass#clusterclass-with-custom-naming-strategies and introduce a sort of custom naming strategies for machines under KCP, MD (MP?).

Another option is to solve this at infrastructure provider level, by providing an option for naming VMs created on the cloud provider (I kind of prefer this option)

Another take-away is to improve communication about those kind of changes; we currently always discuss them at office hours, we recently started surfacing them in release note starting from alpha, beta releases, but this is not probably enough

/priority backlog

lentzi90 · 2024-04-19T10:23:30Z

I have gotten more information, and it is even worse than I thought.
There are downstream tooling and customers that rely on the names of the underlying resources (OpenStack VMs, ports, volumes, etc). An upgrade to v1.7.0 breaks all of it and there is no way around it. I guess we will be stuck on v1.6 for the time being. 🙁

One example of how this impacts downstream is that we have scripts in CI that ensure cleanup of all resources. These scripts look for specifically named resources and delete them when the tests are done. Because of this change they do not work anymore.

neolit123 · 2024-04-24T11:11:46Z

feature gates are supposed to be ephemeral until a feature graduates, is it expected from users and customers to eventually migrate to the new way? if no, then this has to resolved in a different way.

neolit123 · 2024-04-24T11:25:40Z

Going back to the use case, frankly speaking, having different naming conventions for MD and for the underlying nodes seems confusing to me.

agreed.

However, we can think a little bit more about this and eventually implement something like https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-class/write-clusterclass#clusterclass-with-custom-naming-strategies and introduce a sort of custom naming strategies for machines under KCP, MD (MP?).

+1 to this approach.

Another take-away is to improve communication about those kind of changes; we currently always discuss them at office hours, we recently started surfacing them in release note starting from alpha, beta releases, but this is not probably enough

release notes should be the source of truth. people miss zoom calls and don't watch the recording.

neolit123 · 2024-04-24T11:27:59Z

I have gotten more information, and it is even worse than I thought.
There are downstream tooling and customers that rely on the names of the underlying resources (OpenStack VMs, ports, volumes, etc). An upgrade to v1.7.0 breaks all of it and there is no way around it. I guess we will be stuck on v1.6 for the time being. 🙁

One example of how this impacts downstream is that we have scripts in CI that ensure cleanup of all resources. These scripts look for specifically named resources and delete them when the tests are done. Because of this change they do not work anymore.

is that all the breakage or is there something more serious? this seems like something that can be worked around and said scripts can be more flexible. naming is not an API and there is no contract as Fabrizio pointed out.

lentzi90 · 2024-04-24T11:36:36Z

feature gates are supposed to be ephemeral until a feature graduates, is it expected from users and customers to eventually migrate to the new way? if no, then this has to resolved in a different way.

Yes, this is my attempt at a "nicer way to break the naming". I.e. give people some time to adapt while still having the old behavior available. We can completely remove it in the next minor release. I'm hoping we can take this in as a patch so that affected people are still able to upgrade to v1.7 without first adapting to the new naming scheme.

lentzi90 · 2024-04-24T12:27:50Z

is that all the breakage or is there something more serious? this seems like something that can be worked around and said scripts can be more flexible. naming is not an API and there is no contract as Fabrizio pointed out.

I would say it is quite serious since affects existing, long lived clusters. Imagine something like this:

You have 2 clusters named Cluster-A and Cluster-B. Both have control-planes named KCP. They may have been around for years already and the control-plane name is obviously not possible to change. The InfraMachineTemplates are named with prefixes cluster-a- and cluster-b-, which allows for easy tracking of what cluster all InfraMachines belong to.
"Coincidentally", all the related infrastructure providers resources belonging to Cluster-A got names with prefix cluster-a- and were very easy to track.

Upgrading to v1.7.0 means control-plane InfraMachines for both clusters are suddenly named kcp-* with no way of changing it except creating a new cluster. All the related infrastructure providers resources are also now named kcp-* and very hard to track.

I understand that the naming of the nodes and these resources are not part of the API, but please understand that people have gotten used to the way it was and may need time to adapt.

With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.

https://www.hyrumslaw.com/

fabriziopandini · 2024-04-24T15:10:33Z

I understand that this is causing stress and issues, we stated in our manifests that we would try to evolve responsibly, and I'm happy to chat about options about how to get around what happened if possible.

However, as a project we have the right and the need to change, and we put a lot of effort in making it clear what are the guarantees we offer, and we work as hard as possible to continuosly improve them in a trasparent way.

This issue is another reminder of how important it is for the long term success of the project that the users have to collaborate with the project to continuosly improve our definitions of contract & guarantees.

WRT to this specific issue, we opened time ago #7030 to define guidelines about naming for infrastructure objects, but unfortunately it never got traction, and so here we are.

And it is a proof of the fact that assuming a thing is a contract just because it is there since some time is simply wrong, fragile, and not a sustainable assumption for any project.

lentzi90 · 2024-04-25T05:22:56Z

I understand and agree with you @fabriziopandini . I just wish we could for now soften the impact of the breaking change with something like #10505. That would allow us time to adapt without being stuck on v1.6. That PR is not meant to provide a permanent way to get back the old behavior. It is just to let us have it for one more minor release, hidden behind a deprecated and by default disabled feature flag. Is that too much to ask?

sbueringer · 2024-04-25T06:09:37Z

We should probably split the discussion in two:

Should we provide a temporary way to keep the 1.6 behavior and for how long. I think only implementating it on release-1.7 would ensure it's actually temporary and we're not forced to keep it eventually forever.
The need of establishing contracts for naming of servers and nodes based on the objects core CAPI creates (eg Machine and/or Inframachine). I think this would require work to achieve consensus across infra providers, probably breaking changes across providers to adhere to this contract and it might not be possible for all providers.

That being said, I'm personally fine with 1. Especially as we won't be able to achieve 2 quickly for 1.7. I would prefer a very clearly named flag over a feature flag to not send the wrong message. But ultimately that's an implementation detail for me.

(Discussed this topic with Fabrizio and Christian yesterday, don't want to put words in anyone's mouth, so let's give them some time to respond (Fabrizio is back on Monday))

lentzi90 · 2024-04-25T06:22:56Z

Thank you @sbueringer ! I agree and would like to focus on a temporary solution first since I know the second point will take more time.
Flag or feature flag does not matter to me. I did it as a feature flag in the PR just because it seemed to provide a clear deprecation warning 🙂

Edit: Good point about implementing it only on release-1.7! I can change the PR to target that branch

chrischdi · 2024-04-25T07:41:21Z

Thanks @sbueringer , that sums it up pretty nicely.

Regarding 1.:

I'm fine with that too, adding a fallback mode to v1.7.x to enable folks running into issues to migrate over.

The next patch release for v1.7 is scheduled for "Tuesday 14th May 2024" [0].

Regarding 2.:

I think this is complex, sometimes it may be required to have hostname == nodename (e.g. maybe on baremetal?), sometimes its ok for them to be different, maybe that also depends on the provider and/or CPI implementation used or other factors.
Definitely nothing achievable in short-term.

guettli · 2024-04-25T08:59:54Z

I think this is complex, sometimes it may be required to have hostname == nodename (e.g. maybe on baremetal?)

Yes, this would be nice, but this would be a different PR.

lentzi90 · 2024-04-25T11:15:30Z

I replaced the PR with one for release-1.7 now. That way the flag would never even be part of v1.8. It is still a feature flag but I'm very open to change that if there is consensus that it would be better to have something different.

lentzi90 · 2024-04-26T05:46:30Z

Updated the description to clarify better what I'm after here. The main goal I have on the CAPI side is to get a temporary way to get the old behavior back. This is to allow more time to adapt and specifically targeted at old long lived clusters. These clusters would have to be rebuilt in order to get proper node names, hostnames, infra machine names and infrastructure resource names. They currently have identical KCP names because that never influenced anything before.

In the long term, it looks like we will need a feature in CAPO to address the naming of infrastructure resources. It is not relevant to this issue so I won't discuss it anymore here.

mdbooth · 2024-04-29T14:33:42Z

We should probably split the discussion in two:

Should we provide a temporary way to keep the 1.6 behavior and for how long. I think only implementating it on release-1.7 would ensure it's actually temporary and we're not forced to keep it eventually forever.

It sounds like we have agreement that we should do this. Ideally this would land well in advance of the next point release. #10511 looks reasonable to me. Given that it's a temporary solution, can we go forward with this? I worry that it will stall if we wait for consensus.

The need of establishing contracts for naming of servers and nodes based on the objects core CAPI creates (eg Machine and/or Inframachine). I think this would require work to achieve consensus across infra providers, probably breaking changes across providers to adhere to this contract and it might not be possible for all providers.

I don't think this is necessary. There was previous discussion around writing some guidelines around this in #7030 which also highlights the variety of trip hazards in cloud resource naming. I think some guidelines for implementors would be welcome, but it is unrealistic to expect that any single naming scheme that CAPI could use will work for all clouds in all circumstances.

However, this is not at the root of this issue.

The de-facto situation is that providers each have their own scheme for naming cloud resources. All the schemes I am personally aware of are based on the name of the InfraMachine. I think that is as far as a CAPI contract needs to go:

Infrastructure resources created by a provider SHOULD be named based on the corresponding CAPI infrastructure resource, modified by a predictable provider-specific scheme which ensures names are valid for the target environment.

I suspect this also captures the current state of providers. This is not the problem. The problem is that given that this is already how we name infrastructure resources, the name of the CAPI infrastructure resource changed, which resulted in a user-visible change.

I am in favour of change #9833. It 'fixes' an inconsistency of naming which is currently a trip hazard: Machine and InfraMachine (and therefore infrastructure resources) have different names. Having all of these things aligned is simpler for both operators and developers.

The problem, therefore, is that the names of Infrastructure were, in practice, user-visible, and we changed them. @lentzi90's users have been affected by this. His users are not the only users who have organizational expectations around resource naming. Here is another example I am aware of in OpenShift: https://issues.redhat.com/browse/RFE-4901. This refers to CPMS, which an OpenShift controller, but the problem is analogous and we will want to bring this use case into CAPI shortly in any case. FWIW this user is on Azure. Note these words from the RFE:

Due to the organization compliance needs, it’s required to give specific nomenclature to the nodes.

'Organization compliance needs' cover a wide range of crimes and are often notoriously inflexible. A great many Rube Goldberg machines have been created in order to satisfy 'organization compliance needs'. @lentzi90 is the first reporter, but this is going to come up again across multiple providers.

My immediate concern is that @lentzi90 has proposed kubernetes-sigs/cluster-api-provider-openstack#2035. If we accept this is will again break the naming alignment which we got from #9833. It will also complicate both the CAPO API and implementation in a way that is entirely custom to OpenStack and will not help Azure users. I am reluctant to add this to our stable API.

I will assert that we want the naming of the following 3 things to be aligned:

Machine
InfrastructureMachine
Underlying infrastructure resources

Given that the Machine is created first, I think the only possible solution which results in these things being aligned is to control the name of Machine at the time it is created. @sbueringer already highlighted the places we need to look for this in #9833: KCP, MachineSets and MachinePools.

I don't have any strong opinion on how this should be achieved. I had an idea that MachineTemplateSpec seems to be a common structure here. Would it be worth adding something to that struct so that whatever method we chose to allow users to affect the naming of resources in their clusters is common to all things which create machines from a template?

Lennart's proposed solution to add a NamePrefix sounds reasonable to me. Could we, for example, add a NamePrefix to MachineTemplateSpec? If not, is there some better way we can allow users to control the naming of Machine objects in their clusters?

sbueringer · 2024-04-29T14:43:17Z

All the schemes I am personally aware of are based on the name of the InfraMachine.

CAPV is using the Machine name for some reason (I don't know why, I'm also not aware how other providers are handling this).

One thing we might want to keep in mind. Most of our users are probably used to how Deployment > ReplicaSet > Pod & MachineDeployment > MachineSet > Machine names line up with each other. This makes it today very easy to figure out which Machine belongs to which MS / MD / KCP object. Once we start allowing folks to have Machine names that are entirely independent of the owning object we will break this easy correlation.

mdbooth · 2024-04-29T15:06:01Z

All the schemes I am personally aware of are based on the name of the InfraMachine.

CAPV is using the Machine name for some reason (I don't know why, I'm also not aware how other providers are handling this).

That's fine, because with #9833 this is now also consistent with the Infrastructure Machine.

One thing we might want to keep in mind. Most of our users are probably used to how Deployment > ReplicaSet > Pod & MachineDeployment > MachineSet > Machine names line up with each other. This makes it today very easy to figure out which Machine belongs to which MS / MD / KCP object. Once we start allowing folks to have Machine names that are entirely independent of the owning object we will break this easy correlation.

Sure. However, as @lentzi90's issue points out the KCP in particular is immutable for the lifetime of the cluster. If there is ever any reason to change it there is currently no way to achieve that, and #9833 changed it with no way to retain the old behaviour.

Breaking that correlation is also why I'm not keen to accept kubernetes-sigs/cluster-api-provider-openstack#2035, but what other option do affected users have?

Also a new API should not affect existing users, which is the issue we're trying to retrospectively address with #9833. If users are happy with the new naming scheme they can continue to use it. If they have an operational need to specify some other naming scheme they would have the option to specify one. This would be behaviour that they have opted in to, so presumably not a surprise.

I wonder... does this actually need to be generalised beyond KCP? Replacing a MD is inconvenient but unlike a KCP it can be done. Could we add some machine naming scheme parameter to KCP alone?

mdbooth · 2024-04-29T15:11:00Z

Just in case you didn't have time to read my rather lengthy comment, I'll highlight https://issues.redhat.com/browse/RFE-4901 again. We're going to be asked for more flexibility on this.

sbueringer · 2024-04-29T15:28:17Z

Breaking that correlation is also why I'm not keen to accept kubernetes-sigs/cluster-api-provider-openstack#2035, but what other option do affected users have?

Sorry I wasn't trying to say we can't do it, I just wanted to at least mention that there might be a downside.

I wonder... does this actually need to be generalised beyond KCP? Replacing a MD is inconvenient but unlike a KCP it can be done. Could we add some machine naming scheme parameter to KCP alone?

If I got it correctly, the use case @lentzi90 has is that his users can influence the OpenStackMachineTemplate name (via rotation) but not re-create MachineDeployments after they have been initially created.

mdbooth · 2024-04-29T15:57:07Z

If I got it correctly, the use case @lentzi90 has is that his users can influence the OpenStackMachineTemplate name (via rotation) but not re-create MachineDeployments after they have been initially created.

It was my understanding that the KCP specifically cannot be renamed, but the InfraMachineTemplate used by the KCP can, and they were relying on this for the naming of their resources. The problem they now have is the KCP cannot be renamed without creating a new cluster, and they have many long-lived clusters.

It was my guess that this may not apply to MDs, which presumably can be renamed(replaced), although that may be operationally inconvenient (but not impossible).

mdbooth · 2024-04-29T17:16:52Z

If we can agree a design for this I can make time to implement it, btw.

lentzi90 · 2024-04-30T06:47:52Z

If I got it correctly, the use case @lentzi90 has is that his users can influence the OpenStackMachineTemplate name (via rotation) but not re-create MachineDeployments after they have been initially created.

It was my understanding that the KCP specifically cannot be renamed, but the InfraMachineTemplate used by the KCP can, and they were relying on this for the naming of their resources. The problem they now have is the KCP cannot be renamed without creating a new cluster, and they have many long-lived clusters.

It was my guess that this may not apply to MDs, which presumably can be renamed(replaced), although that may be operationally inconvenient (but not impossible).

Precisely on point @mdbooth ! I think the confusion here comes from my initial description. The first reports we got were indeed about how users can no longer rename nodes, but the core of the issue, the reason why they need that, is exactly what @mdbooth described: They have compliance requirement on hostnames in the network, for node names and the name of infrastructure resources. After the breaking change, everything changed and they were no longer able to change it back, so that is what they reported.

The main use-case is not really to be able to change these names on a whim, it is to have some kind of control over them to begin with. Unfortunately, for long lived clusters that practically means that we need a way to opt out of the breaking change or a way to override the naming of the Machines or infrastructure resources.

KCPs are definitely the biggest issue here as they can never change. MDs can be replaced, although it is inconvenient compared to a rollout.

fabriziopandini · 2024-04-30T10:07:42Z

Thanks all, this is a valuable discussion.
It seems we all agree that CAPI should have by default "sane" naming conventions, and that we probably need to find ways to allow flexibility if and when required (and we also already did something similar in ClusterClasses).

However, what comes to my mind when reading this thread or when I try to focus on this problem is that we probably have different problems to untangle:

How to generate the name for the infrastructure machine object for machines controlled by KCP or MD (this is what ⚠️ Objects generated by KCP, MachineSets and MachinePools will now consistently use machine name #9833 changed)
Provide guidelines to infrastructure providers about how infrastructure objects (VMs etc) should be named when provisioning a Machine with a corresponding infrastructure machine object.
Which options exist for Influencing how the node hosted on a machine is named, and eventually extend those options.

The reality is that WRT to 2 - how infrastructure objects (VMs etc) are named - providers are fragmented:

2.a Some providers use the machine name (eventually replacing invalid characters according to the infrastructure naming rule)
2.b Some providers use the infrastructure machine name (eventually replacing invalid characters); these providers were affected by the change in 1.
2.c Some providers combine one of the previous two options with the cluster name - and eventually the namespace name - to avoid name clash at the infrastructure level
2.d In theory infrastructure providers can use any other naming schema or even allow custom naming schema definitions on their API, but I'm not sure we should worry about this (it seems like a sort of break glass option).

Unfortunately, we have fragmentation also for 3 - naming nodes -, and depending on the approach you rely on, this point might be affected by the naming decision in 1 and 2 or not; e.g. just focusing on folks relying on kubeadm:

3.a Some providers leave it to kubeadm to pick the node name, which in most providers is derived by the VM name (notable exception AWS).
3.b Some other providers try to influence the node name by using a combination of kubeadm's nodeRegistration.name, cloud-init metadata + jinja templating (cloud init metadata vary from provider to provider)
3.c In theory folks can influence the kubeadm's nodeRegistration.name in other ways: cluster class patches, webhooks etc. Also in this case, this last option seems like a sort of break glass option.

Trying to find a way forward, from one side, I'm tempted to try to break down all of those three points into distinct discussions.

From the other side I'm tempted to consider the first one is the most urgent, because it could provide "some kind of control over them to begin with", but this leaves me with a concern that we are sacrifying the consistency of the UX inside the management cluster just to avoid to deal with the fragmentation in layers 2 and 3.

Opinions?

mdbooth · 2024-04-30T12:34:33Z

Thanks, I think that's a good summary.

My immediate concern is that our provider was affected by the change, and one of our users is impacted by the change in a manner which currently prevents them from upgrading to CAPI 1.7. Does it make it more palatable if we confine any solution to just KCP, as that's the only controller we're aware of with no workaround? I'll reiterate my offer to work on this if we can agree a design.

fabriziopandini · 2024-04-30T12:49:08Z

I think that we have an agreement on implementing a stop gap for 1.7; my comment is mostly about looking for a long term solution to implement hopefully for 1.8; no strong objection in starting from KCP, but whatever we should ensure consistency down the path

mdbooth · 2024-04-30T13:34:02Z

@lentzi90 What are your thoughts?

lentzi90 · 2024-05-02T06:55:15Z

Sounds good to me also! If we can get a stop gap for 1.7 I would be very happy. I'm also prepared to work on a more long term solution as soon as we can agree on what it would look like 🙂

lentzi90 · 2024-05-02T11:40:43Z

Do we need anything more to move forward with #10511 ?

mdbooth · 2024-05-02T14:54:12Z

@lentzi90 I brought #10511 up in the office hours call last night.

My worry is that the clock is now ticking on CAPI v1.8, and we don't even have an agreement in principal on how to fix this beyond 1.7 yet. I appreciate that adding the flag for 1.7 only was to limit its impact, but I fear we're just setting ourselves up for a panic later if we don't get it merged in time. My preference would be to merge that into main and cherry-pick to 1.7, with a clear agreement to remove it as soon as we have a long-term solution. If we can do that before 1.8 is released then that would be ideal, but I don't think it's a good plan to remove a workaround before we have a fix.

@sbueringer Can you help us get unstuck on this?

sbueringer · 2024-05-02T15:20:13Z

My preference would be to merge that into main and cherry-pick to 1.7, with a clear agreement to remove it as soon as we have a long-term solution.

I have no doubts that we're all highly motivated to get this resolved, but nobody is able to guarantee that we'll reach consensus. If I see correctly we don't even have consensus yet on how much of the problem we have to solve as a next iteration after the flag (just how we generate the name of Machine/InfraMachine/BootstrapConfig or the whole contract thing top to bottom).

My concern is mostly that we'll add a flag and we'll never be able to get rid of it again until we reach consensus on a new solution which might never happen. So if we add a flag, how do we ensure it doesn't stick around forever?

sbueringer · 2024-05-02T15:28:32Z

My problem is that I saw so many issues / PRs stale out over the years (and for years) in CAPI. And the result is that we're accumuluating more and more tech debt all over the place. Also because it's a bunch of work to deprecate and actually remove something, random example, we still "support" Clusters without ControlPlanes and all other sorts of strange things.

I know comparatively this is a relatively minor issue here.

One of the following things could help:

strictly limit the flag to 1.7 & 1.8. This gives us until December to find a solution, I'm pretty sure if it doesn't happen until then, it will never happen.
Have a consensus in principle on what the long-term solution would look like (so that we only depend on figuring out ~ implementation details, which we should very likely be able to do).

About 2, if there is consensus around that we only introduce a field somewhere in KCP, MD & MS to control a name prefix, that's very clearly easy and straightforward to implement. If we go for naming contracts across the hierarchy this is almost the category of the loadbalancer provider proposal (for context: this has been around for years and it's probably never going to happen)

mdbooth · 2024-05-02T15:35:18Z

I'd be less worried giving ourselves a 2 release cycle runway. I think we can get that done, and if we don't I guess we'll have to relitigate it.

As for the proposed solution, IIUC the problem controller is KCP, not MD/MS. MD/MS can be replaced without replacing the cluster, but KCP cannot.

@lentzi90 Can you confirm that a solution which was confined to KCP would allow your users to upgrade?

lentzi90 · 2024-05-03T06:28:24Z

Theoretically, yes KCP is enough. It is quite trivial to include MDs at least for the temporary solution though so I'm not sure I see the benefit of leaving it out. Also, having similar API fields and flags for all controllers seems quite sensible to me, long term or short term.
What would be the goal of leaving out MD/MS?

mdbooth · 2024-05-03T09:28:43Z

Theoretically, yes KCP is enough. It is quite trivial to include MDs at least for the temporary solution though so I'm not sure I see the benefit of leaving it out. Also, having similar API fields and flags for all controllers seems quite sensible to me, long term or short term. What would be the goal of leaving out MD/MS?

Nothing specific other than reducing scope. I also agree that in practise it may well make sense to include both.

lentzi90 · 2024-05-16T10:15:46Z

With #10576 merged with a stop-gap soution and #10577 opened for discussing the long term solution, I guess we can close this, right?

lentzi90 · 2024-05-16T10:16:32Z

/close

k8s-ci-robot · 2024-05-16T10:16:37Z

@lentzi90: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 18, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 18, 2024

k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 18, 2024

lentzi90 changed the title ~~Make it possible to change node naming scheme~~ Make it possible to change infra machine naming scheme Apr 22, 2024

lentzi90 mentioned this issue Apr 24, 2024

🌱 Feature gate for old infra machine naming #10505

Closed

lentzi90 mentioned this issue Apr 24, 2024

✨ Add NamePrefix field to OpenStackMachine.Spec kubernetes-sigs/cluster-api-provider-openstack#2035

Open

3 tasks

fabriziopandini mentioned this issue Apr 24, 2024

Add guidance on infrastructure resource names to provider implementation guides #7030

Open

lentzi90 mentioned this issue Apr 25, 2024

🌱 Feature gate for old infra machine naming #10511

Closed

mdbooth mentioned this issue Apr 30, 2024

Name prefix for OpenStackMachine related resources kubernetes-sigs/cluster-api-provider-openstack#2039

Open

lentzi90 mentioned this issue May 8, 2024

🌱 Flag for old infra machine naming #10576

Merged

mdbooth mentioned this issue May 8, 2024

Allow user-controlled naming of Machines in Machine collections #10577

Open

k8s-ci-robot closed this as completed May 16, 2024

Make it possible to change infra machine naming scheme #10463

Make it possible to change infra machine naming scheme #10463

Comments

lentzi90 commented Apr 18, 2024 • edited Loading

What would you like to be added (User Story)?

Detailed Description

Anything else you would like to add?

Label(s) to be applied

k8s-ci-robot commented Apr 18, 2024

lentzi90 commented Apr 18, 2024

fabriziopandini commented Apr 18, 2024 • edited Loading

lentzi90 commented Apr 19, 2024

neolit123 commented Apr 24, 2024

neolit123 commented Apr 24, 2024

neolit123 commented Apr 24, 2024

lentzi90 commented Apr 24, 2024

lentzi90 commented Apr 24, 2024

fabriziopandini commented Apr 24, 2024 • edited Loading

lentzi90 commented Apr 25, 2024

sbueringer commented Apr 25, 2024 • edited Loading

lentzi90 commented Apr 25, 2024 • edited Loading

chrischdi commented Apr 25, 2024

guettli commented Apr 25, 2024

lentzi90 commented Apr 25, 2024

lentzi90 commented Apr 26, 2024

mdbooth commented Apr 29, 2024

sbueringer commented Apr 29, 2024 • edited Loading

mdbooth commented Apr 29, 2024

mdbooth commented Apr 29, 2024

sbueringer commented Apr 29, 2024 • edited Loading

mdbooth commented Apr 29, 2024 • edited Loading

mdbooth commented Apr 29, 2024

lentzi90 commented Apr 30, 2024

fabriziopandini commented Apr 30, 2024 • edited Loading

mdbooth commented Apr 30, 2024

fabriziopandini commented Apr 30, 2024

mdbooth commented Apr 30, 2024

lentzi90 commented May 2, 2024

lentzi90 commented May 2, 2024

mdbooth commented May 2, 2024

sbueringer commented May 2, 2024 • edited Loading

sbueringer commented May 2, 2024 • edited Loading

mdbooth commented May 2, 2024

lentzi90 commented May 3, 2024

mdbooth commented May 3, 2024

lentzi90 commented May 16, 2024

lentzi90 commented May 16, 2024

k8s-ci-robot commented May 16, 2024

lentzi90 commented Apr 18, 2024 •

edited

Loading

fabriziopandini commented Apr 18, 2024 •

edited

Loading

fabriziopandini commented Apr 24, 2024 •

edited

Loading

sbueringer commented Apr 25, 2024 •

edited

Loading

lentzi90 commented Apr 25, 2024 •

edited

Loading

sbueringer commented Apr 29, 2024 •

edited

Loading

sbueringer commented Apr 29, 2024 •

edited

Loading

mdbooth commented Apr 29, 2024 •

edited

Loading

fabriziopandini commented Apr 30, 2024 •

edited

Loading

sbueringer commented May 2, 2024 •

edited

Loading

sbueringer commented May 2, 2024 •

edited

Loading