Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not create Cluster from Cluster Class without repeating KCP's infrastructureRef #341

Closed
anmazzotti opened this issue May 31, 2024 · 3 comments · Fixed by #542
Closed
Assignees
Labels
area/clusterclass Issues or PRs related to clusterclass kind/bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@anmazzotti
Copy link
Contributor

anmazzotti commented May 31, 2024

What happened:
When defining my RKE2 cluster class I configured the RKE2ControlPlaneTemplate as follows:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: RKE2ControlPlaneTemplate
metadata:
  name: rke2-control-plane
  namespace: default
spec:
  template:
    spec:
      nodeDrainTimeout: 2m
      registrationMethod: "control-plane-endpoint"
      rolloutStrategy:
        rollingUpdate:
          maxSurge: 1
        type: RollingUpdate
      serverConfig:
        disableComponents:
          kubernetesComponents:
            - cloudController

In the CC definition the spec.controlPlane.machineInfrastructure.ref is correctly set:

apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: rke2
  namespace: default
spec:
  controlPlane:
    machineInfrastructure:
      ref:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: ElementalMachineTemplate
        name: rke2-control-plane
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: RKE2ControlPlaneTemplate
      name: rke2-control-plane

However RKE2 provider fails to initialize the control plane machines:

I0531 08:47:34.318733       1 rke2controlplane_controller.go:430] "Reconcile RKE2 Control Plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/rke2-clusterclass-dfkb5" namespace="default" name="rke2-clusterclass-dfkb5" reconcileID="ceadb0a6-1099-43d6-a09e-ea2f21304cb9"
I0531 08:47:34.329520       1 rke2controlplane_controller.go:549] "Initializing control plane" controller="rke2controlplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="RKE2ControlPlane" RKE2ControlPlane="default/rke2-clusterclass-dfkb5" namespace="default" name="rke2-clusterclass-dfkb5" reconcileID="ceadb0a6-1099-43d6-a09e-ea2f21304cb9" Desired=1 Existing=0
E0531 08:47:34.329689       1 scale.go:74] "Failed to create initial control plane Machine" err="failed to clone infrastructure template: failed to retrieve  external object \"default\"/\"\": Object 'Kind' is missing in 'unstructured object has no kind'" namespace="default" name="rke2-clusterclass-dfkb5" cluster-name="rke2-clusterclass"

The workaround is to repeat the infrastructureRef in the RKE2ControlPlaneTemplate, like this:

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: RKE2ControlPlaneTemplate
metadata:
  name: rke2-control-plane
  namespace: default
spec:
  template:
    spec:
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: ElementalMachineTemplate
        name: rke2-control-plane
      nodeDrainTimeout: 2m
      registrationMethod: "control-plane-endpoint"
      rolloutStrategy:
        rollingUpdate:
          maxSurge: 1
        type: RollingUpdate
      serverConfig:
        disableComponents:
          kubernetesComponents:
            - cloudController

I think this should not be necessary due to the CC definition.

What did you expect to happen:

The RKE2ControlPlaneTemplate should respect what was defined in the Cluster Class.

How to reproduce it:

See config above, or it can be reproduced with the quickstart sample by removing the RKE2ControlPlaneTemplate.spec.template.spec.infrastructureRef object. reference

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • rke provider version: main branch
  • OS (e.g. from /etc/os-release):
@anmazzotti anmazzotti added kind/bug Something isn't working needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 31, 2024
@alexander-demicev alexander-demicev added this to the v0.4.0 milestone May 31, 2024
@alexander-demicev alexander-demicev added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 31, 2024
@furkatgofurov7
Copy link
Contributor

furkatgofurov7 commented Jun 14, 2024

By briefly looking into our API and comparing it to CAPI's kubeadm, I observed the following divergance within our API:

  1. Having a RKE2ControlPlaneSpec.InfrastructureRef even though it is already part of the RKE2ControlPlaneSpec.MachineTemplate. That is not the case with kubeadm
  2. Referring to RKE2ControlPlaneSpec from RKE2ControlPlaneTemplateResource.Spec where it should refer to RKE2ControlPlaneTemplateResourceSpec instead (simply because we don't have that API struct for some reason), similar to what kubeadm exposes

Copy link

This issue is stale because it has been open 90 days with no activity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 13, 2024
@kkaempf kkaempf added the area/clusterclass Issues or PRs related to clusterclass label Jan 3, 2025
@kkaempf kkaempf modified the milestones: v0.4.0, v0.11.0 Jan 3, 2025
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2025
@anmazzotti
Copy link
Contributor Author

@furkatgofurov7 From what I understand this requires an API version bump, right?

In particular I think that the RKE2ControlPlaneSpec.infrastructureRef field should be deprecated (here). The RKE2ControlPlaneSpec.machineTemplate.infrastructureRef should be used instead. The machineTemplate field seems to be ignored currently. This seems to be a violation of the control plane contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass kind/bug Something isn't working triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging a pull request may close this issue.

4 participants