Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RKE2ControlPlane have no IP Address available - stuck in WaitingForRKE2Server #156

Closed
localleon opened this issue Jul 20, 2023 · 1 comment
Labels
kind/bug Something isn't working needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@localleon
Copy link
Contributor

What happened:

I'm trying to combine the Syself Hetzner CAPI Provider with RKE2 Bootstrap / Control-Plane Provider. My RKE2ControlPlane get's never ready because of a missing ExternalIP in the object.

My machines get created successfully but RKE2 is unable. I'm getting the following error from the rke2-control-plane-controller-manager pod on my clusterapi-mgt-cluster

E0720 14:15:42.233283       1 controller.go:326]  "msg"="Reconciler error" "error"="some Control Plane machines exist and are ready but they have no IP Address available" "RKE2ControlPlane"={"name":"hetzner-capi-rke2-demo-control-plane","namespace":"default"} "controller"="rke2controlplane" "controllerGroup"="controlplane.cluster.x-k8s.io" "controllerKind"="RKE2ControlPlane" "name"="hetzner-capi-rke2-demo-control-plane" "namespace"="default" "reconcileID"="61571849-1712-43d1-af9f-2b7f2860948f"

The cluster is therefore stuck in the "WaitingForRKE2Server"-State

~/kube/hetzner-clusterapi-rke2 on  main ⌚ 16:20:16
$ clusterctl describe cluster hetzner-capi-rke2-demo
NAME                                                                                    READY  SEVERITY  REASON                SINCE  MESSAGE
Cluster/hetzner-capi-rke2-demo                                                          False  Info      WaitingForRKE2Server  16m
├─ClusterInfrastructure - HetznerCluster/hetzner-capi-rke2-demo
└─ControlPlane - RKE2ControlPlane/hetzner-capi-rke2-demo-control-plane                  False  Info      WaitingForRKE2Server  16m
  └─Machine/hetzner-capi-rke2-demo-control-plane-zs76h                                  True                                   18m
    └─MachineInfrastructure - HCloudMachine/hetzner-capi-rke2-demo-control-plane-wvhml

On the node-itself, i can see that the Node is ready

root@hetzner-capi-rke2-demo-control-plane-wvhml:~# /var/lib/rancher/rke2/bin/kubectl  get nodes
NAME                                         STATUS   ROLES                       AGE    VERSION
hetzner-capi-rke2-demo-control-plane-wvhml   Ready    control-plane,etcd,master   6m8s   v1.24.6+rke2r1

It seems like the ExternalIP is available in the machine object gets populated correctly.

$ k get Machine hetzner-capi-rke2-demo-control-plane-zs76h -o yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Machine
metadata:
  annotations:
    controlplane.cluster.x-k8s.io/rke2-server-configuration: '{"disableComponents":{},"cni":"calico","etcd":{"backupConfig":{}}}'
  creationTimestamp: "2023-07-20T14:01:03Z"
  finalizers:
  - machine.cluster.x-k8s.io
  generation: 3
  labels:
    cluster.x-k8s.io/cluster-name: hetzner-capi-rke2-demo
    cluster.x-k8s.io/control-plane: ""
  name: hetzner-capi-rke2-demo-control-plane-zs76h
  namespace: default
  ownerReferences:
  - apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: RKE2ControlPlane
    name: hetzner-capi-rke2-demo-control-plane
    uid: 3a252df3-1dc7-492c-b6fc-dbdfa82154ec
  resourceVersion: "15555"
  uid: cbb181b8-ab14-4c69-b50c-ad4a7601d3ce
spec:
  bootstrap:
    configRef:
      apiVersion: bootstrap.cluster.x-k8s.io/v1alpha1
      kind: RKE2Config
      name: hetzner-capi-rke2-demo-control-plane-cp9bn
      namespace: default
      uid: d1a75f0c-9b1d-4c43-a003-1398d361b6d1
    dataSecretName: hetzner-capi-rke2-demo-control-plane-cp9bn
  clusterName: hetzner-capi-rke2-demo
  failureDomain: nbg1
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: HCloudMachine
    name: hetzner-capi-rke2-demo-control-plane-wvhml
    namespace: default
    uid: 74f3be66-56cb-457e-af5a-319e38e7ff09
  nodeDeletionTimeout: 10s
  nodeDrainTimeout: 2m0s
  providerID: hcloud://35073299
  version: v1.24.6
status:
  addresses:
  - address: 167.235.55.195
    type: ExternalIP
  - address: 2a01:4f8:c2c:f2c6::1
    type: ExternalIP
  bootstrapReady: true
  conditions:
  - lastTransitionTime: "2023-07-20T14:01:28Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-07-20T14:01:03Z"
    status: "True"
    type: BootstrapReady
  - lastTransitionTime: "2023-07-20T14:01:28Z"
    status: "True"
    type: InfrastructureReady
  - lastTransitionTime: "2023-07-20T14:02:37Z"
    reason: NodeProvisioning
    severity: Warning
    status: "False"
    type: NodeHealthy
  infrastructureReady: true
  lastUpdated: "2023-07-20T14:01:39Z"
  observedGeneration: 3
  phase: Provisioned

but is then missing from the RKE2ControlPlane object.

$ k get RKE2ControlPlane hetzner-capi-rke2-demo-control-plane -o yaml
apiVersion: controlplane.cluster.x-k8s.io/v1alpha1
kind: RKE2ControlPlane
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"controlplane.cluster.x-k8s.io/v1alpha1","kind":"RKE2ControlPlane","metadata":{"annotations":{},"name":"hetzner-capi-rke2-demo-control-plane","namespace":"default"},"spec":{"agentConfig":{"version":"v1.24.6+rke2r1"},"infrastructureRef":{"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","kind":"HCloudMachineTemplate","name":"hetzner-capi-rke2-demo-control-plane"},"nodeDrainTimeout":"2m","replicas":1,"serverConfig":{"cni":"calico"}}}
  creationTimestamp: "2023-07-20T14:01:00Z"
  finalizers:
  - rke2.controleplane.cluster.x-k8s.io
  generation: 1
  labels:
    cluster.x-k8s.io/cluster-name: hetzner-capi-rke2-demo
  name: hetzner-capi-rke2-demo-control-plane
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Cluster
    name: hetzner-capi-rke2-demo
    uid: 2dac05ba-0c65-46fa-9b5d-0a695513d2f2
  resourceVersion: "15860"
  uid: 3a252df3-1dc7-492c-b6fc-dbdfa82154ec
spec:
  agentConfig:
    version: v1.24.6+rke2r1
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: HCloudMachineTemplate
    name: hetzner-capi-rke2-demo-control-plane
  manifestsConfigMapReference: {}
  nodeDrainTimeout: 2m0s
  privateRegistriesConfig: {}
  replicas: 1
  serverConfig:
    cni: calico
    disableComponents: {}
    etcd:
      backupConfig: {}
status:
  conditions:
  - lastTransitionTime: "2023-07-20T14:03:50Z"
    reason: WaitingForRKE2Server
    severity: Info
    status: "False"
    type: Ready
  - lastTransitionTime: "2023-07-20T14:01:03Z"
    reason: WaitingForRKE2Server
    severity: Info
    status: "False"
    type: Available
  - lastTransitionTime: "2023-07-20T14:01:02Z"
    status: "True"
    type: CertificatesAvailable
  - lastTransitionTime: "2023-07-20T14:03:50Z"
    status: "True"
    type: MachinesReady
  - lastTransitionTime: "2023-07-20T14:03:50Z"
    status: "True"
    type: Resized
  initialized: true
  observedGeneration: 1
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

Thanks in advance!

What did you expect to happen:

RKE2ControlPlane should switch to ready state

How to reproduce it:

Clone the following repository https://github.com/localleon/hetzner-clusterapi-rke2 and apply hetzner-capi-rke2-demo.yaml to you're clustesr.

Anything else you would like to add:

I would think this issue lies in this function of

func getIPAddress(machine clusterv1.Machine) (ip string, err error) {
	for _, address := range machine.Status.Addresses {
		switch address.Type {
		case clusterv1.MachineInternalIP:
			if address.Address != "" {
				return address.Address, nil
			}
		case clusterv1.MachineExternalIP:
			if address.Address != "" {
				ip = address.Address
			}
		}
	}

	if ip == "" {
		err = fmt.Errorf("no IP Address found for machine: %s", machine.Name)
	}

	return
}

[Miscellaneous information that will assist in solving the issue.]

Environment:

providers:
  - name: "rke2"
    url: "https://github.com/rancher-sandbox/cluster-api-provider-rke2/releases/v0.1.0-alpha.1/bootstrap-components.yaml"
    type: "BootstrapProvider"
  - name: "rke2"
    url: "https://github.com/rancher-sandbox/cluster-api-provider-rke2/releases/v0.1.0-alpha.1/control-plane-components.yaml"
    type: "ControlPlaneProvider"
  - name: "docker"
    url: "https://github.com/belgaied2/cluster-api/releases/v1.3.3-cabpr-fix/infrastructure-components.yaml"
    type: "InfrastructureProvider"
  • OS (e.g. from /etc/os-release): Hetzner Cloud Ubuntu 22.04
@localleon localleon added kind/bug Something isn't working needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 20, 2023
@localleon
Copy link
Contributor Author

This seems to be fixed in the newest version and i was just confused why an old version of the CAPI-Provider was in the README

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working needs-priority Indicates an issue or PR needs a priority assigning to it needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

1 participant