Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubeadm control plane] upgrade: etcd CA was regenerated #2455

Closed
sethp-nr opened this issue Feb 26, 2020 · 6 comments
Closed

[kubeadm control plane] upgrade: etcd CA was regenerated #2455

sethp-nr opened this issue Feb 26, 2020 · 6 comments
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@sethp-nr
Copy link
Contributor

sethp-nr commented Feb 26, 2020

What steps did you take and what happened:

After the first of three control plane machines were upgraded from v1.15.9 to v.1.16.6, I started getting etcd health check failures (see #2454 and #2451). After a while, it became clear that the cert & private key stored in the management cluster's Secret had diverged from what was on disk on the control plane nodes.

I'm not sure what caused the secret to be re-generated, but it seemed worth noting.

Anything else you would like to add:

I was running my management cluster with tilt up against a local kind, which on my machine has a side effect of... let's call it "timing issue detection." Everything slows way down in my userland and inside the controllers, and there's non-infrequent crashes in the controller. I recall the kubeadm control plane controller specifically was restarted about the time that the etcd certs changed.

Environment:

  • Cluster-api version: master
  • Minikube/KIND version: kind v0.7.0 go1.13.6 darwin/amd64
  • Kubernetes version: (use kubectl version): mixed
  • OS (e.g. from /etc/os-release): ubuntu

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 26, 2020
@sethp-nr sethp-nr changed the title KubeadmControlPlane upgrade: etcd CA was regenerated [kubeadm control plane] upgrade: etcd CA was regenerated Feb 26, 2020
@detiber
Copy link
Member

detiber commented Feb 26, 2020

I wonder if there is an odd race condition that could be taking place in the way we are doing LookupOrGenerate for the secrets, but at a quick glance it appears that it should not overwrite them if they are already created...

@vincepri
Copy link
Member

/milestone v0.3.0

@k8s-ci-robot k8s-ci-robot added this to the v0.3.0 milestone Feb 26, 2020
@sethp-nr
Copy link
Contributor Author

Yeah, that's what was so weird about it – the whole control plane bootstrapped with 3 nodes, everything was great, and then as soon as controlplane-0 finishes deleting the etcd certs change.

The only other guess I have is that controlplane-0's kubeadm config was different than the others (it was init, not join) and since it was up first there might have been some weird owner references – I haven't gotten a chance to investigate yet, though.

@detiber
Copy link
Member

detiber commented Feb 26, 2020

One thing that probably deserves a check... What resource owns the secrets? If it is a Machine, that would explain the bug..

@sethp-nr
Copy link
Contributor Author

Ah ha, sure enough:

  name: test-etcd
  ...
  ownerReferences:
  - apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
    blockOwnerDeletion: true
    controller: true
    kind: KubeadmConfig
    name: controlplane-0

But when I create a control plane using a KCP:

    ownerReferences:
    - apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
      blockOwnerDeletion: true
      controller: true
      kind: KubeadmControlPlane
      name: sp2a-controlplane
      ...

Which means this is an omission from the adoption bits – closing in favor of #2214.

/close

@k8s-ci-robot
Copy link
Contributor

@sethp-nr: Closing this issue.

In response to this:

Ah ha, sure enough:

 name: test-etcd
 ...
 ownerReferences:
 - apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
   blockOwnerDeletion: true
   controller: true
   kind: KubeadmConfig
   name: controlplane-0

But when I create a control plane using a KCP:

   ownerReferences:
   - apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
     blockOwnerDeletion: true
     controller: true
     kind: KubeadmControlPlane
     name: sp2a-controlplane
     ...

Which means this is an omission from the adoption bits – closing in favor of #2214.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@vincepri vincepri added the area/control-plane Issues or PRs related to control-plane lifecycle management label Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Issues or PRs related to control-plane lifecycle management kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants