Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: sealos cert generate wrong DNS altnames for HA etcd clusters #3887

Closed
dinoallo opened this issue Sep 11, 2023 · 1 comment · Fixed by #3891
Closed

BUG: sealos cert generate wrong DNS altnames for HA etcd clusters #3887

dinoallo opened this issue Sep 11, 2023 · 1 comment · Fixed by #3891
Labels
kind/bug Something isn't working

Comments

@dinoallo
Copy link
Contributor

dinoallo commented Sep 11, 2023

Sealos Version

v4.3.3 / main

How to reproduce the bug?

Steps to reproduce:

  1. Create a cluster with multiple master nodes.
  2. Run sealos cert on a master node
  3. Log into any other master nodes.
  4. openssl x509 -text -in /etc/kubernetes/pki/etcd/peer.crt -noout | less and we can see that the DNS altnames, specifically the hostname and the IP, are the ones for the node running the command.

What is the expected behavior?

The certificates for etcd are generated correctly with DNS altnames consisting of each node's hostname and IP.

What do you see instead?

It can be seen that the DNS altnames, specifically the hostname and the IP, are the ones for the node running the command.

Operating environment

- Sealos version: v4.3.3 / main
- Docker version:
- Kubernetes version:
- Operating system:
- Runtime environment:
- Cluster size:
- Additional information:

Additional information

If this issue happened, the first etcd service would complain that the other etcd services have wrong DNS altnames in their certificates, resulting in this etcd service restarting frequently. The cluster can be recovered by re-issuing the wrong etcd certificates(peer.crt, server.crt) on each node.

Step to recover the cluster:
(Do the following on each node affected)

  1. Backup the old certificates
  2. Remove the wrong etcd certificates, i.e. peer.crt, server.crt
  3. Create a ClusterConfiguration like this:
# config.yaml
apiVersion: "kubeadm.k8s.io/v1beta3"
kind: ClusterConfiguration
etcd:
    local:
        serverCertSANs:
        - "<your-node-ip>"
        - "<your-node-hostname>"
        peerCertSANs:
        - "<your-node-ip>"
        - "<your-node-hostname>"
  1. Run kubeadm init phase certs etcd-peer --config config.yaml and kubeadm init phase certs etcd-server --config config.yaml to generate new certificates.
@dinoallo dinoallo added the kind/bug Something isn't working label Sep 11, 2023
@dinoallo
Copy link
Contributor Author

cross reference: #3708

@cuisongliu cuisongliu linked a pull request Sep 11, 2023 that will close this issue
cuisongliu pushed a commit to cuisongliu/sealos that referenced this issue Sep 29, 2023
Signed-off-by: cuisongliu <cuisongliu@qq.com>

labring#3708 labring#3887
cuisongliu added a commit that referenced this issue Sep 29, 2023
* fix: dnsDomain does not take effect in kubelet (#3834) (#3835)

Signed-off-by: yangxg <yangxggo@163.com>
Co-authored-by: yangxg <yangxggo@163.com>
(cherry picked from commit c60b2fd)

* fix: ignore http server close error (#3854) (#3857)

(cherry picked from commit 2d4d78b)

* fix: skip same path (#3898) (#3899)

Co-authored-by: 榴莲榴莲 <78798447@qq.com>
(cherry picked from commit a256283)

* fix: disable scp checksum by default (#3913) (#3919)

Co-authored-by: fengxsong <fengxsong@outlook.com>
(cherry picked from commit 96cb79d)

* feat: support timeout setting for lvscare http prober (#3901) (#3905)

Co-authored-by: fengxsong <fengxsong@outlook.com>
(cherry picked from commit 6bd5c0a)

* feature: kubefile CMD support ENV variable format (#3921) (#3942)

Co-authored-by: Zihan Li <eden.zh.li@outlook.com>
(cherry picked from commit 4b5f3fe)

* delete cr build for buildah (#3953) (#3954)

Co-authored-by: yy <56745951+lingdie@users.noreply.github.com>
(cherry picked from commit 865803c)

* delete: controller part and useless service. (#3950)

* delete controllers and useless service.

* delete buildah image cr part.

* delete ci.

* roll back

(cherry picked from commit 076c7c7)
Signed-off-by: cuisongliu <cuisongliu@qq.com>

* fix: using extra valid status codes when response status code greater than 400 (#3986) (#3988)

Co-authored-by: fengxsong <fengxsong@outlook.com>
(cherry picked from commit 7be765f)

* feature(main): add lvscare gomod (#3995)

Signed-off-by: cuisongliu <cuisongliu@qq.com>
(cherry picked from commit 050d70b)

* fix(main): sync cert for cert cmd

Signed-off-by: cuisongliu <cuisongliu@qq.com>

#3708 #3887

---------

Co-authored-by: sealos-ci-robot <109538726+sealos-ci-robot@users.noreply.github.com>
Co-authored-by: yy <56745951+lingdie@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant