kubeadm stacked etcd #69486

fabriziopandini · 2018-10-06T06:43:02Z

What this PR does / why we need it:
kubeadm now automatically creates a new stacked etcd member when joining a new control plane node (does not applies to external etcd)

Which issue(s) this PR fixes:
Fixes # kubernetes/kubeadm#1123

Special notes for your reviewer:
IMO two points deserve more attention:

The fact the local/stacked etcd client now uses the IP defined by API server advertise address
How the joining node discovers the list of endpoints for the existing etcd members

I will keep this in WIP until there is agreement on the above points

Release note:

kubeadm now automatically creates a new stacked etcd member when joining a new control plane node (does not applies to external etcd)

/sig cluster-lifecycle
/kind feature

/cc @timothysc
/cc @chuckha
/cc @detiber
@kubernetes/sig-cluster-lifecycle-pr-reviews

neolit123

thanks for the PR @fabriziopandini . added a couple of comments.

neolit123 · 2018-10-07T02:52:01Z

cmd/kubeadm/app/phases/certs/pkiutil/pki_helpers.go

+	// advertise address
+	advertiseAddress := net.ParseIP(cfg.APIEndpoint.AdvertiseAddress)
+	if advertiseAddress == nil {
+		return nil, fmt.Errorf("error parsing APIEndpoint AdvertiseAddress %v: is not a valid textual representation of an IP address", cfg.APIEndpoint.AdvertiseAddress)


probably better %v: to be %q?

neolit123 · 2018-10-07T02:54:28Z

cmd/kubeadm/app/phases/etcd/local.go

+	}
+
+	// notifies the other members of the etcd cluster about the joining member
+	etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress)


hm, did we investigate if this approach is safe?
what happens if the port is already taken on that endpoint?

That's the default peer port and "shouldn't collide" but we should definitely add a check here.

added check

neolit123 · 2018-10-07T02:55:39Z

cmd/kubeadm/app/phases/etcd/local.go

+	// notifies the other members of the etcd cluster about the joining member
+	etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress)
+
+	glog.V(1).Infof("Adding etcd Member %s", etcdPeerAddress)


Member -> member:

neolit123 · 2018-10-07T02:56:21Z

cmd/kubeadm/app/phases/etcd/local.go

+	}
+	glog.V(1).Infof("Updated etcd member list %v", initialCluster)
+
+	glog.V(1).Infoln("creating local etcd static pod manifest file")


possibly creating should be uppercase for consistency?

There was a recent issue about golang 1.11 as well needing to use Infof vs. Infoln.

using Infoln

neolit123 · 2018-10-07T02:57:15Z

cmd/kubeadm/app/phases/etcd/local.go

+	}
+
+	if len(initialCluster) == 0 {
+		defaultArguments["initial-cluster"] = fmt.Sprintf("%s=https://%s:2380", cfg.GetNodeName(), cfg.APIEndpoint.AdvertiseAddress)


should we make such ports into constants?

Yes we should add them to the default consts file.

done in a separated PR to keep the scope of this PR as small as possible

neolit123 · 2018-10-07T03:10:26Z

cmd/kubeadm/app/util/etcd/etcd.go

+
+	ret := map[string]string{}
+	for _, m := range resp.Members {
+		// fixes the entry for of the joining member (that doesn't have a name set in the initialCluster returned by etcd)


for of -> for or of only.

neolit123 · 2018-10-07T03:14:56Z

cmd/kubeadm/app/util/etcd/etcd.go

+func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {
+	cli, err := clientv3.New(clientv3.Config{
+		Endpoints:   c.Endpoints,
+		DialTimeout: 5 * time.Second,


i wonder if the connection times increase with the number of etcd pods.

I would up the timeout for new connection. Default client inside the api-server is 20 seconds.

increased timeout to 20

chuckha

Is the expectation that we've already copied over the etcd ca key/cert and generated the correct certificates?

chuckha · 2018-10-08T15:33:37Z

cmd/kubeadm/app/phases/etcd/local.go

@@ -36,10 +39,47 @@ const (
 )

 // CreateLocalEtcdStaticPodManifestFile will write local etcd static pod manifest file.
+// This function is used by init (when there the etcd cluster is empty) or by kubeadm


"(when there the etcd cluster is empty)" => "(when the etcd cluster is empty)"

chuckha · 2018-10-08T15:51:08Z

cmd/kubeadm/app/util/etcd/etcd.go

+}
+
+// AddMember notifies an existing etcd cluster that a new member is joining
+func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {


I almost want this map[string]string to be a struct, something like etcdClusterConfiguration or similar. Do you think that would add anything here?

I don't feel strongly here b/c we're re-wrapping an etcd member struct. I would change it to [string]IP at a minimum though.

Also testability of this function will require using some of the etcd utils I built eons ago under apiserver/storage

done using a struct

chuckha · 2018-10-08T16:04:47Z

cmd/kubeadm/app/util/etcd/etcd.go

+		return nil, err
+	}
+
+	// Note for reviewers: I'm not sure this is the best method for getting the endpoint


I think this is a good approach with one minor concern: there is a race condition that the PodIP may change after the PodIP lookup and before our client access, but I'm ok living with that possibility.

I wonder if there is a clean way to solve this by managing etcd as a statefulset. The other thought I had was adding a Service object to the static manifest and managing that by hand (by kubeadm). The goal of these two ideas is to provide a stable DNS name instead of a PodIP lookup. Either way, those types of changes would be way out of scope for this PR.

So the ClusterStatus should have the endpoints.
Or as discussed on the call put the meta-data in an annotation for the pod to make it easier to extract.

+1 to using ClusterStatus or a pre-set annotation to use as a starting point.

After we have the initial endpoint list, we should use that to query the etcd cluster itself for the list of endpoints.

Potentially, we should also raise an error if the queried endpoints do not match the expected endpoints, or if the cluster is not in a healthy state to start.

done reading from cluster status + calling sync to get the real list of endpoints from etcd

timothysc

Bunch of comments plus we should highlight that folks will need todo this on odd numbers for stacked deploys.

timothysc · 2018-10-10T19:10:35Z

cmd/kubeadm/app/phases/certs/pkiutil/pki_helpers.go

@@ -334,8 +338,8 @@ func GetEtcdPeerAltNames(cfg *kubeadmapi.InitConfiguration) (*certutil.AltNames,

 	// create AltNames with defaults DNSNames/IPs
 	altNames := &certutil.AltNames{
-		DNSNames: []string{cfg.NodeRegistration.Name, "localhost"},
-		IPs:      []net.IP{advertiseAddress, net.IPv4(127, 0, 0, 1), net.IPv6loopback},
+		DNSNames: []string{cfg.NodeRegistration.Name},


Why are you removing loopback from the SAN?

I'd have to dig through history but I remember us adding it on purpose.

sound strange, but restored

timothysc · 2018-10-10T19:15:22Z

cmd/kubeadm/app/phases/etcd/local.go

+	}
+
+	// notifies the other members of the etcd cluster about the joining member
+	etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress)


That's the default peer port and "shouldn't collide" but we should definitely add a check here.

timothysc · 2018-10-10T19:16:38Z

cmd/kubeadm/app/phases/etcd/local.go

+	if err != nil {
+		return err
+	}
+	glog.V(1).Infof("Updated etcd member list %v", initialCluster)


We should do a local client member check, to verify it's correct, and to list the current members in the log line.

timothysc · 2018-10-10T19:17:11Z

cmd/kubeadm/app/phases/etcd/local.go

+	}
+	glog.V(1).Infof("Updated etcd member list %v", initialCluster)
+
+	glog.V(1).Infoln("creating local etcd static pod manifest file")


There was a recent issue about golang 1.11 as well needing to use Infof vs. Infoln.

timothysc · 2018-10-10T19:19:00Z

cmd/kubeadm/app/phases/etcd/local.go

+	}
+
+	if len(initialCluster) == 0 {
+		defaultArguments["initial-cluster"] = fmt.Sprintf("%s=https://%s:2380", cfg.GetNodeName(), cfg.APIEndpoint.AdvertiseAddress)


Yes we should add them to the default consts file.

timothysc · 2018-10-10T19:22:46Z

cmd/kubeadm/app/util/etcd/etcd.go

+		return nil, err
+	}
+
+	// Note for reviewers: I'm not sure this is the best method for getting the endpoint


So the ClusterStatus should have the endpoints.
Or as discussed on the call put the meta-data in an annotation for the pod to make it easier to extract.

timothysc · 2018-10-10T19:25:03Z

cmd/kubeadm/app/util/etcd/etcd.go

+func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {
+	cli, err := clientv3.New(clientv3.Config{
+		Endpoints:   c.Endpoints,
+		DialTimeout: 5 * time.Second,


I would up the timeout for new connection. Default client inside the api-server is 20 seconds.

timothysc · 2018-10-10T19:35:58Z

cmd/kubeadm/app/util/etcd/etcd.go

+}
+
+// AddMember notifies an existing etcd cluster that a new member is joining
+func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {


I don't feel strongly here b/c we're re-wrapping an etcd member struct. I would change it to [string]IP at a minimum though.

timothysc · 2018-10-10T19:37:09Z

cmd/kubeadm/app/util/etcd/etcd.go

+}
+
+// AddMember notifies an existing etcd cluster that a new member is joining
+func (c Client) AddMember(name string, peerAddrs string) (map[string]string, error) {


Also testability of this function will require using some of the etcd utils I built eons ago under apiserver/storage

timothysc · 2018-10-10T19:40:13Z

/assign @timothysc @detiber

detiber · 2018-10-11T13:56:31Z

cmd/kubeadm/app/phases/etcd/local.go

+	etcdPeerAddress := fmt.Sprintf("https://%s:2380", cfg.APIEndpoint.AdvertiseAddress)
+
+	glog.V(1).Infof("Adding etcd Member %s", etcdPeerAddress)
+	initialCluster, err := etcdClient.AddMember(cfg.NodeRegistration.Name, etcdPeerAddress)


Is this method used during upgrade? If so, it seems odd to call 'AddMember()' for an instance that is already a member of the etcd cluster.

No, this is used only when adding a new control plane instance

detiber · 2018-10-11T14:10:42Z

cmd/kubeadm/app/util/etcd/etcd.go

+		return nil, err
+	}
+
+	// Note for reviewers: I'm not sure this is the best method for getting the endpoint


+1 to using ClusterStatus or a pre-set annotation to use as a starting point.

After we have the initial endpoint list, we should use that to query the etcd cluster itself for the list of endpoints.

detiber · 2018-10-11T14:11:25Z

cmd/kubeadm/app/util/etcd/etcd.go

+		return nil, err
+	}
+
+	// Note for reviewers: I'm not sure this is the best method for getting the endpoint


Potentially, we should also raise an error if the queried endpoints do not match the expected endpoints, or if the cluster is not in a healthy state to start.

fabriziopandini · 2018-10-21T09:59:45Z

@neolit123 @chuckha @timothysc @detiber Thanks for the valuable feedback!

Now

The list of etcd members is initially discovered from cluster status, and then I query etcd for getting the real list of members.
Now also upgrades works with stacked etcd

The latest open point to be addressed before removing WIP is the usage of API server advertise an address, that can lead to problems in case the user choose an advertise address that doesn't correspond to any IP address on the machine

timothysc · 2018-10-24T01:06:48Z

@fabriziopandini Is this ready to go? If so can you remove the WIPs from this and other PRs.

fabriziopandini · 2018-10-24T15:19:22Z

@timothysc last point pending is the discussion about usage of API server advertise for etcd

timothysc · 2018-10-24T18:37:46Z

@fabriziopandini ^ want to update now?

/lgtm
/approve

k8s-ci-robot · 2018-10-24T18:37:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kubeadm/OWNERS~~ [fabriziopandini,timothysc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fabriziopandini · 2018-10-26T15:00:15Z

@timothysc this is ready to go for me now
If we can have some help on testing this before the end of the cycle it will be great...

fabriziopandini · 2018-10-26T15:58:43Z

/test pull-kubernetes-e2e-kops-aws

fejta-bot · 2018-10-26T19:19:07Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fabriziopandini · 2018-10-26T19:44:50Z

/lgtm cancel

neolit123 · 2018-10-27T16:27:31Z

/lgtm

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 6, 2018

k8s-ci-robot requested review from chuckha, detiber and timothysc October 6, 2018 06:43

neolit123 reviewed Oct 7, 2018

View reviewed changes

chuckha reviewed Oct 8, 2018

View reviewed changes

timothysc suggested changes Oct 10, 2018

View reviewed changes

k8s-ci-robot assigned detiber and timothysc Oct 10, 2018

detiber reviewed Oct 11, 2018

View reviewed changes

fabriziopandini mentioned this pull request Oct 12, 2018

Kubeadm - Add etcd ports constant #69720

Merged

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 16, 2018

fabriziopandini force-pushed the kubeadm-stacked-etcd branch from bfba065 to 6e16274 Compare October 21, 2018 09:53

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 21, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2018

fabriziopandini force-pushed the kubeadm-stacked-etcd branch from 6e16274 to e5886e5 Compare October 26, 2018 14:51

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2018

fabriziopandini changed the title ~~[WIP] - kubeadm stacked etcd~~ kubeadm stacked etcd Oct 26, 2018

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 26, 2018

timothysc added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2018

timothysc approved these changes Oct 26, 2018

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2018

fabriziopandini added 2 commits October 27, 2018 18:04

kubeadm graduate kubelet-start phase

d30492e

autogenerated

fbd6d2d

fabriziopandini force-pushed the kubeadm-stacked-etcd branch from e5886e5 to fbd6d2d Compare October 27, 2018 16:06

k8s-ci-robot assigned neolit123 Oct 27, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 27, 2018

k8s-ci-robot merged commit 481fa19 into kubernetes:master Oct 27, 2018

mariantalla mentioned this pull request Oct 29, 2018

Failing kubeadm test: kubeadm-gce-stable-on-master/4929 #70375

Closed

leblancd mentioned this pull request Nov 3, 2018

kubeadm init failing for IPv6-only configuration #70617

Closed

fabriziopandini mentioned this pull request Nov 9, 2018

fix kubeadm upgrade regression #70893

Merged

rosti mentioned this pull request Nov 22, 2018

kubeadm: Remove forgotten debug Println #71357

Merged

fabriziopandini deleted the kubeadm-stacked-etcd branch December 15, 2018 08:33

neolit123 mentioned this pull request Mar 28, 2019

kubeadm upgrade plan not working for v1.13.5 to v1.14.0 kubernetes/kubeadm#1469

Closed

kubeadm stacked etcd #69486

kubeadm stacked etcd #69486

Conversation

fabriziopandini commented Oct 6, 2018 • edited Loading

neolit123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chuckha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothysc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothysc commented Oct 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabriziopandini commented Oct 21, 2018

timothysc commented Oct 24, 2018

fabriziopandini commented Oct 24, 2018

timothysc commented Oct 24, 2018

k8s-ci-robot commented Oct 24, 2018

fabriziopandini commented Oct 26, 2018

fabriziopandini commented Oct 26, 2018

fejta-bot commented Oct 26, 2018

fabriziopandini commented Oct 26, 2018

neolit123 commented Oct 27, 2018

fabriziopandini commented Oct 6, 2018 •

edited

Loading