Add KEP for etcdadm #2835

justinsb · 2018-10-22T20:22:29Z

No description provided.

justinsb · 2018-10-22T20:27:03Z

cc @roberthbailey @timothysc

justinsb · 2018-10-22T20:27:27Z

/assign @roberthbailey
/assign @timothysc

(when github catches up :-) )

neolit123

LGTM thanks!

timothysc

/approve
/hold

Wait for feedback and vote, but generally LGTM

timothysc · 2018-10-22T21:00:44Z

keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md

+
+etcdadm gives us easy to use CLI commands, which will form the base layer of
+operation.  Automation should ideally describe what it is doing in terms of
+etcdadm commands, though we will also expose etcdadm as a go-library for easier


/cc @detiber @chuckha @fabriziopandini
fyi.

timothysc · 2018-10-22T21:03:39Z

keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md

+etcd-manager works by coordinating via a shared filesystem-like store (e.g. S3
+or GCS) and/or via cloud APIs (e.g. EC2 or GCE).  In doing so it is able to
+automate the manual commands, which is very handy for running in a cloud
+environment like AWS or GCE.


I still think using the bootstrap token to encrypt and store the certs as secrets also makes a lot of sense. It keeps it local to the cluster without adding the dependencies and also expires after a period of 24-hours by default to eliminate several of the security concerns.

k8s-ci-robot · 2018-10-22T21:13:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/OWNERS~~ [timothysc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jpbetz · 2018-10-22T21:37:56Z

cc @wenjiaswe @jingyih

gyuho · 2018-10-22T21:47:49Z

keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md

+* Cluster backups
+* Disaster recovery or restore from backup
+* Cluster upgrades
+* Cluster downgrades


Currently downgrade is not supported by etcd, but being added to v3.4 etcd-io/etcd#7308. Please ping us if you need any help. Thanks!

It will be great if etcd can support it natively! Currently kopeio/etcd-manager implements it via a backup/restore, with a key-by-key copy. We can obviously be smarter about that, but it should work anywhere (it seems to even work for etcd3 -> etcd2)

It will be great if etcd can support it natively!

Yes, @wenjiaswe is working on it :)

@justinsb yes, I am working on this and here is "etcd downgrad design" documentation. Here is the items planned. I haven't tried kopeio/etcd-manager but I will try it out. I think it's good that it works the way it is and etcdadm could use it before etcd downgrade is supported. Meanwhile, shall we sync on eligibility of integration of etcd native downgrade with etcdadm?

@wenjiaswe thanks & absolutely. I don't think there's any question that when etcd supports downgrade natively we should prefer that option :-) (For expediency in kopeio/etcd-manager all upgrades involve a key/value copy today, but I'll fix that for the upgrades that etcd does support - it's easier to have one code path, but it is very sub-optimal).

But we should definitely sync - for example, today we put etcd into "read-only" mode by switching ports. That lets an HA cluster stay up, but means we know that apiserver won't be writing to it. But ... it's not the cleanest solution, and this is another thing that it would be wonderful to have native support for. But again: not a real blocker.

The real wishlist is for non-voting cluster members - that would make automatic management much safer. But I understand that is coming to etcd as well 🎉

danbeaulieu · 2018-10-22T22:02:29Z

keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md

+
+This results in an multi-node ("HA") etcd cluster.
+
+#### Automatic Cluster Creation


I'm interested in hearing where the line is drawn between this KEP and the current functionality provided by kops. For instance kops has a very specific architecture (1 ASG per control plane node AZ with a min=max=desired=1, EBS volumes per control plane node AZ tagged in a specific way, etc). This architecture is optimized for etcd fault tolerance and DR. Does this KEP offer the option to have similar fault tolerance and DR functionality?

So kopeio/etcd-manager was a reimplementation of the kops etcd management functionality. The intention is that this is a clean implementation that any installation tool can use, not just kops.

I'll clarify though that we're assuming that an external installation tool sets up the infrastructure itself if we're using EBS volumes - i.e. I don't think etcdadm should set up the volumes or the AWS ASGs or GCE MIGs that will likely provide the machines on which this runs. (Or that would should be a separate KEP if so!) It will make it very easy to set up those ASGs though, as they can all run the same command. I'll clarify this though, as I don't think I covered it sufficiently... etcdadm should (optionally) support auto-mounting of volumes IMO, but I think setting them up is best done externally.

We could bring this into scope, but I think it's better just to clearly document the requirements as opinions vary so widely here! (e.g. "if you're using volumes, pass the tags using the this flag, you probably want to put them in separate AZs, and you probably want to run in separate ASGs to guarantee equal zonal coverage")

danbeaulieu · 2018-10-22T22:05:18Z

keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md

+  <endpoint>` command
+* On each other master machine, copy the CA certificate and key from one of the
+  other masters, then run the `etcdadm join <endpoint>` command.
+* Run kubeadm following the [external etcd procedure](https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd)


Is it possible for this tool to integrate with kubeadm in such a way that it produces "local" etcd clusters? There is something nice about etcd running as pods in the cluster as it allows reuse of k8s based tooling for monitoring, logging, metrics, etc

Yes, I'd imagine kubeadm could easily replace its built-in etcd management with a call-out to etcdadm.

And yes, I agree that pods in the cluster is the only configuration that we test today and so it's the one I personally feel most comfortable with. Hopefully we can add more e2e configurations going forward though!

justinsb · 2018-10-23T15:45:46Z

I added a commit to try to more clearly express why cloud interaction (volume mounting) should be in scope, but more importantly to bound that most cloud interaction will be out of scope :-)

fabriziopandini

Looking forward for this!

fabriziopandini · 2018-10-23T16:09:54Z

keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md

+* Cluster upgrades
+* Cluster downgrades
+* PKI management
+


IMO a really useful task will be "pivoting" from kubeadm local etcd to etcdadm managed etcd, thus providing the user a way forward from simplest etcd clusters to something more complex

fabriziopandini · 2018-10-23T16:11:54Z

keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md

+
+etcdadm gives us easy to use CLI commands, which will form the base layer of
+operation.  Automation should ideally describe what it is doing in terms of
+etcdadm commands, though we will also expose etcdadm as a go-library for easier


philips · 2018-10-23T21:33:14Z

Someone should email etcd-dev@googlegroups.com as a heads up to this effort. There might be other interested parties.

justinsb · 2018-10-26T18:06:43Z

Thanks @philips - good idea https://groups.google.com/d/msg/etcd-dev/h2HZ8PU-ttc/sdr_onOhCQAJ

timothysc · 2018-10-29T17:37:41Z

/lgtm

I'll let @roberthbailey cancel the hold on after the vote timeout.

detiber · 2018-10-29T19:07:18Z

keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md

+### Implementation Details/Notes/Constraints
+
+* There will be some changes needed to both platform9/etcdadm (e.g. etcd2
+  support) and kopeio/etcd-manager (to rebase on top of etcdadm).


If this is targeting net-new usage why support etcd2 here? It seems like etcd3+ would be sufficient for new and future usage.

For users that are still on etcd2. We're going to strand them in 1.13 otherwise.

dlipovetsky · 2018-10-30T17:45:00Z

keps/sig-cluster-lifecycle/0031-20181022-etcdadm.md

+
+### Implementation Details/Notes/Constraints
+
+* There will be some changes needed to both platform9/etcdadm (e.g. etcd2


My original sketch of etcdadm included an upgrade verb. It would be limited to making changes on the host where etcdadm is run. Should we add this as a note here?

justaugustus · 2018-11-20T04:35:41Z

REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.

neolit123 · 2018-11-21T22:25:07Z

/cc
to remind myself about this.

unless @roberthbailey unholds in the next few days i will do that.
this proposal was voted and approved.

roberthbailey · 2018-11-22T05:02:26Z

/hold cancel

I didn't realize that this was blocked on me. It passed the vote, so let's get it merged before the great KEP migration of 2018 commences.

justaugustus · 2018-11-22T05:04:49Z

Huzzah!!

Add KEP for etcdadm

justinsb force-pushed the kep_etcdadm branch from dee32b2 to e74add5 Compare October 22, 2018 20:30

Add KEP for etcdadm

1a7d5d9

justinsb force-pushed the kep_etcdadm branch from e74add5 to 1a7d5d9 Compare October 22, 2018 20:32

neolit123 approved these changes Oct 22, 2018

View reviewed changes

timothysc approved these changes Oct 22, 2018

View reviewed changes

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/kep sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Oct 22, 2018

k8s-ci-robot requested review from jdumars and timothysc October 22, 2018 21:13

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2018

k8s-ci-robot assigned roberthbailey and timothysc Oct 22, 2018

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 22, 2018

k8s-ci-robot requested review from chuckha, detiber and fabriziopandini October 22, 2018 21:36

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 22, 2018

gyuho reviewed Oct 22, 2018

View reviewed changes

danbeaulieu reviewed Oct 22, 2018

View reviewed changes

Added clarification of cloud interaction

6db2ddc

fabriziopandini reviewed Oct 23, 2018

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 29, 2018

detiber reviewed Oct 29, 2018

View reviewed changes

dlipovetsky reviewed Oct 30, 2018

View reviewed changes

wenjiaswe mentioned this pull request Nov 8, 2018

REQUEST: New membership for @wenjiaswe kubernetes/org#227

Closed

6 tasks

k8s-ci-robot requested a review from neolit123 November 21, 2018 22:25

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 22, 2018

k8s-ci-robot merged commit c5f3779 into kubernetes:master Nov 22, 2018

justaugustus pushed a commit to justaugustus/community that referenced this pull request Dec 1, 2018

Merge pull request kubernetes#2835 from justinsb/kep_etcdadm

89b2183

Add KEP for etcdadm

justinsb mentioned this pull request Dec 14, 2018

Please create repo kubernetes-sigs/etcdadm kubernetes/org#308

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KEP for etcdadm #2835

Add KEP for etcdadm #2835

justinsb commented Oct 22, 2018

justinsb commented Oct 22, 2018

justinsb commented Oct 22, 2018

neolit123 left a comment

timothysc left a comment

timothysc Oct 22, 2018

fabriziopandini Oct 23, 2018

timothysc Oct 22, 2018

k8s-ci-robot commented Oct 22, 2018

jpbetz commented Oct 22, 2018

gyuho Oct 22, 2018

justinsb Oct 22, 2018

gyuho Oct 22, 2018

wenjiaswe Oct 22, 2018 •

edited

Loading

justinsb Oct 23, 2018

danbeaulieu Oct 22, 2018

justinsb Oct 22, 2018

danbeaulieu Oct 22, 2018

justinsb Oct 22, 2018

justinsb commented Oct 23, 2018

fabriziopandini left a comment

fabriziopandini Oct 23, 2018

fabriziopandini Oct 23, 2018

philips commented Oct 23, 2018

justinsb commented Oct 26, 2018

timothysc commented Oct 29, 2018

detiber Oct 29, 2018

justinsb Oct 29, 2018

dlipovetsky Oct 30, 2018

justaugustus commented Nov 20, 2018

neolit123 commented Nov 21, 2018

roberthbailey commented Nov 22, 2018

justaugustus commented Nov 22, 2018


		This results in an multi-node ("HA") etcd cluster.

		#### Automatic Cluster Creation


		### Implementation Details/Notes/Constraints

		* There will be some changes needed to both platform9/etcdadm (e.g. etcd2

Add KEP for etcdadm #2835

Add KEP for etcdadm #2835

Conversation

justinsb commented Oct 22, 2018

justinsb commented Oct 22, 2018

justinsb commented Oct 22, 2018

neolit123 left a comment

Choose a reason for hiding this comment

timothysc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 22, 2018

jpbetz commented Oct 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenjiaswe Oct 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinsb commented Oct 23, 2018

fabriziopandini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philips commented Oct 23, 2018

justinsb commented Oct 26, 2018

timothysc commented Oct 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justaugustus commented Nov 20, 2018

neolit123 commented Nov 21, 2018

roberthbailey commented Nov 22, 2018

justaugustus commented Nov 22, 2018

wenjiaswe Oct 22, 2018 •

edited

Loading