doc: dynamic tls design #1644

hongchaodeng · 2017-11-09T17:43:39Z

[skip ci]

ref: #1617, #1465

hongchaodeng · 2017-11-09T17:45:53Z

Let me make a conclusion attempt:

Both option 1 and 2 are feasible, and they are not exclusive.
We will start implementing option 2 first. Option 2 is simpler to start with, and has direct use demand from kubeadm.

cc @ericchiang @xiang90 @jamiehannaford

ericchiang · 2017-11-09T19:11:24Z

doc/design/dynamic_tls.md

+- Each etcd cluster has its own CA.
+
+Cons:
+- Inappropriate to give etcd pod access to the CA key?


Answer: yes, it's inappropriate to give the etcd pod the CA key.

I don't think we should include this as an option.

👍 That was my point during yesterday's meeting where we discussed this.

I don't think we should include this as an option.

SGTM

ericchiang · 2017-11-09T19:12:37Z

doc/design/dynamic_tls.md

+
+Similar to option 2, but instead of using init container, etcd operator generates certs and puts them into a secret. Then etcd operator creates an etcd pod with the secret mounted.
+
+The option is not feasible. The pod IP cannot be known before it is created. In self-hosted mode, even if we can know the node addresses, it is still a problem that the node could be deleted and new ones be added. Thus, we can’t specify the SAN until the pod is created.


In self hosted mode the Pod IP is the node IP, right? So that is known ahead of time if the etcd-operator schedules the pod itself (like the daemonset controller)

In theory, yeah. I think @hongchaodeng was worried about edge cases where the node IP suddenly changes before the pod is scheduled. I don't think this is much of a problem, though.

If Option 2 is not an option due to security concerns, I'm more of a fan of Option 3 than 1. At least for now.

In theory, yeah. I think @hongchaodeng was worried about edge cases where the node IP suddenly changes before the pod is scheduled. I don't think this is much of a problem, though.

That sounds like a super-extreme edge case I don't see happening. At least, I think you'll have way worse problems if you're master node IP changes that frequently. I vote for calculating it based on node IP

I think @hongchaodeng was worried about edge cases where the node IP suddenly changes before the pod is scheduled.

Does that actually happen?

That's not edge case. Let me clarify the concerns:

Concern 1: Changing existing nodes

Etcd cluster running on 3 master nodes N1, N2, N3. We are going to replace N3 with N4. In etcd 3.2+, peer's cert SAN needs to include all peer addresses/IPs. Simply adding member of N4 won't be able to join N1 and N2.

Concern 2: How to know what are master nodes?

etcd operator do not know on which node the etcd pod is gonna end up until scheduler makes the decision, and thus do not know what SAN should be included.

If we want to know that, we need to include some scheduling logic in etcd operator.

peer's cert SAN needs to include all peer addresses/IPs. Simply adding member of N4 won't be able to join N1 and N2.

Can you expand on that? How does this affect Option 3 but not Option 1? Surely you're doing the same thing, i.e. generating a cert bound to a node IP.

etcd operator do not know on which node the etcd pod is gonna end up until scheduler makes the decision, and thus do not know what SAN should be included.

I think Lucas's point is that we can infer it. The operator already has a list of master nodes, so finding the ones that do not have etcd pods on should be simple.

If the node is replaced in the seconds after the certs are generated but before the pod is scheduled, this of course won't work. But I think this is very rare.

Can you expand on that? How does this affect Option 3 but not Option 1?

ref: etcd-io/etcd#8797

~~You are right. Option 1 will definitely not work at least for current 3.2~~

Actually I made a mistake in understanding... Sorry for the confusion.

It's the certificate presented by remote peer: https://github.com/golang/go/blob/go1.9/src/crypto/tls/common.go#L169:2

So that means as long as the certificate for each peer includes itself, it should be fine.

Ignore the first concern.

If the node is replaced in the seconds after the certs are generated but before the pod is scheduled, this of course won't work. But I think this is very rare.

Yeah. We probably do not need to care about that at the beginning.
But this is automated system. And we should definitely need to handle that if we go with option 3.

ericchiang · 2017-11-09T19:13:10Z

doc/design/dynamic_tls.md

+Cons:
+- Inappropriate to give etcd pod access to the CA key?
+
+### Option 3: CA signing cert + key, etcd operator generates certs and secrets


Also questions of how clients get client certs.

At least for our use case, kubeadm will generate the CA and client certs upfront before the operator is deployed. Both sets can be stored in the same dir and mounted into the operator

Sorry I should make it clearer. Here we are only concerned about peer-to-peer serving certs.
Client certs is a much simpler problem because client endpoint is known. I can add that later but let's solve peer certs first.

If we choose any option, I will write more details about how to generate TLS assets for client, server, peer

jamiehannaford · 2017-11-09T20:16:57Z

/cc @luxas

luxas

Thanks! I vote for option 3

luxas · 2017-11-09T20:33:51Z

doc/design/dynamic_tls.md

+
+Cons:
+- Share the same CA signing cert + key as k8s.
+- Requires etcd to be trusted by k8s cluster CA (probably only appropriate for self-hosted etcd).


Yeah, this is only appropriate for self-hosting. You don't want your app etcd db to be able to access the k8s main etcd.

ericchiang · 2017-11-09T23:28:51Z

What's the latest plan for HA control planes in kubeadm? What's the TLS plan for the API server?

I think etcd should be able to support option 3, but I'd strongly prefer option 1 for self-hosted etcd bootkube clusters if we ever get there. Having everything go through the CSR endpoints seems way cleaner than the etcd-operator managing its own CA.

hongchaodeng · 2017-11-10T02:31:48Z

I think etcd should be able to support option 3, but I'd strongly prefer option 1 for self-hosted etcd bootkube clusters if we ever get there. Having everything go through the CSR endpoints seems way cleaner than the etcd-operator managing its own CA.

That's actually a good point -- we could have different methods for self-hosted and normal app cases.

As I mentioned above, option 1 isn't exclusive to option 2 & 3. So I could think of another plan:

Implement option 1 first for self-hosted. Actually involved parties here seem to be more interested in self-hosted case.
Normal app case can still use option 1 though after we implement it. But we can add more methods (option 2 or 3) to implement normal app cases.

Just to clarify, above plan is based on that consensus made for option 1.

jamiehannaford · 2017-11-10T10:39:21Z

If Option 1 is the default for self-hosted, who controls the deployment of the auto-approver controller? I assume the operator would deploy it by default. Where is this code going to live? Does CoreOS have enough bandwidth to write and maintain another codebase/sub-project?

My preference is start with Option 3 because it's simpler and actually solves a real-world use case. If we're going to support both options, I think kubeadm will use option 3. Once we know node IPs work well with a pre-generated CA, we can then add CSR support in the 1.10 dev cycle.

But if folks really feel Option 1 is better to implement first, that's fine with me. Just adding my 2c.

hongchaodeng · 2017-11-10T15:31:48Z

If Option 1 is the default for self-hosted, who controls the deployment of the auto-approver controller?

Probably go in etcd operator pod?
Do not need to worry about coding and maintenance. We will take care of that. Let's make sure we pick the best and most secure option.

hongchaodeng · 2017-11-10T17:33:35Z

@jamiehannaford @luxas
Do you guys have any concern on option 1 other than writing the approver component?

I couldn't see any big con for either option 1 or 3. And I want to move forward. Option 1 is still a better choice to move forward because it's what other k8s components does. We just follow the convention instead of reinventing our own and it is the safest way.

jamiehannaford · 2017-11-10T17:58:21Z

Probably go in etcd operator pod?

You mean like a sidecar container?

Do you guys have any concern on option 1 other than writing the approver component?

Not really, no. If it can be implemented well and covers our use-case, I think it's a good idea. It'd be great if we can get this in before the Nov 22 code freeze, but no worries if that's not realistic. If not, HA etcd will have to wait until kubeadm 1.10.

jamiehannaford · 2017-11-10T17:59:08Z

I can also help out with this next week if you want @hongchaodeng. Just describe any outstanding work items in issues and CC me.

hongchaodeng · 2017-11-10T18:02:38Z

@jamiehannaford
Great! Thanks.

I will start digging & trying details of option 1. I need to do this as a last check and also have a better understanding of potential work. After that I will write down the work items more specifically.

ericchiang · 2017-11-10T19:06:27Z

Trying to summarize this thread a bit.

Option 1:

Etcd pods run with a sidecard/init container that request certificates through the certificates API.
Pros:
- etcd-operator doesn't need to handle a CA private key.
Cons:
- Upstream approver doesn't handle non-node CSRs (though it could in the future). Requires writing a custom CSR approver that approves CSRs made by the etcd pod service account.
- Requires cluster configuration. Wouldn't work for all etcd clusters created by the operator.
- Requires a custom sidecard/init container.

Option 3:

Etcd-operator gets a CA and CA private key. Signs serving certs and peer certs.
Pros:
- Simple for regular etcd clusters (non self-hosted) because etcd-operator knows the services it's going to create ahead of time.
Cons:
- Hard for self-hosted etcd because the etcd-operator doesn't schedule pods directly onto nodes, so it doesn't know what serving address to sign ahead of time.
- Self-hosted etcd requires etcd-operator to hold onto the cluster's CA private key (stored in a secret?).

Hinted about this above, but depending on kubeadm's "master join" processes could kubeadm just sign all the certs ahead of time? I haven't caught up with kubeadm's HA plans, but I remember something about a "master join token" in previous proposals. So:

Option 4:

Kubeadm uses pre-generated TLS assets. When a new master node joins the cluster, it signs etcd serving certs for that node.

I think some flavor of option 3 would be good for regular etcd clusters created by the operator. The "just generate a CA for this cluster and set up TLS for me" mode.

Giving the etcd-operator the cluster CA is a significant choice. The certificates API was built partially to defer the final signing to external PKI, and going around it is something to be cautious of. However, it's clear that option 1 is a lot of work and isn't going to get done before the 1.9 release.

hongchaodeng · 2017-11-13T17:07:11Z

I have tried to quick hack on option 1 and made sure it work. A little surprise that came out is that we might need to let user pass in additional SAN for server cert.

For next step, I would update this PR with plan for implementation and testing.

@jamiehannaford One question. Do you want to work on this? If yes, it is fine to just implement self-hosted path and ignore normal app path. It is also fine to leave testing for us.

jamiehannaford · 2017-11-13T17:22:54Z

@hongchaodeng Yeah, I can help out. Self-hosted path is Option 1, right?

hongchaodeng · 2017-11-13T17:38:58Z

Self-hosted path is Option 1, right?

Correct.

hongchaodeng · 2017-11-13T22:24:32Z

Updated with impl. and testing roadmap

hongchaodeng · 2017-11-14T00:02:41Z

@jamiehannaford @ericchiang @luxas
There is one more thing to discuss in the design. It's about client+server TLS.

For peer TLS, it requires dynamic TLS because peer addresses by nature are dynamic. We need to generate certs based on the node/Pod IPs. For client+server TLS, it is not required. APIServer talks to etcd servers via static LB DNS. So dynamic TLS isn't really required.
User might want to pass in additional SAN to etcd server cert. Same reason as above. It might needs to pass in LB DNS since it's out of the control of etcd operator.
User might want to control the CA. etcd peer CA can be ignored by user. But user might have etcd client outside kubeadm/k8s that wants to talk to the etcd cluster and thus needs to control etcd server CA.

jamiehannaford · 2017-11-14T09:56:25Z

APIServer talks to etcd servers via static LB DNS. So dynamic TLS isn't really required.

In kubeadm, the apiserver will talk to the local etcd pod via localhost not a LB, but your point still stands. We shouldn't need dynamic TLS for the serving certs for our use case. @luxas WDYT?

etcd peer CA can be ignored by user. But user might have etcd client outside kubeadm/k8s that wants to talk to the etcd cluster and thus needs to control etcd server CA.

How is the peer CA different from the serving CA? Are you suggesting having different CAs for each?

hongchaodeng · 2017-11-14T17:00:29Z

How is the peer CA different from the serving CA?

Peer is about etcd deployment. Deployer/ops owns peer certs.
On the other hand, server is about using etcd. User/developer owns server certs.

Are you suggesting having different CAs for each?

Yes. I couldn't find any docs on this. But this is the same reason as above.

hongchaodeng · 2017-11-14T17:55:58Z

So as an update, we are going to implement dynamic TLS for peer only. We will keep using the static TLS for client, server initially. There are other ways to generate TLS certs for client, server. But let's focus on dynamic peer TLS first. And this should get rid of dependency on hostname and unblock kubeadm work.

hongchaodeng · 2017-11-15T17:31:24Z

Let's move forward this design doc? We will implement dynamic peer TLS first as first step and unblocks kubeadm. It's open for what is needed for client-server.

ericchiang · 2017-11-15T17:33:38Z

doc/design/dynamic_tls.md

+
+## Alternatives
+
+### Option 2: CA signing cert + key, generate certs on init container


This still needs to be removed.

ericchiang · 2017-11-15T17:39:44Z

doc/design/dynamic_tls.md

+
+> For initial design, we focus on self-hosted etcd use case, although the same methodology applies to app etcd.
+
+[Kubernetes certificates API][k8s_csr] provides the ability to send and approve CSR.


This case was covered in the bootkube auto-TLS proposal. Might want to link to that since it talks about this case a little more in depth?

https://docs.google.com/document/d/1POXVGyEoySvSnx_OftQ2CIWM0HCk27j2VZSOR4XVCDg/edit#heading=h.o7y0uvuoaz4a

hongchaodeng · 2017-11-20T22:36:31Z

Fixed.
This doc has covered only the initial plan to do self-hosted etcd. More specifically, peer ports TLS. We can move forward and supplement once we have more use cases.

ericchiang · 2017-11-21T17:37:11Z

lgtm, feel free to merge

raoofm · 2017-11-21T19:33:01Z

@hongchaodeng

This doc has covered only the initial plan to do self-hosted etcd.

Does it mean we cannot use the same for etcd as an app, not a kubernetes usecase?

ericchiang · 2017-11-21T19:39:38Z

@raoofm self-hosted etcd and etcd clusters running in the cluster have different TLS requirements because their networking and trust domain is different.

For self-hosted etcd, which runs in host networking, you have to use the node addresses (even as new nodes are added), and it has to be trusted by the API server. For clusters running on top of kubernetes, the operator can know the service DNS names before it deploys the individual pods. The etcd cluster's certs only have to be trusted by the apps that want to talk to it.

So whatever solution the operator goes with will likely be aimed at one of these use cases initially. We can expand it to the other use case later.

hongchaodeng · 2017-11-21T20:08:01Z

@raoofm
That's the initial plan. We can gather requirements for app etcd later and then start implementing it.

ericchiang reviewed Nov 9, 2017

View reviewed changes

luxas reviewed Nov 9, 2017

View reviewed changes

hongchaodeng force-pushed the tlsd branch from d86578a to 5ca87c0 Compare November 13, 2017 22:23

jamiehannaford mentioned this pull request Nov 14, 2017

Use podIPs rather than DNS hostnames #1662

Merged

hongchaodeng force-pushed the tlsd branch 2 times, most recently from 5af3207 to 073bd63 Compare November 14, 2017 23:58

ericchiang reviewed Nov 15, 2017

View reviewed changes

doc: dynamic tls design

b9b4eba

hongchaodeng force-pushed the tlsd branch from 073bd63 to b9b4eba Compare November 20, 2017 22:34

hongchaodeng merged commit 51a9931 into coreos:master Nov 21, 2017

hongchaodeng deleted the tlsd branch November 21, 2017 17:37

wenjiaswe mentioned this pull request Oct 1, 2018

Peer TLS SAN check couldn't reason about non-FQDN endpoints etcd-io/etcd#8797

Closed


		Similar to option 2, but instead of using init container, etcd operator generates certs and puts them into a secret. Then etcd operator creates an etcd pod with the secret mounted.

		The option is not feasible. The pod IP cannot be known before it is created. In self-hosted mode, even if we can know the node addresses, it is still a problem that the node could be deleted and new ones be added. Thus, we can’t specify the SAN until the pod is created.


		## Alternatives

		### Option 2: CA signing cert + key, generate certs on init container


		> For initial design, we focus on self-hosted etcd use case, although the same methodology applies to app etcd.

		[Kubernetes certificates API][k8s_csr] provides the ability to send and approve CSR.

doc: dynamic tls design #1644

doc: dynamic tls design #1644

Conversation

hongchaodeng commented Nov 9, 2017

hongchaodeng commented Nov 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hongchaodeng Nov 9, 2017 • edited Loading

Choose a reason for hiding this comment

Concern 1: Changing existing nodes

Concern 2: How to know what are master nodes?

Choose a reason for hiding this comment

hongchaodeng Nov 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hongchaodeng Nov 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamiehannaford commented Nov 9, 2017

luxas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericchiang commented Nov 9, 2017

hongchaodeng commented Nov 10, 2017

jamiehannaford commented Nov 10, 2017

hongchaodeng commented Nov 10, 2017 • edited Loading

hongchaodeng commented Nov 10, 2017

jamiehannaford commented Nov 10, 2017

jamiehannaford commented Nov 10, 2017

hongchaodeng commented Nov 10, 2017 • edited Loading

ericchiang commented Nov 10, 2017

hongchaodeng commented Nov 13, 2017

jamiehannaford commented Nov 13, 2017

hongchaodeng commented Nov 13, 2017

hongchaodeng commented Nov 13, 2017

hongchaodeng commented Nov 14, 2017 • edited Loading

jamiehannaford commented Nov 14, 2017

hongchaodeng commented Nov 14, 2017 • edited Loading

hongchaodeng commented Nov 14, 2017 • edited Loading

hongchaodeng commented Nov 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hongchaodeng commented Nov 20, 2017

ericchiang commented Nov 21, 2017

raoofm commented Nov 21, 2017

ericchiang commented Nov 21, 2017

hongchaodeng commented Nov 21, 2017

hongchaodeng commented Nov 9, 2017 •

edited

Loading

hongchaodeng Nov 9, 2017 •

edited

Loading

hongchaodeng Nov 10, 2017 •

edited

Loading

hongchaodeng Nov 9, 2017 •

edited

Loading

hongchaodeng commented Nov 10, 2017 •

edited

Loading

hongchaodeng commented Nov 10, 2017 •

edited

Loading

hongchaodeng commented Nov 14, 2017 •

edited

Loading

hongchaodeng commented Nov 14, 2017 •

edited

Loading

hongchaodeng commented Nov 14, 2017 •

edited

Loading

hongchaodeng commented Nov 15, 2017 •

edited

Loading