Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

doc: dynamic tls design #1644

Merged
merged 1 commit into from
Nov 21, 2017
Merged

doc: dynamic tls design #1644

merged 1 commit into from
Nov 21, 2017

Conversation

hongchaodeng
Copy link
Member

[skip ci]

ref: #1617, #1465

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Nov 9, 2017

Let me make a conclusion attempt:

  • Both option 1 and 2 are feasible, and they are not exclusive.
  • We will start implementing option 2 first. Option 2 is simpler to start with, and has direct use demand from kubeadm.

cc @ericchiang @xiang90 @jamiehannaford

- Each etcd cluster has its own CA.

Cons:
- Inappropriate to give etcd pod access to the CA key?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer: yes, it's inappropriate to give the etcd pod the CA key.

I don't think we should include this as an option.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That was my point during yesterday's meeting where we discussed this.

I don't think we should include this as an option.

SGTM


Similar to option 2, but instead of using init container, etcd operator generates certs and puts them into a secret. Then etcd operator creates an etcd pod with the secret mounted.

The option is not feasible. The pod IP cannot be known before it is created. In self-hosted mode, even if we can know the node addresses, it is still a problem that the node could be deleted and new ones be added. Thus, we can’t specify the SAN until the pod is created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In self hosted mode the Pod IP is the node IP, right? So that is known ahead of time if the etcd-operator schedules the pod itself (like the daemonset controller)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, yeah. I think @hongchaodeng was worried about edge cases where the node IP suddenly changes before the pod is scheduled. I don't think this is much of a problem, though.

If Option 2 is not an option due to security concerns, I'm more of a fan of Option 3 than 1. At least for now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, yeah. I think @hongchaodeng was worried about edge cases where the node IP suddenly changes before the pod is scheduled. I don't think this is much of a problem, though.

That sounds like a super-extreme edge case I don't see happening. At least, I think you'll have way worse problems if you're master node IP changes that frequently. I vote for calculating it based on node IP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @hongchaodeng was worried about edge cases where the node IP suddenly changes before the pod is scheduled.

Does that actually happen?

Copy link
Member Author

@hongchaodeng hongchaodeng Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not edge case. Let me clarify the concerns:

Concern 1: Changing existing nodes

Etcd cluster running on 3 master nodes N1, N2, N3. We are going to replace N3 with N4. In etcd 3.2+, peer's cert SAN needs to include all peer addresses/IPs. Simply adding member of N4 won't be able to join N1 and N2.

Concern 2: How to know what are master nodes?

etcd operator do not know on which node the etcd pod is gonna end up until scheduler makes the decision, and thus do not know what SAN should be included.

If we want to know that, we need to include some scheduling logic in etcd operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peer's cert SAN needs to include all peer addresses/IPs. Simply adding member of N4 won't be able to join N1 and N2.

Can you expand on that? How does this affect Option 3 but not Option 1? Surely you're doing the same thing, i.e. generating a cert bound to a node IP.

etcd operator do not know on which node the etcd pod is gonna end up until scheduler makes the decision, and thus do not know what SAN should be included.

I think Lucas's point is that we can infer it. The operator already has a list of master nodes, so finding the ones that do not have etcd pods on should be simple.

If the node is replaced in the seconds after the certs are generated but before the pod is scheduled, this of course won't work. But I think this is very rare.

Copy link
Member Author

@hongchaodeng hongchaodeng Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on that? How does this affect Option 3 but not Option 1?

ref: etcd-io/etcd#8797

You are right. Option 1 will definitely not work at least for current 3.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I made a mistake in understanding... Sorry for the confusion.

It's the certificate presented by remote peer: https://github.com/golang/go/blob/go1.9/src/crypto/tls/common.go#L169:2

So that means as long as the certificate for each peer includes itself, it should be fine.

Ignore the first concern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the node is replaced in the seconds after the certs are generated but before the pod is scheduled, this of course won't work. But I think this is very rare.

Yeah. We probably do not need to care about that at the beginning.
But this is automated system. And we should definitely need to handle that if we go with option 3.

Cons:
- Inappropriate to give etcd pod access to the CA key?

### Option 3: CA signing cert + key, etcd operator generates certs and secrets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also questions of how clients get client certs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for our use case, kubeadm will generate the CA and client certs upfront before the operator is deployed. Both sets can be stored in the same dir and mounted into the operator

Copy link
Member Author

@hongchaodeng hongchaodeng Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should make it clearer. Here we are only concerned about peer-to-peer serving certs.
Client certs is a much simpler problem because client endpoint is known. I can add that later but let's solve peer certs first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we choose any option, I will write more details about how to generate TLS assets for client, server, peer

@jamiehannaford
Copy link
Contributor

/cc @luxas

Copy link

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I vote for option 3


Cons:
- Share the same CA signing cert + key as k8s.
- Requires etcd to be trusted by k8s cluster CA (probably only appropriate for self-hosted etcd).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is only appropriate for self-hosting. You don't want your app etcd db to be able to access the k8s main etcd.

@ericchiang
Copy link
Contributor

What's the latest plan for HA control planes in kubeadm? What's the TLS plan for the API server?

I think etcd should be able to support option 3, but I'd strongly prefer option 1 for self-hosted etcd bootkube clusters if we ever get there. Having everything go through the CSR endpoints seems way cleaner than the etcd-operator managing its own CA.

@hongchaodeng
Copy link
Member Author

I think etcd should be able to support option 3, but I'd strongly prefer option 1 for self-hosted etcd bootkube clusters if we ever get there. Having everything go through the CSR endpoints seems way cleaner than the etcd-operator managing its own CA.

That's actually a good point -- we could have different methods for self-hosted and normal app cases.

As I mentioned above, option 1 isn't exclusive to option 2 & 3. So I could think of another plan:

  • Implement option 1 first for self-hosted. Actually involved parties here seem to be more interested in self-hosted case.
  • Normal app case can still use option 1 though after we implement it. But we can add more methods (option 2 or 3) to implement normal app cases.

Just to clarify, above plan is based on that consensus made for option 1.

@jamiehannaford
Copy link
Contributor

If Option 1 is the default for self-hosted, who controls the deployment of the auto-approver controller? I assume the operator would deploy it by default. Where is this code going to live? Does CoreOS have enough bandwidth to write and maintain another codebase/sub-project?

My preference is start with Option 3 because it's simpler and actually solves a real-world use case. If we're going to support both options, I think kubeadm will use option 3. Once we know node IPs work well with a pre-generated CA, we can then add CSR support in the 1.10 dev cycle.

But if folks really feel Option 1 is better to implement first, that's fine with me. Just adding my 2c.

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Nov 10, 2017

If Option 1 is the default for self-hosted, who controls the deployment of the auto-approver controller?

Probably go in etcd operator pod?
Do not need to worry about coding and maintenance. We will take care of that. Let's make sure we pick the best and most secure option.

@hongchaodeng
Copy link
Member Author

@jamiehannaford @luxas
Do you guys have any concern on option 1 other than writing the approver component?

I couldn't see any big con for either option 1 or 3. And I want to move forward. Option 1 is still a better choice to move forward because it's what other k8s components does. We just follow the convention instead of reinventing our own and it is the safest way.

@jamiehannaford
Copy link
Contributor

Probably go in etcd operator pod?

You mean like a sidecar container?

Do you guys have any concern on option 1 other than writing the approver component?

Not really, no. If it can be implemented well and covers our use-case, I think it's a good idea. It'd be great if we can get this in before the Nov 22 code freeze, but no worries if that's not realistic. If not, HA etcd will have to wait until kubeadm 1.10.

@jamiehannaford
Copy link
Contributor

I can also help out with this next week if you want @hongchaodeng. Just describe any outstanding work items in issues and CC me.

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Nov 10, 2017

@jamiehannaford
Great! Thanks.

I will start digging & trying details of option 1. I need to do this as a last check and also have a better understanding of potential work. After that I will write down the work items more specifically.

@ericchiang
Copy link
Contributor

Trying to summarize this thread a bit.

Option 1:

  • Etcd pods run with a sidecard/init container that request certificates through the certificates API.
  • Pros:
    • etcd-operator doesn't need to handle a CA private key.
  • Cons:
    • Upstream approver doesn't handle non-node CSRs (though it could in the future). Requires writing a custom CSR approver that approves CSRs made by the etcd pod service account.
    • Requires cluster configuration. Wouldn't work for all etcd clusters created by the operator.
    • Requires a custom sidecard/init container.

Option 3:

  • Etcd-operator gets a CA and CA private key. Signs serving certs and peer certs.
  • Pros:
    • Simple for regular etcd clusters (non self-hosted) because etcd-operator knows the services it's going to create ahead of time.
  • Cons:
    • Hard for self-hosted etcd because the etcd-operator doesn't schedule pods directly onto nodes, so it doesn't know what serving address to sign ahead of time.
    • Self-hosted etcd requires etcd-operator to hold onto the cluster's CA private key (stored in a secret?).

Hinted about this above, but depending on kubeadm's "master join" processes could kubeadm just sign all the certs ahead of time? I haven't caught up with kubeadm's HA plans, but I remember something about a "master join token" in previous proposals. So:

Option 4:

  • Kubeadm uses pre-generated TLS assets. When a new master node joins the cluster, it signs etcd serving certs for that node.

I think some flavor of option 3 would be good for regular etcd clusters created by the operator. The "just generate a CA for this cluster and set up TLS for me" mode.

Giving the etcd-operator the cluster CA is a significant choice. The certificates API was built partially to defer the final signing to external PKI, and going around it is something to be cautious of. However, it's clear that option 1 is a lot of work and isn't going to get done before the 1.9 release.

@hongchaodeng
Copy link
Member Author

I have tried to quick hack on option 1 and made sure it work. A little surprise that came out is that we might need to let user pass in additional SAN for server cert.

For next step, I would update this PR with plan for implementation and testing.

@jamiehannaford One question. Do you want to work on this? If yes, it is fine to just implement self-hosted path and ignore normal app path. It is also fine to leave testing for us.

@jamiehannaford
Copy link
Contributor

@hongchaodeng Yeah, I can help out. Self-hosted path is Option 1, right?

@hongchaodeng
Copy link
Member Author

Self-hosted path is Option 1, right?

Correct.

@hongchaodeng
Copy link
Member Author

Updated with impl. and testing roadmap

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Nov 14, 2017

@jamiehannaford @ericchiang @luxas
There is one more thing to discuss in the design. It's about client+server TLS.

  1. For peer TLS, it requires dynamic TLS because peer addresses by nature are dynamic. We need to generate certs based on the node/Pod IPs. For client+server TLS, it is not required. APIServer talks to etcd servers via static LB DNS. So dynamic TLS isn't really required.
  2. User might want to pass in additional SAN to etcd server cert. Same reason as above. It might needs to pass in LB DNS since it's out of the control of etcd operator.
  3. User might want to control the CA. etcd peer CA can be ignored by user. But user might have etcd client outside kubeadm/k8s that wants to talk to the etcd cluster and thus needs to control etcd server CA.

@jamiehannaford
Copy link
Contributor

APIServer talks to etcd servers via static LB DNS. So dynamic TLS isn't really required.

In kubeadm, the apiserver will talk to the local etcd pod via localhost not a LB, but your point still stands. We shouldn't need dynamic TLS for the serving certs for our use case. @luxas WDYT?

etcd peer CA can be ignored by user. But user might have etcd client outside kubeadm/k8s that wants to talk to the etcd cluster and thus needs to control etcd server CA.

How is the peer CA different from the serving CA? Are you suggesting having different CAs for each?

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Nov 14, 2017

How is the peer CA different from the serving CA?

Peer is about etcd deployment. Deployer/ops owns peer certs.
On the other hand, server is about using etcd. User/developer owns server certs.

Are you suggesting having different CAs for each?

Yes. I couldn't find any docs on this. But this is the same reason as above.

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Nov 14, 2017

So as an update, we are going to implement dynamic TLS for peer only. We will keep using the static TLS for client, server initially. There are other ways to generate TLS certs for client, server. But let's focus on dynamic peer TLS first. And this should get rid of dependency on hostname and unblock kubeadm work.

@hongchaodeng
Copy link
Member Author

hongchaodeng commented Nov 15, 2017

Let's move forward this design doc? We will implement dynamic peer TLS first as first step and unblocks kubeadm. It's open for what is needed for client-server.


## Alternatives

### Option 2: CA signing cert + key, generate certs on init container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


> For initial design, we focus on self-hosted etcd use case, although the same methodology applies to app etcd.

[Kubernetes certificates API][k8s_csr] provides the ability to send and approve CSR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case was covered in the bootkube auto-TLS proposal. Might want to link to that since it talks about this case a little more in depth?

https://docs.google.com/document/d/1POXVGyEoySvSnx_OftQ2CIWM0HCk27j2VZSOR4XVCDg/edit#heading=h.o7y0uvuoaz4a

@hongchaodeng
Copy link
Member Author

Fixed.
This doc has covered only the initial plan to do self-hosted etcd. More specifically, peer ports TLS. We can move forward and supplement once we have more use cases.

@ericchiang
Copy link
Contributor

lgtm, feel free to merge

@hongchaodeng hongchaodeng merged commit 51a9931 into coreos:master Nov 21, 2017
@hongchaodeng hongchaodeng deleted the tlsd branch November 21, 2017 17:37
@raoofm
Copy link

raoofm commented Nov 21, 2017

@hongchaodeng

This doc has covered only the initial plan to do self-hosted etcd.

Does it mean we cannot use the same for etcd as an app, not a kubernetes usecase?

@ericchiang
Copy link
Contributor

@raoofm self-hosted etcd and etcd clusters running in the cluster have different TLS requirements because their networking and trust domain is different.

For self-hosted etcd, which runs in host networking, you have to use the node addresses (even as new nodes are added), and it has to be trusted by the API server. For clusters running on top of kubernetes, the operator can know the service DNS names before it deploys the individual pods. The etcd cluster's certs only have to be trusted by the apps that want to talk to it.

So whatever solution the operator goes with will likely be aimed at one of these use cases initially. We can expand it to the other use case later.

@hongchaodeng
Copy link
Member Author

@raoofm
That's the initial plan. We can gather requirements for app etcd later and then start implementing it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants