Peer TLS SAN check couldn't reason about non-FQDN endpoints #8797

hongchaodeng · 2017-10-31T20:19:18Z

BUG REPORT

etcd version: v3.2.9

Problem

As reported in #8534

Root cause analysis

When a new secure peer connection is made, etcd server tries to verify that its remote network address (IP address) is in the SAN of the TLS cert. See: https://github.com/coreos/etcd/blob/v3.2.9/pkg/transport/listener_tls.go#L122

Then it tries to check if the IP address is in the SAN by doing (https://github.com/coreos/etcd/blob/v3.2.9/pkg/transport/listener_tls.go#L122):

Reverse lookup the given IP's DNS name. Check if the resolved DNS name matches any SAN? Because the resolved DNS name is FQDN, it won't match anything in the SAN including the wildcard one.
Forward lookup all DNS names in the SAN except the wildcards to match with remote IP. However since it couldn't reason about wildcards some valid IPs are missing.

Example walkthrough

Background

search domain: .c
SAN: *.b
Peer host: a.b

Steps:

etcd servers gets remote address 10.1.2.3
reverse DNS lookup 10.1.2.3 -> a.b.c
Match round 1 (reverse lookup):
- a.b.c doesn't match *.b
Match round 2 (forward lookup):
- There is no DNS names other than wildcards in SAN. So nothing to look up.

Result: match failed.

Expectation: a.b should match *.b. It should succeed.

The text was updated successfully, but these errors were encountered:

hongchaodeng · 2017-10-31T20:19:52Z

/cc @hasbro17 @xiang90 @raoofm

xiang90 · 2017-10-31T20:31:43Z

I am not sure this is something etcd can address.

We have two options:

verify the SAN matches remote address
do not verify the SAN matches remote address

We did 2 before, and people asked to tight security to do 1.

To make this work, SAN should either contain fqdn or IP address.

hongchaodeng · 2017-10-31T21:50:30Z

verify the SAN matches remote address

I agree that's the correct thing to happen. But what needs to be clarified is how to "verify"?

SAN is provided by the server to client so that client can verify the server, not the reverse way. Now peer server is trying to use its own cert to verify the client -- this is not what SAN is for. Instead, when peer A receives a connection from peer B, A should actively request B's certificate and verify SAN against B's address that A would talk to (not necessarily the TCP connection remote address).

And that's what original issue is about, when B tried to join A, only B verified A's cert, but A didn't verify B's cert. The correct fix expected would be A would try to verify B's cert.

xiang90 · 2017-10-31T21:55:06Z

I agree that's the correct thing to happen. But what needs to be clarified is how to "verify"? SAN is provided by the server to client so that client can verify the server, not the reverse way.

That was the old behavior, only the client verifies the server. etcd server communicates both way, not just single way. A peer can be both server and client to other peers depending on its role (follower vs leader). If we allow one direction, not the other then it is still a broken model. And the bi-directional verification is what you requested exactly previously.

I want to understand what is really needed before making any further decision on this one.

raoofm · 2017-11-01T14:31:11Z

@hongchaodeng @xiang90 why can't we have the IP address in the SAN by automating generation of certs on the fly, rather than generating them before deploying (creating them as secrets).

I mean we can use the IP address received from the downward API (status.podIP) for the pod and use an init container or a script that generates the certs before the actual etcd container comes up.

hongchaodeng · 2017-11-01T16:45:30Z

@raoofm
That's etcd deployment issue, not related here. More specifically, etcd operator issue.

We do have plan to support that via k8s csr endpoint: coreos/etcd-operator#1465 .

raoofm · 2017-11-17T19:54:01Z

@xiang90 @hongchaodeng any progress here?

wenjiaswe · 2018-10-01T18:20:04Z

@wenjiaswe

wenjiaswe · 2018-10-01T19:45:17Z

@raoofm looks like @hongchaodeng 's initiated coreos/etcd-operator#1644 in etcd-operator for this issue. Is there anything needed here? Otherwise we are closing out the long pending issues.

Please feel free to reopen it if it's still relevant.

hongchaodeng mentioned this issue Oct 31, 2017

*: make default etcd version to 3.2.10 coreos/etcd-operator#1607

Merged

1 task

gyuho mentioned this issue Oct 31, 2017

Documentation: explain TLS changes #8798

Closed

14 tasks

hongchaodeng mentioned this issue Nov 10, 2017

doc: dynamic tls design coreos/etcd-operator#1644

Merged

gyuho added type/feature area/tls labels Feb 25, 2018

wenjiaswe closed this as completed Oct 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peer TLS SAN check couldn't reason about non-FQDN endpoints #8797

Peer TLS SAN check couldn't reason about non-FQDN endpoints #8797

hongchaodeng commented Oct 31, 2017

hongchaodeng commented Oct 31, 2017

xiang90 commented Oct 31, 2017

hongchaodeng commented Oct 31, 2017 •

edited

Loading

xiang90 commented Oct 31, 2017 •

edited

Loading

raoofm commented Nov 1, 2017

hongchaodeng commented Nov 1, 2017

raoofm commented Nov 17, 2017

wenjiaswe commented Oct 1, 2018

wenjiaswe commented Oct 1, 2018

Peer TLS SAN check couldn't reason about non-FQDN endpoints #8797

Peer TLS SAN check couldn't reason about non-FQDN endpoints #8797

Comments

hongchaodeng commented Oct 31, 2017

Problem

Root cause analysis

Example walkthrough

hongchaodeng commented Oct 31, 2017

xiang90 commented Oct 31, 2017

hongchaodeng commented Oct 31, 2017 • edited Loading

xiang90 commented Oct 31, 2017 • edited Loading

raoofm commented Nov 1, 2017

hongchaodeng commented Nov 1, 2017

raoofm commented Nov 17, 2017

wenjiaswe commented Oct 1, 2018

wenjiaswe commented Oct 1, 2018

hongchaodeng commented Oct 31, 2017 •

edited

Loading

xiang90 commented Oct 31, 2017 •

edited

Loading