Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns discovery: set initial-advertise-peer-urls as https url #8445

Closed
zyf0330 opened this issue Aug 24, 2017 · 41 comments
Closed

dns discovery: set initial-advertise-peer-urls as https url #8445

zyf0330 opened this issue Aug 24, 2017 · 41 comments
Assignees
Milestone

Comments

@zyf0330
Copy link

zyf0330 commented Aug 24, 2017

I use nginx to proxy etcd server and do ssl termination, it use http2. Etcd uses DNS Discovery. And after I startup, cluster cannot work normally.
This my etcd startup cmd

etcd --name thor{num} --data-dir /var/lib/etcd --initial-cluster-token etcd-cluster --initial-cluster-state new --discovery-srv example.com --initial-advertise-peer-urls https://thor01.example.com:4760 --listen-peer-urls http://127.0.0.1:2380 --advertise-client-urls https://thor01.example.com:4758 --listen-client-urls http://127.0.0.1:2379

log

2017-08-24 15:47:09.139152 I | etcdmain: etcd Version: 3.2.6
2017-08-24 15:47:09.139187 I | etcdmain: Git SHA: 9d43462
2017-08-24 15:47:09.139190 I | etcdmain: Go Version: go1.8.3
2017-08-24 15:47:09.139195 I | etcdmain: Go OS/Arch: linux/amd64
2017-08-24 15:47:09.139198 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2017-08-24 15:47:09.139246 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-08-24 15:47:09.139307 I | embed: listening for peers on http://127.0.0.1:2380
2017-08-24 15:47:09.139337 I | embed: listening for client requests on 127.0.0.1:2379
2017-08-24 15:47:09.146167 I | etcdserver: name = thor01
2017-08-24 15:47:09.146180 I | etcdserver: data dir = /var/lib/etcd
2017-08-24 15:47:09.146185 I | etcdserver: member dir = /var/lib/etcd/member
2017-08-24 15:47:09.146188 I | etcdserver: heartbeat = 100ms
2017-08-24 15:47:09.146191 I | etcdserver: election = 1000ms
2017-08-24 15:47:09.146194 I | etcdserver: snapshot count = 100000
2017-08-24 15:47:09.146204 I | etcdserver: advertise client URLs = https://101.251.220.234:4758
2017-08-24 15:47:09.147013 I | etcdserver: restarting member f5e2bce5ae996f7 in cluster 771871d99223fdd1 at commit index 3
2017-08-24 15:47:09.147049 I | raft: f5e2bce5ae996f7 became follower at term 15
2017-08-24 15:47:09.147058 I | raft: newRaft f5e2bce5ae996f7 [peers: [], term: 15, commit: 3, applied: 0, lastindex: 3, lastterm: 1]
2017-08-24 15:47:09.149173 W | auth: simple token is not cryptographically signed
2017-08-24 15:47:09.150181 I | etcdserver: starting server... [version: 3.2.6, cluster version: to_be_decided]
2017-08-24 15:47:09.150888 I | etcdserver/membership: added member f5e2bce5ae996f7 [http://thor01.example.com:4760] to cluster 771871d99223fdd1
2017-08-24 15:47:09.150975 I | etcdserver/membership: added member fea6d1bf0db64b9 [http://thor02.example.com:4760] to cluster 771871d99223fdd1
2017-08-24 15:47:09.150998 I | rafthttp: starting peer fea6d1bf0db64b9...
2017-08-24 15:47:09.151021 I | rafthttp: started HTTP pipelining with peer fea6d1bf0db64b9
2017-08-24 15:47:09.151254 I | rafthttp: started streaming with peer fea6d1bf0db64b9 (writer)
2017-08-24 15:47:09.151932 I | rafthttp: started streaming with peer fea6d1bf0db64b9 (writer)
2017-08-24 15:47:09.153172 I | rafthttp: started peer fea6d1bf0db64b9
2017-08-24 15:47:09.153194 I | rafthttp: added peer fea6d1bf0db64b9
2017-08-24 15:47:09.153221 I | rafthttp: started streaming with peer fea6d1bf0db64b9 (stream Message reader)
2017-08-24 15:47:09.153237 I | rafthttp: started streaming with peer fea6d1bf0db64b9 (stream MsgApp v2 reader)
2017-08-24 15:47:09.153266 I | etcdserver/membership: added member 3aea22b89d7c833a [http://thor03.example.com:4760] to cluster 771871d99223fdd1
2017-08-24 15:47:09.153280 I | rafthttp: starting peer 3aea22b89d7c833a...
2017-08-24 15:47:09.153308 I | rafthttp: started HTTP pipelining with peer 3aea22b89d7c833a
2017-08-24 15:47:09.153533 I | rafthttp: started streaming with peer 3aea22b89d7c833a (writer)
2017-08-24 15:47:09.154117 I | rafthttp: started streaming with peer 3aea22b89d7c833a (writer)
2017-08-24 15:47:09.154804 I | rafthttp: started peer 3aea22b89d7c833a
2017-08-24 15:47:09.154825 I | rafthttp: added peer 3aea22b89d7c833a
2017-08-24 15:47:09.154839 I | rafthttp: started streaming with peer 3aea22b89d7c833a (stream Message reader)
2017-08-24 15:47:09.154929 I | rafthttp: started streaming with peer 3aea22b89d7c833a (stream MsgApp v2 reader)
2017-08-24 15:47:10.047389 I | raft: f5e2bce5ae996f7 is starting a new election at term 15
2017-08-24 15:47:10.047454 I | raft: f5e2bce5ae996f7 became candidate at term 16
2017-08-24 15:47:10.047467 I | raft: f5e2bce5ae996f7 received MsgVoteResp from f5e2bce5ae996f7 at term 16
2017-08-24 15:47:10.047476 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to fea6d1bf0db64b9 at term 16
2017-08-24 15:47:10.047515 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to 3aea22b89d7c833a at term 16
2017-08-24 15:47:11.147341 I | raft: f5e2bce5ae996f7 is starting a new election at term 16
2017-08-24 15:47:11.147371 I | raft: f5e2bce5ae996f7 became candidate at term 17
2017-08-24 15:47:11.147382 I | raft: f5e2bce5ae996f7 received MsgVoteResp from f5e2bce5ae996f7 at term 17
2017-08-24 15:47:11.147390 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to fea6d1bf0db64b9 at term 17
2017-08-24 15:47:11.147412 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to 3aea22b89d7c833a at term 17
2017-08-24 15:47:12.147331 I | raft: f5e2bce5ae996f7 is starting a new election at term 17
2017-08-24 15:47:12.147369 I | raft: f5e2bce5ae996f7 became candidate at term 18
2017-08-24 15:47:12.147383 I | raft: f5e2bce5ae996f7 received MsgVoteResp from f5e2bce5ae996f7 at term 18
2017-08-24 15:47:12.147396 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to fea6d1bf0db64b9 at term 18
2017-08-24 15:47:12.147408 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to 3aea22b89d7c833a at term 18
2017-08-24 15:47:13.747326 I | raft: f5e2bce5ae996f7 is starting a new election at term 18
2017-08-24 15:47:13.747360 I | raft: f5e2bce5ae996f7 became candidate at term 19
2017-08-24 15:47:13.747370 I | raft: f5e2bce5ae996f7 received MsgVoteResp from f5e2bce5ae996f7 at term 19
2017-08-24 15:47:13.747393 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to fea6d1bf0db64b9 at term 19
2017-08-24 15:47:13.747402 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to 3aea22b89d7c833a at term 19
2017-08-24 15:47:14.153415 W | rafthttp: health check for peer fea6d1bf0db64b9 could not connect: invalid character '<' looking for beginning of value
2017-08-24 15:47:14.154961 W | rafthttp: health check for peer 3aea22b89d7c833a could not connect: invalid character '<' looking for beginning of value
2017-08-24 15:47:15.147290 I | raft: f5e2bce5ae996f7 is starting a new election at term 19
2017-08-24 15:47:15.147317 I | raft: f5e2bce5ae996f7 became candidate at term 20
2017-08-24 15:47:15.147327 I | raft: f5e2bce5ae996f7 received MsgVoteResp from f5e2bce5ae996f7 at term 20
2017-08-24 15:47:15.147335 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to fea6d1bf0db64b9 at term 20
2017-08-24 15:47:15.147341 I | raft: f5e2bce5ae996f7 [logterm: 1, index: 3] sent MsgVote request to 3aea22b89d7c833a at term 20
2017-08-24 15:47:16.150528 E | etcdserver: publish error: etcdserver: request timed out
......

As you see, it shows added member f5e2bce5ae996f7 [http://thor01.example.com:4760] to cluster 771871d99223fdd1. I think it should be https.

This is nginx config

server {
        listen       4760 ssl http2;
        server_name  thor01.example.com;

        ssl_certificate fullchain.pem;
        ssl_certificate_key privkey.pem;
        ssl_protocols       TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers         HIGH:!aNULL:!MD5;
        ssl_session_cache   shared:SSL:10m;
        ssl_session_timeout 10m;

        access_log off;

        location / {
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header Host $host;
                proxy_next_upstream off;
                proxy_pass http://127.0.0.1:2380;
       }
}
server {
        listen       4758 ssl http2;
        server_name  thor01.example.com;

        ssl_certificate fullchain.pem;
        ssl_certificate_key privkey.pem;
        ssl_protocols       TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers         HIGH:!aNULL:!MD5;
        ssl_session_cache   shared:SSL:10m;
        ssl_session_timeout 10m;

        access_log off;

        location / {
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Real-IP $remote_addr;
                proxy_set_header Host $host;
                proxy_next_upstream off;

                proxy_pass http://127.0.0.1:2379;
       }
}
@heyitsanthony
Copy link
Contributor

How are the SRV records configured?

@zyf0330
Copy link
Author

zyf0330 commented Aug 24, 2017

dig +noall +answer SRV _etcd-server._tcp.example.com
_etcd-server._tcp.example.com. 1 IN	SRV	0 0 4760 thor01.example.com.
_etcd-server._tcp.example.com. 1 IN	SRV	0 0 4760 thor02.example.com.
_etcd-server._tcp.example.com. 1 IN	SRV	0 0 4760 thor03.example.com.

@heyitsanthony
Copy link
Contributor

try _etcd-server-ssl._tcp.example.com

@zyf0330
Copy link
Author

zyf0330 commented Aug 24, 2017

You alert me! Thanks!
After set SRV for ssl, I got this in etcd log

rafthttp: health check for peer 20fdc7247054886c could not connect: x509: certificate is valid for thor01.example.com, not example.com
rafthttp: health check for peer 20fdc7247054886c could not connect: x509: certificate is valid for thor03.example.com, not example.com

My cert is from LetsEncrypt

@heyitsanthony
Copy link
Contributor

the certificates seem to be matching example.com instead of the *.example.com

@zyf0330
Copy link
Author

zyf0330 commented Aug 24, 2017

This is cert info from certbot. Cert of LetsEncrypt cannot match wildcard domain.

Certificate Name: thor01.example.com
Domains: thor01.example.com
Expiry Date: 2017-11-22 01:43:00+00:00 (VALID: 89 days)
Certificate Path: ...
Private Key Path: ...

@heyitsanthony
Copy link
Contributor

I believe this is a bug.

Discovery bootstrapping will use discovery-srv (example.com) for the ServerName to avoid a mitm cert attack:

if strings.Contains(clusterStr, "https://") && cfg.PeerTLSInfo.CAFile == "" {
    cfg.PeerTLSInfo.ServerName = cfg.DNSCluster
}

but the machine hosts have subdomain.example.com, so they won't match. It should probably be "*." + cfg.DNSCluster

@zyf0330
Copy link
Author

zyf0330 commented Aug 24, 2017

Thanks a lot! So will you fix it ?

@zyf0330
Copy link
Author

zyf0330 commented Aug 28, 2017

I have attempted, but Go is not my language and solving this problem needs some considers about the whole project. So I am not able to give a fix.

@zyf0330
Copy link
Author

zyf0330 commented Oct 9, 2017

Could you help me about this commit?
I change code like cfg.PeerTLSInfo.ServerName = "*." + cfg.DNSCluster, but it doesn't work. Are commits between Aug 28 and today change matching method of ServerName?

@zyf0330
Copy link
Author

zyf0330 commented Oct 9, 2017

I just test and get

2017-10-09 11:14:36.779103 W | etcdserver: could not get cluster response from https://node01.example.com:4760: Get https://node01.example.com:4760/members: x509: certificate is valid for node01.example.com, not *.example.com

What happens

@xiang90 xiang90 reopened this Oct 9, 2017
@xiang90
Copy link
Contributor

xiang90 commented Oct 9, 2017

/cc @gyuho can you take a look?

@gyuho gyuho self-assigned this Oct 9, 2017
@gyuho
Copy link
Contributor

gyuho commented Oct 9, 2017

@zyf0330 You probably need regenerate certs with wildcard, since we

Discovery bootstrapping will use discovery-srv (example.com) for the ServerName to avoid a mitm cert attack:

@zyf0330
Copy link
Author

zyf0330 commented Oct 10, 2017

Do you mean, I need set ServerName of cert as *.example.com when server's domain is node01.example.com? If you do, I just say that Let's Encrypt doesn't support wildcard domains. And I don't think it is a good idea too.

@gyuho
Copy link
Contributor

gyuho commented Oct 11, 2017

And I don't think it is a good idea too.

Can you explain why? You are already passing --discovery-srv example.com...

@zyf0330
Copy link
Author

zyf0330 commented Oct 11, 2017

Actually, the hard problem is Let's Encrypt doesn't support wildcard domains. About what I said above, Maybe I think of something wrong.

@xiang90
Copy link
Contributor

xiang90 commented Oct 11, 2017

@gyuho well... actually it seems that server name does not support wildcard matching. the fix we have is not exactly what @zyf0330 wants.

@heyitsanthony @lclarkmichalek do you still remember how do we expect this checking to work? i am sure at some point, someone tested it...

@zyf0330
Copy link
Author

zyf0330 commented Oct 11, 2017

I think exact match *.example.com is not a very proper way, and it will limit user choice of cert ServerName. Why not just wildcard match like using regex pattern (.+\.)?${domain}?

@xiang90
Copy link
Contributor

xiang90 commented Oct 11, 2017

@zyf0330 it needs an exact match. regex and wildcare are not supported. so you have to use wildcard in your cert too.

@zyf0330
Copy link
Author

zyf0330 commented Oct 11, 2017 via email

@xiang90 xiang90 changed the title set initial-advertise-peer-urls as https url dns discovery: set initial-advertise-peer-urls as https url Oct 14, 2017
@xiang90 xiang90 removed this from the v3.3.0 milestone Oct 14, 2017
@xiang90 xiang90 added this to the v3.4.0 milestone Oct 14, 2017
@bdhess
Copy link

bdhess commented Oct 23, 2017

I just want to note, since I went down the rabbit hole of trying to get a working TLS config set up with DNS discovery, that perhaps the doc'n could be clearer about the requirements. I ended up getting everything working by adding two entries to the Subject Alt Name on my certificate: a DNS name for the hostname, and a DNS name for the etcd discovery address. A wildcard isn't actually required, just potentially more convenient.

@zyf0330
Copy link
Author

zyf0330 commented Oct 23, 2017

@bdhess Do you set one of DNS name as root domain name? Like this, your etcd address is etcd1.example.com, and you set these two DNS name for cert, they are example.com and etcd1.example.com

@bdhess
Copy link

bdhess commented Oct 23, 2017

Yup, I'm setting CN=hostname.example.com as the subject and adding DNS SAN entries for both hostname.example.com and etcd.example.com.

@zyf0330
Copy link
Author

zyf0330 commented Oct 24, 2017

I agree your way is effect, but I think supporting wildcard domain of SRV Discovery is necessary.
And it's hard for me to add root domain name for my cert issued by Let's Encrypt.

@bdhess
Copy link

bdhess commented Oct 24, 2017

@gyuho @xiang90 @heyitsanthony actually I believe the change in #8651 would break the configuration I suggested. And it's not clear why you should permit wildcard matches.

Basically, the mental model I have is that a particular etcd node is addressable in multiple contexts: either (1.) under the identity of an individual node, in the context of a direct connection, or (2.) under the identity of the cluster, in the context of a discovered connection.

Wildcards are one way to achieve the goal of having a single certificate that's valid for two different identities. But that's definitely not the only way, nor is it preferred. Since the wildcard *.example.com is valid for etcd-node-1.example.com and etcd.example.com, but also for webserver.example.com, database.example.com, and so on. A better approach is to issue a certificate that's only valid for etcd-node-1.example.com and etcd.example.com using SANs.

@zyf0330
Copy link
Author

zyf0330 commented Oct 25, 2017

@bdhess I agreed your way is effect, but it doesn't mean supporting wildcard domain is bad.
There is If etcd is using TLS without a custom certificate authority, the discovery domain (e.g., example.com) must match the SRV record domain (e.g., infra1.example.com). wrote in doc of DNS Discovery. So this feature is what be expected, and now it is a bug that wildcard domain is not supported.
And patch #8651 is a improper fix for me, because I cannot set *.example.com as cert domain name issued by Let's Encrypt. They will support this function next year. I hope that do a regex match to implement what is like using *.example.com.

@bdhess
Copy link

bdhess commented Oct 31, 2017

Custom wildcard matching is bad from a security perspective, because it is inconsistent with standard X.509 idioms. Assume an etcd deployment with an SRV record at app.example.com and hosts node-1.app.example.com and node-2.app.example.com. A PKI identity issued for metrics.app.example.com would apparently be sufficient to join the etcd cluster, even if the cluster administrator doesn't intend for that to be the case.

In a small shop, the etcd cluster administrator and the PKI administrator may be the same person, so the point is moot. But in a larger organization, maintaining effective security controls on etcd requires that the PKI administrator understand the custom etcd security model. This doesn't scale.

@lclarkmichalek
Copy link
Contributor

While enumerating the issues with x509 validation is an entertaining passtime, the fact remains that the expected behaviour of TLS validation is to validate wildcard certificates correctly. If you find wildcard certs to be an issue, then you can work around this by not issuing any wildcard certificates.

To take your example of app.example.com a little more concretely, what I usually see is:

etcd.prd.example.com
n00.etcd.prd.example.com
n01.etcd.prd.example.com
n01.etcd.prd.example.com

Now, if I hand you a certificate for *.etcd.prd.example.com, what would you expect to be able to do?

etcd TLS setup is hard enough as it is, mostly because it breaks a lot of expectations people have (from HTTPS validation usually). The security gain from disallowing wildcard certificates is not worth the breaking of expectations, and the subsequent insecure clusters when people realise that running unauthenticated is a whole lot easier.

@bdhess
Copy link

bdhess commented Oct 31, 2017

I'm not saying that using actual wildcard certificates should be disallowed. I'm saying that the matching rules that are built into etcd shouldn't add wildcarding of their own. I'm specifically talking about changes like #8651.

@xiang90
Copy link
Contributor

xiang90 commented Oct 31, 2017

@bdhess Previously it tries to do a exact match without the subdomain. how would that even get verified correctly? maybe i misunderstood something early in this issue.

i will re-go over the thread later. probably i missed something important.

@bdhess
Copy link

bdhess commented Oct 31, 2017

@xiang90 using subject alt names

Here's an example cert for an etcd node name node1.etcd.example.com that is a part of the etcd.example.com cluster:

 ❯ openssl x509 -in node1.etcd.example.com.cer -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            6e:f0:28:c2:42:01:f2:16:73:70:80:c0:5d:50:86:51:37:37:d6:7f:9d:f4:55:7e:69:64:7d:6a:d6:57:e4:6c
        Signature Algorithm: ecdsa-with-SHA256
        Issuer: CN=etcd.example.com Root Authority
        Validity
            Not Before: Oct 31 18:00:42 2017 GMT
            Not After : Oct 31 19:00:42 2019 GMT
        Subject: CN=node1.etcd.example.com
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
            EC Public Key:
                pub: 
                    04:5d:3f:80:98:3d:7d:f5:39:fe:05:35:41:83:12:
                    cb:ab:09:3f:99:d0:b0:99:06:5d:9b:3f:4f:35:d1:
                    4b:43:b8:f2:79:eb:d9:5a:84:ba:42:d5:61:3c:8d:
                    c0:bc:14:aa:76:8f:8d:31:56:16:98:bf:00:1a:06:
                    aa:62:70:89:e1
                ASN1 OID: prime256v1
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage: 
                TLS Web Client Authentication, TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Alternative Name: 
                DNS:node1.etcd.example.com, DNS:etcd.example.com
    Signature Algorithm: ecdsa-with-SHA256
        30:46:02:21:00:91:3c:75:92:b4:6c:86:7d:c1:c5:6c:65:d7:
        f5:28:2a:44:fa:d9:04:10:51:52:50:40:34:6c:81:07:55:eb:
        92:02:21:00:cb:5f:3c:c6:66:f1:7f:dc:70:57:80:69:ae:7e:
        5b:4b:4a:4c:d1:ae:cc:13:b7:6d:2d:7e:b8:3c:c3:e3:dc:f9
-----BEGIN CERTIFICATE-----
MIIB2DCCAX2gAwIBAgIgbvAowkIB8hZzcIDAXVCGUTc31n+d9FV+aWR9atZX5Gww
CgYIKoZIzj0EAwIwNDEyMDAGA1UEAxMpZXRjZC5leGFtcGxlLmNvbSBFdGNkIEhv
c3QgUm9vdCBBdXRob3JpdHkwHhcNMTcxMDMxMTgwMDQyWhcNMTkxMDMxMTkwMDQy
WjAhMR8wHQYDVQQDExZub2RlMS5ldGNkLmV4YW1wbGUuY29tMFkwEwYHKoZIzj0C
AQYIKoZIzj0DAQcDQgAEXT+AmD199Tn+BTVBgxLLqwk/mdCwmQZdmz9PNdFLQ7jy
eevZWoS6QtVhPI3AvBSqdo+NMVYWmL8AGgaqYnCJ4aN0MHIwDgYDVR0PAQH/BAQD
AgeAMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcDATAMBgNVHRMBAf8EAjAA
MDMGA1UdEQQsMCqCFm5vZGUxLmV0Y2QuZXhhbXBsZS5jb22CEGV0Y2QuZXhhbXBs
ZS5jb20wCgYIKoZIzj0EAwIDSQAwRgIhAJE8dZK0bIZ9wcVsZdf1KCpE+tkEEFFS
UEA0bIEHVeuSAiEAy188xmbxf9xwV4Bprn5bS0pM0a7ME7dtLX64PMPj3Pk=
-----END CERTIFICATE-----

@xiang90
Copy link
Contributor

xiang90 commented Nov 8, 2017

@bdhess @zyf0330 @stephanh @lclarkmichalek

I am inclined to revert #8651.

So if you have 1.example.com and want to use SRV discovery under example.com domain, then you need to have

1.example.com
example.com

as Subject Alternative Name before.

With #8651, you need to have

*.example.com

#8651 might save one SAN entry if you can afford wildcard in SAN. But it does not really bring any value if you want to put exact subdomain in to the SAN.

But @zyf0330 said he cannot really have wildcard cert, so i am not sure what we are fixing here but bring in an incompatibility change.

opinions?

@zyf0330
Copy link
Author

zyf0330 commented Nov 9, 2017

@xiang90 I mean I cannot have a wildcard cert which domain name is *.example.com, but I can have what like 1.example.com which is an exact domain.
#8651 matches just exact domain name *.example.com, but I want it to match domain with regex whose root domain is example.com like 1.example.com or a.example.com or etc.
I just don't clear what @bdhess want now, because he was just opposed to wildcard domain matching. Does he want you to specify a pattern to set domain now?

@xiang90
Copy link
Contributor

xiang90 commented Nov 9, 2017

but I want it to match domain with regex whose

Correct me if I am wrong, but no such thing exists. hostname ONLY support exact match. you either put *.example.com there to match ONLY *.example or example.com to match example.com.

If you think TLS can use regex to do verification, please show me the RFC and probably a go example.

@zyf0330
Copy link
Author

zyf0330 commented Nov 9, 2017

I know.
See error info rafthttp: health check for peer 20fdc7247054886c could not connect: x509: certificate is valid for thor01.example.com, not example.com again.
My question is why probe requests url example.com but gets cert thor01.example.com. If it requests url thor01.example.com, it should get correct cert.
I cannot find where this info appears and AddPeer defined in rafthttp/transport.go is called.

@xiang90
Copy link
Contributor

xiang90 commented Nov 9, 2017

@zyf0330

OK. Now I know why you are confused.

etcd does not try to connect to example.com, but it does expect the cert from its peers to have SAN with example.com (without the patch) or *.example.com (with the patch) when the cluster is bootstrapped with DNS SRV.

The reason for the check is to ensure the DNS SRV discovery is not hijacked to return you domains that has nothing to do with example.com.

We are going to revert the patch, and improve the documentation on this.

If you cannot put the root domain in SAN, the only option for you is to disable the check. Maybe you can submit a PR to add a flag for that. I am not sure about it at the moment though.

@zyf0330
Copy link
Author

zyf0330 commented Nov 10, 2017

I don't understand above explanation.

etcd does not try to connect to example.com, but it does expect the cert from its peers to have SAN with example.com (without the patch) or *.example.com (with the patch) when the cluster is bootstrapped with DNS SRV.

When etcd get domain like 1.example.com from DNS SRV of example.com, if it think *.example.com is valid, why it thinks this situation is invalid that 1.example.com domain has 1.example.com cert ? I just don't know how it decides and where code related to this function is?
And if etcd gets domain 1.example2.com with matching cert 1.example2.com or not-matching cert other.com from DNS SRV of example.com, then I believe it is right etcd treats it as invalid.

@xiang90
Copy link
Contributor

xiang90 commented Nov 10, 2017

It has nothing to do with etcd, but how TLS works in general.I would suggest you to learn how it works at this point.

@zyf0330
Copy link
Author

zyf0330 commented Nov 10, 2017

I just see some code and learn something. When use DNS Discovery mode, cfg.PeerTLSInfo.ServerName is set as "*." + cfg.DNSCluster (with the patch), and is used by all peers url. If use Static Discovery, every peer url uses their self domain matchs cert.
So this is my thought:

  • I know matching cert is not thing of etcd, but etcd set ServerName of TLSInfo for them.
  • I don't see more code, so don't know why etcd doesn't set respective ServerName of TLSInfo for each peer. Please tell me why, thanks.

Finally, if etcd cannot do this, then just revert that patch. Adding a flag to ignore check is not necessary, at least for me.

@bdhess
Copy link

bdhess commented Nov 14, 2017

@xiang90 sorry for the delay in my reply-- I support reverting #8651

@gyuho
Copy link
Contributor

gyuho commented Nov 15, 2017

@bdhess @zyf0330 We've reverted that change in #8884.
And plan to publish another patch release with it, sometime next week.

@gyuho gyuho closed this as completed Nov 15, 2017
zultron added a commit to zultron/etcd that referenced this issue Jun 26, 2018
Using SRV discovery with TLS, the SRV record must be in the DNS SAN or clustering will fail.

This is a new requirement and may cause mysterious failures when upgrading from 3.1 to 3.2.  I was only able to fix this in our configuration after reading through etcd-io#8445; and now I understand the problem it's clear the docs have a hole here.
zultron added a commit to zultron/freeipa-cloud-prov that referenced this issue Jul 9, 2018
In etcd bug 8445, they explain that certs need a domain name as well
as a hostname in the DNS SAN when using SRV records for cluster
discovery.

This would be more elegantly fixed by doing that, of course.  Should
be easy with cfssl, but unknown if FreeIPA will allow it.

etcd-io/etcd#8445
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants