Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC connection fail when --join with TLS enabled #1682

Closed
AstroProfundis opened this issue Aug 14, 2019 · 10 comments · Fixed by #1728
Closed

gRPC connection fail when --join with TLS enabled #1682

AstroProfundis opened this issue Aug 14, 2019 · 10 comments · Fixed by #1728
Assignees
Labels
type/bug The issue is confirmed as a bug.

Comments

@AstroProfundis
Copy link
Contributor

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

When trying to add TLS support to tidb-operator (#750), we found that --join is not working correctly with TLS enabled.

Assume there're pd-0, pd-1 and pd-2, starting sequently. pd-0 starts as normal with --initial-cluster=pd-0=https://pd-0:2380 and TLS enabled, and pd-1 starts with --join=https://pd-0:2380, then pd-1 is not able to connect to pd-0 and exits.

The errors from pd-1 are:

[2019/08/14 05:50:43.611 -04:00] [ERROR] [join.go:180] ["failed to open directory"] [error="open /home/tidb/deploy/pd-server/member: no such file or directory"]
2019/08/14 05:50:43.612 grpclog.go:45: [info] parsed scheme: "endpoint"
2019/08/14 05:50:43.612 grpclog.go:45: [info] ccResolverWrapper: sending new addresses to cc: [{https://172.16.5.146:2380 0  <nil>}]
2019/08/14 05:50:43.613 grpclog.go:60: [warning] grpc: addrConn.createTransport failed to connect to {https://172.16.5.146:2380 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.16.5.146:2380: connect: connection refused". Reconnecting...
2019/08/14 05:50:44.613 grpclog.go:60: [warning] grpc: addrConn.createTransport failed to connect to {https://172.16.5.146:2380 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.16.5.146:2380: connect: connection refused". Reconnecting...
2019/08/14 05:50:46.138 grpclog.go:60: [warning] grpc: addrConn.createTransport failed to connect to {https://172.16.5.146:2380 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.16.5.146:2380: connect: connection refused". Reconnecting...
2019/08/14 05:50:48.301 grpclog.go:60: [warning] grpc: addrConn.createTransport failed to connect to {https://172.16.5.146:2380 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.16.5.146:2380: connect: connection refused". Reconnecting...
2019/08/14 05:50:52.802 grpclog.go:60: [warning] grpc: addrConn.createTransport failed to connect to {https://172.16.5.146:2380 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.16.5.146:2380: connect: connection refused". Reconnecting...
[2019/08/14 05:50:53.613 -04:00] [FATAL] [main.go:85] ["join meet error"] [error="context deadline exceeded"] [stack="github.com/pingcap/log.Fatal\n\t/home/jenkins/workspace/release_tidb_3.0/go/pkg/mod/github.com/pingcap/log@v0.0.0-20190715063458-479153f07ebd/global.go:59\nmain.main\n\t/home/jenkins/workspace/release_tidb_3.0/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:85\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"]

And there errors on pd-0 shows:

[2019/08/14 05:57:27.602 -04:00] [WARN] [grpclog.go:60] ["grpc: Server.Serve failed to complete security handshake from \"172.16.5.147:59118\": tls: first record does not look like a TLS handshake"]

So this might because pd-1 uses a plain connection to try to connect to pd-0, instead of a TLS encrypted connection.

We initially found this issue on a Kubernetes deployment, but it's also reproducible with binary deployment, the procedure are as above:

  • deploy a cluster using tidb-ansible with enable_tls = True in inventory.ini, but do not start it
  • login to pd-0, change pd-0:/path/to/deploy/script/run_pd.sh and set --initial-cluster=pd-0=https://pd-0:2380 (remove other nodes), and start it with sudo systemctl start pd-2379.service
  • login to pd-1, change pd-1:/path/to/deploy/script/run_pd.sh and delete --initial-cluster line, add a --join=https://pd-0:2380 line, then start it with sudo systemctl start pd-2379.service
  • watch pd-0 and pd-1's log
  1. What did you expect to see?

The process work normally

  1. What did you see instead?

pd-1 produce errors and exits

  1. What version of PD are you using (pd-server -V)?
Release Version: v3.0.1
Git Commit Hash: 811ce0b9a1335d1b2a049fd97ef9e186f1c9efc1
Git Branch: HEAD
UTC Build Time:  2019-07-16 01:02:23

This issue also exists with v3.0.2 docker image.

@rleungx
Copy link
Member

rleungx commented Sep 5, 2019

I think the join related parameter should be --join=https://pd-0:2379 instead of --join=https://pd-0:2380 @AstroProfundis

@AstroProfundis
Copy link
Contributor Author

The problem is, when TLS is disabled, --join works normally with port 2380... This parameter comes from "peer-url", and is the same as tidb-ansible does (but ansible don't use --join)

@rleungx
Copy link
Member

rleungx commented Sep 6, 2019

@AstroProfundis The join parameter is used to create etcd client, the actual address for join is still peer-url.

@AstroProfundis
Copy link
Contributor Author

I tried to verify the fix with:

Release Version: v4.0.0-alpha-50-gb66ba448
Git Commit Hash: b66ba4482c5dfb3d976461544b5df3b8442d4d37
Git Branch: master
UTC Build Time:  2019-09-06 08:57:10
  1. With the cmdline parameters unchanged:
/pd-server --data-dir=/var/lib/pd --name=tls-pd-1 --peer-urls=https://0.0.0.0:2380 --advertise-peer-urls=https://tls-pd-1.tls-pd-peer.allenz.svc:2380 --client-urls=https://0.0.0.0:2379 --advertise-client-urls=https://tls-pd-1.tls-pd-peer.allenz.svc:2379 --config=/etc/pd/pd.toml --join=https://tls-pd-0.tls-pd-peer.allenz.svc:2380

The second PD instance still fails to join:

[2019/09/06 11:53:44.943 +00:00] [INFO] [util.go:56] ["Welcome to Placement Driver (PD)"]
[2019/09/06 11:53:44.943 +00:00] [INFO] [util.go:57] [PD] [release-version=v4.0.0-alpha-50-gb66ba448]
[2019/09/06 11:53:44.943 +00:00] [INFO] [util.go:58] [PD] [git-hash=b66ba4482c5dfb3d976461544b5df3b8442d4d37]
[2019/09/06 11:53:44.943 +00:00] [INFO] [util.go:59] [PD] [git-branch=master]
[2019/09/06 11:53:44.943 +00:00] [INFO] [util.go:60] [PD] [utc-build-time="2019-09-06 08:57:10"]
[2019/09/06 11:53:44.943 +00:00] [INFO] [metricutil.go:80] ["disable Prometheus push client"]
[2019/09/06 11:53:44.943 +00:00] [ERROR] [join.go:213] ["failed to open directory"] [error="open /var/lib/pd/member: no such file or directory"]
2019/09/06 11:53:44.945 grpclog.go:45: [info] parsed scheme: "endpoint"
2019/09/06 11:53:44.948 grpclog.go:45: [info] ccResolverWrapper: sending new addresses to cc: [{https://tls-pd-0.tls-pd-peer.allenz.svc:2380 0  <nil>}]
{"level":"warn","ts":"2019-09-06T11:53:44.966Z","caller":"clientv3/retry_interceptor.go:60","msg":"retrying of unary invoker failed","target":"endpoint://client-2ffaae40-3df0-4cc9-bc3f-f41f81843e6c/tls-pd-0.tls-pd-peer.allenz.svc:2380","attempt":0,"error":"rpc error: code = Unavailable desc = transport is closing"}
{"level":"warn","ts":"2019-09-06T11:53:45.967Z","caller":"clientv3/retry_interceptor.go:60","msg":"retrying of unary invoker failed","target":"endpoint://client-2ffaae40-3df0-4cc9-bc3f-f41f81843e6c/tls-pd-0.tls-pd-peer.allenz.svc:2380","attempt":1,"error":"rpc error: code = Unavailable desc = transport is closing"}
{"level":"warn","ts":"2019-09-06T11:53:47.686Z","caller":"clientv3/retry_interceptor.go:60","msg":"retrying of unary invoker failed","target":"endpoint://client-2ffaae40-3df0-4cc9-bc3f-f41f81843e6c/tls-pd-0.tls-pd-peer.allenz.svc:2380","attempt":2,"error":"rpc error: code = Unavailable desc = transport is closing"}
{"level":"warn","ts":"2019-09-06T11:53:50.423Z","caller":"clientv3/retry_interceptor.go:60","msg":"retrying of unary invoker failed","target":"endpoint://client-2ffaae40-3df0-4cc9-bc3f-f41f81843e6c/tls-pd-0.tls-pd-peer.allenz.svc:2380","attempt":3,"error":"rpc error: code = Unavailable desc = transport is closing"}
{"level":"warn","ts":"2019-09-06T11:53:54.342Z","caller":"clientv3/retry_interceptor.go:60","msg":"retrying of unary invoker failed","target":"endpoint://client-2ffaae40-3df0-4cc9-bc3f-f41f81843e6c/tls-pd-0.tls-pd-peer.allenz.svc:2380","attempt":4,"error":"rpc error: code = Unavailable desc = transport is closing"}
{"level":"warn","ts":"2019-09-06T11:53:54.948Z","caller":"clientv3/retry_interceptor.go:60","msg":"retrying of unary invoker failed","target":"endpoint://client-2ffaae40-3df0-4cc9-bc3f-f41f81843e6c/tls-pd-0.tls-pd-peer.allenz.svc:2380","attempt":5,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[2019/09/06 11:53:54.948 +00:00] [FATAL] [main.go:93] ["join meet error"] [error="context deadline exceeded"] [stack="github.com/pingcap/log.Fatal\n\t/home/jenkins/workspace/build_pd_master/go/pkg/mod/github.com/pingcap/log@v0.0.0-20190715063458-479153f07ebd/global.go:59\nmain.main\n\t/home/jenkins/workspace/build_pd_master/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:93\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"]

The error seems unchanged as before

  1. With --join changed to port 2379:
/pd-server --data-dir=/var/lib/pd --name=tls-pd-1 --peer-urls=https://0.0.0.0:2380 --advertise-peer-urls=https://tls-pd-1.tls-pd-peer.allenz.svc:2380 --client-urls=https://0.0.0.0:2379 --advertise-client-urls=https://tls-pd-1.tls-pd-peer.allenz.svc:2379 --config=/etc/pd/pd.toml --join=https://tls-pd-0.tls-pd-peer.allenz.svc:2380:2379

The etcd server started sucessfully:

2019/09/06 12:44:24.861 grpclog.go:45: [info] parsed scheme: "endpoint"
2019/09/06 12:44:24.861 grpclog.go:45: [info] ccResolverWrapper: sending new addresses to cc: [{https://tls-pd-0.tls-pd-peer.allenz.svc:2379 0  <nil>}]
[2019/09/06 12:44:24.897 +00:00] [INFO] [server.go:108] ["PD Config"] [config="{\"client-urls\":\"https://0.0.0.0:2379\",\"peer-urls\":\"https://0.0.0.0:2380\",\"advertise-client-urls\":\"https://tls-pd-1.tls-pd-peer.allenz.svc:2379\",\"advertise-peer-urls\":\"https://tls-pd-1.tls-pd-peer.allenz.svc:2380\",\"name\":\"tls-pd-1\",\"data-dir\":\"/var/lib/pd\",\"force-new-cluster\":false,\"enable-grpc-gateway\":true,\"initial-cluster\":\"tls-pd-0=https://tls-pd-0.tls-pd-peer.allenz.svc:2380,tls-pd-1=https://tls-pd-1.tls-pd-peer.allenz.svc:2380\",\"initial-cluster-state\":\"existing\",\"join\":\"https://tls-pd-0.tls-pd-peer.allenz.svc:2379\",\"lease\":3,\"log\":{\"level\":\"info\",\"format\":\"text\",\"disable-timestamp\":false,\"file\":{\"filename\":\"\",\"log-rotate\":true,\"max-size\":0,\"max-days\":0,\"max-backups\":0},\"development\":false,\"disable-caller\":false,\"disable-stacktrace\":false,\"disable-error-verbose\":true,\"sampling\":null},\"log-file\":\"\",\"log-level\":\"\",\"tso-save-interval\":\"3s\",\"metric\":{\"job\":\"tls-pd-1\",\"address\":\"\",\"interval\":\"15s\"},\"schedule\":{\"max-snapshot-count\":3,\"max-pending-peer-count\":16,\"max-merge-region-size\":20,\"max-merge-region-keys\":200000,\"split-merge-interval\":\"1h0m0s\",\"enable-one-way-merge\":\"false\",\"patrol-region-interval\":\"100ms\",\"max-store-down-time\":\"30m0s\",\"leader-schedule-limit\":4,\"region-schedule-limit\":64,\"replica-schedule-limit\":64,\"merge-schedule-limit\":8,\"hot-region-schedule-limit\":4,\"hot-region-cache-hits-threshold\":3,\"store-balance-rate\":15,\"tolerant-size-ratio\":0,\"low-space-ratio\":0.8,\"high-space-ratio\":0.6,\"scheduler-max-waiting-operator\":3,\"disable-raft-learner\":\"false\",\"disable-remove-down-replica\":\"false\",\"disable-replace-offline-replica\":\"false\",\"disable-make-up-replica\":\"false\",\"disable-remove-extra-replica\":\"false\",\"disable-location-replacement\":\"false\",\"disable-namespace-relocation\":\"false\",\"schedulers-v2\":[{\"type\":\"balance-region\",\"args\":null,\"disable\":false},{\"type\":\"balance-leader\",\"args\":null,\"disable\":false},{\"type\":\"hot-region\",\"args\":null,\"disable\":false},{\"type\":\"label\",\"args\":null,\"disable\":false}]},\"replication\":{\"max-replicas\":3,\"location-labels\":\"region,zone,rack,host\",\"strictly-match-label\":\"false\"},\"namespace\":{},\"pd-server\":{\"use-region-storage\":\"true\"},\"cluster-version\":\"0.0.0\",\"quota-backend-bytes\":\"0 B\",\"auto-compaction-mode\":\"periodic\",\"auto-compaction-retention-v2\":\"1h\",\"TickInterval\":\"500ms\",\"ElectionInterval\":\"3s\",\"PreVote\":true,\"security\":{\"cacert-path\":\"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt\",\"cert-path\":\"/var/lib/pd-tls/cert\",\"key-path\":\"/var/lib/pd-tls/key\"},\"label-property\":null,\"WarningMsgs\":null,\"namespace-classifier\":\"table\",\"DisableStrictReconfigCheck\":false,\"HeartbeatStreamBindInterval\":\"1m0s\",\"LeaderPriorityCheckInterval\":\"1m0s\"}"]
[2019/09/06 12:44:24.912 +00:00] [INFO] [server.go:143] ["start embed etcd"]
[2019/09/06 12:44:24.912 +00:00] [INFO] [etcd.go:117] ["configuring peer listeners"] [listen-peer-urls="[https://0.0.0.0:2380]"]
[2019/09/06 12:44:24.912 +00:00] [INFO] [etcd.go:463] ["starting with peer TLS"] [tls-info="cert = /var/lib/pd-tls/cert, key = /var/lib/pd-tls/key, trusted-ca = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, client-cert-auth = false, crl-file = "] [cipher-suites="[]"]
[2019/09/06 12:44:24.914 +00:00] [INFO] [systime_mon.go:25] ["start system time monitor"]
[2019/09/06 12:44:24.915 +00:00] [INFO] [etcd.go:127] ["configuring client listeners"] [listen-client-urls="[https://0.0.0.0:2379]"]
[2019/09/06 12:44:24.915 +00:00] [INFO] [etcd.go:600] ["pprof is enabled"] [path=/debug/pprof]
[2019/09/06 12:44:24.915 +00:00] [INFO] [etcd.go:297] ["starting an etcd server"] [etcd-version=3.3.0+git] [git-sha="Not provided (use ./build instead of go build)"] [go-version=go1.12] [go-os=linux] [go-arch=amd64] [max-cpu-set=4] [max-cpu-available=4] [member-initialized=false] [name=tls-pd-1] [data-dir=/var/lib/pd] [wal-dir=] [wal-dir-dedicated=] [member-dir=/var/lib/pd/member] [force-new-cluster=false] [heartbeat-interval=500ms] [election-timeout=3s] [initial-election-tick-advance=true] [snapshot-count=100000] [snapshot-catchup-entries=5000] [initial-advertise-peer-urls="[https://tls-pd-1.tls-pd-peer.allenz.svc:2380]"] [listen-peer-urls="[https://0.0.0.0:2380]"] [advertise-client-urls="[https://tls-pd-1.tls-pd-peer.allenz.svc:2379]"] [listen-client-urls="[https://0.0.0.0:2379]"] [listen-metrics-urls="[]"] [cors="[*]"] [host-whitelist="[*]"] [initial-cluster="tls-pd-0=https://tls-pd-0.tls-pd-peer.allenz.svc:2380,tls-pd-1=https://tls-pd-1.tls-pd-peer.allenz.svc:2380"] [initial-cluster-state=existing] [initial-cluster-token=etcd-cluster] [quota-size-bytes=2147483648] [pre-vote=true] [initial-corrupt-check=false] [corrupt-check-time-interval=0s] [auto-compaction-mode=periodic] [auto-compaction-retention=1h0m0s] [auto-compaction-interval=1h0m0s] [discovery-url=] [discovery-proxy=]

But still not fully working:

[2019/09/06 12:44:24.992 +00:00] [INFO] [server.go:662] ["starting initial election tick advance"] [election-ticks=6]
[2019/09/06 12:44:24.992 +00:00] [INFO] [etcd.go:574] ["serving peer traffic"] [address="[::]:2380"]
[2019/09/06 12:44:25.025 +00:00] [INFO] [peer_status.go:51] ["peer became active"] [peer-id=385576565d01bba]
[2019/09/06 12:44:25.025 +00:00] [INFO] [stream.go:424] ["established TCP streaming connection with remote peer"] [stream-reader-type="stream MsgApp v2"] [local-member-id=f70e014ad1908002] [remote-peer-id=385576565d01bba]
[2019/09/06 12:44:25.041 +00:00] [INFO] [stream.go:249] ["set message encoder"] [from=f70e014ad1908002] [to=f70e014ad1908002] [stream-type="stream MsgApp v2"]
[2019/09/06 12:44:25.041 +00:00] [WARN] [stream.go:276] ["established TCP streaming connection with remote peer"] [stream-writer-type="stream MsgApp v2"] [local-member-id=f70e014ad1908002] [remote-peer-id=385576565d01bba]
[2019/09/06 12:44:25.042 +00:00] [INFO] [stream.go:249] ["set message encoder"] [from=f70e014ad1908002] [to=f70e014ad1908002] [stream-type="stream Message"]
[2019/09/06 12:44:25.042 +00:00] [WARN] [stream.go:276] ["established TCP streaming connection with remote peer"] [stream-writer-type="stream Message"] [local-member-id=f70e014ad1908002] [remote-peer-id=385576565d01bba]
[2019/09/06 12:44:25.043 +00:00] [INFO] [stream.go:424] ["established TCP streaming connection with remote peer"] [stream-reader-type="stream Message"] [local-member-id=f70e014ad1908002] [remote-peer-id=385576565d01bba]
[2019/09/06 12:44:25.043 +00:00] [INFO] [raft.go:862] ["f70e014ad1908002 [term: 1] received a MsgApp message with higher term from 385576565d01bba [term: 2]"]
[2019/09/06 12:44:25.043 +00:00] [INFO] [raft.go:712] ["f70e014ad1908002 became follower at term 2"]
[2019/09/06 12:44:25.043 +00:00] [INFO] [node.go:330] ["raft.node: f70e014ad1908002 elected leader 385576565d01bba at term 2"]
[2019/09/06 12:44:25.052 +00:00] [INFO] [cluster.go:344] ["added member"] [cluster-id=fb349a4f363de893] [local-member-id=f70e014ad1908002] [added-peer-id=385576565d01bba] [added-peer-peer-urls="[https://tls-pd-0.tls-pd-peer.allenz.svc:2380]"]
[2019/09/06 12:44:25.053 +00:00] [INFO] [cluster.go:486] ["set initial cluster version"] [cluster-id=fb349a4f363de893] [local-member-id=f70e014ad1908002] [cluster-version=3.3]
[2019/09/06 12:44:25.054 +00:00] [INFO] [capability.go:75] ["enabled capabilities for version"] [cluster-version=3.3]
[2019/09/06 12:44:25.061 +00:00] [INFO] [cluster.go:344] ["added member"] [cluster-id=fb349a4f363de893] [local-member-id=f70e014ad1908002] [added-peer-id=f70e014ad1908002] [added-peer-peer-urls="[https://tls-pd-1.tls-pd-peer.allenz.svc:2380]"]
[2019/09/06 12:44:25.063 +00:00] [INFO] [server.go:1824] ["published local member to cluster through raft"] [local-member-id=f70e014ad1908002] [local-member-attributes="{Name:tls-pd-1 ClientURLs:[https://tls-pd-1.tls-pd-peer.allenz.svc:2379]}"] [request-path=/0/members/f70e014ad1908002/attributes] [cluster-id=fb349a4f363de893] [publish-timeout=11s]
[2019/09/06 12:44:25.067 +00:00] [INFO] [serve.go:191] ["serving client traffic insecurely"] [address="[::]:2379"]
[2019/09/06 12:44:25.070 +00:00] [INFO] [server.go:173] ["create etcd v3 client"] [endpoints="[https://tls-pd-1.tls-pd-peer.allenz.svc:2379]"]
[2019/09/06 12:44:25.089 +00:00] [INFO] [server.go:210] ["init cluster id"] [cluster-id=6733536034233686735]
[2019/09/06 12:44:25.098 +00:00] [WARN] [history_buffer.go:138] ["load history index failed"] [error="leveldb: not found"]
[2019/09/06 12:44:25.098 +00:00] [INFO] [history_buffer.go:146] ["start from history index"] [start-index=0]
[2019/09/06 12:44:25.100 +00:00] [INFO] [namespace_classifier.go:461] ["load namespaces information"] [namespace-count=0] [cost=2.26595ms]
[2019/09/06 12:44:25.104 +00:00] [INFO] [server.go:938] ["server enable region storage"]
[2019/09/06 12:44:25.105 +00:00] [INFO] [server.go:820] ["start watch leader"] [leader="name:\"tls-pd-0\" member_id:253705047027751866 peer_urls:\"https://tls-pd-0.tls-pd-peer.allenz.svc:2380\" client_urls:\"https://tls-pd-0.tls-pd-peer.allenz.svc:2379\" "]
[2019/09/06 12:44:25.118 +00:00] [INFO] [client.go:106] ["server starts to synchronize with leader"] [server=tls-pd-1] [leader=tls-pd-0] [request-index=0]
[2019/09/06 12:44:41.106 +00:00] [ERROR] [redirector.go:98] ["request failed"] [error="Get https://tls-pd-0.tls-pd-peer.allenz.svc:2379/pd/ping: x509: certificate signed by unknown authority"]
[2019/09/06 12:44:41.174 +00:00] [ERROR] [redirector.go:98] ["request failed"] [error="Get https://tls-pd-0.tls-pd-peer.allenz.svc:2379/pd/ping: x509: certificate signed by unknown authority"]
[2019/09/06 12:44:50.773 +00:00] [ERROR] [redirector.go:98] ["request failed"] [error="Get https://tls-pd-0.tls-pd-peer.allenz.svc:2379/pd/ping: x509: certificate signed by unknown authority"]

Missed CA cert somewhere? I can see PD server itself and etcd server are using the correct CA cert trusted-ca = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt.

@rleungx
Copy link
Member

rleungx commented Sep 9, 2019

How do you generate the cacert file and the key file? And how about your config file?

@AstroProfundis
Copy link
Contributor Author

# cat /etc/pd/pd.toml 
[log]
level = "info"
[replication]
location-labels = ["region", "zone", "rack", "host"]

[security]
cacert-path = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
cert-path = "/var/lib/pd-tls/cert"
key-path = "/var/lib/pd-tls/key"

Certs are generated with Kubernetes' certificate management infrastructures, all other certs used (for TiDB, TiKV and for clients) are generated the same way. Other components are also able to connect to PD (if only one PD instance is running, this issue only occurs when trying to run multiple PD instances) without problem.

ca.crt:

-----BEGIN CERTIFICATE-----
MIICyDCCAbCgAwIBAgIBADANBgkqhkiG9w0BAQsFADAVMRMwEQYDVQQDEwprdWJl
cm5ldGVzMB4XDTE5MDkwOTA3NDQ1N1oXDTI5MDkwNjA3NDQ1N1owFTETMBEGA1UE
AxMKa3ViZXJuZXRlczCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAMhN
Lj68yjB9OL5BERSJ/5aFrSS5zCwa9GKM3STo+JwRSMrsR/wU85vZQZqCihxCpuoT
Xg2mvdeo/xPoJL0/Z5HRZebl623c6kaiCi76LiXxyJz7ZQihvU/6f15hOKkI9tok
gpRo4pO9b5LFRHGxH3122c1OXk74lK7Q6DV5Yx5aQsoChtdKOiDJrNYVztvqVne0
DuX3mbGCJAPCSHvk0ssbsLiO9koxyUYJ57D+OSd/Kbi3DGpQPZXSfNaKsXX/P5A2
67Bmz9MEIW4NcoVbvACVKJNkrhoKSib4P0h+CsKh3WMVT/UPZbd56MQLLYV9fBFN
7eXc9WkK+RPQajWndhsCAwEAAaMjMCEwDgYDVR0PAQH/BAQDAgKkMA8GA1UdEwEB
/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADggEBAI34GxTZqaDJZZE1Py735/HLyBG2
E804XrOImfFjb9o5q45JR2LAm4ISIPTrBc9RU11xL8R7T2g+Kf/HAu7xvi92W9AL
S51ohkc56vQLpMcrLbWmHiKz27kgybMDxB+XNeLDEkqFzz7YFuyJpmEy/X1LKBOL
Molf33elwSsv5TnKkQJE6XbO9oTETwPLHZowxSp4fAPs1k8Y0rtBJlFQwLRXqwd1
udc3ItMHU/UDkpapxKXzVqkv+XryipjEIuGcw/Wm/YwT9ge7DwdaEtV0S7xeEhr9
R8TZsiBP74QmuOoS5ZM8eJUKgyR9AlpqCIteCPWip+7QZEGeT/15pBr516U=
-----END CERTIFICATE-----

pd.crt:

-----BEGIN CERTIFICATE-----
MIIDizCCAnOgAwIBAgIUaLqA1mVZhZqMY4dPBApOoSyh73MwDQYJKoZIhvcNAQEL
BQAwFTETMBEGA1UEAxMKa3ViZXJuZXRlczAeFw0xOTA5MDkwODE5MDBaFw0yMDA5
MDgwODE5MDBaMDsxEDAOBgNVBAoTB1BpbmdDQVAxFjAUBgNVBAsTDVRpREIgT3Bl
cmF0b3IxDzANBgNVBAMTBnRscy1wZDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC
AQoCggEBAO2AaMnYG9GZubmnE9ew1pwUcBlcAjd56+Ff+kle5MSh+3XW54NE5t6+
F4P3CEtSZYYoApPnEROttkfWvsMkgddpQ3nKxoWQ7+qbQB5jdnwfV8BCicIXV/to
vKhOOl9dXulIjICJMGGvKHd94oQu2rvOsIfI9wn+za7LDSHmqYF2odT9aR3lMlpH
MgwqD08f+H22OpmKxTVvtvGJ/dKWY8NnUzkbvDeGzNlTAWH9wn+HbplLhx77yACd
50taxTVGocgdjEOx78JSeES2qdMN732Yf4RSGS2txEmH2/QKlw1BbNTz7In/je6Q
CxRoRwUpvNeEsJGvlm+/7BNSFM56xO8CAwEAAaOBrDCBqTAdBgNVHSUEFjAUBggr
BgEFBQcDAgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAdBgNVHQ4EFgQU/xY7bdRQ
LzA/5hhgfXAxfMf4UxEwWwYDVR0RBFQwUoIGdGxzLXBkggt0bHMtcGQtcGVlcoIN
dGxzLXBkLmFsbGVueoISdGxzLXBkLXBlZXIuYWxsZW56ghgqLnRscy1wZC1wZWVy
LmFsbGVuei5zdmMwDQYJKoZIhvcNAQELBQADggEBAI3WKDwiF3DPTvNqBJrOCXJh
E6LAF1WWVStiLB/dKhsh4L2laTZlg8YrDGuj+ZrlIwWBcuncbK7Wjq9peU8dQT+a
p5p7hbgFXpyrTKrWsrIkjtJZXX8TwyfKgrfq1ShkzAUeRO0oA15QFjhObfFkHmm2
r6KzlQhJCPSAp4RjoiAjhLsdlLE24YT3bUxkx/gmNQPg4PMPVlIPKnU8jDvlMWDk
fAFUlJpyo71L+g1yzCSfJM+NS+m2LRPTQwjfASWPvcP6mXcqCsUr9EqJOovQd3Gx
3WjrXlYSY1wUXOqm7L7NYCmEHBduyFZJqVJCTlG+9vTYDqku+zIh4G4Ydc//Wpo=
-----END CERTIFICATE-----

pd.key:

-----BEGIN RSA PRIVATE KEY-----
MIIEpQIBAAKCAQEA7YBoydgb0Zm5uacT17DWnBRwGVwCN3nr4V/6SV7kxKH7ddbn
g0Tm3r4Xg/cIS1JlhigCk+cRE622R9a+wySB12lDecrGhZDv6ptAHmN2fB9XwEKJ
whdX+2i8qE46X11e6UiMgIkwYa8od33ihC7au86wh8j3Cf7NrssNIeapgXah1P1p
HeUyWkcyDCoPTx/4fbY6mYrFNW+28Yn90pZjw2dTORu8N4bM2VMBYf3Cf4dumUuH
HvvIAJ3nS1rFNUahyB2MQ7HvwlJ4RLap0w3vfZh/hFIZLa3ESYfb9AqXDUFs1PPs
if+N7pALFGhHBSm814Swka+Wb7/sE1IUznrE7wIDAQABAoIBAQCo9z8VwpLgBm7U
fuImBGBaQEwULppBH5NKDv9AbatxnRAKIO8qO73IYBLYxsn21FL4I8TZtn02s9JH
v6aNrI5XU0M3BaVA5wFYtkTimb50xdOnK29YT0U/zp7RWn461HGuo/eZhoCOLpAq
mrupcLAbBwwePkJKsSVhooHgSXr0Z7Pfqi7pqRDTFsi12OQYhNQDKP9UI13VUcOe
pVarUsKlBuLPtiMJocXWUdqnzP+dKv3PV0qe/z/1mAxPfgH9PL0B8ys6sz1rchQz
/6ysfgSSjaDDQ7Ezwv+HmF+KoSoTLF9Nz85ZNh2qRUJE5ezaCFiDtTc4STwuLelm
ENrcWFf5AoGBAO4Vqaz8ASyp8x2zFDl0kisTdhbdeE3H9Go8y8Iz0OpSC149dyCo
LBZzCJrnnT/EpmmoowCR5Ma0R8DpEJSGr9B47Dxxz9DE+DP8ADSGOwMF9GMQFcjC
E9P+IEY/dJxTai+/xCaEMWkzSRc9lgQphhINC+7SwmSJ5XGE0pBuuljrAoGBAP9f
g/jnRHv0sFKsi3ut2CA/Hb/JyKw7jsOi0muPhYZ9W9kZLWQqmkIhOGnGOIUtiqyK
EMmAi2U5CA6zL9JTC6rmEXy5C/FCkfAiaS34d8BsibXlOXkSAoROrXUmytQFTo11
eZansCNQFvpfQxg+2or49O1xX+QucFpZSurZjIMNAoGBAKAPCjoUVUnMm3f8+3zA
5L923u3ySD2qTqPZaXaO1UWikKfzlJHs3W7eOQvC6FGFiAcCa0snyfDYJGEJjq77
eVki4lakgPyuXtq78Ptevm+C4lBy8OI9r4zWjKYNZPzvizS8rEbkmj9KTjoEmkUE
EXEzOjF9mVhz6D+P9utItZivAoGAWVaB5b6KP88PLCz+surTVBygfKrL0C0Zuakp
ccWI0c7jJeTf803QH1hd0ussdLLE861tSAD3QxcbkYDwNuUkjMnlzjsySVmfkmGH
aDSnOCMAXijt3UQGq2CW4AgNJvUgUO6K9cB+JyxqjXZsE3xRmhKUJMjn4fy5A3J/
ef9XX7UCgYEAxEQrThLlSL+eBUIo3964pPye4xh371Y+8GnYRyRMDulkysjuSAd4
utzctn3fnrAqral3K6hu2Zt+7KduL3fOu9XTj3uVKnQPDM9zAXgF1Qr58ZkGB4C+
VhB2a6pJijzAoaL9KmC8l0PMbpp7Tx0QrqFJYvxLYtYdG83FigyI17Q=
-----END RSA PRIVATE KEY-----

@rleungx
Copy link
Member

rleungx commented Sep 9, 2019

@AstroProfundis #1740 has fixed the redirect problem.

@AstroProfundis
Copy link
Contributor Author

@rleungx Thanks, I'll update once verified.

@AstroProfundis AstroProfundis reopened this Sep 9, 2019
@disksing
Copy link
Contributor

Hi @AstroProfundis does the fix work out?

@AstroProfundis
Copy link
Contributor Author

Fix confirmed with release-version=v4.0.0-alpha-67-g5c648dc3, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants