3.3.7 panic: send on closed channel #9956

ekle · 2018-07-23T15:02:33Z

hi,
i got a panic on 3.3.7.
happend today and 10 days ago
version:

 ./etcd --version
etcd Version: 3.3.7
Git SHA: 56536de55
Go Version: go1.9.6
Go OS/Arch: linux/amd64

cmdline:

/opt/etcd/etcd --name xxx --discovery-srv xxx --initial-advertise-peer-urls=https://xxx:2380 --initial-cluster-token xxx --initial-cluster-state new --advertise-client-urls=https://xxxx --listen-client-urls=https://xxx:2379 --listen-peer-urls=https://xxx:2380 --cert-file=/etc/certificates/etcd/server.crt --key-file=/etc/certificates/etcd/server.key --trusted-ca-file=/etc/certificates/etcd/ca.crt --client-cert-auth --peer-cert-file=/etc/certificates/etcd/cluster.crt --peer-key-file=/etc/certificates/etcd/cluster.key --peer-trusted-ca-file=/etc/certificates/etcd/ca.crt --auto-compaction-retention=2 --proxy off

stacktrace:

panic: send on closed channel
goroutine 167547782 [running]:
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).do(0xc4258a4ba0, 0xc4202b0960, 0x411648, 0x10)
#011/tmp/etcd-release-3.3.7/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:169 +0x115
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).Write(0xc4258a4ba0, 0xc420507900, 0xc424ed134a, 0x5, 0x5, 0xc4201b5650, 0x2b, 0x2b, 0xc424ed1350, 0x5, ...)
#011/tmp/etcd-release-3.3.7/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:255 +0xe6
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*serverStream).SendMsg(0xc422327d80, 0xf4d7a0, 0xc42497baa0, 0x0, 0x0)
#011/tmp/etcd-release-3.3.7/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/stream.go:611 +0x275
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*serverStreamWithCtx).SendMsg(0xc422a85e90, 0xf4d7a0, 0xc42497baa0, 0xc42000d2b8, 0xc4204b2df8)
#011<autogenerated>:1 +0x50
github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).SendMsg(0xc422b1dcc0, 0xf4d7a0, 0xc42497baa0, 0x160b820, 0xc425e6f4c0)
#011/tmp/etcd-release-3.3.7/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:179 +0x4b
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*leaseLeaseKeepAliveServer).Send(0xc423b2b640, 0xc42497baa0, 0xc425e6f4c0, 0xc8a648e8a853ccb)
#011/tmp/etcd-release-3.3.7/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:3687 +0x49
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*LeaseServer).leaseKeepAlive(0xc4244d0580, 0x16110c0, 0xc423b2b640, 0xc4258a4ba0, 0xc420507900)
#011/tmp/etcd-release-3.3.7/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/lease.go:138 +0x18d
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*LeaseServer).LeaseKeepAlive.func1(0xc4258a4c60, 0xc4244d0580, 0x16110c0, 0xc423b2b640)
#011/tmp/etcd-release-3.3.7/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/lease.go:89 +0x3f
created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*LeaseServer).LeaseKeepAlive
#011/tmp/etcd-release-3.3.7/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/lease.go:88 +0x91

The text was updated successfully, but these errors were encountered:

cfc4n · 2018-07-24T08:07:23Z

It also happened in my case.I know how to reproduce.
but,It's a bug of grpc,not etcd.Is that right?

gyuho · 2018-07-24T12:17:08Z

@cfc4n Can you provide a way to reproduce?

cfc4n · 2018-07-27T15:08:51Z

yeah, I'm tring...

gyuho · 2018-07-27T16:03:40Z

@cfc4n If you can't reproduce, can you tell me more about how this happened (e.g. sending writes and panic, etc.)?

ticpu · 2018-10-15T23:30:51Z

It seems to happen randomly multiple times a day on the version shipped in Bionic, I had to setup systemd to Restart=on-failure to make sure it stays up.

$ etcd --version
etcd Version: 3.2.17
Git SHA: Not provided (use ./build instead of go build)
Go Version: go1.10
Go OS/Arch: linux/amd64

$ apt-cache policy etcd
etcd:
Installed: 3.2.17+dfsg-1

$ journalctl -u etcd

etcd[3186091]: panic: send on closed channel
etcd[3186091]: goroutine 341451 [running]:
etcd[3186091]: google.golang.org/grpc/transport.(*serverHandlerTransport).do(0xc420215b20, 0xc4264605a0, 0x412d68, 0x8)
etcd[3186091]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/transport/handler_server.go:170 +0x10b
etcd[3186091]: google.golang.org/grpc/transport.(*serverHandlerTransport).WriteStatus(0xc420215b20, 0xc423231b00, 0xc426694310, 0xc426f71110, 0xc427a7cf00)
etcd[3186091]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/transport/handler_server.go:187 +0xbb
etcd[3186091]: google.golang.org/grpc.(*serverStream).RecvMsg.func1(0xc422e6f710, 0xc424c06ca0, 0xe63960, 0xc420206730)
etcd[3186091]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/stream.go:654 +0xb0
etcd[3186091]: google.golang.org/grpc.(*serverStream).RecvMsg(0xc422e6f710, 0xe63960, 0xc420206730, 0xfc7ae0, 0xc426f71110)
etcd[3186091]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/stream.go:668 +0x163
etcd[3186091]: github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).RecvMsg(0xc423255500, 0xe63960, 0xc420206730, 0xe65b20, 0xc420d13e01)
etcd[3186091]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:187 +0x4b
etcd[3186091]: github.com/coreos/etcd/etcdserver/etcdserverpb.(*watchWatchServer).Recv(0xc420206680, 0x0, 0x0, 0x0)
etcd[3186091]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:3258 +0x62
etcd[3186091]: github.com/coreos/etcd/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop(0xc4237cc460, 0xc424cbddc0, 0x0)
etcd[3186091]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:181 +0x4c
etcd[3186091]: github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch.func2(0xc4237cc460, 0xfd9be0, 0xc420206680, 0xc4263b1b60)
etcd[3186091]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:143 +0x40
etcd[3186091]: created by github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch
etcd[3186091]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:142 +0x2b1
systemd[1]: etcd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: etcd.service: Failed with result 'exit-code'.

Since this is a testing environment, we're doing a lot of test reboots, and this happens once in a while too, we get more details but same panic message.

systemd[1]: Started etcd - highly-available key value store.
etcd[3360411]: lost the TCP streaming connection with peer 191ed67d2bee85a8 (stream Message reader)
etcd[3360411]: failed to read 191ed67d2bee85a8 on stream Message (unexpected EOF)
etcd[3360411]: peer 191ed67d2bee85a8 became inactive
etcd[3360411]: lost the TCP streaming connection with peer 191ed67d2bee85a8 (stream MsgApp v2 reader)
etcd[3360411]: peer 191ed67d2bee85a8 became active
etcd[3360411]: closed an existing TCP streaming connection with peer 191ed67d2bee85a8 (stream Message writer)
etcd[3360411]: established a TCP streaming connection with peer 191ed67d2bee85a8 (stream Message writer)
etcd[3360411]: closed an existing TCP streaming connection with peer 191ed67d2bee85a8 (stream MsgApp v2 writer)
etcd[3360411]: established a TCP streaming connection with peer 191ed67d2bee85a8 (stream MsgApp v2 writer)
etcd[3360411]: established a TCP streaming connection with peer 191ed67d2bee85a8 (stream Message reader)
etcd[3360411]: established a TCP streaming connection with peer 191ed67d2bee85a8 (stream MsgApp v2 reader)
etcd[3360411]: 103d3ab12194ae45 [term: 25] received a MsgApp message with higher term from 191ed67d2bee85a8 [term: 26]
etcd[3360411]: 103d3ab12194ae45 became follower at term 26
etcd[3360411]: 103d3ab12194ae45 [logterm: 26, index: 48394, vote: 0] rejected MsgVote from 191ed67d2bee85a8 [logterm: 25, index: 48393] at term 26
etcd[3360411]: panic: send on closed channel
etcd[3360411]: goroutine 776 [running]:
etcd[3360411]: google.golang.org/grpc/transport.(*serverHandlerTransport).do(0xc4271007e0, 0xc4266ade80, 0x412d68, 0x8)
etcd[3360411]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/transport/handler_server.go:170 +0x10b
etcd[3360411]: google.golang.org/grpc/transport.(*serverHandlerTransport).WriteStatus(0xc4271007e0, 0xc42710c600, 0xc42468c200, 0xc4240622d0, 0xc4271c16e0)
etcd[3360411]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/transport/handler_server.go:187 +0xbb
etcd[3360411]: google.golang.org/grpc.(*Server).processStreamingRPC(0xc42014a000, 0xfd96a0, 0xc4271007e0, 0xc42710c600, 0xc420230e70, 0x1642960, 0x0, 0x0, 0x0)
etcd[3360411]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/server.go:942 +0x36f
etcd[3360411]: google.golang.org/grpc.(*Server).handleStream(0xc42014a000, 0xfd96a0, 0xc4271007e0, 0xc42710c600, 0x0)
etcd[3360411]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/server.go:1010 +0x12b1
etcd[3360411]: google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc426680020, 0xc42014a000, 0xfd96a0, 0xc4271007e0, 0xc42710c600)
etcd[3360411]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/server.go:552 +0x9f
etcd[3360411]: created by google.golang.org/grpc.(*Server).serveStreams.func1
etcd[3360411]:         /build/etcd-4uSRAK/etcd-3.2.17+dfsg/_build/src/google.golang.org/grpc/server.go:550 +0xa1
systemd[1]: etcd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: etcd.service: Failed with result 'exit-code'.

ticpu · 2018-10-16T04:54:44Z

So in fact, it seems to happen on all server at once when it happens, here is a log from our syslog server grepping only the servers in quorum.
etcd-crashfest.log

wenjiaswe · 2018-10-18T18:13:29Z

/cc @jpbetz

daxtens · 2018-10-31T23:56:23Z

This happens to me as well. I have etcd providing storage for a kubernetes master (from hyperkube:v1.11.4), and I've installed the etcd from bionic. It just randomly falls over, almost always in < 1 hour.

My etcd version is:

ubuntu@crustle:~$ etcd --version
etcd Version: 3.2.17
Git SHA: Not provided (use ./build instead of go build)
Go Version: go1.10
Go OS/Arch: linux/amd64

ticpu · 2018-11-01T01:22:52Z

It really feels like I should open an issue on Ubuntu's launchpad, feels like a downstream issue.

mcginne · 2018-11-08T15:07:10Z

We have also hit this issue when issuing a hard power off of a host during failure testing.

We had 1332 etcd clusters (3 members each) spread across 111 nodes.
The 111 nodes are spread across 3 zones (so 37 in each)
Each etcd cluster has 1 member in each zone

We performed a test where we did a hard shutdown of all nodes in 1 of the zones. We expected all etcds to maintain quorum as they had 2 members in the remaining zones.

Out of 1332 clusters 32 of them hit this issue at the time the hosts in 1 zone were powered off. The other 1300 clusters survived as expected.

The 32 clusters that hit the issue lost quorum as they had lost 1 member due to the power offs, and 1 other member hit the panic issue, so terminated, leaving just 1 surviving member - so this is not good for HA.

The etcd clusters members that got the panic seemed to be spread across a variety of nodes.
The etcd clusters that got the panic issue had members running on many different nodes in the zone that was powered down.

We ran a similar test previously, but just powering off 3 nodes in a zone, and none of the etcd clusters had any issues - so this seems to be a timing condition that can happen when a host shuts down/crashes that is quite difficult to hit.

We are using etcd-operator to manage the etcdclusters.

In some cases, from the logs it looks like the original peer gets removed from the cluster (as its node was shutdown), and etcd-operator starts another peer, but presumably this peer then dies (as the node it was scheduled on was subsequently shutdown)

2018-11-07 14:07:57.930811 W | etcdserver: read-only range request "key:\"/registry/initializerconfigurations/\" range_end:\"/registry/initializerconfigurations0\" " took too long (129.257275ms) to execute
2018-11-07 14:08:30.832009 W | etcdserver: read-only range request "key:\"/registry/statefulsets\" range_end:\"/registry/statefulsett\" count_only:true " took too long (184.074063ms) to execute
2018-11-07 14:11:55.829403 I | mvcc: store.index: compact 3804958
2018-11-07 14:11:55.831824 I | mvcc: finished scheduled compaction at 3804958 (took 1.473947ms)
2018-11-07 14:13:19.412073 W | etcdserver: read-only range request "key:\"/registry/initializerconfigurations/\" range_end:\"/registry/initializerconfigurations0\" " took too long (126.314268ms) to execute
2018-11-07 14:15:38.989657 I | etcdserver/membership: removed member fbada998cf3a70e5 from cluster 53753c2aa1bcfebe
2018-11-07 14:15:38.989706 I | rafthttp: stopping peer fbada998cf3a70e5...
2018-11-07 14:15:38.991351 I | rafthttp: closed the TCP streaming connection with peer fbada998cf3a70e5 (stream MsgApp v2 writer)
2018-11-07 14:15:38.991383 I | rafthttp: stopped streaming with peer fbada998cf3a70e5 (writer)
2018-11-07 14:15:38.992881 I | rafthttp: closed the TCP streaming connection with peer fbada998cf3a70e5 (stream Message writer)
2018-11-07 14:15:38.992907 I | rafthttp: stopped streaming with peer fbada998cf3a70e5 (writer)
2018-11-07 14:15:38.993208 I | rafthttp: stopped HTTP pipelining with peer fbada998cf3a70e5
2018-11-07 14:15:38.993239 I | rafthttp: stopped streaming with peer fbada998cf3a70e5 (stream MsgApp v2 reader)
2018-11-07 14:15:38.993298 I | rafthttp: stopped streaming with peer fbada998cf3a70e5 (stream Message reader)
2018-11-07 14:15:38.993311 I | rafthttp: stopped peer fbada998cf3a70e5
2018-11-07 14:15:38.993328 I | rafthttp: removed peer fbada998cf3a70e5
2018-11-07 14:16:43.133431 I | etcdserver/membership: added member 88bec81ac7d4f69a [https://etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc:2380] to cluster 53753c2aa1bcfebe
2018-11-07 14:16:43.133485 I | rafthttp: starting peer 88bec81ac7d4f69a...
2018-11-07 14:16:43.133551 I | rafthttp: started HTTP pipelining with peer 88bec81ac7d4f69a
2018-11-07 14:16:43.138484 I | rafthttp: started streaming with peer 88bec81ac7d4f69a (writer)
2018-11-07 14:16:43.139235 I | rafthttp: started peer 88bec81ac7d4f69a
2018-11-07 14:16:43.139273 I | rafthttp: added peer 88bec81ac7d4f69a
2018-11-07 14:16:43.139352 I | rafthttp: started streaming with peer 88bec81ac7d4f69a (writer)
2018-11-07 14:16:43.139386 I | rafthttp: started streaming with peer 88bec81ac7d4f69a (stream MsgApp v2 reader)
2018-11-07 14:16:43.139933 I | rafthttp: started streaming with peer 88bec81ac7d4f69a (stream Message reader)
2018-11-07 14:16:48.140114 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:16:53.140406 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:16:55.850451 I | mvcc: store.index: compact 3805350
2018-11-07 14:16:55.852660 I | mvcc: finished scheduled compaction at 3805350 (took 1.73239ms)
2018-11-07 14:16:58.140694 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:03.141261 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:08.141515 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:13.142290 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:18.142634 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:23.144316 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:28.144576 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:33.145053 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:38.145365 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:43.147031 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:48.147397 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:53.150801 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:17:58.151930 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:18:03.153258 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:18:08.154310 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:18:13.154663 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:18:18.155445 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:18:23.156258 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: i/o timeout
2018-11-07 14:18:28.156580 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: i/o timeout
2018-11-07 14:18:33.156848 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: i/o timeout
2018-11-07 14:18:38.157108 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: read udp 1
72.16.113.134:42711->172.20.0.10:53: i/o timeout
2018-11-07 14:18:43.158100 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: read udp 1
72.16.113.134:42711->172.20.0.10:53: i/o timeout
2018-11-07 14:18:48.158404 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:18:53.158949 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:18:58.159497 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:19:03.159769 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
2018-11-07 14:19:08.160055 W | rafthttp: health check for peer 88bec81ac7d4f69a could not connect: dial tcp: lookup etcd-fakecruiser-00300-17-1-tmbt9xd4gc.etcd-fakecruiser-00300-17-1.kubx-etcd-16.svc on 172.20.0.10:53: no such ho
st
panic: send on closed channel

goroutine 1428 [running]:
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).do(0xc420eb4180, 0xc423963720, 0x411648, 0x2)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:169 +0x115
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).Write(0xc420eb4180, 0xc420cac400, 0xc42415a708, 0x5, 0x5, 0xc422274600, 0x1f1, 0x1f1, 0xc42415a70e, 0x5, ...)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:255 +0xe6
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*serverStream).SendMsg(0xc420c85300, 0xf8bae0, 0xc4239636d0, 0x0, 0x0)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/stream.go:611 +0x275
github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).SendMsg(0xc4217e6800, 0xf8bae0, 0xc4239636d0, 0x3ff0000000000000, 0x3ff0000000000000)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:179 +0x4b
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*watchWatchServer).Send(0xc420ccc170, 0xc4239636d0, 0x0, 0xc421b21ae8)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:3495 +0x49
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*serverWatchStream).sendLoop(0xc420261040)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:345 +0x8ad
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch.func1(0xc420261040)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:132 +0x2b
created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:131 +0x29b

But I also have cases where the panic occurs before another member is scheduled:

2018-11-07 14:04:53.782030 I | mvcc: store.index: compact 2242113
2018-11-07 14:04:53.783298 I | mvcc: finished scheduled compaction at 2242113 (took 947.153µs)
2018-11-07 14:09:53.792107 I | mvcc: store.index: compact 2242504
2018-11-07 14:09:53.793893 I | mvcc: finished scheduled compaction at 2242504 (took 1.386502ms)
2018-11-07 14:11:41.279838 W | rafthttp: lost the TCP streaming connection with peer e2be25a05c05f6fa (stream MsgApp v2 reader)
2018-11-07 14:11:41.279916 E | rafthttp: failed to read e2be25a05c05f6fa on stream MsgApp v2 (read tcp 172.16.127.52:33756->172.16.119.144:2380: i/o timeout)
2018-11-07 14:11:41.279927 I | rafthttp: peer e2be25a05c05f6fa became inactive
2018-11-07 14:11:41.280494 W | rafthttp: lost the TCP streaming connection with peer e2be25a05c05f6fa (stream Message reader)
2018-11-07 14:14:53.809462 I | mvcc: store.index: compact 2242885
2018-11-07 14:14:53.810581 I | mvcc: finished scheduled compaction at 2242885 (took 810.65µs)
2018-11-07 14:19:53.822868 I | mvcc: store.index: compact 2243245
2018-11-07 14:19:53.824116 I | mvcc: finished scheduled compaction at 2243245 (took 919.883µs)
2018-11-07 14:23:05.440861 I | etcdserver: start to snapshot (applied: 3330335, lastsnap: 3320334)
2018-11-07 14:23:05.444606 I | etcdserver: saved snapshot at index 3330335
2018-11-07 14:23:05.445155 I | etcdserver: compacted raft log at 3325335
2018-11-07 14:23:05.960706 I | pkg/fileutil: purged file /var/etcd/data/member/snap/0000000000000002-0000000000320dca.snap successfully
panic: send on closed channel

goroutine 2691 [running]:
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).do(0xc42051f2c0, 0xc42231d4a0, 0x411648, 0x2)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:169 +0x115
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).Write(0xc42051f2c0, 0xc420d3d400, 0xc421d180d8, 0x5, 0x5, 0xc423fac6c0, 0x203, 0x203, 0xc421d180de, 0x5, ...)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:255 +0xe6
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*serverStream).SendMsg(0xc42052b100, 0xf8bae0, 0xc42231d450, 0x0, 0x0)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/stream.go:611 +0x275
github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).SendMsg(0xc421b33860, 0xf8bae0, 0xc42231d450, 0x3ff0000000000000, 0x3ff0000000000000)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:179 +0x4b
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*watchWatchServer).Send(0xc421cc6010, 0xc42231d450, 0x0, 0xc4226ffae8)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:3495 +0x49
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*serverWatchStream).sendLoop(0xc421bd8960)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:345 +0x8ad
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch.func1(0xc421bd8960)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:132 +0x2b
created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:131 +0x29b

Etcd version:

2018-10-18 15:56:26.579220 I | etcdmain: etcd Version: 3.3.1
2018-10-18 15:56:26.579237 I | etcdmain: Git SHA: 28f3f26c0
2018-10-18 15:56:26.579245 I | etcdmain: Go Version: go1.9.4
2018-10-18 15:56:26.579253 I | etcdmain: Go OS/Arch: linux/amd64

gyuho · 2018-11-08T16:05:28Z

In some cases, from the logs it looks like the original peer gets removed from the cluster (as its node was shutdown), and etcd-operator starts another peer, but presumably this peer then dies

So, this panic happens in the new node?

mcginne · 2018-11-08T16:14:25Z

No, the panic is always on one of the existing nodes that was not shutdown (Although it is caused by one of its peers being shutdown)

mcginne · 2018-12-13T14:39:22Z

Does anyone have any thoughts on a potential fix for this? I can recreate it quite easily on our test system. We haven't hit it in prod yet, but it's quite serious if this does occur.
The fact the panic is coming from under go-grpc-prometheus/server_metrics.go:179 makes it look like the issue is coming from the code collecting grpc metrics - quite frustrating to lose the etcd cluster because of a panic under metrics code...

I'd be quite happy to have the option to sacrifice the GRPC metrics if it meant we could avoid this panic.

hexfusion · 2018-12-13T15:21:57Z

Does anyone have any thoughts on a potential fix for this? I can recreate it quite easily on our test system.

Reviewing this thread it is unclear to me the reproducible steps can you outline them? Once we can reproduce this issue we can help resolve.

mcginne · 2018-12-13T15:51:08Z

I'm not sure how easy it will be to reproduce, as in our case we hit it something like 32 out of 1300 cases..
My scenario that hits this regularly requires a 3 member etcd cluster - in our case it is backing store for Kubernetes, so it is processing constant traffic.
If I then do a hard shutdown of one of the hosts (VSI) I sometimes hit the issue on one of the other etcd members in the cluster.
In our case we are actually running 1332 etcd clusters across 111 nodes, and when I shutdown 37 of the nodes we usually see the issue on between 10 - 40 of the etcd clusters.

I'm not sure if having multiple etcds on the same node is contributing, or if you would hit this with fewer etcds on each node. The fact others have seen it probably means it can happen at lower scale, it's just that running at such scale we increase the chances of hitting whatever timing condition is causing it.

So in terms of reproduction I think it's just a matter of issuing a hard shutdown - but it seems the window to hit the issue is small.

If there's any debug patch or anything that would help to diagnose I should be able to run that on our test system.

hexfusion · 2018-12-13T16:18:14Z

So in terms of reproduction I think it's just a matter of issuing a hard shutdown - but it seems the window to hit the issue is small.

@mcginne could you try to recreate this in your testing ENV in a small scale such as single cluster on a single node? Use something to randomly perform the hard shutdown and see if you capture the panic logging. If you can get that far it would be greatly appreciated and will go a long way to getting to the bottom of this. Meanwhile I will take a look at the panic and see if @gyuho has any ideas.

mcginne · 2018-12-19T13:30:55Z

Just so you're aware I have setup 3 nodes with a single etcd cluster running across them. I have it running in a loop to power-off/power-on one of the hosts in the hope of recreating the issue at smaller scale. It has currently got through ~50 loops with no issues so far, but will leave it running.

hexfusion · 2018-12-19T13:38:53Z

@mcginne awesome! If anyone is going to recreate this is will be you! Thanks a lot for doing this and please keep us posted if you have any questions.

relyt0925 · 2019-01-11T05:08:37Z

Adding to this we saw it again. This was a 3 node etcd cluster

peer 1's logs

2019-01-08 23:31:44.935233 W | rafthttp: health check for peer 99a6e5492354482e could not connect: dial tcp 172.16.100.214:2380: i/o timeout
2019-01-08 23:31:45.130200 W | rafthttp: health check for peer 7733eb6f83e0e73 could not connect: dial tcp 172.16.0.112:2380: i/o timeout
2019-01-08 23:31:45.866573 I | embed: rejected connection from "172.16.233.128:12815" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:46.549867 I | embed: rejected connection from "172.16.233.128:22172" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:46.683891 I | embed: rejected connection from "172.16.233.128:14865" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:46.942291 I | embed: rejected connection from "172.16.233.128:53909" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:47.102318 I | embed: rejected connection from "172.16.233.128:17024" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:47.102758 I | embed: rejected connection from "172.16.233.128:41870" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:47.416660 I | embed: rejected connection from "172.16.233.128:40391" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:48.657340 I | embed: rejected connection from "172.16.233.128:63993" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:49.208777 I | embed: rejected connection from "172.16.233.128:38989" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:49.280381 I | embed: rejected connection from "172.16.233.128:36104" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:49.343921 I | embed: rejected connection from "172.16.233.128:45501" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:49.345204 I | embed: rejected connection from "172.16.233.128:5915" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:49.397227 I | embed: rejected connection from "10.127.72.17:34614" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:49.935455 W | rafthttp: health check for peer 99a6e5492354482e could not connect: dial tcp 172.16.100.214:2380: i/o timeout
2019-01-08 23:31:50.130474 W | rafthttp: health check for peer 7733eb6f83e0e73 could not connect: dial tcp 172.16.0.112:2380: i/o timeout
2019-01-08 23:31:50.861716 I | embed: rejected connection from "172.16.233.128:52025" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:51.266883 I | embed: rejected connection from "172.16.233.128:9514" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:51.716401 I | embed: rejected connection from "172.16.233.128:40469" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:53.722939 I | embed: rejected connection from "172.16.233.128:21346" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:53.758902 I | embed: rejected connection from "172.16.233.128:41267" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:53.761387 I | embed: rejected connection from "172.16.233.128:23906" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:54.935664 W | rafthttp: health check for peer 99a6e5492354482e could not connect: dial tcp 172.16.100.214:2380: i/o timeout
2019-01-08 23:31:55.131524 W | rafthttp: health check for peer 7733eb6f83e0e73 could not connect: dial tcp 172.16.0.112:2380: i/o timeout
2019-01-08 23:31:55.529517 I | embed: rejected connection from "172.16.149.128:48568" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:31:55.570072 W | etcdserver/api/v3rpc: failed to send watch response to gRPC stream ("rpc error: code = Unavailable desc = transport is closing")
2019-01-08 23:31:55.570334 W | etcdserver/api/v3rpc: failed to send watch response to gRPC stream ("rpc error: code = Unavailable desc = transport is closing")
panic: send on closed channel

goroutine 2486646159 [running]:
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).do(0xc431e2c4e0, 0xc433c86b90, 0x411648, 0x2)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:169 +0x115
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).Write(0xc431e2c4e0, 0xc427d71500, 0xc438d58fb8, 0x5, 0x5, 0xc43975c000, 0x5aa, 0x5aa, 0xc438d58fbe, 0x5, ...)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:255 +0xe6
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*serverStream).SendMsg(0xc425b55780, 0xf8bae0, 0xc433c86b40, 0x0, 0x0)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/stream.go:611 +0x275
github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).SendMsg(0xc429147d00, 0xf8bae0, 0xc433c86b40, 0x3ff0000000000000, 0x3ff0000000000000)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:179 +0x4b
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*watchWatchServer).Send(0xc429aeb290, 0xc433c86b40, 0x1, 0xc42aba0548)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:3495 +0x49
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*serverWatchStream).sendLoop(0xc423f6fea0)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:345 +0x8ad
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch.func1(0xc423f6fea0)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:132 +0x2b
created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:131 +0x29b

peers 2 logs:

2019-01-08 23:22:59.968392 I | mvcc: store.index: compact 203737831
2019-01-08 23:23:00.088395 I | mvcc: finished scheduled compaction at 203737831 (took 114.027972ms)
2019-01-08 23:23:01.819083 I | embed: rejected connection from "172.16.233.128:24494" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:05.180764 I | embed: rejected connection from "172.16.132.132:52830" (error "EOF", ServerName "")
2019-01-08 23:23:13.900669 I | embed: rejected connection from "172.16.233.128:55755" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:20.800852 I | embed: rejected connection from "172.16.233.128:49692" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:21.969389 I | embed: rejected connection from "172.16.233.128:8497" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:22.369821 I | embed: rejected connection from "172.16.233.128:7990" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:28.190057 I | embed: rejected connection from "172.16.233.128:43183" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:28.653442 I | embed: rejected connection from "172.16.233.128:60902" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:29.263253 I | embed: rejected connection from "172.16.233.128:58410" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:38.663006 I | embed: rejected connection from "172.16.233.128:17856" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:43.425465 I | embed: rejected connection from "172.16.233.128:45143" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:45.250362 I | embed: rejected connection from "172.16.132.132:53122" (error "EOF", ServerName "")
2019-01-08 23:23:45.838758 I | embed: rejected connection from "172.16.233.128:4097" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:46.497070 I | embed: rejected connection from "172.16.233.128:15084" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:46.654271 I | embed: rejected connection from "172.16.233.128:65208" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:46.906865 I | embed: rejected connection from "172.16.233.128:46269" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:47.077042 I | embed: rejected connection from "172.16.233.128:34188" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:49.179995 I | embed: rejected connection from "172.16.233.128:23493" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:49.371474 I | embed: rejected connection from "172.16.219.128:33984" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:50.591717 I | embed: rejected connection from "172.16.233.128:51087" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:53.731111 I | embed: rejected connection from "172.16.233.128:19774" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:03.472036 I | embed: rejected connection from "172.16.233.128:39624" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:20.803456 I | embed: rejected connection from "172.16.233.128:50490" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:20.824384 I | embed: rejected connection from "172.16.233.128:6305" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:22.373378 I | embed: rejected connection from "172.16.233.128:8338" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:23.561307 I | embed: rejected connection from "172.16.233.128:8958" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:23.910082 I | embed: rejected connection from "172.16.233.128:4187" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:27.140366 I | embed: rejected connection from "172.16.233.128:17280" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:29.265277 I | embed: rejected connection from "172.16.233.128:44052" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:29.266538 I | embed: rejected connection from "172.16.233.128:58768" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:39.228656 I | embed: rejected connection from "172.16.233.128:27617" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:39.630815 I | embed: rejected connection from "172.16.233.128:64386" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:39.744563 I | embed: rejected connection from "172.16.233.128:64743" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:41.407853 I | wal: segmented wal file /var/etcd/data/member/wal/0000000000000542-00000000014a4b0c.wal is created
2019-01-08 23:24:42.065607 I | pkg/fileutil: purged file /var/etcd/data/member/wal/000000000000053d-0000000001494cba.wal successfully
2019-01-08 23:24:43.327190 I | embed: rejected connection from "172.16.233.128:8928" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:45.362476 I | embed: rejected connection from "172.16.132.132:53500" (error "EOF", ServerName "")
2019-01-08 23:24:46.910891 I | embed: rejected connection from "172.16.233.128:46539" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:47.075454 I | embed: rejected connection from "172.16.233.128:8179" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:49.261786 I | embed: rejected connection from "172.16.233.128:34218" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:49.323385 I | embed: rejected connection from "172.16.233.128:2155" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:55.379410 I | embed: rejected connection from "172.16.132.132:53562" (error "EOF", ServerName "")
2019-01-08 23:24:57.022588 I | embed: rejected connection from "172.16.233.128:51508" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:59.507221 I | embed: rejected connection from "172.16.233.128:45654" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:25:15.415427 I | embed: rejected connection from "172.16.132.132:53694" (error "EOF", ServerName "")
panic: send on closed channel

goroutine 24200312 [running]:
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).do(0xc427c7e540, 0xc4203f0be0, 0x411648, 0x2)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:169 +0x115
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).Write(0xc427c7e540, 0xc427e86800, 0xc42322d050, 0x5, 0x5, 0xc42a204500, 0x248, 0x248, 0xc42322d056, 0x5, ...)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:255 +0xe6
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*serverStream).SendMsg(0xc42a9be400, 0xf8bae0, 0xc4203f0b90, 0x0, 0x0)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/stream.go:611 +0x275
github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).SendMsg(0xc422888320, 0xf8bae0, 0xc4203f0b90, 0x3ff0000000000000, 0x3ff0000000000000)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:179 +0x4b
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*watchWatchServer).Send(0xc428292d50, 0xc4203f0b90, 0x7, 0xc42881e4a8)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:3495 +0x49
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*serverWatchStream).sendLoop(0xc421098e60)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:345 +0x8ad
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch.func1(0xc421098e60)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:132 +0x2b
created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:131 +0x29b

instance 3's logs

2019-01-08 23:23:33.196517 I | etcdserver: start to snapshot (applied: 21644854, lastsnap: 21634853)
2019-01-08 23:23:33.222556 I | etcdserver: saved snapshot at index 21644854
2019-01-08 23:23:33.222729 I | etcdserver: compacted raft log at 21639854
2019-01-08 23:23:35.232887 I | embed: rejected connection from "172.16.132.132:53024" (error "EOF", ServerName "")
2019-01-08 23:23:36.741398 I | pkg/fileutil: purged file /var/etcd/data/member/snap/0000000000000006-00000000014982e1.snap successfully
2019-01-08 23:23:36.901221 W | etcdserver: read-only range request "key:\"/registry/secrets\" range_end:\"/registry/secrett\" count_only:true " took too long (148.206229ms) to execute
2019-01-08 23:23:38.997467 I | embed: rejected connection from "172.16.233.128:9896" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:39.226457 I | embed: rejected connection from "172.16.233.128:27253" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:39.741538 I | embed: rejected connection from "172.16.233.128:63587" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:40.524709 I | embed: rejected connection from "172.16.233.128:50071" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:47.071693 I | embed: rejected connection from "172.16.233.128:7940" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:51.240236 I | embed: rejected connection from "172.16.233.128:3526" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:51.694661 I | embed: rejected connection from "172.16.233.128:36541" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:55.276754 I | embed: rejected connection from "172.16.132.132:53186" (error "EOF", ServerName "")
2019-01-08 23:23:55.499360 I | embed: rejected connection from "172.16.149.128:47656" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:23:59.504589 I | embed: rejected connection from "172.16.233.128:45542" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:01.822749 I | embed: rejected connection from "172.16.233.128:24696" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:15.312683 I | embed: rejected connection from "172.16.132.132:53324" (error "EOF", ServerName "")
2019-01-08 23:24:18.493390 I | embed: rejected connection from "172.16.233.128:18320" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:20.790775 I | embed: rejected connection from "172.16.233.128:58013" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:23.664415 I | embed: rejected connection from "172.16.233.128:34663" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:24.275505 I | embed: rejected connection from "172.16.233.128:46282" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:25.330540 I | embed: rejected connection from "172.16.132.132:53380" (error "EOF", ServerName "")
2019-01-08 23:24:27.319741 I | embed: rejected connection from "172.16.233.128:14217" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:28.193509 I | embed: rejected connection from "172.16.233.128:43423" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:28.491085 I | embed: rejected connection from "172.16.233.128:55430" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:28.657285 I | embed: rejected connection from "172.16.233.128:61154" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:35.346440 I | embed: rejected connection from "172.16.132.132:53444" (error "EOF", ServerName "")
2019-01-08 23:24:38.666616 I | embed: rejected connection from "172.16.233.128:18206" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:39.268167 I | embed: rejected connection from "172.16.233.128:3692" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:40.526129 I | embed: rejected connection from "172.16.233.128:53155" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:43.427438 I | embed: rejected connection from "172.16.233.128:45449" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:46.499424 I | embed: rejected connection from "172.16.233.128:15488" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:46.658151 I | embed: rejected connection from "172.16.233.128:1027" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:47.388624 I | embed: rejected connection from "172.16.233.128:25539" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:48.630134 I | embed: rejected connection from "172.16.233.128:60341" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:49.183169 I | embed: rejected connection from "172.16.233.128:24229" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:49.321561 I | embed: rejected connection from "172.16.233.128:38309" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:49.375322 I | embed: rejected connection from "172.16.219.128:34060" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:51.245082 I | embed: rejected connection from "172.16.233.128:6478" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:51.696712 I | embed: rejected connection from "172.16.233.128:37119" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:24:56.432382 I | embed: rejected connection from "172.16.233.128:52210" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:25:01.826662 I | embed: rejected connection from "172.16.233.128:24890" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:25:01.846917 I | embed: rejected connection from "172.16.233.128:3861" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:25:03.138818 I | embed: rejected connection from "172.16.233.128:65256" (error "tls: client offered an unsupported, maximum protocol version of 300", ServerName "")
2019-01-08 23:25:05.397012 I | embed: rejected connection from "172.16.132.132:53626" (error "EOF", ServerName "")
2019-01-08 23:25:16.157851 W | rafthttp: lost the TCP streaming connection with peer 99a6e5492354482e (stream Message reader)
2019-01-08 23:25:16.157887 W | rafthttp: lost the TCP streaming connection with peer 99a6e5492354482e (stream MsgApp v2 reader)
2019-01-08 23:25:16.157960 E | rafthttp: failed to read 99a6e5492354482e on stream Message (unexpected EOF)
2019-01-08 23:25:16.157977 I | rafthttp: peer 99a6e5492354482e became inactive
2019-01-08 23:25:16.219846 W | etcdserver/api/v3rpc: failed to send watch response to gRPC stream ("rpc error: code = Unavailable desc = transport is closing")
2019-01-08 23:25:16.219987 W | etcdserver/api/v3rpc: failed to send watch response to gRPC stream ("rpc error: code = Unavailable desc = transport is closing")
panic: send on closed channel

goroutine 875493 [running]:
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).do(0xc423bb3e60, 0xc4274eba40, 0x411648, 0x2)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:169 +0x115
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport.(*serverHandlerTransport).Write(0xc423bb3e60, 0xc422ac8100, 0xc42c90f3e8, 0x5, 0x5, 0xc428843200, 0x5a7, 0x5a7, 0xc42c90f3ee, 0x5, ...)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/transport/handler_server.go:255 +0xe6
github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc.(*serverStream).SendMsg(0xc4215d4a00, 0xf8bae0, 0xc4274eb9f0, 0x0, 0x0)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/google.golang.org/grpc/stream.go:611 +0x275
github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).SendMsg(0xc423637a40, 0xf8bae0, 0xc4274eb9f0, 0x3ff0000000000000, 0x3ff0000000000000)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server_metrics.go:179 +0x4b
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb.(*watchWatchServer).Send(0xc427f514b0, 0xc4274eb9f0, 0x4, 0xc4271ca138)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/etcdserverpb/rpc.pb.go:3495 +0x49
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*serverWatchStream).sendLoop(0xc42108f7c0)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:345 +0x8ad
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch.func1(0xc42108f7c0)
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:132 +0x2b
created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch
	/home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:131 +0x29b

relyt0925 · 2019-01-11T05:10:06Z

Note that instance 2 and 3 die within a second of one another. Instance 2 is the first to die with the panic followed by instance 3. Then Instance 1 dies 6 minutes later time with the same error.

mcginne · 2019-01-16T17:10:47Z

I've run hundreds of loops of powering off/on hosts in my simple recreate, and haven't been able to recreate it unfortunately.
Can still hit it regularly in the large scale environment.

relyt0925 · 2019-01-21T23:14:31Z

Note we still see this in v3.3.11 as well.

thsnr · 2019-03-13T09:58:06Z

It really feels like I should open an issue on Ubuntu's launchpad, feels like a downstream issue.

@ticpu Yes, these are two different problems here. However since I am probably not the only one Googling ending up in this thread, I will report my findings.

The original issue stems from a Write call, whereas your stack trace indicates WriteStatus. The latter was already reported in #8627 and "fixed" in etcd 3.2.10 by upgrading the grpc-go dependency to 1.7.0.

However I also encountered this on 3.2.17 packaged in Ubuntu Bionic. The problem is

Built-Using: ..., golang-google-grpc (= 1.6.0-3), ...

Ubuntu still packages and uses an old grpc-go which does not include grpc/grpc-go@22c3f92. I will update https://bugs.launchpad.net/ubuntu/+source/etcd/+bug/1800973 to highlight this.

xiang90 · 2019-03-28T18:36:34Z

@gyuho @hexfusion

grpc/grpc-go#2695 is merged.

We should update etcd grpc dependency to fix this bug after grpc 1.12 is released.

hexfusion · 2019-03-28T18:42:29Z

@xiang90 I will make sure to cover this.

This version includes patch from grpc/grpc-go#2695 which fixes etcd-io#9956 Signed-off-by: André Martins <aanm90@gmail.com>

tgraf · 2019-05-13T08:20:25Z

@hexfusion @xiang90 @aanm grpc-go 1.20 has been released with the fix included [0] Can we help to get the rebase done? We are hitting this daily on any large scale etcd deployment.

[0] https://github.com/grpc/grpc-go/releases/tag/v1.20.0

This version includes patch from grpc/grpc-go#2695 which fixes etcd-io#9956 Signed-off-by: André Martins <aanm90@gmail.com>

gyuho · 2019-05-13T22:27:36Z

@tgraf Sorry for delay. Yes, @aanm sent out a PR to bump up the gRPC, but a lot of our client balancer tests are failing. We will look into this.

This version includes patch from grpc/grpc-go#2695 which fixes etcd-io#9956 Signed-off-by: André Martins <aanm90@gmail.com>

hexfusion · 2019-07-29T13:21:02Z

Going to review and move this forward for 3.3. backport.

jingyih · 2019-07-29T18:21:15Z

This is fixed by #10911, correct?

hexfusion · 2019-07-29T18:32:31Z

This is fixed by #10911, correct?

@jingyih it is fixed in master/3.4 but not in release-3.3. I am going to try and bump gRPC deps in 3.3 to resolve.

xiang90 · 2019-08-12T20:53:02Z

already fixed.

xiang90 added the type/bug label Jul 23, 2018

gyuho added the stage/investigating label Nov 8, 2018

aanm mentioned this issue Mar 18, 2019

transport: do not close channel that can lead to panic grpc/grpc-go#2695

Merged

xiang90 added backport/v3.3 and removed stage/investigating labels Mar 28, 2019

xiang90 added this to the etcd-v3.4 milestone Mar 28, 2019

hexfusion self-assigned this Mar 28, 2019

aanm added a commit to aanm/etcd that referenced this issue Apr 10, 2019

vendor: update grpc-go to v1.20.0o

c46837e

This version includes patch from grpc/grpc-go#2695 which fixes etcd-io#9956 Signed-off-by: André Martins <aanm90@gmail.com>

aanm added a commit to aanm/etcd that referenced this issue Apr 10, 2019

vendor: update grpc-go to v1.20.0

d1ecbac

This version includes patch from grpc/grpc-go#2695 which fixes etcd-io#9956 Signed-off-by: André Martins <aanm90@gmail.com>

aanm mentioned this issue Apr 10, 2019

vendor: update grpc-go to v1.20.0 #10624

Closed

aanm added a commit to aanm/etcd that referenced this issue Apr 23, 2019

vendor: update grpc-go to v1.20.0

495d3ab

This version includes patch from grpc/grpc-go#2695 which fixes etcd-io#9956 Signed-off-by: André Martins <aanm90@gmail.com>

aanm added a commit to aanm/etcd that referenced this issue May 13, 2019

vendor: update grpc-go to v1.20.0

686c9f7

This version includes patch from grpc/grpc-go#2695 which fixes etcd-io#9956 Signed-off-by: André Martins <aanm90@gmail.com>

aanm added a commit to aanm/etcd that referenced this issue May 18, 2019

vendor: update grpc-go to v1.20.0

9e1bdca

This version includes patch from grpc/grpc-go#2695 which fixes etcd-io#9956 Signed-off-by: André Martins <aanm90@gmail.com>

hexfusion mentioned this issue Jul 29, 2019

WIP release-3.3: test grpc-go 1.22.1 #10952

Closed

gyuho modified the milestones: etcd-v3.4, etcd-v3.5 Aug 5, 2019

hexfusion mentioned this issue Aug 6, 2019

release-3.3: bump gRPC to 1.23 #10984

Closed

4 tasks

xiang90 closed this as completed Aug 12, 2019

hexfusion mentioned this issue Oct 23, 2019

[openshift-4.3]: bump etcd to v3.3.17 openshift/etcd#20

Merged

cfc4n mentioned this issue Nov 19, 2019

balancer: create many tcp connections use the same endpoint #11371

Closed

swapnilgm mentioned this issue Nov 22, 2019

Bump etcd version to 3.3.17 gardener/gardener#1643

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.3.7 panic: send on closed channel #9956

3.3.7 panic: send on closed channel #9956

ekle commented Jul 23, 2018 •

edited by gyuho

Loading

cfc4n commented Jul 24, 2018 •

edited

Loading

gyuho commented Jul 24, 2018

cfc4n commented Jul 27, 2018

gyuho commented Jul 27, 2018

ticpu commented Oct 15, 2018 •

edited

Loading

ticpu commented Oct 16, 2018

wenjiaswe commented Oct 18, 2018

daxtens commented Oct 31, 2018

ticpu commented Nov 1, 2018

mcginne commented Nov 8, 2018

gyuho commented Nov 8, 2018

mcginne commented Nov 8, 2018

mcginne commented Dec 13, 2018

hexfusion commented Dec 13, 2018

mcginne commented Dec 13, 2018

hexfusion commented Dec 13, 2018

mcginne commented Dec 19, 2018

hexfusion commented Dec 19, 2018

relyt0925 commented Jan 11, 2019

relyt0925 commented Jan 11, 2019

mcginne commented Jan 16, 2019

relyt0925 commented Jan 21, 2019

thsnr commented Mar 13, 2019

xiang90 commented Mar 28, 2019

hexfusion commented Mar 28, 2019

tgraf commented May 13, 2019

gyuho commented May 13, 2019

hexfusion commented Jul 29, 2019

jingyih commented Jul 29, 2019

hexfusion commented Jul 29, 2019

xiang90 commented Aug 12, 2019

3.3.7 panic: send on closed channel #9956

3.3.7 panic: send on closed channel #9956

Comments

ekle commented Jul 23, 2018 • edited by gyuho Loading

cfc4n commented Jul 24, 2018 • edited Loading

gyuho commented Jul 24, 2018

cfc4n commented Jul 27, 2018

gyuho commented Jul 27, 2018

ticpu commented Oct 15, 2018 • edited Loading

ticpu commented Oct 16, 2018

wenjiaswe commented Oct 18, 2018

daxtens commented Oct 31, 2018

ticpu commented Nov 1, 2018

mcginne commented Nov 8, 2018

gyuho commented Nov 8, 2018

mcginne commented Nov 8, 2018

mcginne commented Dec 13, 2018

hexfusion commented Dec 13, 2018

mcginne commented Dec 13, 2018

hexfusion commented Dec 13, 2018

mcginne commented Dec 19, 2018

hexfusion commented Dec 19, 2018

relyt0925 commented Jan 11, 2019

relyt0925 commented Jan 11, 2019

mcginne commented Jan 16, 2019

relyt0925 commented Jan 21, 2019

thsnr commented Mar 13, 2019

xiang90 commented Mar 28, 2019

hexfusion commented Mar 28, 2019

tgraf commented May 13, 2019

gyuho commented May 13, 2019

hexfusion commented Jul 29, 2019

jingyih commented Jul 29, 2019

hexfusion commented Jul 29, 2019

xiang90 commented Aug 12, 2019

ekle commented Jul 23, 2018 •

edited by gyuho

Loading

cfc4n commented Jul 24, 2018 •

edited

Loading

ticpu commented Oct 15, 2018 •

edited

Loading