Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the kube-apiserver UT failed because of the mock etcd server panic #8589

Closed
abel-von opened this issue Sep 21, 2017 · 8 comments
Closed

the kube-apiserver UT failed because of the mock etcd server panic #8589

abel-von opened this issue Sep 21, 2017 · 8 comments

Comments

@abel-von
Copy link

abel-von commented Sep 21, 2017

Hi, recently we are updating our kubernetes version to 1.7, and when we run the unit test of kube-apiserver, sometimes we got failed tests
the log is like below:

2017-09-20 20:58:36.459910 I | integration: launching 836758074913443343 (unix://localhost:8367580749134433430)
2017-09-20 20:58:36.471974 I | etcdserver: name = 836758074913443343
2017-09-20 20:58:36.472008 I | etcdserver: data dir = /tmp/etcd039571547
2017-09-20 20:58:36.472018 I | etcdserver: member dir = /tmp/etcd039571547/member
2017-09-20 20:58:36.472023 I | etcdserver: heartbeat = 10ms
2017-09-20 20:58:36.472028 I | etcdserver: election = 100ms
2017-09-20 20:58:36.472038 I | etcdserver: snapshot count = 0
2017-09-20 20:58:36.472047 I | etcdserver: advertise client URLs = unix://127.0.0.1:2101291470
2017-09-20 20:58:36.472057 I | etcdserver: initial advertise peer URLs = unix://127.0.0.1:2101191470
2017-09-20 20:58:36.472073 I | etcdserver: initial cluster = 836758074913443343=unix://127.0.0.1:2101191470
2017-09-20 20:58:36.485243 I | etcdserver: starting member 679b2ccdb038a94f in cluster 1a280ee43bd84b0c
2017-09-20 20:58:36.485298 I | raft: 679b2ccdb038a94f became follower at term 0
2017-09-20 20:58:36.485316 I | raft: newRaft 679b2ccdb038a94f [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2017-09-20 20:58:36.485327 I | raft: 679b2ccdb038a94f became follower at term 1
2017-09-20 20:58:36.513627 I | etcdserver: set snapshot count to default 10000
2017-09-20 20:58:36.513664 I | etcdserver: starting server... [version: 3.1.5, cluster version: to_be_decided]
2017-09-20 20:58:36.513994 I | integration: launched 836758074913443343 (unix://localhost:8367580749134433430)
2017-09-20 20:58:36.514439 I | etcdserver/membership: added member 679b2ccdb038a94f [unix://127.0.0.1:2101191470] to cluster 1a280ee43bd84b0c
2017-09-20 20:58:36.595703 I | raft: 679b2ccdb038a94f is starting a new election at term 1
2017-09-20 20:58:36.595791 I | raft: 679b2ccdb038a94f became candidate at term 2
2017-09-20 20:58:36.595809 I | raft: 679b2ccdb038a94f received MsgVoteResp from 679b2ccdb038a94f at term 2
2017-09-20 20:58:36.595831 I | raft: 679b2ccdb038a94f became leader at term 2
2017-09-20 20:58:36.595850 I | raft: raft.node: 679b2ccdb038a94f elected leader 679b2ccdb038a94f at term 2
2017-09-20 20:58:36.596215 I | etcdserver: setting up the initial cluster version to 3.1
2017-09-20 20:58:36.597814 N | etcdserver/membership: set the initial cluster version to 3.1
2017-09-20 20:58:36.597862 I | etcdserver: published {Name:836758074913443343 ClientURLs:[unix://127.0.0.1:2101291470]} to cluster 1a280ee43bd84b0c
2017-09-20 20:58:39.821197 I | integration: terminating 836758074913443343 (unix://localhost:8367580749134433430)
panic: assertion failed: tx closed

goroutine 842 [running]:
k8s.io/kubernetes/vendor/github.com/boltdb/bolt._assert(0x0, 0x2890b6f, 0x9, 0x0, 0x0, 0x0)
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/boltdb/bolt/db.go:1026 +0xf6
k8s.io/kubernetes/vendor/github.com/boltdb/bolt.(*Cursor).seek(0xc4212696e8, 0x3c5f306, 0x3, 0x3, 0x0, 0x0, 0xc420846a40, 0x410f00, 0x1, 0xc4212696e0, ...)
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/boltdb/bolt/cursor.go:155 +0x6f
k8s.io/kubernetes/vendor/github.com/boltdb/bolt.(*Bucket).Bucket(0xc42017e638, 0x3c5f306, 0x3, 0x3, 0xc420711b80)
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/boltdb/bolt/bucket.go:112 +0x108
k8s.io/kubernetes/vendor/github.com/boltdb/bolt.(*Tx).Bucket(0xc42017e620, 0x3c5f306, 0x3, 0x3, 0x12)
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/boltdb/bolt/tx.go:101 +0x4f
k8s.io/kubernetes/vendor/github.com/coreos/etcd/mvcc/backend.(*batchTx).UnsafeRange(0xc4204f6640, 0x3c5f306, 0x3, 0x3, 0xc420711b60, 0x11, 0x12, 0xc420711b80, 0x11, 0x12, ...)
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/coreos/etcd/mvcc/backend/batch_tx.go:88 +0x6e
k8s.io/kubernetes/vendor/github.com/coreos/etcd/mvcc.(*store).rangeKeys(0xc4202e8820, 0xc420838780, 0x22, 0x30, 0x0, 0x0, 0x0, 0x0, 0x2, 0xc421269b00, ...)
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/coreos/etcd/mvcc/kvstore.go:524 +0x219
k8s.io/kubernetes/vendor/github.com/coreos/etcd/mvcc.(*store).Range(0xc4202e8820, 0xc420838780, 0x22, 0x30, 0x0, 0x0, 0x0, 0x0, 0x2, 0xc420196600, ...)
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/coreos/etcd/mvcc/kvstore.go:155 +0xd2
k8s.io/kubernetes/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*serverWatchStream).sendLoop(0xc421301050)
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:274 +0x84f
k8s.io/kubernetes/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch.func1(0xc421301050)
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:125 +0x2b
created by k8s.io/kubernetes/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc.(*watchServer).Watch
	/home/workstation/go/src/k8s.io/kubernetes/vendor/github.com/coreos/etcd/etcdserver/api/v3rpc/watch.go:127 +0x283
FAIL	k8s.io/kubernetes/pkg/registry/core/serviceaccount/storage	5.878s

and the failed test cases are all about Watch API, such as the TestWatch in https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/registry/core/serviceaccount/storage/storage_test.go#L112

we have found the tx closed error has been fixed in #7743, I am not sure if this a same issue, or should I update the version of etcd in the vendor of kubernetes to fix this issue? @gyuho

@gyuho
Copy link
Contributor

gyuho commented Sep 21, 2017

@abel-von The fix has been released via etcd v3.2. Yes, upgrading etcd would fix the issue.

@jpbetz
Copy link
Contributor

jpbetz commented Sep 21, 2017

@abel-von What exact version of kubernetes is this? What does kubectl version print?

@abel-von
Copy link
Author

@jpbetz it's the 1.7 release. I don't know why other guys in the kubernetes community didn't find this issue.

@jpbetz
Copy link
Contributor

jpbetz commented Sep 22, 2017 via email

@abel-von
Copy link
Author

@jpbetz thank you, it's 1.7.3.

@xiang90
Copy link
Contributor

xiang90 commented Oct 4, 2017

@gyuho i remembered that you fixed a bug around this in previous version of etcd, which k8s 1.7.3 vendors. can you double check that?

@gyuho
Copy link
Contributor

gyuho commented Oct 4, 2017

We've added grpc server GracefulStop to v3.2.0.

But not in any previous releases (See https://github.com/coreos/etcd/blob/release-3.1/embed/etcd.go#L142-L159).

@xiang90
Copy link
Contributor

xiang90 commented Oct 4, 2017

@jpbetz

updating etcd to 3.2 in k8s vendoring should resolve the issue. i am closing this one since it is already fixed at etcd side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants