Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD panics on bolt assertion #8118

Closed
armstrongli opened this issue Jun 16, 2017 · 7 comments
Closed

ETCD panics on bolt assertion #8118

armstrongli opened this issue Jun 16, 2017 · 7 comments

Comments

@armstrongli
Copy link

armstrongli commented Jun 16, 2017

ETCD got panics when I run benchmark. See source code here on update key/value benchmark: armstrongli@33d6876

I run the benchmark on 5 nodes with the following parameters:

benchmark put --sequential-keys=true --key-prefix=/tess.io --update-total=100000000 --total=10000 --val-size=1000 --clients=100 --conns=100 --cacert=/etc/ssl/etcd/tessca.crt --key=/etc/ssl/etcd/etcd.key --cert=/etc/ssl/etcd/etcd.crt --endpoints=https://....:4001,https://....:4001,https://....:4001

The panics happened when I call defrag on all members

Here are the logs

2017-06-16 11:02:36.117063 I | etcdmain: etcd Version: 3.1.9
2017-06-16 11:02:36.117479 I | etcdmain: Git SHA: 0f4a535
2017-06-16 11:02:36.117483 I | etcdmain: Go Version: go1.7.6
2017-06-16 11:02:36.117490 I | etcdmain: Go OS/Arch: linux/amd64
2017-06-16 11:02:36.117494 I | etcdmain: setting maximum number of CPUs to 32, total number of available CPUs is 32
2017-06-16 11:02:36.117524 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2017-06-16 11:02:36.117573 I | embed: listening for peers on http://0.0.0.0:2380
2017-06-16 11:02:36.117602 I | embed: listening for client requests on 0.0.0.0:4001
2017-06-16 11:02:36.224704 W | etcdserver: discovery token ignored since a cluster has already been initialized. Valid log found at "/var/etcd/data/member/wal"
2017-06-16 11:02:36.225755 I | etcdserver: recovered store from snapshot at index 76713019
2017-06-16 11:02:36.225772 I | etcdserver: name = tess-node-nf3nq
2017-06-16 11:02:36.225778 I | etcdserver: data dir = /var/etcd/data
2017-06-16 11:02:36.225785 I | etcdserver: member dir = /var/etcd/data/member
2017-06-16 11:02:36.225791 I | etcdserver: heartbeat = 100ms
2017-06-16 11:02:36.225796 I | etcdserver: election = 1000ms
2017-06-16 11:02:36.225802 I | etcdserver: snapshot count = 10000
2017-06-16 11:02:36.225806 I | etcdserver: discovery URL= https://discovery.etcd.io/d9b4a84a55879a6d8025c1902a5795a2
2017-06-16 11:02:36.225819 I | etcdserver: advertise client URLs = https://tess-node-nf3nq-1189139.33.tess.io:4001
2017-06-16 11:02:37.650301 I | etcdserver: restarting member 487914179a65b0e in cluster c078315b9cc3b5cd at commit index 77078887
2017-06-16 11:02:37.757047 I | raft: 487914179a65b0e became follower at term 3
2017-06-16 11:02:37.757101 I | raft: newRaft 487914179a65b0e [peers: [487914179a65b0e,5220c85f8798114,aee71dc959e69117], term: 3, commit: 77078887, applied: 76713019, lastindex: 77078888, lastterm: 3]
2017-06-16 11:02:37.757340 I | etcdserver/api: enabled capabilities for version 3.1
2017-06-16 11:02:37.757367 I | etcdserver/membership: added member 487914179a65b0e [http://10.148.168.32:2380] to cluster c078315b9cc3b5cd from store
2017-06-16 11:02:37.757378 I | etcdserver/membership: added member 5220c85f8798114 [http://10.148.168.34:2380] to cluster c078315b9cc3b5cd from store
2017-06-16 11:02:37.757389 I | etcdserver/membership: added member aee71dc959e69117 [http://10.148.168.33:2380] to cluster c078315b9cc3b5cd from store
2017-06-16 11:02:37.757409 I | etcdserver/membership: set the cluster version to 3.1 from store
2017-06-16 11:02:37.812222 I | mvcc: restore compact to 20340944
2017-06-16 11:02:42.094942 I | mvcc: store.index: compact 26131720
2017-06-16 11:02:42.099186 I | mvcc: resume scheduled compaction at 26131720
panic: assertion failed: tx closed

goroutine 119 [running]:
panic(0xcbb000, 0xc42d6c0000)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/coreos/etcd/cmd/vendor/github.com/boltdb/bolt._assert(0xd6f900, 0xe4471b, 0x9, 0x0, 0x0, 0x0)
        /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/boltdb/bolt/db.go:1026 +0xff
github.com/coreos/etcd/cmd/vendor/github.com/boltdb/bolt.(*Cursor).seek(0xc4215f8aa0, 0x12d2032, 0x3, 0x3, 0x0, 0x0, 0x0, 0xcf1cd91be100208, 0x8318031000082a1a, 0x80f521f2224df80, ...)
        /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/boltdb/bolt/cursor.go:155 +0x6f
github.com/coreos/etcd/cmd/vendor/github.com/boltdb/bolt.(*Bucket).Bucket(0xc4202281d8, 0x12d2032, 0x3, 0x3, 0xc4215f8b20)
        /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/boltdb/bolt/bucket.go:112 +0x108
github.com/coreos/etcd/cmd/vendor/github.com/boltdb/bolt.(*Tx).Bucket(0xc4202281c0, 0x12d2032, 0x3, 0x3, 0x42cd7e)
        /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/boltdb/bolt/tx.go:101 +0x4f
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc/backend.(*batchTx).UnsafeRange(0xc42026d780, 0x12d2032, 0x3, 0x3, 0xc4202ec000, 0x11, 0x11, 0xc42d85c000, 0x8, 0x8, ...)
        /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc/backend/batch_tx.go:88 +0x79
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc.(*store).scheduleCompaction(0xc4200f6340, 0x18ebd08, 0xc4275e0210, 0xc400000000)
        /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc/kvstore_compaction.go:38 +0x2d3
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc.(*store).Compact.func2(0x7fa4f6992508, 0xc420a2de80)
        /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/mvcc/kvstore.go:312 +0xbb
github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule.(*fifo).run(0xc420166720)
        /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule/schedule.go:160 +0xdd
created by github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule.NewFIFOScheduler
        /home/gyuho/go/src/github.com/coreos/etcd/release/etcd/gopath/src/github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/pkg/schedule/schedule.go:71 +0x1e8
@armstrongli
Copy link
Author

There was one sudden increase on memory and then no response any more.
screen shot 2017-06-16 at 7 12 04 pm

@gyuho
Copy link
Contributor

gyuho commented Jun 16, 2017

This should be fixed via #7743 (which is in 3.2)? Can you try 3.2?

@xiang90
Copy link
Contributor

xiang90 commented Jun 16, 2017

@gyuho It seems like a different one. boltdb panics while defrag not shutting down.

@gyuho
Copy link
Contributor

gyuho commented Jun 16, 2017

@xiang90 Then similar to this one #7526, which is fixed via #7579?

@xiang90
Copy link
Contributor

xiang90 commented Jun 16, 2017

yes. that is what i thought too.

@xiang90
Copy link
Contributor

xiang90 commented Jun 16, 2017

closing. try 3.2. let us know if it still breaks.

@xiang90 xiang90 closed this as completed Jun 16, 2017
@armstrongli
Copy link
Author

@xiang90 not any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants