Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend: skip *bolt.DB.Size call when nil #6662

Merged
merged 3 commits into from
Oct 21, 2016
Merged

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Oct 18, 2016

  1. batchTx.commit(false) from newBatchTx
  2. batchTx.commit(true) from stopping backend
  3. batchTx.commit(false) from inflight mvcc Hash call

would panic because *bolt.Tx.Commit in batchTx.commit
initializes *bolt.Tx.db and *bolt.Tx.meta as nil,
and subsequent *bolt.Tx.Size() call refers to this nil
pointer (panic).

Fix etcd-io/etcdlabs#30.

/cc @xiang90

defer cleanup(b, tmpPath)

// 2. call batchTx.commit(true)
b.batchTx.CommitAndStop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be wrong. CommitAndStop only can be called by backend once it stopped. no other operation can be issued to backend after that. if you want to prevent bad caller that calls backend.X() after backend gets stopped, you should protect the backend side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ongoing Hash call before stopping would still trigger this panic.
I might be wrong. I will read mvcc, backend again to investigate more.

Thanks!

@gyuho gyuho added the WIP label Oct 18, 2016
@gyuho gyuho changed the title mvcc: fix panic on *bolt.Tx.Size after commit mvcc/backend: exit on racey after-stop commit call Oct 20, 2016
@gyuho gyuho removed the WIP label Oct 20, 2016
@gyuho
Copy link
Contributor Author

gyuho commented Oct 20, 2016

@xiang90 Added more test cases which all panic without this patch.

To make it clear, this is only useful for projects embedding etcd servers. If a project embeds etcd server, this panic would shut down the whole application. And there's no way to know in advance because etcd server stop method returns after calling backend.Close. https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L667-L672

@gyuho gyuho force-pushed the db-panic branch 2 times, most recently from f6b76f2 to 2e153a5 Compare October 20, 2016 17:06
@heyitsanthony
Copy link
Contributor

Why would an in-flight Hash break it but not an in-flight Get or Put?

@gyuho
Copy link
Contributor Author

gyuho commented Oct 20, 2016

@heyitsanthony I think same thing can happen in other in-flight RPC calls like get, put.
Hash just happens to be my use case in using embedded etcd servers. I periodically call
Hash to get states of etcd servers, and there seems to be no easy to prevent or shut down
in-flight Hash calls after stopping backend that internally happens in etcdserver.EtcdServer.run.

@@ -159,6 +161,12 @@ func (t *batchTx) commit(stop bool) {
var err error
// commit the last tx
if t.tx != nil {
if t.stopped {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropping a commit on the floor like this without an error and acting like it worked is kind of scary; what if the commit is important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. let me rethink on this. thanks.

@gyuho
Copy link
Contributor Author

gyuho commented Oct 20, 2016

@xiang90 @heyitsanthony How about checking if db is nil without adding stopped field?

--- a/mvcc/backend/batch_tx.go
+++ b/mvcc/backend/batch_tx.go
@@ -162,7 +162,9 @@ func (t *batchTx) commit(stop bool) {
                if t.pending == 0 && !stop {
                        t.backend.mu.RLock()
                        defer t.backend.mu.RUnlock()
-                       atomic.StoreInt64(&t.backend.size, t.tx.Size())
+                       if t.tx.DB() != nil {
+                               atomic.StoreInt64(&t.backend.size, t.tx.Size())
+                       }
                        return
                }
                start := time.Now()

@gyuho gyuho changed the title mvcc/backend: exit on racey after-stop commit call backend: skip db.Size call on nil db Oct 20, 2016

// TestV3MaintenanceHash ensures concurrent Hash call to embedded EtcdServer
// that is being stopped does not panic.
func TestV3MaintenanceHash(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestV3MaintenanceHashInflight?

@@ -162,7 +162,9 @@ func (t *batchTx) commit(stop bool) {
if t.pending == 0 && !stop {
t.backend.mu.RLock()
defer t.backend.mu.RUnlock()
atomic.StoreInt64(&t.backend.size, t.tx.Size())
if t.tx.DB() != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment?

@heyitsanthony
Copy link
Contributor

lgtm after fixing nits

@gyuho gyuho changed the title backend: skip db.Size call on nil db backend: skip *bolt.DB.Size call when nil Oct 21, 2016
@gyuho gyuho merged commit 1ad038d into etcd-io:master Oct 21, 2016
@gyuho gyuho deleted the db-panic branch October 21, 2016 18:38
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 15, 2017
- Test etcd-io#7322.
- Remove test case added in etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 15, 2017
Revert etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 15, 2017
- Test etcd-io#7322.
- Remove test case added in etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 15, 2017
Revert etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 15, 2017
- Test etcd-io#7322.
- Remove test case added in etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 15, 2017
Revert etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 15, 2017
- Test etcd-io#7322.
- Remove test case added in etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 15, 2017
Revert etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 17, 2017
- Test etcd-io#7322.
- Remove test case added in etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
gyuho added a commit to gyuho/etcd that referenced this pull request Apr 17, 2017
Revert etcd-io#6662.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants