Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix watchablestore runlock bug #13505

Merged
merged 1 commit into from
Dec 3, 2021
Merged

Conversation

LeoYang90
Copy link

@LeoYang90 LeoYang90 commented Nov 25, 2021

tx.RLock()
revs, vs := tx.UnsafeRange(keyBucketName, minBytes, maxBytes, 0)
tx.RUnLock()

First we must know, the value(vs) got from function UnsafeRange is using shallow copy.
The pointer(of vs) comes from memory of bolt db.
So if bolt db is re-mmaping, the memory will be retrieved, then will trigger SIGSEGV:

{"log":"unexpected fault address 0x7ff096486221\r\n","stream":"stdout","time":"2021-11-22T05:58:18.896795868Z"}
{"log":"fatal error: fault\r\n","stream":"stdout","time":"2021-11-22T05:58:18.896824465Z"}
{"log":"[signal SIGSEGV: segmentation violation code=0x1 addr=0x7ff096486221 pc=0x472ee5]\r\n","stream":"stdout","time":"2021-11-22T05:58:18.90016673Z"}
{"log":"\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900182456Z"}
{"log":"goroutine 335 [running]:\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900185844Z"}
{"log":"runtime.throw(0x1035d04, 0x5)\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900200492Z"}
{"log":"\u0009/usr/local/go/src/runtime/panic.go:1117 +0x72 fp=0xc0f035fac0 sp=0xc0f035fa90 pc=0x4381b2\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900211528Z"}
{"log":"runtime.sigpanic()\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900214574Z"}
{"log":"\u0009/usr/local/go/src/runtime/signal_unix.go:741 +0x268 fp=0xc0f035faf8 sp=0xc0f035fac0 pc=0x44f9c8\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900217998Z"}
{"log":"runtime.memmove(0xc0c597a000, 0x7ff09647f0b4, 0x71ed)\r\n","stream":"stdout","time":"2021-11-22T05:58:18.90022098Z"}
{"log":"\u0009/usr/local/go/src/runtime/memmove_amd64.s:332 +0x3c5 fp=0xc0f035fb00 sp=0xc0f035faf8 pc=0x472ee5\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900223571Z"}
{"log":"go.etcd.io/etcd/mvcc/mvccpb.(*KeyValue).Unmarshal(0xc0487ef3b0, 0x7ff09647f041, 0x7260, 0x7260, 0x5f, 0x0)\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900238058Z"}
{"log":"\u0009/workspace/code-repo/mvcc/mvccpb/kv.pb.go:410 +0x8ae fp=0xc0f035fbb8 sp=0xc0f035fb00 pc=0x8bfeee\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900258776Z"}
{"log":"go.etcd.io/etcd/mvcc.kvsToEvents(0xc00015e0c0, 0xc0003242c0, 0xc17c5a4000, 0x4af7, 0x5000, 0xc17c222000, 0x4af7, 0x5000, 0x11, 0x12, ...)\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900290453Z"}
{"log":"\u0009/workspace/code-repo/mvcc/watchable_store.go:412 +0xf0 fp=0xc0f035fd30 sp=0xc0f035fbb8 pc=0xa66930\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900294704Z"}
{"log":"go.etcd.io/etcd/mvcc.(*watchableStore).syncWatchers(0xc000324280, 0x0)\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900309685Z"}
{"log":"\u0009/workspace/code-repo/mvcc/watchable_store.go:354 +0x8c9 fp=0xc0f035ff20 sp=0xc0f035fd30 pc=0xa66789\r\n","stream":"stdout","time":"2021-11-22T05:58:18.90032452Z"}
{"log":"go.etcd.io/etcd/mvcc.(*watchableStore).syncWatchersLoop(0xc000324280)\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900328268Z"}
{"log":"\u0009/workspace/code-repo/mvcc/watchable_store.go:215 +0x1e8 fp=0xc0f035ffd8 sp=0xc0f035ff20 pc=0xa65588\r\n","stream":"stdout","time":"2021-11-22T05:58:18.900338151Z"}

@serathius
Copy link
Member

Can you please add a test that would prevent future regression?

@LeoYang90
Copy link
Author

LeoYang90 commented Nov 26, 2021

Can you please add a test that would prevent future regression?

It is difficult to reproduce this bug by test case, because the probability trigger this bug is low. Firstly, we must make bolt db re-mmap by filling a large number data.

I add some annotation in code firstly. Do you have some suggestion for test case?
@serathius

@serathius
Copy link
Member

Not my area of expertise, but we are running unit tests with data race detection. Maybe we could add a test case that would trigger race in this case and depend on the data race detection to report error.

Based on the code I think the fix is correct, however I just want to make sure we are thinking long term how to avoid making same mistakes again.

@ptabor
Copy link
Contributor

ptabor commented Dec 2, 2021

Great finding.
I wonder whether it can be cause of: #12845,
i.e. the page is being overwritten at the same time its being parsed.

@wilsonwang371
Copy link
Contributor

This looks like a correct bug fix. I have a patch for logging more information in case of failure. Shall we check in both changes and wait and see if we can get this reproduced?

#13499

@ptabor @chaochn47 @hexfusion @gyuho

@wilsonwang371
Copy link
Contributor

My only concern is that here it can significantly reduce the throughput. Shall we make a copy of the data and then call kvsToEvents? syncWatchers() is not a time-sensitive operation. @ptabor

@wilsonwang371
Copy link
Contributor

Another related issue: #13067

It makes sense that in our company, we have observed this in multiple versions of etcd but only recently we are seeing this more frequently. The reason might be because we have improved the etcd performance and there is a higher chance to see this happening.

@wilsonwang371 wilsonwang371 added type/bug priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 2, 2021
@ptabor
Copy link
Contributor

ptabor commented Dec 2, 2021

My only concern is that here it can significantly reduce the throughput. Shall we make a copy of the data and then call kvsToEvents? syncWatchers() is not a time-sensitive operation. @ptabor

Without benchmarking we will not know for sure (and your framework does not support (yet) RW transactions while watches are open, right ?)

The current kvsToEvents is in practice only parsing protos. I wouldn't consider this especially heavy operation in comparison to making an additional copy of [][]byte just to release shared lock faster.

@wilsonwang371
Copy link
Contributor

My only concern is that here it can significantly reduce the throughput. Shall we make a copy of the data and then call kvsToEvents? syncWatchers() is not a time-sensitive operation. @ptabor

Without benchmarking we will not know for sure (and your framework does not support (yet) RW transactions while watches are open, right ?)

The current kvsToEvents is in practice only parsing protos. I wouldn't consider this especially heavy operation in comparison to making an additional copy of [][]byte just to release shared lock faster.

Correct, this part is not covered in the script yet. Also, syncWatchers() is called when there are lots of write operations and the watchers are out of sync. When holding the lock, it might alleviate(?) some of the write workloads.

I am okay with this and we can wait and see if there is any feedback later in case of any performance issue.

@wilsonwang371
Copy link
Contributor

Btw, we may also need to backport this change to earlier versions too.

@ahrtr
Copy link
Member

ahrtr commented Mar 23, 2022

It's a great fix, but the changelog isn't updated. Just submit a PR to update the changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. type/bug
Development

Successfully merging this pull request may close these issues.

5 participants