-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VReplication: Handle large binlog compressed transactions more efficiently #16328
VReplication: Handle large binlog compressed transactions more efficiently #16328
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
d1029a0
to
c8f7837
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16328 +/- ##
==========================================
- Coverage 68.72% 68.69% -0.04%
==========================================
Files 1547 1548 +1
Lines 198290 198487 +197
==========================================
+ Hits 136279 136350 +71
- Misses 62011 62137 +126 ☔ View full report in Codecov by Sentry. |
c8f7837
to
e5c1917
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
e5c1917
to
73a3454
Compare
go/mysql/binlog_event.go
Outdated
// within the compressed transaction. | ||
TransactionPayload(BinlogFormat) ([]BinlogEvent, error) | ||
TransactionPayload(BinlogFormat) (func() (BinlogEvent, error), error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're changing this to be compatible with the new Go v1.23 iterators, right? 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we will almost certainly be using 1.23 on main before the release and I'd like to use the new standard interface if I can.
go/mysql/binlog_event_compression.go
Outdated
// temporary file. | ||
if tp.uncompressedSize > zstdInMemoryDecompressorMaxSize { | ||
// Create a temporary file to stream the uncompressed payload to. | ||
tmpFile, err := os.CreateTemp("", "binlog-transaction-payload-*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's worth the complexity or cleanup synchronization, but if we named the file based on header information, couldn't multiple streams share the same file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could prove to be worth it in the long run, but at this point I would say no. I say that as the file management and I/O would become much more complex if we went from a single RW "owner" / single FD, to having 1 RW FD and N RO FDs as we'd have to coordinate on I/O and file cleanup. The cleanup could be an issue too as the streams are not aligned and when would we clean them up? We could move the problem from exhausting RAM to exhausting block device space as these files can be 10s or 100s of GBs in size so we'd want to clean them up ASAP. I'm also assuming that these VERY large compressed transactions (many rows changed on wide tables with before and after images in the row events) will be relatively rare. But we'll see. That's definitely an optimization that is possible if it proves warranted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my thought too. For my specific issue, and assuming that 60 GB RAM translates roughly into disk space, we wouldn't have had sufficient disk either to write all these files, but that would have been easier to track down than vttablet being repeatedly OOMKilled.
I know there are already mutexes all over the place, but maybe we could add another one to lock here inside the if statement, which would prevent multiple large files / transactions from being processed concurrently. That would have no impact on regular operations, but would prevent amplification of the same huge transaction taking all RAM / disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good idea too. I probably won't do that here but it's something I'll keep in mind if disk usage does end up being a problem. Thanks for the good suggestions/ideas!
Signed-off-by: Matt Lord <mattalord@gmail.com>
5e5f153
to
a2e135e
Compare
28e7363
to
35b340c
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
be3c879
to
6ee167c
Compare
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@shlomi-noach This no longer copies anything to disk, so the mmap part is irrelevant now. 😄 |
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
fc3a9ed
to
83196e9
Compare
go/mysql/binlog_event_compression.go
Outdated
statefulDecodersPool = sync.Pool{ | ||
New: func() any { | ||
d, err := zstd.NewReader(nil, zstd.WithDecoderMaxMemory(zstdInMemoryDecompressorMaxSize)) | ||
log.Errorf("Error creating stateful decoder: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw this a couple days ago that helps propagate the real error instead of checking for nil, but what you have is probably sufficient. It doesn't seem like that new function should error out in practice
https://www.reddit.com/r/golang/comments/1dww9bi/how_to_handle_errors_on_synvpool/lbxinal
3453cda
to
b7392aa
Compare
b7392aa
to
bd8b094
Compare
I'm going to revert the I can see in my tests that we use significantly more memory with the |
Signed-off-by: Matt Lord <mattalord@gmail.com>
noleak.go:56: found unexpected goroutines: [Goroutine 23 in state chan receive, with vitess.io/vitess/go/mysql.(*AuthServerStatic).installSignalHandlers.func1 on top of the stack: vitess.io/vitess/go/mysql.(*AuthServerStatic).installSignalHandlers.func1() /home/runner/work/vitess/vitess/go/mysql/auth_server_static.go:273 +0x6d created by vitess.io/vitess/go/mysql.(*AuthServerStatic).installSignalHandlers in goroutine 11 /home/runner/work/vitess/vitess/go/mysql/auth_server_static.go:272 +0x165 Goroutine 74 in state chan receive, with vitess.io/vitess/go/mysql.(*AuthServerStatic).installSignalHandlers.func1 on top of the stack: vitess.io/vitess/go/mysql.(*AuthServerStatic).installSignalHandlers.func1() /home/runner/work/vitess/vitess/go/mysql/auth_server_static.go:273 +0x6d created by vitess.io/vitess/go/mysql.(*AuthServerStatic).installSignalHandlers in goroutine 24 /home/runner/work/vitess/vitess/go/mysql/auth_server_static.go:272 +0x165 Goroutine 75 in state chan receive, with vitess.io/vitess/go/mysql.(*AuthServerStatic).installSignalHandlers.func2 on top of the stack: vitess.io/vitess/go/mysql.(*AuthServerStatic).installSignalHandlers.func2() /home/runner/work/vitess/vitess/go/mysql/auth_server_static.go:282 +0xc7 created by vitess.io/vitess/go/mysql.(*AuthServerStatic).installSignalHandlers in goroutine 24 /home/runner/work/vitess/vitess/go/mysql/auth_server_static.go:281 +0x25d It's also not relevant anymore as we're not managing a pool of zstd.Decoders. Signed-off-by: Matt Lord <mattalord@gmail.com>
After further testing and reading I can see that I was perfectly wrong here. 😆 What I described above was true prior to klauspost/compress#498. It was in that PR where a key behavior was changed as noted in the PR description:
And we can see that calling I confirmed through testing that in fact no goroutines are leaked — using both the unit tests
That was in part due to the unnecessary finalizer and I no longer see any increase in memory usage in my tests. SO, after all this I'm going to re-apply the |
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
…ently (vitessio#16328) Signed-off-by: Matt Lord <mattalord@gmail.com>
Description
We added support for MySQL binlog transaction compression in VReplication in #12950. This initial implementation performed all processing of the compressed events in memory.
The size of these compressed transaction payload events, when uncompressed (the compressed transaction payload event is limited by
max_allowed_packet
), could be 10s or even 100s of GiB in size — think updating 1 million rows in a wide table with JSON, BLOB, and TEXT fields and keep in mind that for each of those rows changed the binlog ROW event contains a before and after image withbinlog_row_image=full
(the default).So keeping that entire uncompressed transaction in memory can become unfeasible and problematic (the
vttablet
process is likely to get OOM-killed).This PR changes the processing of the compressed transaction payload events so that when the uncompressed size is above 128MiB we stream the decompression work and, whether we use the slower streaming method or the faster stateless in-memory buffer method, we read each event within the uncompressed transaction, one at a time. This allows us to process larger payloads w/o allocating more than the maximum 128MiB we allow for the in-memory processing.
Memory Usage Comparisons
Below are memory usage comparisons using this manual test on macOS: https://gist.github.com/mattlord/74a18bb96dbcb1157fb72eba07730d49
Main
PR Branch
Related Issue(s)
Checklist