-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/vt/wrangler: reduce VReplicationExec calls when getting copy state #14375
go/vt/wrangler: reduce VReplicationExec calls when getting copy state #14375
Conversation
Signed-off-by: Max Englander <max@planetscale.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @maxenglander ! This is a nice little optimization.
I had some minor suggestions. Beyond that, we'll need to make the equivalent optimization for vtctldclient as well. vtctlclient — which is going away soon — uses wrangler whereas vtctldclient uses the workflow server. So we'd make the same optimization here:
vitess/go/vt/vtctl/workflow/server.go
Lines 1053 to 1083 in 2f56827
func (s *Server) getWorkflowCopyStates(ctx context.Context, tablet *topo.TabletInfo, id int64) ([]*vtctldatapb.Workflow_Stream_CopyState, error) { | |
span, ctx := trace.NewSpan(ctx, "workflow.Server.getWorkflowCopyStates") | |
defer span.Finish() | |
span.Annotate("keyspace", tablet.Keyspace) | |
span.Annotate("shard", tablet.Shard) | |
span.Annotate("tablet_alias", tablet.AliasString()) | |
span.Annotate("vrepl_id", id) | |
query := fmt.Sprintf("select table_name, lastpk from _vt.copy_state where vrepl_id = %d and id in (select max(id) from _vt.copy_state where vrepl_id = %d group by vrepl_id, table_name)", id, id) | |
qr, err := s.tmc.VReplicationExec(ctx, tablet.Tablet, query) | |
if err != nil { | |
return nil, err | |
} | |
result := sqltypes.Proto3ToResult(qr) | |
if result == nil { | |
return nil, nil | |
} | |
copyStates := make([]*vtctldatapb.Workflow_Stream_CopyState, len(result.Rows)) | |
for i, row := range result.Rows { | |
// These fields are technically varbinary, but this is close enough. | |
copyStates[i] = &vtctldatapb.Workflow_Stream_CopyState{ | |
Table: row[0].ToString(), | |
LastPk: row[1].ToString(), | |
} | |
} | |
return copyStates, nil | |
} |
Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
@mattlord I can add more tests if you think it's needed, but I think the code changes are in a decent place. I decided not to try the |
Signed-off-by: Max Englander <max@planetscale.com>
My concern is that Did you get a sense of how much slower this made the average |
Hey @mattlord the use case we had was a production MySQL cluster with 110 shards that we are migrating from an external keyspace with external tablets into a Vitess keyspace with 128 shards. Because the source keyspace has different primary vindexes than the target keyspace, this exploded into 14080 VStreams, and therefore 14080 We were seeing that the overall time to complete this block of code on The fact that it was taking so long resulted in various timeouts:
...as well as this error:
Can you break down for me how that is the case? My understanding is that the current implementation fetches copy state once per each stream. I think this PR will change things so that it does the same or else fewer number of calls. |
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
I understand. My point was only that to my knowledge this is an exceedingly rare use case in the history of Vitess. I'm not saying that it's an invalid one. My point was that we should not make things worse/slower for the typical case in order to improve this one. That was my concern. It's a matter of HOW we address that use case/issue, not IF.
We may have been overloading various resources like the topo server which has a cascading effect. As we process each result we also make a topo call:
And those results are processed serially. So if the topo responses are a little slow, that will cause the total time to climb a lot in this particular case. We could process those results concurrently as well, synchronizing on the actual updates to the map (but most importantly making those topo calls in parallel).
You noted the trade-off yourself in the PR description.
The main query goes from being a point select to a range query. And this can have an impact when subsequently getting the log records for the stream(s) etc as well. I made this more efficient here: #14212 It's still a general concern of mine going forward though. So I'm a little (overly) paranoid about it as a lot of vreplication in v18+ is potentially impacted. All that being said, in general I think we are offsetting any additional cost here by batching things like getting the copy states (although in most cases there will only be one stream on a tablet, but still) so this may be a wash in the end or even improve the typical case. So let me just review in full again. 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry again for the delay. I think this looks OK, but we don't seem to have any test coverage do we? It looks like we updated the existing tests to adjust for the query changes but we don't have any tests that cover the case we're doing the work for do we? Meaning, cases where there are multiple streams per tablet.
Can we add some? Or let me know if I just missed it. I'm talking about unit tests here as we do have some coverage in the endtoend tests as there are cases where there's N streams per tablet (e.g. shard merges).
I realized that I completely misread this initially. I thought you were asking me how much slower the I did not get a sense of how much slower this was for the average case, although I tested it out locally a bunch with
Definitely. I was hoping to get a some initial feedback before investing in tests, which you've now given, and I appreciate! I'll get to work on tests 👷 |
Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>
Co-authored-by: Matt Lord <mattalord@gmail.com> Signed-off-by: Max Englander <max.englander@gmail.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
I took a somewhat lazy approach and just updated the tests to unit tests to have two tables, and test from there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Thanks, @maxenglander ! I only had some minor nits and suggestions. We can discuss/address those along with any that @rohit-nayak-ps may have. @rohit-nayak-ps can you please also review this whenever you have time?
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, @maxenglander !
Description
During
MoveTables SwitchTraffic
, there is a phase where wrangler queries thecopystate
of each stream. Currently it does this by making an individualVReplicationExec
call for each stream.This can be prohibitively time-consuming for workflows with very large # of streams. For example, a
MoveTables
workflow where the source and target keyspace have 128 shards, and where the target keyspace has different primary vindex than the source keyspace, will end up with 16384 VStreams. Even if each individualVReplicationExec
takes only a few milliseconds, in aggregate this could easily take 30+ seconds, risking timing out theSwitchTraffic
action.This PR makes things a bit more efficient by making 1
VReplicationExec
per target shard, and getting all relevant copy states with each call. In my testing with large # of VStreams, this makes the overallSwitchTraffic
action much faster for the use case I described above. The trade-off here is that the theseVReplicationExec
queries are more expensive with larger result sets.Related Issue(s)
Fixes #14325
Checklist
was added oris not required