Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/vt/wrangler: reduce VReplicationExec calls when getting copy state #14375

Merged
merged 20 commits into from
Dec 28, 2023

Conversation

maxenglander
Copy link
Collaborator

@maxenglander maxenglander commented Oct 26, 2023

Description

During MoveTables SwitchTraffic, there is a phase where wrangler queries the copystate of each stream. Currently it does this by making an individual VReplicationExec call for each stream.

This can be prohibitively time-consuming for workflows with very large # of streams. For example, a MoveTables workflow where the source and target keyspace have 128 shards, and where the target keyspace has different primary vindex than the source keyspace, will end up with 16384 VStreams. Even if each individual VReplicationExec takes only a few milliseconds, in aggregate this could easily take 30+ seconds, risking timing out the SwitchTraffic action.

This PR makes things a bit more efficient by making 1 VReplicationExec per target shard, and getting all relevant copy states with each call. In my testing with large # of VStreams, this makes the overall SwitchTraffic action much faster for the use case I described above. The trade-off here is that the these VReplicationExec queries are more expensive with larger result sets.

Related Issue(s)

Fixes #14325

Checklist

  • "Backport to:" labels are not needed
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Signed-off-by: Max Englander <max@planetscale.com>
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Oct 26, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Oct 26, 2023
@maxenglander maxenglander added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication and removed NeedsWebsiteDocsUpdate What it says labels Oct 26, 2023
@github-actions github-actions bot added this to the v19.0.0 milestone Oct 26, 2023
@maxenglander maxenglander marked this pull request as ready for review October 26, 2023 13:21
@maxenglander maxenglander removed the NeedsIssue A linked issue is missing for this Pull Request label Oct 26, 2023
@dbussink dbussink removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Oct 26, 2023
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @maxenglander ! This is a nice little optimization.

I had some minor suggestions. Beyond that, we'll need to make the equivalent optimization for vtctldclient as well. vtctlclient — which is going away soon — uses wrangler whereas vtctldclient uses the workflow server. So we'd make the same optimization here:

func (s *Server) getWorkflowCopyStates(ctx context.Context, tablet *topo.TabletInfo, id int64) ([]*vtctldatapb.Workflow_Stream_CopyState, error) {
span, ctx := trace.NewSpan(ctx, "workflow.Server.getWorkflowCopyStates")
defer span.Finish()
span.Annotate("keyspace", tablet.Keyspace)
span.Annotate("shard", tablet.Shard)
span.Annotate("tablet_alias", tablet.AliasString())
span.Annotate("vrepl_id", id)
query := fmt.Sprintf("select table_name, lastpk from _vt.copy_state where vrepl_id = %d and id in (select max(id) from _vt.copy_state where vrepl_id = %d group by vrepl_id, table_name)", id, id)
qr, err := s.tmc.VReplicationExec(ctx, tablet.Tablet, query)
if err != nil {
return nil, err
}
result := sqltypes.Proto3ToResult(qr)
if result == nil {
return nil, nil
}
copyStates := make([]*vtctldatapb.Workflow_Stream_CopyState, len(result.Rows))
for i, row := range result.Rows {
// These fields are technically varbinary, but this is close enough.
copyStates[i] = &vtctldatapb.Workflow_Stream_CopyState{
Table: row[0].ToString(),
LastPk: row[1].ToString(),
}
}
return copyStates, nil
}

go/vt/wrangler/vexec.go Show resolved Hide resolved
go/vt/wrangler/vexec.go Outdated Show resolved Hide resolved
maxenglander and others added 4 commits October 26, 2023 15:56
Co-authored-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Max Englander <max.englander@gmail.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander requested a review from mattlord October 30, 2023 09:29
@maxenglander
Copy link
Collaborator Author

@mattlord I can add more tests if you think it's needed, but I think the code changes are in a decent place. I decided not to try the JOIN approach we chatted about in Slack.

Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander self-assigned this Nov 10, 2023
@mattlord
Copy link
Contributor

mattlord commented Dec 4, 2023

My concern is that GetWorkflows is used quite heavily in vtctldclient (the client going forward). This work makes that significantly heavier for 99.9999% of cases in order to improve the 0.00001% of cases (high number of shards and changing vindexes during the move). It's already relatively heavy and gets called fairly often.

Did you get a sense of how much slower this made the average workflow show or moveables show commands?

@maxenglander
Copy link
Collaborator Author

maxenglander commented Dec 4, 2023

Hey @mattlord the use case we had was a production MySQL cluster with 110 shards that we are migrating from an external keyspace with external tablets into a Vitess keyspace with 128 shards.

Because the source keyspace has different primary vindexes than the target keyspace, this exploded into 14080 VStreams, and therefore 14080 getCopyState calls under the current implementation.

We were seeing that the overall time to complete this block of code on SwitchTraffic was taking between 40-80s by itself.

The fact that it was taking so long resulted in various timeouts:

  • remote operation timeout
  • etcd topo lease timeout

...as well as this error:

cannot switch traffic for workflow import_workflow at this time: replication lag 61s is higher than allowed lag 30s

This work makes that significantly heavier

Can you break down for me how that is the case? My understanding is that the current implementation fetches copy state once per each stream. I think this PR will change things so that it does the same or else fewer number of calls.

Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
@mattlord
Copy link
Contributor

mattlord commented Dec 5, 2023

Hey @mattlord the use case we had was a production MySQL cluster with 110 shards that we are migrating from an external keyspace with external tablets into a Vitess keyspace with 128 shards.

Because the source keyspace has different primary vindexes than the target keyspace, this exploded into 14080 VStreams, and therefore 14080 getCopyState calls under the current implementation.

I understand. My point was only that to my knowledge this is an exceedingly rare use case in the history of Vitess. I'm not saying that it's an invalid one. My point was that we should not make things worse/slower for the typical case in order to improve this one. That was my concern. It's a matter of HOW we address that use case/issue, not IF.

We were seeing that the overall time to complete this block of code on SwitchTraffic was taking between 40-80s by itself.

The fact that it was taking so long resulted in various timeouts:

  • remote operation timeout
  • etcd topo lease timeout

We may have been overloading various resources like the topo server which has a cascading effect. As we process each result we also make a topo call:

si, err := wr.ts.GetShard(ctx, keyspace, primary.Shard)

And those results are processed serially. So if the topo responses are a little slow, that will cause the total time to climb a lot in this particular case. We could process those results concurrently as well, synchronizing on the actual updates to the map (but most importantly making those topo calls in parallel).

...as well as this error:

cannot switch traffic for workflow import_workflow at this time: replication lag 61s is higher than allowed lag 30s

This work makes that significantly heavier

Can you break down for me how that is the case? My understanding is that the current implementation fetches copy state once per each stream. I think this PR will change things so that it does the same or else fewer number of calls.

You noted the trade-off yourself in the PR description.

The trade-off here is that the these VReplicationExec queries are more expensive with larger result sets.

The main query goes from being a point select to a range query. And this can have an impact when subsequently getting the log records for the stream(s) etc as well. I made this more efficient here: #14212 It's still a general concern of mine going forward though. So I'm a little (overly) paranoid about it as a lot of vreplication in v18+ is potentially impacted.

All that being said, in general I think we are offsetting any additional cost here by batching things like getting the copy states (although in most cases there will only be one stream on a tablet, but still) so this may be a wash in the end or even improve the typical case. So let me just review in full again. 😄

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry again for the delay. I think this looks OK, but we don't seem to have any test coverage do we? It looks like we updated the existing tests to adjust for the query changes but we don't have any tests that cover the case we're doing the work for do we? Meaning, cases where there are multiple streams per tablet.

Can we add some? Or let me know if I just missed it. I'm talking about unit tests here as we do have some coverage in the endtoend tests as there are cases where there's N streams per tablet (e.g. shard merges).

go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved
go/vt/wrangler/vexec.go Outdated Show resolved Hide resolved
go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved
go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved
@maxenglander
Copy link
Collaborator Author

Did you get a sense of how much slower this made the average workflow show or moveables show commands?

I realized that I completely misread this initially. I thought you were asking me how much slower the MoveTables commands were in the extreme case (14k shards). So my last comment completely missed the mark, sorry.

I did not get a sense of how much slower this was for the average case, although I tested it out locally a bunch with examples/local and did not notice any difference.

Can we add some?

Definitely. I was hoping to get a some initial feedback before investing in tests, which you've now given, and I appreciate! I'll get to work on tests 👷

@maxenglander maxenglander added the Skip CI Skip CI actions from running label Dec 19, 2023
maxenglander and others added 5 commits December 19, 2023 10:54
Co-authored-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Max Englander <max.englander@gmail.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Max Englander <max.englander@gmail.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander removed the Skip CI Skip CI actions from running label Dec 19, 2023
Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander
Copy link
Collaborator Author

It looks like we updated the existing tests to adjust for the query changes but we don't have any tests that cover the case we're doing the work for do we? Meaning, cases where there are multiple streams per tablet.

I took a somewhat lazy approach and just updated the tests to unit tests to have two tables, and test from there.

Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thanks, @maxenglander ! I only had some minor nits and suggestions. We can discuss/address those along with any that @rohit-nayak-ps may have. @rohit-nayak-ps can you please also review this whenever you have time?

go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved
go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved
go/vt/vtctl/workflow/server.go Outdated Show resolved Hide resolved
go/vt/wrangler/vexec.go Outdated Show resolved Hide resolved
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Signed-off-by: Max Englander <max@planetscale.com>
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, @maxenglander !

@rohit-nayak-ps rohit-nayak-ps merged commit 2783e32 into vitessio:main Dec 28, 2023
102 checks passed
@rohit-nayak-ps rohit-nayak-ps deleted the maxeng-vexec-getcopystate branch December 28, 2023 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement: improve MoveTables SwitchTraffic performance (getCopyState)
4 participants