Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: migration with same context and SQL now considered duplicate and silently marked as complete #9107

Merged

Conversation

shlomi-noach
Copy link
Contributor

Description

As reminder, a migration has both a UUID (unique to the migration) and context. The context is implicitly set to be unique in a VTGate session, or is (optionally) provided via vtctl ApplySchema -request_context.

Up till now the migration context was purely declarative and had no logical role. One could submit it, one could list it, one could SHOW VITESS_MIGRATIONS LIKE 'the-context' but that's it.

This PR offers a form of implicit idempotency based on the context, adding actual logic to the use of context.

If two migrations are submitted with same context and same SQL/DDL, then they are considered to be duplicates. When the Online DDL executor schedules a migration it checks:

  • if the migration has an explicit context, and
  • there is a previous migration with exact same context and SQL, and the migration is complete
  • then we implicitly mark the scheduled migration as complete and we do not actually run it.

the idea is that the user is requested to provide a unique context for migrations. Take the following vtctl command:

$ vtctl ApplySchema -request_context 1111-2222-3333 -sql "alter table t 1add column i int not null default 0; alter table t2 drop column c"

It's the user's choice to provide the context 1111-2222-3333 and it's the users responsibility to make it unique.

In the above example the user submits two queries. It is possible for vtctl to crash midway, leaving the user in confusion: were both migrations submitted? Was any of them submitted? Was just one of them submitted?

The user could invoke SHOW VITESS_MIGRATIONS LIKE '1111-2222-3333' and compare the result set with the list of migrations the user provided via -sql. But we feel we can automate that better in Vitess.

As of this PR, if the user again submits same ApplySchema with same context and same SQL:

$ vtctl ApplySchema -request_context 1111-2222-3333 -sql "alter table t 1add column i int not null default 0; alter table t2 drop column c"

Then Vitess identifies that either or both of these migrations are duplicates. If any of these duplicates is already marked as complete, then the "new" migrations can be skipped.
Vitess does generate new UUIDs for the new migrations, but, in case of duplication, they will not execute, and very quickly be marked as complete.
Of course, if the "old" duplicate was failed, then the new migration is actually executed.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

cc @mscoutermarsh @deepthi @mavenraven

If a migration is found which shares exactly same migration-context and SQL as a previously complete migration, then the migration at hand is implicitly considered as 'complete'

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) labels Oct 28, 2021
@mavenraven
Copy link
Contributor

mavenraven commented Oct 28, 2021

@shlomi-noach Awesome, thank you so much! I think this will cover everything.

edit: Re: Vitess does generate new UUIDs for the new migrations, but, in case of duplication, they will not execute, and very quickly be marked as complete. How feasible would it be for not make new migrations at all?

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Oct 28, 2021

How feasible would it be for not make new migrations at all?

This is doable This is quite complicated. Consider that vtclt ApplySchema could provide 5 statements, and only get back 2 UUIDs. That's quite confusing; which 2 UUID are those?

One might think we can instead return the UUIDs of the existing migrations. However, UUIDs are generated way before the migration is sent to the tablets: consider how there could be multiple shards for a single migration; all shards must use the same migration UUID.

@shlomi-noach
Copy link
Contributor Author

s/This is doable/This is quite complicated/g

@mavenraven
Copy link
Contributor

mavenraven commented Oct 28, 2021

How feasible would it be for not make new migrations at all?

This is doable This is quite complicated. Consider that vtclt ApplySchema could provide 5 statements, and only get back 2 UUIDs. That's quite confusing; which 2 UUID are those?

One might think we can instead return the UUIDs of the existing migrations. However, UUIDs are generated way before the migration is sent to the tablets: consider how there could be multiple shards for a single migration; all shards must use the same migration UUID.

Ok, actually let me rephrase. When we display the progress of the migrations of a given context, it would be ideal if we only showed the "real" ones, otherwise it'll be super confusing for a user. How would we filter out the extra ones? If there's a solution to this, then I'm happy.

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Oct 31, 2021

When we display the progress of the migrations of a given context, it would be ideal if we only showed the "real" ones, otherwise it'll be super confusing for a user.

Got it. Right, in the current suggested method the new migration will have a binary "not yet started" then suddenly "I''m all done" status/progress, and not show the actual progress of the running predecessor duplicate. Let me think about this some more.

@shlomi-noach
Copy link
Contributor Author

After some internal discussions about how this might be used, it looks like we're on the right path. This PR is good to be reviewed.

@shlomi-noach shlomi-noach marked this pull request as ready for review November 15, 2021 08:37
@shlomi-noach shlomi-noach requested a review from sougou November 15, 2021 08:39
@@ -405,6 +409,39 @@ func TestSchemaChange(t *testing.T) {
checkTable(t, tableName, true)
testSelectTableMetrics(t)
})

// ### Teh following tests are not strictly 'declarative' but are best served under this endtoend test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ### Teh following tests are not strictly 'declarative' but are best served under this endtoend test
// ### The following tests are not strictly 'declarative' but are best served under this endtoend test

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shlomi-noach shlomi-noach marked this pull request as draft November 18, 2021 06:25
@shlomi-noach
Copy link
Contributor Author

I've temporarily converted to draft again just because we had another internal discussion about the use cases. I think the PR is good as it is, and worth merging irrespective, but I just want to think a bit more to see we haven't missed anything.

@shlomi-noach
Copy link
Contributor Author

(conflict resolution delayed a bit due to new linter's errors)

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

OK I'm happy with the PR as-is. It does not solve the entire discussion around idempotency but it's good basis. Merging.

@shlomi-noach shlomi-noach marked this pull request as ready for review December 5, 2021 06:38
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach merged commit c124930 into vitessio:main Dec 5, 2021
@shlomi-noach shlomi-noach deleted the online-ddl-idempotent-context-sql branch December 5, 2021 08:06
@vkovacik
Copy link

@shlomi-noach
Do you plan to add support for parameter request_context provided via vtgate session variable?
Vitess already supports executing online DDL via vtgate using @@ddl_strategy='online' but this PR does not allow to pass @@request_context session variable.

Thanks for this PR, very useful.

@shlomi-noach
Copy link
Contributor Author

@vkovacik I wasn't planning to do so. Right now if you run an online DDL from vtgate, the value request_context is your session identifier (ie, different session - different context).

I don't have a solid idea/suggestion on how to do it in vtgate, I'm open to ideas. Should we introduce a new @@request_context variable and fall back to the session ID if empty?

@vkovacik
Copy link

vkovacik commented Jan 23, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants