Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: add a channel to limit multiple DDL jobs writing at the same time #14342

Merged
merged 13 commits into from
Mar 5, 2020

Conversation

zimulala
Copy link
Contributor

@zimulala zimulala commented Jan 3, 2020

What problem does this PR solve?

When performing DDL operations on multiple connections at the same time, you may encounter a "Write conflict" error.

What is changed and how it works?

Adding a channel on a single TiDB to limit multiple DDL jobs writing to the DDL job list at the same time results in "Write conflict".

Check List

Tests

  • Benchmark test
func addCreateTableJob(ctx sessionctx.Context, d *ddl, dbInfo *model.DBInfo, tblInfo *model.TableInfo) *model.Job {
	job := &model.Job{...}
	task := &limitJobTask{job, make(chan error)}
	d.limitJobCh <- task
	err := <-task.err
...
	return job
}

func BenchmarkAddDDLs(b *testing.B) {
...
	taskCnt := 1
	count := 200
	tblInfos := make([]*model.TableInfo, 0, count*taskCnt)
	for i := 0; i < count*taskCnt; i++ {
		tblInfo := testTableInfo(d, fmt.Sprintf("t_%d", i), 2)
		tblInfos = append(tblInfos, tblInfo)
	}
	wg := sync.WaitGroup{}
	runDDLsFunc := func(start, end int) {
		defer wg.Done()
		for i := start; i < end; i++ {
			addCreateTableJob(ctx, d, dbInfo, tblInfos[i])
		}
	}
	
	b.ResetTimer()
	wg.Add(taskCnt)
	for i := 0; i < taskCnt; i++ {
		go runDDLsFunc(i*count, (i+1)*count)
	}
	wg.Wait()
	b.StopTimer()
}

WC is the error of write confilct.

taskCnt count spend time(after) WC count(after) spend time(before) WC count(before)
1 200 0.117ns 7 0115ns 6
3 200 0.254ns 18 1.092 147
5 200 0.419s 41 10.387 1210
5 400 1.894s 148 68.745 3509

Related changes

  • Need to cherry-pick to the release branch

Release note

  • Adding a channel on a single TiDB to limit multiple DDL jobs writing to the DDL job list at the same time results in "Write conflict".

ddl/ddl_test.go Outdated Show resolved Hide resolved
@bb7133
Copy link
Member

bb7133 commented Jan 4, 2020

Please explain the meaning of ”taskCnt" and "Count" in the table in your PR Description.

@bb7133 bb7133 added the type/enhancement The issue or PR belongs to an enhancement. label Jan 4, 2020
@zimulala
Copy link
Contributor Author

zimulala commented Jan 5, 2020

@bb7133 ”taskCnt" and "Count" are the variables in the benchmark.

@zimulala
Copy link
Contributor Author

zimulala commented Jan 7, 2020

PTAL @bb7133 @crazycs520 @tiancaiamao

ddl/ddl_worker.go Outdated Show resolved Hide resolved
ddl/ddl_worker.go Show resolved Hide resolved
for {
tasks := make([]*limitJobTask, 0, batchAddingJobs)
select {
case task := <-d.limitJobCh:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not quite understand how this limits the task rate.
By let multiple worker conflict on the same channel and increase the runtime lock spin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write conflicts occur when multiple DDL jobs are written to the DDL job list at the same time.
We use this channel to reduce such conflicts.

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tiancaiamao
Copy link
Contributor

LGTM

@tiancaiamao tiancaiamao added the status/LGT2 Indicates that a PR has LGTM 2. label Jan 10, 2020
ddl/ddl_worker.go Outdated Show resolved Hide resolved
job.StartTS = txn.StartTS()
job.ID = ids[i]
if err = buildJobDependence(t, job); err != nil {
return errors.Trace(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one task fails, following tasks are discarded?

Copy link
Contributor Author

@zimulala zimulala Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we get jobs failed only this function will fail to get jobs from TiKV. Therefore, if this job gets jobs failed, the other job should also fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL @djshow832

@zimulala
Copy link
Contributor Author

PTAL @coocood

@zimulala zimulala requested a review from coocood January 20, 2020 02:39
@zimulala
Copy link
Contributor Author

PTAL @coocood @AilinKid

Copy link
Contributor

@AilinKid AilinKid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

for i, task := range tasks {
job := task.job
job.Version = currentVersion
job.StartTS = txn.StartTS()
Copy link
Contributor

@AilinKid AilinKid Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch DDL jobs share the same txn.StartTS()? unless these ddls is in same statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But not necessarily the same statement.
And this field has no special use in TiFilash and Tools.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok

AilinKid
AilinKid previously approved these changes Feb 21, 2020
Copy link
Contributor

@AilinKid AilinKid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zimulala zimulala removed the status/LGT2 Indicates that a PR has LGTM 2. label Mar 3, 2020
@zimulala zimulala added status/can-merge Indicates a PR has been approved by a committer. component/DDL labels Mar 4, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Mar 4, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Mar 4, 2020

@zimulala merge failed.

@zimulala zimulala added sig/sql-infra SIG: SQL Infra and removed component/DDL1 labels Mar 4, 2020
@zimulala zimulala requested a review from a team as a code owner March 5, 2020 03:05
@ghost ghost requested review from lzmhhh123 and winoros and removed request for a team March 5, 2020 03:05
@zimulala
Copy link
Contributor Author

zimulala commented Mar 5, 2020

/run-all-tests

@zimulala
Copy link
Contributor Author

zimulala commented Mar 5, 2020

/run-integration-ddl-test

@zimulala
Copy link
Contributor Author

zimulala commented Mar 5, 2020

/run-all-tests

@zimulala zimulala merged commit 6ccdf64 into pingcap:master Mar 5, 2020
@zimulala zimulala deleted the cache-ddl branch March 5, 2020 06:13
Copy link
Contributor

@AilinKid AilinKid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sre-bot pushed a commit to sre-bot/tidb that referenced this pull request Mar 5, 2020
Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot
Copy link
Contributor

sre-bot commented Mar 5, 2020

cherry pick to release-3.0 in PR #15148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/sql-infra SIG: SQL Infra status/can-merge Indicates a PR has been approved by a committer. status/LGT3 The PR has already had 3 LGTM. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants