Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

shard DDL: rewrite shard DDL in pessimism mode for HA #456

Merged
merged 70 commits into from
Feb 11, 2020

Conversation

csuzhangxc
Copy link
Member

@csuzhangxc csuzhangxc commented Jan 15, 2020

What problem does this PR solve?

  • rewrite shard DDL in pessimism mode to adjust the framework of HA

What is changed and how it works?

  • use etcd to coordinate the shard DDL lock instead of gRPC.
    • to reviewers: it's better to review pkg/shardddl/pessimism/doc.go firstly
  • fix some integration test cases, NOTE, in order to make the case stable somethings need to be done:
    • rewrite the coordinator for the scheduler
    • fix the deadlock problem in StreamerController

Things have not to be done in this PR (I'll do them in the next PR)

  • add terror for shard DDL in pessimism mode
  • refactor manual handlers for shard DDL (e.g. unlock-ddl-lock)

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

  • Need to update the documentation
  • Need to be included in the release note

@csuzhangxc
Copy link
Member Author

@WangXiangUSTC @lichunzhu PTAL

pkg/shardddl/pessimism/doc.go Show resolved Hide resolved
pkg/shardddl/pessimism/info.go Show resolved Hide resolved
pkg/shardddl/pessimism/doc.go Show resolved Hide resolved
p.lk.Clear() // clear all previous locks to support re-Start.

// get the history shard DDL info.
ifm, rev, err := pessimism.GetAllInfo(etcdCli)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we synchronize the variable name of rev to revision-W1 in the explannation go program?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the ref to doc.go in 3ae58b6.

dm/master/shardddl/pessimist.go Show resolved Hide resolved
p.wg.Done()
close(opCh)
}()
pessimism.WatchOperationPut(ctx, etcdCli, "", "", rev, opCh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this rev be rev2?

Copy link
Member Author

@csuzhangxc csuzhangxc Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rev can work correctly because rev < rev2, but rev2 is more readable, updated in 3ae58b6.

dm/master/shardddl/pessimist.go Show resolved Hide resolved
@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

dm/master/shardddl/pessimist.go Outdated Show resolved Hide resolved
pkg/shardddl/pessimism/operation.go Show resolved Hide resolved
pkg/shardddl/pessimism/operation.go Show resolved Hide resolved
syncer/shardddl/pessimist.go Show resolved Hide resolved
syncer/syncer.go Show resolved Hide resolved
shardLockResolving.WithLabelValues(s.cfg.Name).Set(0)
break
shardOp, err2 := s.pessimist.GetOperation(ec.tctx.Ctx, shardInfo, rev)
shardLockResolving.WithLabelValues(s.cfg.Name).Set(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this after err2 is nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES, if err2 == nil, syncer will abort the replication, then it's not waiting for the lock to be resolved anymore, but can be in resolving after resumed without error.

csuzhangxc and others added 2 commits February 11, 2020 09:29
Co-Authored-By: WangXiangUSTC <wx347249478@gmail.com>
@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

@codecov
Copy link

codecov bot commented Feb 11, 2020

Codecov Report

❗ No coverage uploaded for pull request base (ha-dev@71b6b9a). Click here to learn what that means.
The diff coverage is 58.1395%.

@@             Coverage Diff             @@
##             ha-dev       #456   +/-   ##
===========================================
  Coverage          ?   54.4227%           
===========================================
  Files             ?        174           
  Lines             ?      16890           
  Branches          ?          0           
===========================================
  Hits              ?       9192           
  Misses            ?       6780           
  Partials          ?        918

Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

lk *pessimism.LockKeeper

// sources used to get all sources relative to the give task.
sources func(task string) []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about refract to taskSources or getTaskSources? sources seems weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed in e2972a2.

Copy link
Contributor

@WangXiangUSTC WangXiangUSTC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM


resp, err := cli.Txn(ctx).Then(putOp, delOp).Commit()
if err != nil {
return 0, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return err?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! fixed in e2972a2.

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

@csuzhangxc
Copy link
Member Author

@lichunzhu @WangXiangUSTC PTAL again.

Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lichunzhu lichunzhu added status/LGT1 One reviewer already commented LGTM and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Feb 11, 2020
Copy link
Contributor

@WangXiangUSTC WangXiangUSTC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WangXiangUSTC WangXiangUSTC added status/LGT2 Two reviewers already commented LGTM, ready for merge and removed status/LGT1 One reviewer already commented LGTM labels Feb 11, 2020
@csuzhangxc csuzhangxc merged commit 2a8a594 into pingcap:ha-dev Feb 11, 2020
@csuzhangxc csuzhangxc deleted the ha-shard-ddl branch February 11, 2020 07:46
lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 6, 2020
@csuzhangxc csuzhangxc removed the needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated label Jun 17, 2020
@csuzhangxc csuzhangxc added already-update-docs The docs related to this PR already updated. Add this label once the docs are updated and removed needs-update-docs Should update docs after this PR is merged. Remove this label once the docs are updated labels Oct 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
already-update-docs The docs related to this PR already updated. Add this label once the docs are updated priority/normal Minor change, requires approval from ≥1 primary reviewer status/LGT2 Two reviewers already commented LGTM, ready for merge type/enhancement Performance improvement or refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants