Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

HA: refactor unlock-ddl-lock and break-ddl-lock #522

Merged
merged 16 commits into from
Mar 9, 2020

Conversation

csuzhangxc
Copy link
Member

@csuzhangxc csuzhangxc commented Mar 6, 2020

What problem does this PR solve?

refactor unlock-ddl-lock and break-ddl-lock to support HA.

What is changed and how it works?

  • re-implement the unlock-ddl-lock based on the etcd model.
  • remove break-ddl-lock because it's not needed anymore in the new model.

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change
  • Has interface methods change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

@csuzhangxc csuzhangxc added priority/normal Minor change, requires approval from ≥1 primary reviewer status/WIP This PR is still work in progress type/enhancement Performance improvement or refactoring labels Mar 6, 2020
@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

@codecov
Copy link

codecov bot commented Mar 7, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@a711a1c). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             master      #522   +/-   ##
==========================================
  Coverage          ?   56.846%           
==========================================
  Files             ?       183           
  Lines             ?     18821           
  Branches          ?         0           
==========================================
  Hits              ?     10699           
  Misses            ?      7046           
  Partials          ?      1076

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

@csuzhangxc csuzhangxc added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP This PR is still work in progress labels Mar 7, 2020
@csuzhangxc
Copy link
Member Author

@lichunzhu @WangXiangUSTC PTAL

dm/ctl/master/unlock_ddl_lock.go Outdated Show resolved Hide resolved
dm/master/shardddl/pessimist.go Show resolved Hide resolved
dm/master/shardddl/pessimist.go Show resolved Hide resolved
dm/master/shardddl/pessimist.go Show resolved Hide resolved
dm/master/shardddl/pessimist.go Outdated Show resolved Hide resolved
dm/master/shardddl/pessimist_test.go Outdated Show resolved Hide resolved
dm/master/shardddl/pessimist_test.go Outdated Show resolved Hide resolved
@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

@csuzhangxc
Copy link
Member Author

In some other places:
https://github.com/pingcap/dm/blob/master/pkg/shardddl/pessimism/operation_test.go#L68
https://github.com/pingcap/dm/blob/master/pkg/shardddl/pessimism/operation_test.go#L84
https://github.com/pingcap/dm/blob/master/syncer/shardddl/pessimist.go#L86
We are still using rev instead of rev+1. Should we unify them?
Rest LGTM

for

> https://github.com/pingcap/dm/blob/master/pkg/shardddl/pessimism/operation_test.go#L68
> https://github.com/pingcap/dm/blob/master/pkg/shardddl/pessimism/operation_test.go#L84

WatchOperationPut is watching with the revision returned from PutOperations, so rev should be used rather than rev+1.

for

> https://github.com/pingcap/dm/blob/master/syncer/shardddl/pessimist.go#L86

I updated in the caller (syncer.go) in 1ddb439.

Copy link
Contributor

@lichunzhu lichunzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lichunzhu lichunzhu added status/LGT1 One reviewer already commented LGTM and removed status/PTAL This PR is ready for review. Add this label back after committing new changes labels Mar 9, 2020
@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0


for _, source := range sources {
if synced, ok := l.ready[source]; ok && synced {
l.ready[source] = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to revert l.done[source]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only usage of RevertSynced is to revert Synced for ForceSynced.
and currenly we can't RevertSynced is the owner done the exec operation (done happened after that).

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

// UnlockLock unlocks a shard DDL lock manually when using `unlock-ddl-lock` command.
// ID: the shard DDL lock ID.
// replaceOwner: the new owner used to replace the original DDL for executing DDL to downstream.
// if the original owner is still exist, we should NOT specify any replaceOwner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which case the original owner is not existing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe we should choose a replaceOwner automatic later.

Copy link
Member Author

@csuzhangxc csuzhangxc Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which case the original owner is not existing

operate-source stop takes on the original owner.

and maybe we should choose a replaceOwner automatic later.

YES, it's better to do that automatically.

@csuzhangxc
Copy link
Member Author

/run-all-tests tidb=release-3.0

@@ -61,6 +62,11 @@ func (s *Server) electionNotify(ctx context.Context) {
log.L().Error("scheduler do not started", zap.Error(err))
}

err = s.pessimist.Start(ctx, s.etcdClient)
if err != nil {
log.L().Error("pessimist do not started", zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if start pessimist failed, do we need close scheduler and skip set leader in L71?

Copy link
Member Author

@csuzhangxc csuzhangxc Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a mechanism to transfer leader role to another DM-master member when any necessary components (pessimist and scheduler in currenlty) started fail later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and close the pessimist

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a mechanism to transfer leader role to another DM-master member when any necessary components (pessimist and scheduler in currenlty) started fail later.

ok, we can do this later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lichunzhu you can open a issue to trace leader transfer support later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

@WangXiangUSTC WangXiangUSTC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

Copy link
Contributor

@WangXiangUSTC WangXiangUSTC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@csuzhangxc csuzhangxc merged commit 263b86b into pingcap:master Mar 9, 2020
@csuzhangxc csuzhangxc deleted the pessimistic-lock branch March 9, 2020 16:06
lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 6, 2020
lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 6, 2020
lichunzhu pushed a commit to lichunzhu/dm that referenced this pull request Apr 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
priority/normal Minor change, requires approval from ≥1 primary reviewer status/LGT1 One reviewer already commented LGTM type/enhancement Performance improvement or refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants