Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unit(dm): add Kill func for unit #4035

Merged
merged 63 commits into from
Feb 16, 2022
Merged

unit(dm): add Kill func for unit #4035

merged 63 commits into from
Feb 16, 2022

Conversation

Ehco1996
Copy link
Contributor

@Ehco1996 Ehco1996 commented Dec 23, 2021

What problem does this PR solve?

Issue Number: close #3737

Root case of #3737

  • when dm-master node offline, dm-worker will lost the keepalive with master
  • dm-worker will use stopWorker to stop running tasks
  • if current running unit is syncer, syncer will try continue syncing until current trasaction end
    tctx.L().Info("received subtask's done")
  • at the same time master online again or another master node beacme leader will try scheduler task to another running worker
  • so there are two workers syncning from same source ,and we may meet duplicate key error

What is changed and how it works?

  • kill running subtasks when worker lost the keep-alive with dm-master (both for master offline or worker offline )

  • add Kill interface func to unit

  • impleament kill in syncer unit

  • change worker.Close to worker.Stop(graceful bool), when ungraceful stop, worker will call killAllSubTasks to force stop and this will cancel syncer.Run immediately

  • only when worker lost keepalive unit should call Kill

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change

Side effects

  • Possible performance regression
  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation
  • Need to update key monitor metrics in both TiCDC document and official document

Release note

`fix a bug that multiple worker write for same upstream`.

@Ehco1996 Ehco1996 added the area/dm Issues or PRs related to DM. label Dec 23, 2021
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Dec 23, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • GMHDBJD
  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 23, 2021
@Ehco1996 Ehco1996 changed the title unit(dm): supprt un/graceful stop unit unit(dm): support un/graceful close unit Dec 23, 2021
@ti-chi-bot ti-chi-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 23, 2021
@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2021

Codecov Report

Merging #4035 (603bf19) into master (9607554) will increase coverage by 0.0294%.
The diff coverage is 59.9121%.

❗ Current head 603bf19 differs from pull request most recent head 311e86f. Consider uploading reports for the commit 311e86f to get more accurate results

Flag Coverage Δ
cdc 60.1996% <53.6448%> (+0.2773%) ⬆️
dm 51.8508% <62.5194%> (-0.1781%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #4035        +/-   ##
================================================
+ Coverage   55.6402%   55.6696%   +0.0294%     
================================================
  Files           494        507        +13     
  Lines         61283      63081      +1798     
================================================
+ Hits          34098      35117      +1019     
- Misses        23750      24458       +708     
- Partials       3435       3506        +71     

@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 24, 2021
@Ehco1996
Copy link
Contributor Author

/run-dm-integration-tests

dm/dm/worker/subtask.go Outdated Show resolved Hide resolved
dm/dm/worker/subtask.go Outdated Show resolved Hide resolved
@@ -972,7 +973,7 @@ func (s *Syncer) addJob(job *job) error {
s.isTransactionEnd = false
failpoint.Inject("checkCheckpointInMiddleOfTransaction", func() {
s.tctx.L().Info("receive dml job", zap.Any("dml job", job))
time.Sleep(100 * time.Millisecond)
time.Sleep(500 * time.Millisecond)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because my local computer run this case very fast, sleep 0.1 seconds will cause case failed sometimes

dm/syncer/syncer.go Outdated Show resolved Hide resolved
dm/syncer/syncer.go Outdated Show resolved Hide resolved
@Ehco1996
Copy link
Contributor Author

/unhold

@Ehco1996 Ehco1996 marked this pull request as ready for review December 30, 2021 08:40
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 30, 2021
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weak suggestion: Close having a parameter is strange to me, can we use Kill or something?

EtcdServer has Stop and HardStop

@Ehco1996
Copy link
Contributor Author

Ehco1996 commented Dec 31, 2021

ok i will add new interface func to unit called Kill

@Ehco1996
Copy link
Contributor Author

Ehco1996 commented Jan 4, 2022

/run-dm-integration-tests

@Ehco1996 Ehco1996 changed the title unit(dm): support un/graceful close unit unit(dm): support Kill in syncer Jan 4, 2022
@Ehco1996 Ehco1996 changed the title unit(dm): support Kill in syncer unit(dm): support Kill in syncer unit Jan 4, 2022
@Ehco1996 Ehco1996 changed the title unit(dm): support Kill in syncer unit unit(dm): add Kill func for unit Jan 4, 2022
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 15, 2022
@Ehco1996 Ehco1996 removed the needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. label Feb 16, 2022
@Ehco1996 Ehco1996 added this to the v5.3.1 milestone Feb 16, 2022
Copy link
Contributor

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

dm/tests/new_relay/run.sh Outdated Show resolved Hide resolved
dm/tests/new_relay/run.sh Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added the needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. label Feb 16, 2022
Copy link
Contributor

@GMHDBJD GMHDBJD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 16, 2022
@Ehco1996
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 603bf19

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 16, 2022
@Ehco1996
Copy link
Contributor Author

FAIL: server_test.go:2144: testMaster.TestGRPCLongResponse

/run-leak-test
/run-verify

@ti-chi-bot ti-chi-bot merged commit e7b0aae into pingcap:master Feb 16, 2022
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Feb 16, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4604.

@Ehco1996 Ehco1996 added the needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. label Feb 16, 2022
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Feb 16, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4610.

zhaoxinyu pushed a commit to zhaoxinyu/ticdc that referenced this pull request Feb 16, 2022
@Ehco1996 Ehco1996 removed needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. labels Feb 17, 2022
@Ehco1996 Ehco1996 removed this from the v5.3.1 milestone Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multiple worker write for same upstream
7 participants