Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete and create the same changefeed in a short time may cause deadlock and data loss #7657

Closed
overvenus opened this issue Nov 21, 2022 · 8 comments · Fixed by #8268
Closed
Assignees
Labels
affects-5.0 affects-5.1 affects-5.2 affects-5.3 affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/major type/bug The issue is confirmed as a bug.

Comments

@overvenus
Copy link
Member

overvenus commented Nov 21, 2022

What did you do?

Because of model.ChangeFeedID is non-unique, owner and process has no way to
find out whether they are running the same changefeed. After appling delete and
create operation on the same changefeed name, there is a race condition between
owner and processor that they are running two different changefeeds. This may
cause deadlock and data loss.

To fix the issue, we should make sure:

  1. When owner receives a delete changefeed request, it can only return a
    response after the changefeed is closed completely on every TiCDC node. It
    prevents data loss.
  2. Add a unique field to model.ChangeFeed. It prevents deadlock and other
    illegal states.

What did you expect to see?

No deadlock and no data loss.

What did you see instead?

Changefeed is a deadlock state, and lag is increasing.

Versions of the cluster

TiCDC version (execute cdc version):

All TiCDC released version
@overvenus overvenus added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. labels Nov 21, 2022
@AkiraXie
Copy link

/found automation

@ti-chi-bot ti-chi-bot added the found/automation Bugs found by automation cases label Nov 21, 2022
@AkiraXie
Copy link

/severity major

@nongfushanquan
Copy link
Contributor

/assign @overvenus

@nongfushanquan
Copy link
Contributor

#7730

@overvenus
Copy link
Member Author

overvenus commented Dec 5, 2022

This issue has not been fixed, #7730 is a workaround that only mitigates the issue.

@overvenus
Copy link
Member Author

overvenus commented Feb 13, 2023

Find a similar case, restart changefeed in a very short period. Owner may see processor's state pre-restart changefeed.

See also #8242

overvenus added a commit to ti-chi-bot/tiflow that referenced this issue Apr 6, 2023
close pingcap#7657

Signed-off-by: Neil Shen <overvenus@gmail.com>
ti-chi-bot pushed a commit that referenced this issue Apr 11, 2023
close #7657

Signed-off-by: Neil Shen <overvenus@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.0 affects-5.1 affects-5.2 affects-5.3 affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.0 affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-6.6 area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants