Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changefeed Fails to Replicate 'Drop Primary Key' DDL of a table that only has the dropped Primary key as valid indexes. #10890

Closed
asddongmen opened this issue Apr 9, 2024 · 7 comments · Fixed by #10965
Assignees
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. report/customer Customers have encountered this bug. severity/moderate type/bug The issue is confirmed as a bug.

Comments

@asddongmen
Copy link
Contributor

What did you do?

When a table with a primary key (PK) is replicated by Changefeed, the 'Drop Primary Key' DDL command of it is not replicated as expected.
This resulted in the subsequent Add Primary Key operation failing to replicate because the table downstream still retained the old Primary Key.

What did you expect to see?

Both Drop PK and Add PK DDL can be replicated.

What did you see instead?

as above.

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

Upstream TiKV version (execute tikv-server --version):

(paste TiKV version here)

TiCDC version (execute cdc version):

v8.0.0
@asddongmen asddongmen added type/bug The issue is confirmed as a bug. severity/moderate area/ticdc Issues or PRs related to TiCDC. labels Apr 9, 2024
@asddongmen
Copy link
Contributor Author

asddongmen commented Apr 9, 2024

😇 How to reproduce?

First:

CREATE TABLE t1 (
id INT PRIMARY KEY /*T![clustered_index] NONCLUSTERED */,
name VARCHAR(255),
email VARCHAR(255) UNIQUE
);

INSERT INTO t1 (id, name, email) VALUES (1, 'Alice Smith', 'alice.smith@example.com');
INSERT INTO t1 (id, name, email) VALUES (2, 'Bob Johnson', 'bob.johnson@example.com');
INSERT INTO t1 (id, name, email) VALUES (3, 'Charlie Brown', 'charlie.brown@example.com');
// All dml above are replicated

And then:

alter table t1 drop primary key; // Was not replicated, it should be, it's not expected

INSERT INTO t1 (id, name, email) VALUES (4, 'dongmen', 'dongmen@example.com');   // Was not replicated, it is expected.

alter table t1 add primary key (id); // Was not replicated,  it should be, it's not expected.

@asddongmen
Copy link
Contributor Author

asddongmen commented Apr 9, 2024

Workaround

The current workaround to this issue involves setting force-replicate=true in the changefeed configuration. This allows the replication of the ineligible table. And no ddl or dml will be lost by the way.

Solution

Solution 1:

One potential solution is to replicate the Drop Primary Key DDL on a table that only has the dropped primary key and no other non-null unique key. However, this could potentially result in data loss.
Upon executing the Drop Primary Key DDL, all subsequent DML operations on table t1 are ignored.
As described in the official documentation, by default, changefeed only replicates tables with valid indexes, where valid indexes refer to primary keys or non-null unique indexes. Because this table only has primary key as a valid index, after dropping primary key, the table's DML is no longer replicated.
Consequently, all DML operations conducted between Drop Primary Key and Add Primary Key may be lost.

Solution 2:

Another possible solution is as follows:
When replicating a table with only one valid index, if TiCDC receives a DDL to delete this valid index, it should terminate the changefeed and report an error. The error message should warn the user that this is a potentially risky operation that could lead to data loss, so we refuse to do it.
If the user needs to continue replication, they need to modify the changefeed configuration to force-replicate=true to avoid data loss.
However, this operation may lead to data redundancy, please refer to the official documentation.
Lastly, our documentation needs to provide a detailed explanation of this behavior.

@flowbehappy
Copy link
Collaborator

flowbehappy commented Apr 9, 2024

Workaround

The current workaround to this issue involves setting force-replicate=true in the changefeed configuration. This allows the replication of the ineligible table. And no ddl or dml will be lost by the way.

Solution

Solution 1:

One potential solution is to replicate the Drop Primary Key DDL on a table that only has the dropped primary key and no other non-null unique key. However, this could potentially result in data loss. Upon executing the Drop Primary Key DDL, all subsequent DML operations on table t1 are ignored. As described in the official documentation, by default, changefeed only replicates tables with valid indexes, where valid indexes refer to primary keys or non-null unique indexes. Because this table only has primary key as a valid index, after dropping primary key, the table's DML is no longer replicated. Consequently, all DML operations conducted between Drop Primary Key and Add Primary Key may be lost.

Solution 2:

Another possible solution is as follows: When replicating a table with only one valid index, if TiCDC receives a DDL to delete this valid index, it should terminate the changefeed and report an error. The error message should warn the user that this is a potentially risky operation that could lead to data loss, so we refuse to do it. If the user needs to continue replication, they need to modify the changefeed configuration to force-replicate=true to avoid data loss. However, this operation may lead to data redundancy, please refer to the official documentation. Lastly, our documentation needs to provide a detailed explanation of this behavior.

@benmeadowcroft What do you think about @asddongmen 's suggestion? BTW, I prefer solution 2. Because data lost is a more serious situation.

@benmeadowcroft
Copy link

@flowbehappy @asddongmen I see the benefits of solution 2 as well, though we will have to ensure the process for transitioning indexes on a table, without incurring interruption of the change feed, is well documented. For example, does dropping one index and adding another index in the same transaction avoid this problem? Do we recommend they add the second index beforehand, and then later drop the original index? Etc.

We will also need to ensure that the changefeed UX on TiDB Cloud allow for the configuration settings needed to resolve this issue.

@asddongmen
Copy link
Contributor Author

Thank you for your valuable input.

Does dropping one index and adding another index in the same transaction avoid this problem?

The TiDB does not support executing multiple DDL in a same transaction.

Do we recommend they add the second index beforehand, and then later drop the original index?

Yes, this is a feasible way to avoid this problem. We can include this in the documentation.

@benmeadowcroft
cc @flowbehappy

@asddongmen asddongmen changed the title Changefeed Fails to Replicate 'Drop Primary Key' DDL Changefeed Fails to Replicate 'Drop Primary Key' DDL of a table that only has the dropped Primary key as valid indexes. Apr 10, 2024
@asddongmen asddongmen assigned asddongmen and unassigned asddongmen Apr 15, 2024
@asddongmen asddongmen self-assigned this Apr 24, 2024
@asddongmen
Copy link
Contributor Author

asddongmen commented Apr 24, 2024

To do list:

@asddongmen asddongmen added affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. labels Apr 25, 2024
@asddongmen asddongmen added the affects-7.1 This bug affects the 7.1.x(LTS) versions. label May 30, 2024
@ti-chi-bot ti-chi-bot added the affects-5.4 This bug affects the 5.4.x(LTS) versions. label May 30, 2024
@seiya-annie
Copy link

/found customer

@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jun 3, 2024
hicqu added a commit to ti-chi-bot/tiflow that referenced this issue Jun 12, 2024
commit c092599
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 12 00:26:59 2024 +0800

    pkg/config, sink(ticdc): support output raw change event for mq and cloud storage sink (pingcap#11226) (pingcap#11290)

    close pingcap#11211

commit 3426e46
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 19:40:29 2024 +0800

    puller(ticdc): fix wrong update splitting behavior after table scheduling (pingcap#11269) (pingcap#11282)

    close pingcap#11219

commit 2a28078
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 16:40:37 2024 +0800

    mysql(ticdc): remove error filter when check isTiDB in backend init (pingcap#11214) (pingcap#11261)

    close pingcap#11213

commit 2425d54
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 16:40:30 2024 +0800

    log(ticdc): Add more error query information to the returned error to facilitate users to know the cause of the failure (pingcap#10945) (pingcap#11257)

    close pingcap#11254

commit 053cdaf
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 15:34:30 2024 +0800

    cdc: log slow conflict detect every 60s (pingcap#11251) (pingcap#11287)

    close pingcap#11271

commit 327ba7b
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 11:42:00 2024 +0800

    redo(ticdc): return internal error in redo writer (pingcap#11011) (pingcap#11091)

    close pingcap#10124

commit d82ae89
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Mon Jun 10 22:28:29 2024 +0800

    ddl_puller (ticdc): handle dorp pk/uk ddl correctly (pingcap#10965) (pingcap#10981)

    close pingcap#10890

commit f15bec9
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Fri Jun 7 16:16:28 2024 +0800

    redo(ticdc): enable pprof and set memory limit for redo applier (pingcap#10904) (pingcap#10996)

    close pingcap#10900

commit ba50a0e
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 5 19:58:26 2024 +0800

    test(ticdc): enable sequence test (pingcap#11023) (pingcap#11037)

    close pingcap#11015

commit 94b9897
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 5 17:08:56 2024 +0800

    mounter(ticdc): timezone fill default value should also consider tz. (pingcap#10932) (pingcap#10946)

    close pingcap#10931

commit a912d33
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 5 10:49:25 2024 +0800

    mysql (ticdc): Improve the performance of the mysql sink by refining the transaction event batching logic (pingcap#10466) (pingcap#11242)

    close pingcap#11241

commit 6277d9a
Author: dongmen <20351731+asddongmen@users.noreply.github.com>
Date:   Wed May 29 20:13:22 2024 +0800

    kvClient (ticdc): revert e5999e3 to remove useless metrics (pingcap#11184)

    close pingcap#11073

commit 54e93ed
Author: dongmen <20351731+asddongmen@users.noreply.github.com>
Date:   Wed May 29 17:43:22 2024 +0800

    syncpoint (ticdc): make syncpoint support base64 encoded password (pingcap#11162)

    close pingcap#10516

commit 0ba9329
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed May 29 09:07:21 2024 +0800

    (redo)ticdc: fix the event orderliness in redo log (pingcap#11117) (pingcap#11180)

    close pingcap#11096

Signed-off-by: qupeng <qupeng@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. report/customer Customers have encountered this bug. severity/moderate type/bug The issue is confirmed as a bug.
Projects
Development

Successfully merging a pull request may close this issue.

5 participants