ErrSnapshotSchemaNotFound or ErrSchemaStorageGCed after migrating tables #1067
Labels
area/ticdc
Issues or PRs related to TiCDC.
bug-from-user
Bugs found by users.
component/scheduler
TiCDC inner scheduler component.
priority/P0
The issue has P0 priority.
type/bug
The issue is confirmed as a bug.
Milestone
Bug Report
This issue is part of the problem described in #1056
Please answer these questions before submitting your issue. Thanks!
What did you do? If possible, provide a recipe for reproducing the error.
Run a changefeed on three CDC nodes, with five tables and a total QPS around 5000. The changefeed stops with
ErrSnapshotSchemaNotFound
orErrSchemaStorageGCed
, after scaling in and scaling out the CDC cluster.What did you expect to see?
That the changefeed runs normally.
What did you see instead?
The errors
ErrSnapshotSchemaNotFound
orErrSchemaStorageGCed
.Versions of the cluster
All components are in v4.0.8.
Cause Analysis
This problem is caused by bugs in handling table migrations. Since stopping a table is not immediate, the checkpoint will proceed a bit after the checkpoint at which the owner started the migration. But the owner has recorded the checkpoint prematurely in
replica-info.startTs
. When the table is restarted on another capture that previously has not run this changefeed, the SchemaStorage does not have information about the Schemas before the current checkpoint, but the table pullers fetch data fromreplica-info.startTs
, which is before the current checkpoint. Hence the errors.The text was updated successfully, but these errors were encountered: