DM may lost data when resume from error in non GTID mode #1751

lance6716 · 2021-06-05T10:03:09Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do? If possible, provide a recipe for reproducing the error.

Upstream uses non GTID replication and generates binlog events which contain more than one row changes. Assuming that DM is handling event e1, if DM meets error and resumes, some row changes of event e1 may get lost.

Explanation.

DM will split event and generate jobs per row changes.

binlog event with 2 row changes
              │
      ┌───────▼───────┐
      │handleRowsEvent│
      └───────┬───────┘
              ▼
         job1, job2

Above 2 jobs are attached with same event position p0 and table information t0. DM will invoke addJob sequentially for jobs.

In addJob, DM will

send job to a random worker
check this job needing a flush
if need flush, wait workers to finish all jobs
save checkpoint for table
if need flush, flush checkpoints

Now if job1 needs flush (may caused by reaching 30s flush interval), after step 5, the checkpoint for the table t0 is flushed with p0. If unfortunately SQL of job2 fails executing, after resuming, DM will skip t0's binlog events whose position <= p0. So job2 is lost.

The text was updated successfully, but these errors were encountered:

lance6716 · 2021-06-07T02:58:56Z

in version 1.0, relay unit to sync unit is always non-GTID replication.

lance6716 added affected-v2.0.2 this issue/BUG affects v2.0.2 affected-v2.0.3 this issue/BUG affects v2.0.3 severity/critical type/bug This issue is a bug report labels Jun 5, 2021

lance6716 mentioned this issue Jun 5, 2021

syncer: don't skip jobs from same event when comparing table checkpoint #1752

Merged

lance6716 changed the title ~~DM may lost data when resume from synchronization error in non GTID mode~~ DM may lost data when resume from error in non GTID mode Jun 5, 2021

jebter assigned lance6716 Jun 7, 2021

ti-chi-bot closed this as completed in #1752 Jun 17, 2021

ti-chi-bot mentioned this issue Jun 17, 2021

syncer: don't skip jobs from same event when comparing table checkpoint (#1752) #1781

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM may lost data when resume from error in non GTID mode #1751

DM may lost data when resume from error in non GTID mode #1751

lance6716 commented Jun 5, 2021 •

edited

Loading

lance6716 commented Jun 7, 2021

DM may lost data when resume from error in non GTID mode #1751

DM may lost data when resume from error in non GTID mode #1751

Comments

lance6716 commented Jun 5, 2021 • edited Loading

Bug Report

lance6716 commented Jun 7, 2021

lance6716 commented Jun 5, 2021 •

edited

Loading