Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to fix data loss when using the transaction and some kafka partitions are down #2016

Open
jackjoesh opened this issue Jun 20, 2023 · 4 comments

Comments

@jackjoesh
Copy link

backgroud:
As we all know, inflightMessages just add tx-commit record. That means if we start a local transaction as the following:
begin transaction
insert A (no tx commit)
insert B (tx commit)
end transaction

And if "insert A" and "insert B" dispatch to different kafka partitions, "insert B" completes callbacking before "insert A" out of order.
The event timeline is as the following:

1 "insert B" completes callbacking and updates the this tx's position (tx commit)
2 some kafka partitions are down
3 partitons don't receive "insert A" data yet

Finally, dose it main we lose the data "insert A" ? Is there any way to fix this problem?

@osheroff
Copy link
Collaborator

The design of maxwell's InflightMessage buffer is towards at-least-once delivery: all rows have to commit before maxwell.positions is incremented

So if you have:

row 1 <- succeeded
row 2 <- blocked
row 3 <- succeeded

rows 1 and 3 will reach kafka but maxwell.positions will not commit. If the partition stays blocked long enough, maxwell's buffers will fill and the process will stop. If maxwell crashes and row 2 is still not committed, maxwell will replay from row 1.

@jackjoesh
Copy link
Author

jackjoesh commented Jun 21, 2023

hi osheroff, it seems that you don't get my meaning, let me explain again

InflightMessage doesn't record no final tx-commit action in a local transaction even it's a insert action as the following
image

so in my above case:
begin transaction
insert A (no tx commit)
insert B (tx commit)
end transaction

insert A (no tx commit) doesn't be added in InflightMessage.

so if "insert A " 's position is 1, insert B (tx commit)'s position is 2.

Then the all process is as the following timeline:
1 owing to different partitions' out of order, "insert B" completes callbacking before "insert A" and updates the tx's position to 2(tx commit)
2 some kafka partitions are down
3 partitons don't receive "insert A" data yet (position 1 actually doesn't send kafka successfully)

At last, when maxwell restarts, it will continue from position 2(insert B) but not position 1(insert A), and insert A doesn't send kafka message successfully, right? @osheroff

osheroff added a commit that referenced this issue Jun 21, 2023
if rows in a transaction are flowing to different partitions, data loss
was possible if one partition got stuck while the "commit" message made
progress.

This has the net affect of serializing maxwell's ability to make
progress in a binlog.  i think that's mostly a good thing.
@osheroff
Copy link
Collaborator

ah, yeah, sorry. I missed that the rows inside a transaction are sent to different positions.

#2019 addresses this, but I do want to be a bit cautious about the increased memory usage possible here.

@jackjoesh
Copy link
Author

ah, yeah, sorry. I missed that the rows inside a transaction are sent to different positions.

#2019 addresses this, but I do want to be a bit cautious about the increased memory usage possible here.

thank you , I get it. Would you also make DEFAULT_CAPACITY in InflightMessageList can be configured? That means we need to increase this capactity

osheroff added a commit that referenced this issue Jul 5, 2023
osheroff added a commit that referenced this issue Sep 9, 2023
Right idea, but dumb implementation.
This reverts commit c20370c.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants