-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-4287] Optimize Flink checkpoint meta mechanism to fix mistaken pending instants #5913
Conversation
+1 for fix |
Fix conflicts with https://issues.apache.org/jira/projects/HUDI/issues/HUDI-4311 , which also has attempted to fix the rollback data lose in batch write scene. And this pr might be still needed because rollback could happen during runtime, such as StreamWriteOperatorCoordinator#notifyCheckpointComplete when commitInstant, and it won't crash the job for CkpMeta to rebootstrap. |
@danny0405 updated already, pls take a review to see whether it's ok. |
We already add stats for write functions to recoveer and recommit if the job fails during instant commition |
@chenshzh would you please rebase to latest master first? |
bb42bb2
to
59a5e7c
Compare
@chenshzh can you please check on the CI failures? |
66cadb8
to
fbb9ed3
Compare
@hudi-bot run azure |
@alexeykudinkin would you help review the CI failure once more? I have rebased to the latest and it seems not the problem of this PR's specific changes ? Because we find it failed in
|
Hello @chenshzh , do you think this is still an issue here ? We can have an offline talk if possible, are you in the Hudi DingTalk group now ? |
@danny0405 Yes, I'm in the group now, and would be glad to discuss it. |
You can add me as a friend and we can have a talk. |
@danny0405 : is this still a valid patch. can you follow up. |
No, it should have been fixed in master. |
Change Logs
Details added in https://issues.apache.org/jira/projects/HUDI/issues/HUDI-4287
To optimize the mechanism with CANCELLED CkpMessage state in the highest priority corresponding with DELETE instant during rollback action.
Revert back to re-send pending instants to ckp_meta when bootstrap.
Include scenes of compaction and clustering.
Impact
Last successful instant will be kept when recovering for recommit
Risk level (write none, low medium or high below)
medium. It will be fast verified by data quality check.
Documentation Update
none
Contributor's checklist