Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fix bug that routine load may lost some data #5093

Merged
merged 10 commits into from
Dec 23, 2020

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Dec 16, 2020

Proposed changes

In the previous implementation, whether a subtask is in commit or abort state,
we will try to update the job progress, such as the consumed offset of kafka.
Under normal circumstances, the aborted transaction does not consume any data,
and all progress is 0, so even we update the progress, the progress will remain
unchanged.
However, in the case of high cluster load, the subtask may fail half of the execution on the BE side.
At this time, although the task is aborted, part of the progress is updated.
Cause the next subtask to skip these data for consumption, resulting in data loss.

Bad case:
LoadedRows = 0;
comittedOffset > 0
txn: aborted

Also modify some log level and config

Types of changes

  • Bugfix (non-breaking change which fixes an issue)

Checklist

In the previous implementation, whether a subtask is in commit or abort state,
we will try to update the job progress, such as the consumed offset of kafka.
Under normal circumstances, the aborted transaction does not consume any data,
and all progress is 0, so even we update the progress, the progress will remain
unchanged.
However, in the case of high cluster load, the subtask may fail half of the execution on the BE side.
At this time, although the task is aborted, part of the progress is updated.
Cause the next subtask to skip these data for consumption, resulting in data loss.

Also modify some log level and config
@morningman morningman added kind/fix Categorizes issue or PR as related to a bug. area/load Issues or PRs related to all kinds of load labels Dec 16, 2020
@morningman morningman self-assigned this Dec 16, 2020
@apache apache deleted a comment from yangzhg Dec 16, 2020
@apache apache deleted a comment from EmmyMiao87 Dec 16, 2020
// step2: update job progress
// only committed task need to update progress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error commit...

EmmyMiao87
EmmyMiao87 previously approved these changes Dec 17, 2020
Copy link
Contributor

@EmmyMiao87 EmmyMiao87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EmmyMiao87 EmmyMiao87 added the approved Indicates a PR has been approved by one committer. label Dec 17, 2020
Copy link
Contributor

@hf200012 hf200012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the patch is applied, there is no data loss problem so far.

EmmyMiao87
EmmyMiao87 previously approved these changes Dec 21, 2020
Copy link
Contributor

@EmmyMiao87 EmmyMiao87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman marked this pull request as draft December 21, 2020 06:00
@morningman morningman removed the approved Indicates a PR has been approved by one committer. label Dec 21, 2020
@morningman morningman marked this pull request as ready for review December 22, 2020 04:50
Copy link
Contributor

@EmmyMiao87 EmmyMiao87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit c57145b into apache:master Dec 23, 2020
acelyc111 pushed a commit to acelyc111/incubator-doris that referenced this pull request Dec 25, 2020
In the previous implementation, whether a subtask is in commit or abort state,
we will try to update the job progress, such as the consumed offset of kafka.
Under normal circumstances, the aborted transaction does not consume any data,
and all progress is 0, so even we update the progress, the progress will remain
unchanged.
However, in the case of high cluster load, the subtask may fail half of the execution on the BE side.
At this time, although the task is aborted, part of the progress is updated.
Cause the next subtask to skip these data for consumption, resulting in data loss.
morningman added a commit to baidu-doris/incubator-doris that referenced this pull request Jan 5, 2021
In the previous implementation, whether a subtask is in commit or abort state,
we will try to update the job progress, such as the consumed offset of kafka.
Under normal circumstances, the aborted transaction does not consume any data,
and all progress is 0, so even we update the progress, the progress will remain
unchanged.
However, in the case of high cluster load, the subtask may fail half of the execution on the BE side.
At this time, although the task is aborted, part of the progress is updated.
Cause the next subtask to skip these data for consumption, resulting in data loss.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/load Issues or PRs related to all kinds of load kind/fix Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Routine load lose data
4 participants