[Bug] Fix bug that routine load may lost some data #5093

morningman · 2020-12-16T08:49:03Z

Proposed changes

In the previous implementation, whether a subtask is in commit or abort state,
we will try to update the job progress, such as the consumed offset of kafka.
Under normal circumstances, the aborted transaction does not consume any data,
and all progress is 0, so even we update the progress, the progress will remain
unchanged.
However, in the case of high cluster load, the subtask may fail half of the execution on the BE side.
At this time, although the task is aborted, part of the progress is updated.
Cause the next subtask to skip these data for consumption, resulting in data loss.

Bad case:
LoadedRows = 0;
comittedOffset > 0
txn: aborted

Also modify some log level and config

Types of changes

Bugfix (non-breaking change which fixes an issue)

Checklist

[] I have create an issue on (Fix [Bug] Routine load lose data #5095 ), and have described the bug/feature there in detail

In the previous implementation, whether a subtask is in commit or abort state, we will try to update the job progress, such as the consumed offset of kafka. Under normal circumstances, the aborted transaction does not consume any data, and all progress is 0, so even we update the progress, the progress will remain unchanged. However, in the case of high cluster load, the subtask may fail half of the execution on the BE side. At this time, although the task is aborted, part of the progress is updated. Cause the next subtask to skip these data for consumption, resulting in data loss. Also modify some log level and config

EmmyMiao87 · 2020-12-17T07:27:14Z

fe/fe-core/src/main/java/org/apache/doris/load/routineload/RoutineLoadJob.java

            // step2: update job progress
+            // only committed task need to update progress


error commit...

EmmyMiao87

LGTM

hf200012

After the patch is applied, there is no data loss problem so far.

EmmyMiao87

LGTM

EmmyMiao87

LGTM

In the previous implementation, whether a subtask is in commit or abort state, we will try to update the job progress, such as the consumed offset of kafka. Under normal circumstances, the aborted transaction does not consume any data, and all progress is 0, so even we update the progress, the progress will remain unchanged. However, in the case of high cluster load, the subtask may fail half of the execution on the BE side. At this time, although the task is aborted, part of the progress is updated. Cause the next subtask to skip these data for consumption, resulting in data loss.

morningman added kind/fix Categorizes issue or PR as related to a bug. area/load Issues or PRs related to all kinds of load labels Dec 16, 2020

morningman self-assigned this Dec 16, 2020

fix bug

5fc7b90

apache deleted a comment from yangzhg Dec 16, 2020

apache deleted a comment from EmmyMiao87 Dec 16, 2020

fix bug

6df889a

EmmyMiao87 reviewed Dec 17, 2020

View reviewed changes

review

ef53847

EmmyMiao87 previously approved these changes Dec 17, 2020

View reviewed changes

EmmyMiao87 added the approved Indicates a PR has been approved by one committer. label Dec 17, 2020

hf200012 reviewed Dec 17, 2020

View reviewed changes

fix ut

350d304

morningman dismissed EmmyMiao87’s stale review via 350d304 December 19, 2020 04:45

morningman added 2 commits December 19, 2020 16:21

fix codestyle

4205e9f

fix bug

d298be5

morningman force-pushed the modify_routine_log branch from e2fedbe to d298be5 Compare December 19, 2020 12:08

fix bug

6637950

EmmyMiao87 previously approved these changes Dec 21, 2020

View reviewed changes

morningman marked this pull request as draft December 21, 2020 06:00

morningman removed the approved Indicates a PR has been approved by one committer. label Dec 21, 2020

morningman-cmy added 2 commits December 21, 2020 18:27

5

143b393

6

f53a4fe

morningman dismissed EmmyMiao87’s stale review via f53a4fe December 21, 2020 10:53

morningman marked this pull request as ready for review December 22, 2020 04:50

EmmyMiao87 approved these changes Dec 22, 2020

View reviewed changes

morningman merged commit c57145b into apache:master Dec 23, 2020

yangzhg mentioned this pull request Feb 9, 2021

Release Notes 0.14.0 #5374

Closed

924060929 mentioned this pull request Sep 17, 2021

[Optimize][Stream load] Speed up stream load 3x #6684

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Fix bug that routine load may lost some data #5093

[Bug] Fix bug that routine load may lost some data #5093

morningman commented Dec 16, 2020 •

edited

Loading

EmmyMiao87 Dec 17, 2020

EmmyMiao87 left a comment

hf200012 left a comment

EmmyMiao87 left a comment

EmmyMiao87 left a comment

		// step2: update job progress
		// only committed task need to update progress

[Bug] Fix bug that routine load may lost some data #5093

[Bug] Fix bug that routine load may lost some data #5093

Conversation

morningman commented Dec 16, 2020 • edited Loading

Proposed changes

Types of changes

Checklist

EmmyMiao87 Dec 17, 2020

Choose a reason for hiding this comment

EmmyMiao87 left a comment

Choose a reason for hiding this comment

hf200012 left a comment

Choose a reason for hiding this comment

EmmyMiao87 left a comment

Choose a reason for hiding this comment

EmmyMiao87 left a comment

Choose a reason for hiding this comment

morningman commented Dec 16, 2020 •

edited

Loading