Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](routine-load) fix routine load pause when Kafka data deleted after TTL #37288

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Jul 4, 2024

Proposed changes

When using routine load, After the data load is completed, the lag is still a positive number:

  Lag: {"0":16,"1":15,"2":16,"3":16,"4":16,"5":16,"6":15,"7":16,"8":16,"9":16,"10":15,"11":16,"12":15,"13":15,"14":16,"15":16,"16":17,"17":15,"18":16,"19":15,"20":16,"21":16,"22":16,"23":16,"24":15,"25":17,"26":17,"27":16,"28":16,"29":16,"30":16,"31":17,"32":14,"33":16,"34":17,"35":16,"36":15,"37":15,"38":15,"39":16,"40":16,"41":16,"42":15,"43":15,"44":17,"45":16,"46":15,"47":15,"48":16,"49":17,"50":16,"51":15,"52":16,"53":15,"54":15,"55":17,"56":16,"57":17,"58":16,"59":16,"60":15,"61":15,"62":16,"63":16,"64":17,"65":16,"66":15,"67":16,"68":17,"69":16,"70":15,"71":17}

and the routing load is paused when the Kafka data reaches TTL and is deleted, the error is out of range.

The reason why this happened is EOF has it offset which needed statistics.

note(important):
After the bug is fixed, if you set

"property.enable.partition.eof" = "false"

in your routine load job, it will meet the problem. For EOF has offset, and the config is true in Doris default.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@sollhui
Copy link
Contributor Author

sollhui commented Jul 4, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39569 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 371ecaad17ca25ebe167b46dfcd7f8caa6d34f87, data reload: false

------ Round 1 ----------------------------------
q1	17633	4310	4232	4232
q2	2021	189	187	187
q3	10452	1181	1148	1148
q4	10190	852	848	848
q5	7511	2635	2595	2595
q6	218	136	135	135
q7	940	601	603	601
q8	9218	2049	2050	2049
q9	8841	6468	6445	6445
q10	8897	3666	3691	3666
q11	450	231	236	231
q12	468	229	225	225
q13	18009	2963	2971	2963
q14	278	215	217	215
q15	522	472	485	472
q16	488	367	364	364
q17	954	639	650	639
q18	8028	7402	7397	7397
q19	4670	1412	1434	1412
q20	650	330	338	330
q21	4843	3079	3244	3079
q22	381	336	344	336
Total cold run time: 115662 ms
Total hot run time: 39569 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4322	4217	4219	4217
q2	373	269	250	250
q3	2987	2750	2810	2750
q4	1934	1717	1749	1717
q5	5573	5594	5466	5466
q6	241	137	130	130
q7	2200	1841	1810	1810
q8	3239	3382	3362	3362
q9	8666	8656	8776	8656
q10	4068	3887	3832	3832
q11	588	481	489	481
q12	800	625	626	625
q13	15886	3161	3165	3161
q14	313	298	281	281
q15	533	483	489	483
q16	485	418	422	418
q17	1804	1520	1468	1468
q18	8281	8024	7718	7718
q19	1838	1668	1710	1668
q20	2131	1883	1881	1881
q21	5074	4910	4914	4910
q22	625	580	528	528
Total cold run time: 71961 ms
Total hot run time: 55812 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 170548 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 371ecaad17ca25ebe167b46dfcd7f8caa6d34f87, data reload: false

query1	937	371	371	371
query2	6489	2699	2629	2629
query3	6640	201	208	201
query4	19143	17410	17424	17410
query5	3648	474	484	474
query6	274	174	159	159
query7	4582	289	299	289
query8	325	313	291	291
query9	9252	2473	2457	2457
query10	603	305	280	280
query11	10647	9955	9959	9955
query12	120	90	83	83
query13	1654	378	369	369
query14	8871	7811	6257	6257
query15	222	188	194	188
query16	7878	311	310	310
query17	1811	556	535	535
query18	1929	280	283	280
query19	204	157	161	157
query20	87	83	83	83
query21	206	135	127	127
query22	4515	4027	3990	3990
query23	33684	33635	33392	33392
query24	10673	2914	2909	2909
query25	667	405	421	405
query26	931	165	160	160
query27	2327	317	324	317
query28	6372	2166	2157	2157
query29	918	670	670	670
query30	256	159	167	159
query31	983	753	771	753
query32	103	55	62	55
query33	769	331	308	308
query34	1019	487	483	483
query35	766	650	653	650
query36	1125	938	961	938
query37	146	80	80	80
query38	2985	2860	2836	2836
query39	868	802	774	774
query40	207	126	126	126
query41	55	52	52	52
query42	131	103	103	103
query43	604	573	567	567
query44	1247	739	721	721
query45	196	168	163	163
query46	1088	709	696	696
query47	1842	1802	1803	1802
query48	380	313	305	305
query49	943	404	428	404
query50	764	379	385	379
query51	6997	6669	6679	6669
query52	102	96	94	94
query53	369	302	289	289
query54	872	445	437	437
query55	71	70	70	70
query56	297	266	268	266
query57	1138	1039	1027	1027
query58	249	248	243	243
query59	3616	3440	3225	3225
query60	312	287	284	284
query61	117	96	95	95
query62	608	430	450	430
query63	320	286	291	286
query64	8853	2312	1756	1756
query65	3148	3074	3084	3074
query66	742	322	334	322
query67	15765	14944	14947	14944
query68	9113	560	544	544
query69	697	431	343	343
query70	1360	1121	1145	1121
query71	543	275	276	275
query72	9174	2802	2631	2631
query73	2297	327	323	323
query74	5841	5499	5474	5474
query75	6043	2668	2704	2668
query76	5487	951	951	951
query77	814	322	302	302
query78	9571	11470	9501	9501
query79	9616	528	520	520
query80	1447	470	475	470
query81	552	216	222	216
query82	253	111	107	107
query83	339	179	172	172
query84	277	90	88	88
query85	740	316	281	281
query86	341	316	315	315
query87	3365	3115	3083	3083
query88	4327	2377	2353	2353
query89	476	377	376	376
query90	2458	185	191	185
query91	131	103	102	102
query92	64	49	49	49
query93	2702	504	507	504
query94	1509	210	207	207
query95	420	320	329	320
query96	605	269	268	268
query97	3155	3097	3014	3014
query98	214	202	196	196
query99	1089	819	836	819
Total cold run time: 290682 ms
Total hot run time: 170548 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 371ecaad17ca25ebe167b46dfcd7f8caa6d34f87, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.06
query4	1.68	0.09	0.07
query5	0.52	0.48	0.48
query6	1.13	0.73	0.73
query7	0.03	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.49	0.48
query10	0.53	0.55	0.54
query11	0.16	0.11	0.11
query12	0.14	0.12	0.12
query13	0.60	0.59	0.59
query14	0.78	0.77	0.79
query15	0.82	0.81	0.83
query16	0.36	0.35	0.35
query17	1.01	1.01	1.03
query18	0.20	0.27	0.23
query19	1.79	1.73	1.71
query20	0.02	0.01	0.01
query21	15.41	0.74	0.64
query22	4.88	6.52	2.16
query23	18.31	1.33	1.28
query24	2.15	0.23	0.23
query25	0.16	0.08	0.09
query26	0.27	0.18	0.18
query27	0.08	0.08	0.10
query28	13.23	1.01	1.00
query29	12.65	3.26	3.24
query30	0.25	0.07	0.05
query31	2.90	0.40	0.39
query32	3.25	0.49	0.47
query33	2.89	2.93	2.88
query34	17.16	4.38	4.44
query35	4.49	4.48	4.46
query36	0.67	0.49	0.50
query37	0.18	0.15	0.15
query38	0.15	0.14	0.14
query39	0.04	0.03	0.03
query40	0.17	0.14	0.14
query41	0.10	0.04	0.06
query42	0.05	0.04	0.05
query43	0.04	0.04	0.03
Total cold run time: 110.19 s
Total hot run time: 30.76 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Jul 7, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 7, 2024
Copy link
Contributor

github-actions bot commented Jul 7, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liaoxin01 liaoxin01 merged commit b210e8f into apache:master Jul 8, 2024
30 of 34 checks passed
@sollhui sollhui deleted the routine_load_unstable branch July 17, 2024 03:43
yiguolei pushed a commit that referenced this pull request Jul 17, 2024
…ter TTL (#37288) (#37983)

pick (#37288)

When using routine load, After the data load is completed, the lag is
still a positive number:
```
  Lag: {"0":16,"1":15,"2":16,"3":16,"4":16,"5":16,"6":15,"7":16,"8":16,"9":16,"10":15,"11":16,"12":15,"13":15,"14":16,"15":16,"16":17,"17":15,"18":16,"19":15,"20":16,"21":16,"22":16,"23":16,"24":15,"25":17,"26":17,"27":16,"28":16,"29":16,"30":16,"31":17,"32":14,"33":16,"34":17,"35":16,"36":15,"37":15,"38":15,"39":16,"40":16,"41":16,"42":15,"43":15,"44":17,"45":16,"46":15,"47":15,"48":16,"49":17,"50":16,"51":15,"52":16,"53":15,"54":15,"55":17,"56":16,"57":17,"58":16,"59":16,"60":15,"61":15,"62":16,"63":16,"64":17,"65":16,"66":15,"67":16,"68":17,"69":16,"70":15,"71":17}
```
and the routing load is paused when the Kafka data reaches TTL and is
deleted, the error is `out of range`.

The reason why this happened is EOF has it offset which needed
statistics.

**note(important):**
After the bug is fixed, if you set 
```
"property.enable.partition.eof" = "false"
```
in your routine load job, it will meet the problem. For EOF has offset,
and the config is true in Doris default.
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
…ter TTL (#37288)

When using routine load, After the data load is completed, the lag is
still a positive number:
```
  Lag: {"0":16,"1":15,"2":16,"3":16,"4":16,"5":16,"6":15,"7":16,"8":16,"9":16,"10":15,"11":16,"12":15,"13":15,"14":16,"15":16,"16":17,"17":15,"18":16,"19":15,"20":16,"21":16,"22":16,"23":16,"24":15,"25":17,"26":17,"27":16,"28":16,"29":16,"30":16,"31":17,"32":14,"33":16,"34":17,"35":16,"36":15,"37":15,"38":15,"39":16,"40":16,"41":16,"42":15,"43":15,"44":17,"45":16,"46":15,"47":15,"48":16,"49":17,"50":16,"51":15,"52":16,"53":15,"54":15,"55":17,"56":16,"57":17,"58":16,"59":16,"60":15,"61":15,"62":16,"63":16,"64":17,"65":16,"66":15,"67":16,"68":17,"69":16,"70":15,"71":17}
```
and the routing load is paused when the Kafka data reaches TTL and is
deleted, the error is `out of range`.

The reason why this happened is EOF has it offset which needed
statistics.

**note(important):**
After the bug is fixed, if you set 
```
"property.enable.partition.eof" = "false"
```
in your routine load job, it will meet the problem. For EOF has offset,
and the config is true in Doris default.
dataroaring pushed a commit that referenced this pull request Aug 11, 2024
…ter TTL(#37288) (#39183)

pick (#37288)

When using routine load, After the data load is completed, the lag is
still a positive number:
```
  Lag: {"0":16,"1":15,"2":16,"3":16,"4":16,"5":16,"6":15,"7":16,"8":16,"9":16,"10":15,"11":16,"12":15,"13":15,"14":16,"15":16,"16":17,"17":15,"18":16,"19":15,"20":16,"21":16,"22":16,"23":16,"24":15,"25":17,"26":17,"27":16,"28":16,"29":16,"30":16,"31":17,"32":14,"33":16,"34":17,"35":16,"36":15,"37":15,"38":15,"39":16,"40":16,"41":16,"42":15,"43":15,"44":17,"45":16,"46":15,"47":15,"48":16,"49":17,"50":16,"51":15,"52":16,"53":15,"54":15,"55":17,"56":16,"57":17,"58":16,"59":16,"60":15,"61":15,"62":16,"63":16,"64":17,"65":16,"66":15,"67":16,"68":17,"69":16,"70":15,"71":17}
```
and the routing load is paused when the Kafka data reaches TTL and is
deleted, the error is `out of range`.

The reason why this happened is EOF has it offset which needed
statistics.

**note(important):**
After the bug is fixed, if you set
```
"property.enable.partition.eof" = "false"
```
in your routine load job, it will meet the problem. For EOF has offset,
and the config is true in Doris default.
GoGoWen pushed a commit to GoGoWen/incubator-doris that referenced this pull request Aug 27, 2024
…ter TTL(apache#37288) (apache#39183)

pick (apache#37288)

When using routine load, After the data load is completed, the lag is
still a positive number:
```
  Lag: {"0":16,"1":15,"2":16,"3":16,"4":16,"5":16,"6":15,"7":16,"8":16,"9":16,"10":15,"11":16,"12":15,"13":15,"14":16,"15":16,"16":17,"17":15,"18":16,"19":15,"20":16,"21":16,"22":16,"23":16,"24":15,"25":17,"26":17,"27":16,"28":16,"29":16,"30":16,"31":17,"32":14,"33":16,"34":17,"35":16,"36":15,"37":15,"38":15,"39":16,"40":16,"41":16,"42":15,"43":15,"44":17,"45":16,"46":15,"47":15,"48":16,"49":17,"50":16,"51":15,"52":16,"53":15,"54":15,"55":17,"56":16,"57":17,"58":16,"59":16,"60":15,"61":15,"62":16,"63":16,"64":17,"65":16,"66":15,"67":16,"68":17,"69":16,"70":15,"71":17}
```
and the routing load is paused when the Kafka data reaches TTL and is
deleted, the error is `out of range`.

The reason why this happened is EOF has it offset which needed
statistics.

**note(important):**
After the bug is fixed, if you set
```
"property.enable.partition.eof" = "false"
```
in your routine load job, it will meet the problem. For EOF has offset,
and the config is true in Doris default.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants