Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](hyperscan) Fix hyper scan fall back to re2 #44547

Merged
merged 2 commits into from
Nov 27, 2024

Conversation

zhiqiang-hhhh
Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Nov 25, 2024

What problem does this PR solve?

  • Core modification
    When hyper scan failed, we should not set_error in FunctionContext.
    Since set_error will try cancel query, but actually we want to fall back to re2 in this case.

  • Some refactor
    Rename FunctionRegexp so that we can distinguish regexp match with regexp_extract.

  • Reproduce

SELECT * FROM regexp_test_chinese WHERE city REGEXP "^上海|^北京" ORDER BY id;

Note, the | in above sql is a Chinese character.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40112 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 27c4b6305cee78603f23aa355d1b9ab32aedb09a, data reload: false

------ Round 1 ----------------------------------
q1	17576	7868	7328	7328
q2	2051	179	171	171
q3	10800	1125	1208	1125
q4	10445	806	737	737
q5	7637	2738	2698	2698
q6	243	150	147	147
q7	1002	631	617	617
q8	9233	1865	1948	1865
q9	6583	6388	6443	6388
q10	7034	2319	2370	2319
q11	462	265	263	263
q12	426	225	222	222
q13	17774	3019	3098	3019
q14	244	214	219	214
q15	574	536	519	519
q16	658	577	607	577
q17	993	563	544	544
q18	7403	6754	6777	6754
q19	1341	1016	981	981
q20	462	190	182	182
q21	4085	3263	3128	3128
q22	384	314	319	314
Total cold run time: 107410 ms
Total hot run time: 40112 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7220	7332	7303	7303
q2	322	233	225	225
q3	2925	2872	3006	2872
q4	2054	1801	1824	1801
q5	5699	5667	5763	5667
q6	230	144	148	144
q7	2243	1822	1834	1822
q8	3427	3550	3579	3550
q9	8910	9005	8880	8880
q10	3597	3573	3552	3552
q11	601	509	497	497
q12	838	621	636	621
q13	11762	3260	3259	3259
q14	306	270	276	270
q15	582	528	522	522
q16	700	652	645	645
q17	1888	1634	1615	1615
q18	8321	7777	7546	7546
q19	1761	1399	1438	1399
q20	2143	1863	1894	1863
q21	5624	5365	5449	5365
q22	631	549	585	549
Total cold run time: 71784 ms
Total hot run time: 59967 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197785 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 27c4b6305cee78603f23aa355d1b9ab32aedb09a, data reload: false

query1	1237	933	943	933
query2	6255	2145	2021	2021
query3	10826	4108	4099	4099
query4	68036	29033	23760	23760
query5	4989	465	449	449
query6	420	188	182	182
query7	5655	301	296	296
query8	312	223	212	212
query9	9408	2702	2711	2702
query10	459	250	242	242
query11	17513	15600	15867	15600
query12	159	110	104	104
query13	1540	414	424	414
query14	10989	7424	7664	7424
query15	211	185	172	172
query16	7464	462	447	447
query17	1070	566	562	562
query18	1918	310	295	295
query19	219	167	158	158
query20	124	115	120	115
query21	215	112	109	109
query22	4605	4603	4490	4490
query23	35099	34366	34236	34236
query24	5672	2471	2473	2471
query25	520	428	383	383
query26	645	156	153	153
query27	1795	286	288	286
query28	4333	2486	2494	2486
query29	687	421	417	417
query30	216	156	157	156
query31	1013	818	850	818
query32	70	55	58	55
query33	402	276	292	276
query34	994	533	550	533
query35	873	727	726	726
query36	1110	962	976	962
query37	127	76	74	74
query38	4462	4357	4367	4357
query39	1545	1505	1545	1505
query40	206	106	103	103
query41	46	42	42	42
query42	107	96	102	96
query43	540	493	491	491
query44	1205	845	849	845
query45	188	170	181	170
query46	1147	707	728	707
query47	2052	1927	1926	1926
query48	413	328	318	318
query49	740	406	397	397
query50	880	391	389	389
query51	7553	7320	7086	7086
query52	105	87	91	87
query53	257	180	186	180
query54	522	394	405	394
query55	77	76	79	76
query56	247	234	233	233
query57	1339	1216	1140	1140
query58	222	214	224	214
query59	3218	3000	2971	2971
query60	286	238	276	238
query61	111	121	110	110
query62	769	684	658	658
query63	215	186	196	186
query64	1387	653	623	623
query65	3390	3316	3241	3241
query66	707	298	302	298
query67	15956	15550	15608	15550
query68	4057	562	564	562
query69	443	255	255	255
query70	1198	1138	1153	1138
query71	371	243	246	243
query72	6394	4177	4025	4025
query73	754	363	362	362
query74	10259	9002	8975	8975
query75	3472	2707	2702	2702
query76	1765	1101	1146	1101
query77	487	272	262	262
query78	10827	9658	9593	9593
query79	1446	605	606	605
query80	849	431	439	431
query81	516	237	244	237
query82	1296	119	119	119
query83	175	144	147	144
query84	342	69	71	69
query85	873	311	305	305
query86	340	335	305	305
query87	4733	4552	4600	4552
query88	3410	2276	2208	2208
query89	431	292	302	292
query90	2014	195	190	190
query91	139	104	103	103
query92	62	51	49	49
query93	1714	541	537	537
query94	767	296	282	282
query95	344	242	249	242
query96	634	281	279	279
query97	2881	2670	2711	2670
query98	223	197	203	197
query99	1758	1302	1307	1302
Total cold run time: 322046 ms
Total hot run time: 197785 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.33% (9978/26032)
Line Coverage: 29.44% (83541/283754)
Region Coverage: 28.60% (42982/150303)
Branch Coverage: 25.18% (21839/86718)
Coverage Report: http://coverage.selectdb-in.cc/coverage/27c4b6305cee78603f23aa355d1b9ab32aedb09a_27c4b6305cee78603f23aa355d1b9ab32aedb09a/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 32.49 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 27c4b6305cee78603f23aa355d1b9ab32aedb09a, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.04
query3	0.23	0.07	0.07
query4	1.61	0.11	0.11
query5	0.42	0.43	0.42
query6	1.17	0.67	0.65
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.57	0.50	0.51
query10	0.56	0.56	0.57
query11	0.15	0.10	0.10
query12	0.14	0.12	0.12
query13	0.60	0.61	0.60
query14	2.86	2.76	2.85
query15	0.90	0.84	0.83
query16	0.44	0.37	0.39
query17	1.03	1.06	1.07
query18	0.23	0.21	0.21
query19	1.98	1.77	1.99
query20	0.01	0.01	0.01
query21	15.36	0.60	0.59
query22	2.79	1.99	2.45
query23	16.78	1.17	0.85
query24	2.75	0.62	1.35
query25	0.21	0.08	0.14
query26	0.43	0.14	0.14
query27	0.05	0.05	0.05
query28	11.10	1.09	1.07
query29	12.56	3.19	3.24
query30	0.25	0.07	0.06
query31	2.86	0.40	0.39
query32	3.27	0.46	0.47
query33	3.07	3.12	3.01
query34	17.08	4.46	4.48
query35	4.53	4.51	4.52
query36	0.67	0.49	0.48
query37	0.09	0.06	0.07
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.15	0.12	0.12
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 107.31 s
Total hot run time: 32.49 s

@zhiqiang-hhhh zhiqiang-hhhh marked this pull request as ready for review November 25, 2024 12:37
xiaokang
xiaokang previously approved these changes Nov 26, 2024
Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 26, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@@ -499,9 +496,6 @@ Status FunctionLikeBase::hs_prepare(FunctionContext* context, const char* expres
hs_free_database(*database);
*database = nullptr;
*scratch = nullptr;
if (context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please leave a comment here, avoid somebody add these codes back again next time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,I will add some test and comment.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 26, 2024
@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39832 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8aeb490adebad6534abc660197d72d595e28f4ef, data reload: false

------ Round 1 ----------------------------------
q1	17579	7516	7336	7336
q2	2033	179	177	177
q3	10693	1075	1188	1075
q4	10556	730	750	730
q5	7600	2708	2753	2708
q6	236	152	150	150
q7	1003	622	606	606
q8	9373	1872	1961	1872
q9	6640	6373	6405	6373
q10	6960	2314	2320	2314
q11	461	269	260	260
q12	413	220	221	220
q13	17767	3047	2975	2975
q14	247	210	218	210
q15	566	518	527	518
q16	644	597	592	592
q17	990	538	572	538
q18	7328	6625	6598	6598
q19	1341	1034	1032	1032
q20	464	183	177	177
q21	3981	3070	3091	3070
q22	375	312	301	301
Total cold run time: 107250 ms
Total hot run time: 39832 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7288	7284	7255	7255
q2	323	230	236	230
q3	2940	2805	2962	2805
q4	1994	1770	1802	1770
q5	5719	5732	5638	5638
q6	225	142	147	142
q7	2216	1823	1838	1823
q8	3417	3526	3520	3520
q9	8826	8968	8911	8911
q10	3624	3539	3565	3539
q11	603	508	530	508
q12	863	601	609	601
q13	11572	3326	3232	3232
q14	334	270	276	270
q15	582	523	533	523
q16	711	655	654	654
q17	1862	1628	1720	1628
q18	8332	7744	7776	7744
q19	2196	1550	1560	1550
q20	2094	1871	1867	1867
q21	5656	5418	5363	5363
q22	630	574	579	574
Total cold run time: 72007 ms
Total hot run time: 60147 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.35% (9977/26013)
Line Coverage: 29.45% (83496/283495)
Region Coverage: 28.61% (42988/150279)
Branch Coverage: 25.20% (21832/86618)
Coverage Report: http://coverage.selectdb-in.cc/coverage/8aeb490adebad6534abc660197d72d595e28f4ef_8aeb490adebad6534abc660197d72d595e28f4ef/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 196595 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8aeb490adebad6534abc660197d72d595e28f4ef, data reload: false

query1	1295	947	956	947
query2	6245	2111	2027	2027
query3	10923	4024	3972	3972
query4	67661	28979	23432	23432
query5	4956	470	452	452
query6	399	185	183	183
query7	5516	294	285	285
query8	316	233	245	233
query9	8498	2655	2665	2655
query10	433	268	269	268
query11	17086	15290	16097	15290
query12	153	107	108	107
query13	1483	452	438	438
query14	10868	7648	7775	7648
query15	209	185	176	176
query16	7404	483	465	465
query17	1140	582	578	578
query18	1806	334	293	293
query19	206	145	149	145
query20	117	118	111	111
query21	212	110	103	103
query22	4743	4447	4491	4447
query23	34459	34166	34345	34166
query24	5600	2532	2449	2449
query25	505	389	399	389
query26	654	154	146	146
query27	1940	283	288	283
query28	4322	2501	2491	2491
query29	668	404	408	404
query30	229	155	148	148
query31	1009	844	840	840
query32	65	57	58	57
query33	444	286	272	272
query34	921	526	522	522
query35	880	755	716	716
query36	1105	952	999	952
query37	128	74	74	74
query38	4613	4367	4482	4367
query39	1514	1598	1471	1471
query40	211	103	96	96
query41	44	42	41	41
query42	108	100	100	100
query43	519	485	490	485
query44	1176	850	833	833
query45	191	172	167	167
query46	1137	687	700	687
query47	2053	1943	1934	1934
query48	433	320	308	308
query49	725	386	402	386
query50	854	400	402	400
query51	7261	7243	7029	7029
query52	99	83	84	83
query53	252	178	183	178
query54	507	382	398	382
query55	76	74	74	74
query56	255	243	243	243
query57	1313	1138	1160	1138
query58	218	215	219	215
query59	3089	3021	3143	3021
query60	265	245	254	245
query61	109	105	105	105
query62	822	671	656	656
query63	212	184	180	180
query64	1404	657	654	654
query65	3266	3195	3286	3195
query66	720	301	301	301
query67	15992	15829	15631	15631
query68	3810	577	570	570
query69	442	252	257	252
query70	1198	1160	1150	1150
query71	338	245	246	245
query72	6332	4071	4201	4071
query73	765	359	355	355
query74	9557	9058	9009	9009
query75	3399	2677	2655	2655
query76	1913	1001	1051	1001
query77	452	281	269	269
query78	10495	9378	9408	9378
query79	1989	589	602	589
query80	1387	417	438	417
query81	519	248	227	227
query82	1277	122	114	114
query83	272	149	144	144
query84	281	79	68	68
query85	1119	299	291	291
query86	411	319	315	315
query87	4800	4463	4543	4463
query88	3588	2167	2109	2109
query89	415	293	312	293
query90	1994	194	186	186
query91	135	101	101	101
query92	76	50	51	50
query93	2924	542	534	534
query94	815	312	286	286
query95	342	251	241	241
query96	622	286	283	283
query97	2927	2718	2663	2663
query98	215	197	201	197
query99	1616	1331	1300	1300
Total cold run time: 320461 ms
Total hot run time: 196595 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8aeb490adebad6534abc660197d72d595e28f4ef, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.03	0.03
query3	0.23	0.07	0.07
query4	1.64	0.10	0.10
query5	0.44	0.42	0.40
query6	1.14	0.67	0.66
query7	0.02	0.01	0.01
query8	0.04	0.03	0.04
query9	0.55	0.52	0.49
query10	0.54	0.56	0.56
query11	0.14	0.10	0.11
query12	0.14	0.11	0.11
query13	0.62	0.60	0.61
query14	2.83	2.79	2.76
query15	0.91	0.83	0.83
query16	0.38	0.38	0.38
query17	1.07	1.05	1.02
query18	0.23	0.21	0.21
query19	1.85	1.78	1.91
query20	0.01	0.01	0.01
query21	15.36	0.61	0.59
query22	2.36	2.60	2.07
query23	17.05	0.97	0.78
query24	2.85	2.09	1.92
query25	0.27	0.15	0.13
query26	0.54	0.15	0.14
query27	0.04	0.04	0.04
query28	9.38	1.10	1.08
query29	12.50	3.20	3.18
query30	0.25	0.07	0.07
query31	2.87	0.38	0.37
query32	3.29	0.47	0.47
query33	2.98	3.02	3.09
query34	17.18	4.49	4.49
query35	4.57	4.53	4.58
query36	0.66	0.48	0.48
query37	0.09	0.05	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.02
query40	0.15	0.12	0.12
query41	0.08	0.02	0.03
query42	0.03	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.51 s
Total hot run time: 33.84 s

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 26, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@yiguolei yiguolei added usercase Important user case type label dev/2.1.x dev/3.0.x labels Nov 26, 2024
@yiguolei yiguolei merged commit df5bfe8 into apache:master Nov 27, 2024
29 of 32 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 27, 2024
* Core modification
When hyper scan failed, we should not set_error in FunctionContext.
Since set_error will try cancel query, but actually we want to fall back
to re2 in this case.

* Some refactor
Rename FunctionRegexp so that we can distinguish regexp match with
regexp_extract.

* Reproduce
```cpp
SELECT * FROM regexp_test_chinese WHERE city REGEXP "^上海|^北京" ORDER BY id;
```
Note, the `|` in above sql is a Chinese character.
github-actions bot pushed a commit that referenced this pull request Nov 27, 2024
* Core modification
When hyper scan failed, we should not set_error in FunctionContext.
Since set_error will try cancel query, but actually we want to fall back
to re2 in this case.

* Some refactor
Rename FunctionRegexp so that we can distinguish regexp match with
regexp_extract.

* Reproduce
```cpp
SELECT * FROM regexp_test_chinese WHERE city REGEXP "^上海|^北京" ORDER BY id;
```
Note, the `|` in above sql is a Chinese character.
@zhiqiang-hhhh zhiqiang-hhhh deleted the fix-hs-fallback-master branch November 27, 2024 03:19
yiguolei pushed a commit that referenced this pull request Nov 27, 2024
* Core modification
When hyper scan failed, we should not set_error in FunctionContext.
Since set_error will try cancel query, but actually we want to fall back
to re2 in this case.

* Some refactor
Rename FunctionRegexp so that we can distinguish regexp match with
regexp_extract.

* Reproduce
```cpp
SELECT * FROM regexp_test_chinese WHERE city REGEXP "^上海|^北京" ORDER BY id;
```
Note, the `|` in above sql is a Chinese character.
yiguolei pushed a commit that referenced this pull request Nov 27, 2024
…44652)

Cherry-picked from #44547

Co-authored-by: zhiqiang <hezhiqiang@selectdb.com>
yiguolei pushed a commit that referenced this pull request Nov 28, 2024
…44653)

Cherry-picked from #44547

Co-authored-by: zhiqiang <hezhiqiang@selectdb.com>
@yiguolei yiguolei mentioned this pull request Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants