Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](inverted index)Support Chinese column name with inverted index #36321

Merged
merged 2 commits into from
Jun 16, 2024

Conversation

qidaye
Copy link
Contributor

@qidaye qidaye commented Jun 14, 2024

Proposed changes

  1. std::string to std::wstring conversion only supports ASCII characters. For non-ASCII characters, we need to use StringUtil::string_to_wstring
  2. Fix index_tool check_terms_stats_v2 and add field info to print

Issue Number: #34118

1. `std::string` to `std::wstring` conversion only supports ASCII characters. For non-ASCII characters, we need to use `StringUtil::string_to_wstring`
2. Fix index_tool check_terms_stats_v2 and add field info to print
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@qidaye
Copy link
Contributor Author

qidaye commented Jun 14, 2024

run buildall

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.44% (8987/24661)
Line Coverage: 28.02% (73664/262926)
Region Coverage: 27.49% (38262/139204)
Branch Coverage: 24.19% (19510/80652)
Coverage Report: http://coverage.selectdb-in.cc/coverage/004a16bd29194309f35ad7c58161f85752124c66_004a16bd29194309f35ad7c58161f85752124c66/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 39406 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 004a16bd29194309f35ad7c58161f85752124c66, data reload: false

------ Round 1 ----------------------------------
q1	17619	4341	4273	4273
q2	2025	190	191	190
q3	10447	1135	1113	1113
q4	10198	738	801	738
q5	7507	2642	2605	2605
q6	219	135	134	134
q7	947	634	596	596
q8	9217	2050	2047	2047
q9	8779	6477	6446	6446
q10	8860	3677	3720	3677
q11	436	235	238	235
q12	495	230	222	222
q13	18709	2940	2982	2940
q14	262	224	221	221
q15	504	483	482	482
q16	510	371	364	364
q17	952	625	651	625
q18	7967	7474	7364	7364
q19	5968	1491	1384	1384
q20	650	314	322	314
q21	4860	3107	3823	3107
q22	382	329	331	329
Total cold run time: 117513 ms
Total hot run time: 39406 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4315	4215	4208	4208
q2	367	267	266	266
q3	2959	3012	2875	2875
q4	1965	1746	1702	1702
q5	5567	5505	5452	5452
q6	223	126	133	126
q7	2181	1931	1868	1868
q8	3337	3434	3416	3416
q9	8670	8678	8787	8678
q10	4056	3892	3790	3790
q11	592	495	513	495
q12	834	626	657	626
q13	16369	3157	3178	3157
q14	305	275	271	271
q15	527	490	486	486
q16	501	431	452	431
q17	1826	1520	1525	1520
q18	8001	8088	7787	7787
q19	2138	1541	1664	1541
q20	2166	1864	1861	1861
q21	8816	4912	4715	4715
q22	628	539	589	539
Total cold run time: 76343 ms
Total hot run time: 55810 ms

@qidaye
Copy link
Contributor Author

qidaye commented Jun 14, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.44% (8987/24664)
Line Coverage: 28.01% (73663/263019)
Region Coverage: 27.48% (38266/139246)
Branch Coverage: 24.18% (19507/80678)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ea0acf799c45145718dfb875a1934034211d1266_ea0acf799c45145718dfb875a1934034211d1266/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 39574 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ea0acf799c45145718dfb875a1934034211d1266, data reload: false

------ Round 1 ----------------------------------
q1	17633	4265	4292	4265
q2	2017	188	188	188
q3	10464	1126	1099	1099
q4	10189	778	723	723
q5	7531	2604	2612	2604
q6	218	138	135	135
q7	963	617	587	587
q8	9229	2067	2052	2052
q9	8895	6497	6453	6453
q10	8900	3725	3695	3695
q11	456	240	237	237
q12	481	234	222	222
q13	17757	2999	2968	2968
q14	268	227	222	222
q15	503	477	469	469
q16	503	371	382	371
q17	966	768	706	706
q18	8024	7424	7337	7337
q19	4376	1432	1541	1432
q20	675	323	329	323
q21	5086	3140	3951	3140
q22	402	352	346	346
Total cold run time: 115536 ms
Total hot run time: 39574 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4457	4216	4191	4191
q2	361	261	268	261
q3	2978	2785	2956	2785
q4	1974	1708	1724	1708
q5	5611	5530	5511	5511
q6	222	126	128	126
q7	2216	1919	1891	1891
q8	3308	3391	3458	3391
q9	8674	8660	8779	8660
q10	4100	3871	3759	3759
q11	614	507	513	507
q12	826	627	642	627
q13	16416	3158	3137	3137
q14	314	269	278	269
q15	518	477	491	477
q16	487	426	443	426
q17	1812	1537	1510	1510
q18	8252	8091	7867	7867
q19	1845	1716	1698	1698
q20	2146	1834	1827	1827
q21	8964	4820	4824	4820
q22	643	552	554	552
Total cold run time: 76738 ms
Total hot run time: 56000 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173718 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ea0acf799c45145718dfb875a1934034211d1266, data reload: false

query1	956	382	395	382
query2	6468	2495	2392	2392
query3	6641	211	206	206
query4	19463	17159	17184	17159
query5	3689	514	464	464
query6	253	159	171	159
query7	4578	307	291	291
query8	331	300	296	296
query9	8528	2438	2426	2426
query10	568	297	287	287
query11	10574	10026	10108	10026
query12	126	84	91	84
query13	1652	366	377	366
query14	10062	7473	7447	7447
query15	265	204	204	204
query16	7745	272	266	266
query17	1877	555	546	546
query18	1962	288	299	288
query19	205	157	179	157
query20	110	84	83	83
query21	225	149	138	138
query22	4531	4542	4296	4296
query23	34223	33483	33642	33483
query24	10642	2830	2763	2763
query25	578	351	363	351
query26	705	153	151	151
query27	2211	316	316	316
query28	5778	2070	2065	2065
query29	876	608	601	601
query30	242	151	150	150
query31	932	734	734	734
query32	91	55	55	55
query33	641	280	273	273
query34	875	473	477	473
query35	727	641	625	625
query36	1092	946	924	924
query37	140	70	76	70
query38	2893	2745	2743	2743
query39	866	796	808	796
query40	205	125	125	125
query41	55	53	59	53
query42	120	103	96	96
query43	595	579	567	567
query44	1097	723	739	723
query45	196	168	170	168
query46	1067	734	709	709
query47	1878	1754	1829	1754
query48	380	311	307	307
query49	858	418	429	418
query50	789	402	387	387
query51	6841	6695	6713	6695
query52	109	93	97	93
query53	368	296	286	286
query54	900	454	452	452
query55	78	78	73	73
query56	295	268	287	268
query57	1178	1057	1085	1057
query58	255	250	246	246
query59	3570	3393	3194	3194
query60	288	272	270	270
query61	91	88	93	88
query62	625	450	456	450
query63	313	293	290	290
query64	8477	2255	1744	1744
query65	3370	3105	3067	3067
query66	747	323	324	323
query67	15557	15083	14944	14944
query68	6109	552	529	529
query69	589	489	371	371
query70	1198	1062	1082	1062
query71	456	275	279	275
query72	7786	5193	5667	5193
query73	775	320	320	320
query74	5907	5480	5510	5480
query75	3643	2657	2705	2657
query76	3192	932	978	932
query77	612	312	310	310
query78	10511	9733	9698	9698
query79	2435	516	522	516
query80	1771	478	474	474
query81	562	228	220	220
query82	1371	109	102	102
query83	299	173	166	166
query84	278	80	93	80
query85	1256	274	270	270
query86	475	341	285	285
query87	3240	3096	3038	3038
query88	3542	2344	2333	2333
query89	464	384	365	365
query90	1709	192	187	187
query91	134	98	101	98
query92	63	48	52	48
query93	2200	504	493	493
query94	1057	210	188	188
query95	406	312	312	312
query96	593	269	260	260
query97	3244	3089	3043	3043
query98	222	202	196	196
query99	1201	834	837	834
Total cold run time: 271923 ms
Total hot run time: 173718 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.1 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ea0acf799c45145718dfb875a1934034211d1266, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.07
query5	0.49	0.46	0.48
query6	1.12	0.72	0.71
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.54	0.49	0.50
query10	0.55	0.56	0.54
query11	0.16	0.12	0.11
query12	0.15	0.12	0.12
query13	0.59	0.59	0.59
query14	0.81	0.78	0.76
query15	0.83	0.81	0.81
query16	0.36	0.37	0.35
query17	1.00	0.99	1.00
query18	0.23	0.23	0.24
query19	1.78	1.75	1.76
query20	0.02	0.01	0.01
query21	15.40	0.64	0.64
query22	5.13	7.19	1.49
query23	18.25	1.40	1.30
query24	2.08	0.21	0.21
query25	0.16	0.09	0.09
query26	0.27	0.18	0.18
query27	0.08	0.09	0.08
query28	13.61	1.02	0.99
query29	12.67	3.37	3.29
query30	0.25	0.06	0.07
query31	2.85	0.39	0.39
query32	3.27	0.48	0.47
query33	2.90	2.95	2.90
query34	17.00	4.40	4.43
query35	4.48	4.42	4.44
query36	0.64	0.45	0.49
query37	0.18	0.15	0.15
query38	0.15	0.15	0.14
query39	0.05	0.04	0.03
query40	0.17	0.17	0.14
query41	0.10	0.04	0.04
query42	0.06	0.05	0.04
query43	0.05	0.04	0.03
Total cold run time: 110.53 s
Total hot run time: 30.1 s

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 16, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@xiaokang xiaokang merged commit a425382 into apache:master Jun 16, 2024
26 of 29 checks passed
@qidaye qidaye deleted the fix_chinese_col_with_index branch June 17, 2024 02:05
qidaye added a commit to qidaye/incubator-doris that referenced this pull request Jun 17, 2024
…pache#36321)

1. `std::string` to `std::wstring` conversion only supports ASCII
characters. For non-ASCII characters, we need to use
`StringUtil::string_to_wstring`
2. Fix index_tool check_terms_stats_v2 and add field info to print

Issue Number: apache#34118
qidaye added a commit to qidaye/incubator-doris that referenced this pull request Jun 17, 2024
…pache#36321)

1. `std::string` to `std::wstring` conversion only supports ASCII
characters. For non-ASCII characters, we need to use
`StringUtil::string_to_wstring`
2. Fix index_tool check_terms_stats_v2 and add field info to print

Issue Number: apache#34118
xiaokang pushed a commit that referenced this pull request Jun 17, 2024
…36321 (#36374)

1. `std::string` to `std::wstring` conversion only supports ASCII
characters. For non-ASCII characters, we need to use
`StringUtil::string_to_wstring`
2. Fix index_tool check_terms_stats_v2 and add field info to print

pick from master #36321
dataroaring pushed a commit that referenced this pull request Jun 21, 2024
…36321)

1. `std::string` to `std::wstring` conversion only supports ASCII
characters. For non-ASCII characters, we need to use
`StringUtil::string_to_wstring`
2. Fix index_tool check_terms_stats_v2 and add field info to print

Issue Number: #34118
@morningman morningman mentioned this pull request Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants