Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[branch-2.0](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062) #37270

Merged
merged 1 commit into from
Jul 31, 2024

Conversation

zclllyybb
Copy link
Contributor

pick #37062

  1. revert [chore](be) Add default timezone files #25097. we decide to rely on OS. not maintain independent tzdata anymore to keep result consistency
  2. refactor timezone load. removed rwlock.

before:

mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)

now:

mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
  1. now don't support timezone offset format string like 'UTC+8', like we already said in
    https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
  2. support case-insensitive timezone parsing in nereids.
  3. a bug when parse timezone using nereids. should check DST by input, but wrongly by now before. now fixed.

doc pr: apache/doris-website#810

…ezone parsing (apache#37062)

1. revert apache#25097. we decide to rely
on OS. not maintain independent tzdata anymore to keep result
consistency
2. refactor timezone load. removed rwlock.

before:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)
```
now:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
```
3. now don't support timezone offset format string like 'UTC+8', like we
already said in
https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
4. support case-insensitive timezone parsing in nereids.
5. a bug when parse timezone using nereids. should check DST by input,
but wrongly by now before. now fixed.

doc pr: apache/doris-website#810
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zclllyybb
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/util/timezone_utils.cpp Show resolved Hide resolved
@doris-robot
Copy link

TPC-H: Total hot run time: 49999 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 365b586953a1e488c3ac6f90770d979f4c7f20b8, data reload: false

------ Round 1 ----------------------------------
q1	17891	4416	4352	4352
q2	2069	160	149	149
q3	10300	1894	1972	1894
q4	10107	1264	1345	1264
q5	8497	3909	3849	3849
q6	233	150	124	124
q7	2068	1616	1601	1601
q8	9360	2739	2721	2721
q9	10700	10303	10284	10284
q10	8668	3530	3524	3524
q11	433	248	251	248
q12	480	315	304	304
q13	18338	3968	4044	3968
q14	358	332	337	332
q15	504	459	472	459
q16	692	577	574	574
q17	1128	953	963	953
q18	7284	6972	7015	6972
q19	1791	1652	1655	1652
q20	531	317	274	274
q21	4426	4095	4046	4046
q22	539	455	459	455
Total cold run time: 116397 ms
Total hot run time: 49999 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4347	4314	4274	4274
q2	326	225	226	225
q3	4192	4208	4142	4142
q4	2749	2758	2775	2758
q5	7241	7104	7123	7104
q6	241	118	121	118
q7	3236	2899	2838	2838
q8	4377	4500	4483	4483
q9	16918	16776	16878	16776
q10	4237	4299	4333	4299
q11	762	702	668	668
q12	1026	866	851	851
q13	6878	3736	3732	3732
q14	457	425	416	416
q15	513	467	461	461
q16	737	680	688	680
q17	3795	3856	3798	3798
q18	8875	8875	8743	8743
q19	1725	1711	1653	1653
q20	2343	2096	2135	2096
q21	8481	8548	8570	8548
q22	1039	947	990	947
Total cold run time: 84495 ms
Total hot run time: 79610 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 203493 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 365b586953a1e488c3ac6f90770d979f4c7f20b8, data reload: false

query1	950	425	376	376
query2	6547	2634	2639	2634
query3	6919	208	208	208
query4	21066	17969	18116	17969
query5	19738	6531	6545	6531
query6	300	218	241	218
query7	4161	302	312	302
query8	426	422	435	422
query9	3110	2695	2613	2613
query10	435	314	327	314
query11	11345	10774	10773	10773
query12	124	76	75	75
query13	5606	707	677	677
query14	18155	13245	13922	13245
query15	370	249	262	249
query16	6454	288	265	265
query17	1757	1440	881	881
query18	2301	416	417	416
query19	207	146	153	146
query20	82	78	80	78
query21	192	108	98	98
query22	5256	5007	5019	5007
query23	32812	31982	31944	31944
query24	6947	6529	6496	6496
query25	522	453	440	440
query26	532	163	164	163
query27	1855	301	295	295
query28	6199	2351	2313	2313
query29	2859	2799	2938	2799
query30	245	165	174	165
query31	905	724	785	724
query32	75	57	55	55
query33	402	248	266	248
query34	852	468	489	468
query35	1102	942	909	909
query36	1301	1168	1406	1168
query37	89	60	62	60
query38	3043	2914	2916	2914
query39	1388	1327	1323	1323
query40	200	94	96	94
query41	47	46	44	44
query42	88	84	85	84
query43	698	669	604	604
query44	1129	713	716	713
query45	247	237	243	237
query46	1237	964	964	964
query47	1851	1659	1645	1645
query48	1008	712	707	707
query49	627	389	388	388
query50	875	647	666	647
query51	4736	4621	4673	4621
query52	92	81	83	81
query53	451	329	324	324
query54	2632	2478	2485	2478
query55	97	80	83	80
query56	218	211	213	211
query57	1130	1148	1066	1066
query58	218	219	182	182
query59	4091	4337	4169	4169
query60	214	215	209	209
query61	102	97	99	97
query62	767	437	483	437
query63	489	339	344	339
query64	2586	1525	1440	1440
query65	3595	3534	3560	3534
query66	763	390	379	379
query67	15745	15458	16185	15458
query68	8569	634	651	634
query69	575	355	344	344
query70	1545	1287	1403	1287
query71	405	316	319	316
query72	6586	3487	3529	3487
query73	729	324	322	322
query74	6404	5851	5841	5841
query75	5130	3740	3612	3612
query76	5031	1160	1194	1160
query77	839	264	248	248
query78	12656	12141	11371	11371
query79	10555	627	638	627
query80	1717	394	404	394
query81	503	233	237	233
query82	1651	99	102	99
query83	177	134	134	134
query84	263	70	73	70
query85	1011	322	319	319
query86	345	287	324	287
query87	3186	2982	3062	2982
query88	5226	2308	2298	2298
query89	484	292	290	290
query90	1882	206	213	206
query91	180	156	144	144
query92	58	51	55	51
query93	7274	593	544	544
query94	756	217	207	207
query95	1100	1054	1053	1053
query96	643	334	323	323
query97	6459	6333	6421	6333
query98	186	168	171	168
query99	2990	896	899	896
Total cold run time: 317677 ms
Total hot run time: 203493 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.82% (8098/21411)
Line Coverage: 29.47% (66234/224741)
Region Coverage: 28.96% (34167/117973)
Branch Coverage: 24.85% (17557/70638)
Coverage Report: http://coverage.selectdb-in.cc/coverage/365b586953a1e488c3ac6f90770d979f4c7f20b8_365b586953a1e488c3ac6f90770d979f4c7f20b8/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 30.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 365b586953a1e488c3ac6f90770d979f4c7f20b8, data reload: false

query1	0.02	0.02	0.01
query2	0.07	0.02	0.03
query3	0.25	0.05	0.04
query4	1.80	0.07	0.06
query5	0.53	0.54	0.51
query6	1.24	0.60	0.60
query7	0.02	0.01	0.01
query8	0.03	0.02	0.03
query9	0.53	0.48	0.48
query10	0.54	0.54	0.54
query11	0.12	0.09	0.09
query12	0.12	0.09	0.10
query13	0.62	0.62	0.61
query14	0.80	0.78	0.80
query15	0.79	0.75	0.76
query16	0.35	0.36	0.39
query17	0.97	1.01	1.02
query18	0.25	0.23	0.25
query19	1.95	1.83	1.83
query20	0.01	0.01	0.01
query21	15.46	0.55	0.53
query22	1.84	1.78	1.50
query23	17.12	1.07	0.97
query24	6.12	1.50	1.06
query25	0.39	0.14	0.05
query26	0.65	0.16	0.14
query27	0.05	0.04	0.04
query28	6.50	0.77	0.75
query29	12.63	2.26	2.31
query30	0.63	0.52	0.54
query31	2.86	0.38	0.37
query32	3.38	0.50	0.50
query33	3.06	3.11	3.08
query34	15.25	4.81	4.78
query35	4.86	4.83	4.83
query36	1.05	1.01	1.03
query37	0.06	0.04	0.04
query38	0.04	0.02	0.02
query39	0.02	0.01	0.02
query40	0.16	0.15	0.14
query41	0.07	0.01	0.02
query42	0.02	0.01	0.02
query43	0.02	0.02	0.01
Total cold run time: 103.25 s
Total hot run time: 30.68 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 365b586953a1e488c3ac6f90770d979f4c7f20b8 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       22.0 seconds inserted 10000000 Rows, about 454K ops/s

@xiaokang xiaokang merged commit 397b2c6 into apache:branch-2.0 Jul 31, 2024
23 of 25 checks passed
@zclllyybb zclllyybb deleted the pick_tz_20 branch November 23, 2024 13:37
mongo360 pushed a commit to mongo360/doris that referenced this pull request Dec 11, 2024
…imezone parsing (apache#37062) (apache#37270)

pick apache#37062

1. revert apache#25097. we decide to rely
on OS. not maintain independent tzdata anymore to keep result
consistency
2. refactor timezone load. removed rwlock.

before:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)
```
now:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
```
3. now don't support timezone offset format string like 'UTC+8', like we
already said in
https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
4. support case-insensitive timezone parsing in nereids.
5. a bug when parse timezone using nereids. should check DST by input,
but wrongly by now before. now fixed.

doc pr: apache/doris-website#810
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants