Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature-wip](plsql)(step1) Support PL-SQL #30817

Merged
merged 11 commits into from
Feb 6, 2024

Conversation

xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Feb 4, 2024

Proposed changes

1. Motivation

PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL.

Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL.

Reference documentation:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html
antlr4: https://github.com/antlr/antlr4/blob/master/doc/options.md
https://github.com/Peefy/CompileDragonBook/blob/master/doc/NOTE_ANTLR.md
http://lab.antlr.org/

Similar pr: #20776

2. Implementation

Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol.

CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int)
          select count(*) from test;
          select count(*) into result from test where k = name;
END

declare result INT default = 0;
call A(‘xxx’, result);
print result;

image

  1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata.
  2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>.
  3. Execute Doris Statement
    • Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax.
    • Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult.
    • Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables.
    • Stored Programs compatible with Mysql protocol support multiple statements.
  4. Execute PL-SQL Statement
    • Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL.

3. TODO

  1. Support drop procedure.
  2. Create procedure only in PlSqlOperation.
  3. Doris Parser supports declare variable.
  4. Select Statement supports insert into variable.
  5. Parameters and fields have the same name.
  6. If Cursor exits halfway, will there be a memory leak?
  7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters?
  8. Supports complex types such as Map and Struct.
  9. Test syntax such as Package.
  10. Support UDF
  11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN,
    but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
  12. Built-in functions require a separate management.
  13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt.
  14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt.

4. Some questions

  1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error;
  2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste.
  3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented.
  4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution.
  5. The format of the result returned by Doris Statement is xxxx\n, xxxx\n, 2 rows affected (0.03 sec). PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results.

5. Some thoughts

The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process.
HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated.
image


1. 动机

PL-SQL(Stored procedure)是一组sql的集合,定义及使用方式类似于函数。支持条件判断、循环等控制语句,支持游标处理结果集,可用sql的方式编写业务逻辑。

Hive 使用 Hplsql 支持PL-SQL,同时很大程度兼容 Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2 等,我们基于 Hplsql 在 Doris 支持PL-SQL,实现对 Oracle、PostgreSQL 等数据库系统Stored procedure的兼容。

参考文档:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html

2. 实现

以下面的 case 为例,解释使用 Mysql 协议连接 Doris FE 执行存储过程的流程。

CREATE OR REPLACE PROCEDURE  A(IN name STRING, OUT result int)
         select count(*) from test;
         select count(*) into result from test where k = name;
END

declare result INT default = 0;
call A(‘xxx’, result);
print result;

[图片]

  1. Add procedure,将 Procedure Name 和 Source(原始SQL)持久化到 Doris FE 元数据中。
  2. Call procedure,提取 Call Stmt 中的实参 Value和 Procedure Name。使用 Procedure Name 在元数据中找到 Source,提取 Procedure 形参的 Name 和 Type,和实参 Value 匹配后组成完整变量<Name, Type, Value>。
  3. 执行 Doris Statement
    • 使用 Doris Logical Plan Builder 对 Source 中的 Doris Statement 语法解析,替换参数变量,去除 into variable子句,生成符合 Doris 语法的Plan Tree。
    • 使用 stmtExecutor 执行SQL,将查询结果集迭代器封装到 QueryResult 中。
    • 将查询结果输出到Mysql Channel,或写入Cursor、出参、变量中。
    • 兼容 Mysql 协议的 Stored Programs 支持多语句。
  4. 执行 PL-SQL Statement
    • 使用 Plsql Logical Plan Builder 对 Source 中的 PL-SQL Statement 语法解析并执行,包括 Loop、Cursor、IF、Declare 等,基本复用 HplSQL。

3. TODO

  1. Support drop procedure。
  2. Create procedure only in PlSqlOperation
  3. Doris Parser support declare variable。
  4. Select Statement support insert into variable。
  5. 参数和字段重名。
  6. 如果Cursor中途退出,是否会内存泄漏。
  7. 在语法解析 LogicalPlanBuilder 中使用 getOriginSql(ctx) 获得原始SQL,特殊字符是否有问题。
  8. 支持 Map、Struct 等复杂类型。
  9. 测试 Package 等语法。
  10. 支持 UDF
  11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN,
    but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
  12. 内置函数需要一个单独的管理。
  13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt.
  14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt.

4. 一些问题

  1. 不支持 JDBC 执行会返回结果的存储过程,只能把执行结果 Into 到一个变量,或写入表中,因为返回多个结果集时 JDBC 需使用 prepareCall 语句执行,否则返回结果的 Statemnt 执行 Finalize 时 Send EOF Packet 会报错;
  2. 使用 PL-SQL Cursor 可以同时打开多个 Query 的结果集迭代器,在 Doris BE 会一直缓存这些 Query 的中间状态(比如 HashTable)和查询结果,直到 Query 结果集迭代完成,如果 Cursor 长时间没有被使用,会导致大量内存浪费。
  3. plsql/Var.defineType() 中会通过 Mysql 类型名称字符串找到对应的 Plsql Var 类型,需要实现 Doris type 和 Plsql Var 之间的对应关系。
  4. 当前 PL-SQL Statement 都会Forward 到Master FE 创建和计算,这可能影响 Doris FE 上的其他服务,并受限于 Doris FE 的性能,考虑移到 Doris BE 上执行。
  5. Doris Statement返回结果的格式是 xxxx\n, xxxx\n, 2 rows affected (0.03 sec),PL-SQL 使用 Print 打印变量值是无格式的,JDBC 无法方便的拿到真正结果。

5. 一些思考

上面执行 Doris Statement 复用 Doris Logical Plan Builder 做语法解析,自顶向下解析成一个 Plan Tree,并调用 stmtExecutor 执行,PL-SQL 替换变量、去除 Into Variable 等操作耦合在 Doris 语法解析中,好处是较少的改动就可以兼容 Doris 语法,坏处是会侵入 Doris 语法解析流程。
HplSQL是独立于 Hive 单独做了一次语法解析,实现变量替换等操作,最终输出一个符合 Hive 语法的 SQL,下面是一个简单的语法解析流程,对 SQL 中 select、 where、expression、table name、join、agg、order 等语法的解析都要重新实现,好处是与原系统完全独立,但改动太复杂。
[图片]

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Feb 4, 2024

run buildall

bin/plsql.sh Outdated Show resolved Hide resolved
conf/plsql-site.xml Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Feb 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

github-actions bot commented Feb 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 37570 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ea1c4964f8432bc7b64fe0c2cbcb776de1159ba2, data reload: false

------ Round 1 ----------------------------------
q1	17785	4838	4650	4650
q2	2295	150	139	139
q3	12208	942	927	927
q4	4814	785	752	752
q5	7937	2970	2908	2908
q6	190	122	121	121
q7	1182	771	749	749
q8	9430	2083	2072	2072
q9	7645	6408	6405	6405
q10	8141	2443	2439	2439
q11	416	225	214	214
q12	800	279	281	279
q13	18023	3337	3306	3306
q14	282	245	254	245
q15	526	491	501	491
q16	474	401	427	401
q17	955	568	474	474
q18	6960	5976	5927	5927
q19	1558	1399	1375	1375
q20	659	368	336	336
q21	6851	3200	3052	3052
q22	801	317	308	308
Total cold run time: 109932 ms
Total hot run time: 37570 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4626	4458	4473	4458
q2	320	233	231	231
q3	3015	2901	2893	2893
q4	1823	1643	1704	1643
q5	5276	5265	5303	5265
q6	196	114	116	114
q7	2216	1800	1790	1790
q8	3153	3258	3265	3258
q9	8409	8321	8367	8321
q10	5888	3534	3559	3534
q11	548	460	454	454
q12	758	577	579	577
q13	7750	3102	3112	3102
q14	296	262	257	257
q15	540	503	491	491
q16	519	476	477	476
q17	1835	1649	1702	1649
q18	7962	7989	7575	7575
q19	6337	1517	1561	1517
q20	2163	1914	1925	1914
q21	4952	4538	4494	4494
q22	607	481	441	441
Total cold run time: 69189 ms
Total hot run time: 54454 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181306 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ea1c4964f8432bc7b64fe0c2cbcb776de1159ba2, data reload: false

query1	929	344	335	335
query2	6536	2001	1871	1871
query3	6708	206	208	206
query4	31559	22005	21939	21939
query5	4278	349	402	349
query6	264	169	178	169
query7	4603	283	274	274
query8	241	169	187	169
query9	9074	2331	2328	2328
query10	413	210	201	201
query11	18647	15361	15413	15361
query12	128	76	75	75
query13	1633	411	412	411
query14	9103	6853	7029	6853
query15	241	177	191	177
query16	8093	248	244	244
query17	1855	556	504	504
query18	2104	265	262	262
query19	242	141	141	141
query20	82	72	77	72
query21	193	132	126	126
query22	4901	4600	4422	4422
query23	30878	30136	30065	30065
query24	11767	2764	2751	2751
query25	598	349	349	349
query26	1751	143	144	143
query27	3069	303	304	303
query28	7795	1875	1867	1867
query29	969	612	590	590
query30	287	133	140	133
query31	916	699	682	682
query32	95	54	51	51
query33	728	225	218	218
query34	1128	440	461	440
query35	852	765	746	746
query36	1071	840	886	840
query37	138	58	57	57
query38	3196	3138	3070	3070
query39	1308	1253	1231	1231
query40	271	91	90	90
query41	35	35	35	35
query42	92	94	86	86
query43	532	473	489	473
query44	1078	702	710	702
query45	191	182	172	172
query46	1081	650	660	650
query47	1542	1502	1515	1502
query48	407	343	346	343
query49	1176	279	285	279
query50	779	370	370	370
query51	5209	5200	5157	5157
query52	94	91	80	80
query53	346	273	265	265
query54	268	216	211	211
query55	76	79	73	73
query56	216	191	202	191
query57	981	903	905	903
query58	213	174	173	173
query59	2495	2295	2124	2124
query60	250	211	205	205
query61	81	90	80	80
query62	657	354	357	354
query63	295	275	270	270
query64	6370	3728	3498	3498
query65	3279	3246	3222	3222
query66	1162	326	308	308
query67	14265	14280	14160	14160
query68	4331	539	534	534
query69	455	331	320	320
query70	1227	1178	1169	1169
query71	329	252	252	252
query72	5983	2817	2669	2669
query73	696	317	313	313
query74	6628	6248	6209	6209
query75	3057	2292	2337	2292
query76	2500	950	928	928
query77	335	236	227	227
query78	9170	8755	8599	8599
query79	3409	504	515	504
query80	1761	354	343	343
query81	523	194	198	194
query82	1108	79	83	79
query83	251	127	128	127
query84	292	77	81	77
query85	2225	332	322	322
query86	489	294	315	294
query87	3396	3201	3187	3187
query88	4694	2334	2327	2327
query89	466	366	375	366
query90	2067	165	165	165
query91	150	133	122	122
query92	58	44	45	44
query93	5057	481	489	481
query94	1293	177	173	173
query95	7971	7739	7762	7739
query96	612	275	273	273
query97	4215	4100	4115	4100
query98	211	201	191	191
query99	1075	687	685	685
Total cold run time: 297848 ms
Total hot run time: 181306 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.88 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ea1c4964f8432bc7b64fe0c2cbcb776de1159ba2, data reload: false

query1	0.04	0.04	0.03
query2	0.06	0.03	0.02
query3	0.22	0.06	0.06
query4	1.68	0.10	0.10
query5	0.53	0.51	0.52
query6	1.20	0.65	0.67
query7	0.01	0.01	0.02
query8	0.04	0.02	0.03
query9	0.53	0.50	0.50
query10	0.56	0.54	0.58
query11	0.12	0.09	0.08
query12	0.10	0.08	0.09
query13	0.60	0.60	0.60
query14	0.78	0.80	0.82
query15	0.79	0.79	0.76
query16	0.38	0.39	0.38
query17	0.98	1.04	1.00
query18	0.21	0.27	0.27
query19	1.82	1.74	1.81
query20	0.01	0.01	0.01
query21	15.41	0.58	0.57
query22	2.52	2.36	2.08
query23	17.46	0.74	0.76
query24	2.34	0.71	1.10
query25	0.31	0.33	0.08
query26	0.40	0.14	0.14
query27	0.05	0.05	0.05
query28	12.42	0.83	0.84
query29	12.47	3.10	3.11
query30	0.61	0.54	0.52
query31	2.79	0.36	0.35
query32	3.36	0.47	0.48
query33	3.22	3.25	3.24
query34	15.73	4.25	4.29
query35	4.35	4.37	4.30
query36	1.10	1.05	1.07
query37	0.06	0.04	0.05
query38	0.03	0.03	0.03
query39	0.02	0.01	0.02
query40	0.16	0.13	0.12
query41	0.07	0.02	0.01
query42	0.02	0.01	0.01
query43	0.02	0.02	0.02
Total cold run time: 105.58 s
Total hot run time: 30.88 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit ea1c4964f8432bc7b64fe0c2cbcb776de1159ba2 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       14.5 seconds inserted 10000000 Rows, about 689K ops/s

@xinyiZzz xinyiZzz force-pushed the 20240102_hplsql branch 2 times, most recently from 41a05ff to d0f982c Compare February 4, 2024 12:16
Copy link
Contributor

github-actions bot commented Feb 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

github-actions bot commented Feb 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

github-actions bot commented Feb 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@xinyiZzz xinyiZzz force-pushed the 20240102_hplsql branch 2 times, most recently from af77b5a to 0fcf45b Compare February 5, 2024 03:34
@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Feb 5, 2024

run buildall

Copy link
Contributor

github-actions bot commented Feb 5, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 36968 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c9e3f3649a92cc7f64eae6db4903854d3989c08a, data reload: false

------ Round 1 ----------------------------------
q1	17714	4715	4414	4414
q2	2051	148	143	143
q3	10584	976	914	914
q4	4649	708	733	708
q5	7711	2838	2771	2771
q6	181	122	120	120
q7	1160	743	727	727
q8	9381	1995	2022	1995
q9	7297	6394	6377	6377
q10	8120	2421	2388	2388
q11	425	207	198	198
q12	778	288	276	276
q13	18019	3289	3296	3289
q14	272	250	242	242
q15	522	486	489	486
q16	481	393	416	393
q17	947	571	526	526
q18	6852	5949	5959	5949
q19	1570	1369	1364	1364
q20	585	343	335	335
q21	6726	3047	3144	3047
q22	812	315	306	306
Total cold run time: 106837 ms
Total hot run time: 36968 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4530	4331	4399	4331
q2	330	235	253	235
q3	2996	2834	2849	2834
q4	1926	1675	1671	1671
q5	5175	5220	5189	5189
q6	190	115	114	114
q7	2132	1735	1735	1735
q8	3113	3251	3300	3251
q9	8357	8270	8239	8239
q10	5765	3590	3575	3575
q11	546	452	473	452
q12	739	560	556	556
q13	13621	3093	3094	3093
q14	290	265	262	262
q15	531	482	482	482
q16	499	471	476	471
q17	1830	1705	1716	1705
q18	8073	7680	7571	7571
q19	10254	1550	1527	1527
q20	2144	1924	1904	1904
q21	5072	4605	4625	4605
q22	592	489	491	489
Total cold run time: 78705 ms
Total hot run time: 54291 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181186 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c9e3f3649a92cc7f64eae6db4903854d3989c08a, data reload: false

query1	921	342	334	334
query2	6555	2017	1956	1956
query3	6698	213	212	212
query4	31775	22183	21999	21999
query5	4273	428	354	354
query6	260	166	164	164
query7	4604	285	287	285
query8	235	171	180	171
query9	9083	2278	2266	2266
query10	407	230	210	210
query11	18397	15320	15404	15320
query12	126	79	74	74
query13	1630	412	416	412
query14	9457	7233	6796	6796
query15	239	178	183	178
query16	8185	251	245	245
query17	2037	526	502	502
query18	2098	270	270	270
query19	364	139	134	134
query20	82	78	82	78
query21	192	132	135	132
query22	4887	4572	4499	4499
query23	31014	30048	30018	30018
query24	10308	2720	2756	2720
query25	533	348	329	329
query26	723	146	151	146
query27	2171	297	305	297
query28	5731	1871	1860	1860
query29	857	601	596	596
query30	280	133	141	133
query31	915	697	736	697
query32	89	54	54	54
query33	630	224	226	224
query34	819	456	477	456
query35	868	764	773	764
query36	1038	903	945	903
query37	94	59	56	56
query38	3218	3106	3105	3105
query39	1297	1275	1238	1238
query40	179	95	92	92
query41	41	36	32	32
query42	103	91	93	91
query43	499	488	478	478
query44	1061	694	707	694
query45	196	175	171	171
query46	1050	655	653	653
query47	1605	1431	1554	1431
query48	422	380	352	352
query49	1042	286	285	285
query50	773	384	389	384
query51	5255	5179	5179	5179
query52	93	90	80	80
query53	330	270	273	270
query54	283	221	223	221
query55	90	74	79	74
query56	220	198	209	198
query57	983	894	908	894
query58	199	182	188	182
query59	2478	2346	2357	2346
query60	241	216	214	214
query61	86	83	88	83
query62	625	357	323	323
query63	304	269	267	267
query64	4637	3148	3394	3148
query65	3266	3209	3203	3203
query66	830	323	313	313
query67	14389	14272	14140	14140
query68	4390	528	537	528
query69	473	319	323	319
query70	1276	1248	1249	1248
query71	310	245	271	245
query72	5960	2832	2672	2672
query73	690	330	326	326
query74	6605	6164	6132	6132
query75	3005	2382	2367	2367
query76	2585	896	961	896
query77	365	231	225	225
query78	9155	8691	8508	8508
query79	2706	518	489	489
query80	2079	347	331	331
query81	539	193	195	193
query82	858	83	79	79
query83	251	130	126	126
query84	294	78	80	78
query85	2162	327	319	319
query86	484	303	294	294
query87	3405	3211	3287	3211
query88	3855	2338	2304	2304
query89	456	348	372	348
query90	1975	164	165	164
query91	154	125	125	125
query92	52	41	43	41
query93	2774	461	450	450
query94	1345	172	173	172
query95	478	7942	7729	7729
query96	590	277	271	271
query97	4244	4109	4149	4109
query98	217	192	192	192
query99	1267	717	710	710
Total cold run time: 279369 ms
Total hot run time: 181186 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c9e3f3649a92cc7f64eae6db4903854d3989c08a, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.02	0.02
query3	0.23	0.06	0.06
query4	1.68	0.10	0.10
query5	0.53	0.51	0.51
query6	1.18	0.64	0.61
query7	0.01	0.01	0.01
query8	0.04	0.02	0.02
query9	0.57	0.51	0.50
query10	0.55	0.55	0.58
query11	0.11	0.09	0.09
query12	0.11	0.09	0.10
query13	0.60	0.61	0.60
query14	0.80	0.78	0.80
query15	0.80	0.77	0.77
query16	0.37	0.37	0.40
query17	1.01	0.99	1.03
query18	0.23	0.24	0.26
query19	1.82	1.73	1.80
query20	0.02	0.01	0.01
query21	15.40	0.57	0.55
query22	2.22	2.46	1.73
query23	17.52	0.94	0.77
query24	2.48	1.32	1.01
query25	0.37	0.12	0.13
query26	0.68	0.13	0.13
query27	0.04	0.04	0.06
query28	12.11	0.86	0.82
query29	12.52	3.18	3.25
query30	0.68	0.55	0.50
query31	2.79	0.35	0.35
query32	3.37	0.49	0.48
query33	3.20	3.23	3.25
query34	15.88	4.28	4.27
query35	4.28	4.25	4.27
query36	1.08	1.06	1.07
query37	0.06	0.05	0.06
query38	0.04	0.03	0.03
query39	0.03	0.02	0.01
query40	0.16	0.14	0.13
query41	0.07	0.02	0.02
query42	0.02	0.01	0.02
query43	0.02	0.02	0.02
Total cold run time: 105.77 s
Total hot run time: 30.84 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit c9e3f3649a92cc7f64eae6db4903854d3989c08a with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       13.7 seconds inserted 10000000 Rows, about 729K ops/s

Copy link
Contributor

github-actions bot commented Feb 6, 2024

clang-tidy review says "All clean, LGTM! 👍"

@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Feb 6, 2024

run buildall

Copy link
Contributor

github-actions bot commented Feb 6, 2024

clang-tidy review says "All clean, LGTM! 👍"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to support this feature only with nereid planner? no plans to support with legacy planner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only supports nereids planner.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, legacy planner will be supported later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No plans at the moment, do you need to use legacy planner?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering how grammers can be merged in case of legacy planner as sql statements are defined using cup file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it's easy. legacy planner only needs to forward the original SQL of create procedure stmt and call stmt to PlSqlOperation for execution.

However, many new features only supported in the nereids planner, and it seems unnecessary to support the legacy planner.

@doris-robot
Copy link

TPC-H: Total hot run time: 37117 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 28b4e0460f3b55b8fb30f7449698aacff0ad98f7, data reload: false

------ Round 1 ----------------------------------
q1	17701	4709	4458	4458
q2	2041	145	135	135
q3	10654	967	917	917
q4	4642	781	724	724
q5	7706	2858	2875	2858
q6	185	125	122	122
q7	1131	728	747	728
q8	9306	2009	2027	2009
q9	7303	6342	6378	6342
q10	8134	2453	2484	2453
q11	400	208	205	205
q12	774	287	282	282
q13	17998	3335	3295	3295
q14	273	242	248	242
q15	530	495	484	484
q16	480	420	423	420
q17	938	531	551	531
q18	6842	6067	5880	5880
q19	1578	1359	1344	1344
q20	595	335	315	315
q21	6956	3126	3067	3067
q22	789	313	306	306
Total cold run time: 106956 ms
Total hot run time: 37117 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4473	4489	4458	4458
q2	344	247	236	236
q3	2992	2853	2870	2853
q4	1862	1676	1582	1582
q5	5240	5278	5270	5270
q6	192	116	117	116
q7	2151	1806	1762	1762
q8	3109	3251	3259	3251
q9	8374	8266	8257	8257
q10	5790	3612	3574	3574
q11	553	475	459	459
q12	746	575	559	559
q13	12718	3087	3074	3074
q14	272	251	258	251
q15	548	489	491	489
q16	515	487	471	471
q17	1862	1731	1705	1705
q18	8048	7643	7510	7510
q19	10420	1540	1550	1540
q20	2136	1910	1889	1889
q21	4916	4675	4486	4486
q22	534	447	462	447
Total cold run time: 77795 ms
Total hot run time: 54239 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181505 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 28b4e0460f3b55b8fb30f7449698aacff0ad98f7, data reload: false

query1	920	335	330	330
query2	6567	2083	1899	1899
query3	6701	213	214	213
query4	31754	21988	21952	21952
query5	4312	364	354	354
query6	260	183	159	159
query7	4603	284	277	277
query8	245	184	180	180
query9	9141	2279	2261	2261
query10	407	216	224	216
query11	18398	15439	15394	15394
query12	126	76	75	75
query13	1622	416	421	416
query14	9078	6977	7028	6977
query15	270	179	179	179
query16	8140	262	243	243
query17	2026	539	494	494
query18	2093	262	264	262
query19	365	145	141	141
query20	84	74	79	74
query21	196	122	126	122
query22	5150	4780	4743	4743
query23	30950	30084	30014	30014
query24	10606	2736	2796	2736
query25	542	337	347	337
query26	705	143	144	143
query27	2202	303	309	303
query28	5885	1837	1831	1831
query29	875	618	605	605
query30	280	133	150	133
query31	921	687	736	687
query32	87	53	53	53
query33	614	221	216	216
query34	830	446	457	446
query35	850	752	759	752
query36	1056	899	960	899
query37	92	54	59	54
query38	3235	3073	3114	3073
query39	1302	1258	1247	1247
query40	183	94	93	93
query41	36	36	36	36
query42	111	92	91	91
query43	512	509	494	494
query44	1056	695	704	695
query45	189	181	172	172
query46	1058	642	658	642
query47	1611	1427	1511	1427
query48	426	350	360	350
query49	1038	286	280	280
query50	759	371	371	371
query51	5315	5116	5147	5116
query52	98	86	89	86
query53	343	267	269	267
query54	294	210	220	210
query55	84	73	76	73
query56	217	198	197	197
query57	964	913	902	902
query58	195	177	178	177
query59	2490	2333	2243	2243
query60	244	211	214	211
query61	82	82	80	80
query62	660	376	344	344
query63	285	272	274	272
query64	4902	3700	3442	3442
query65	3244	3218	3188	3188
query66	820	315	309	309
query67	14353	14431	14273	14273
query68	4414	545	535	535
query69	449	330	338	330
query70	1317	1193	1163	1163
query71	318	238	251	238
query72	6036	2835	2651	2651
query73	711	324	323	323
query74	6566	6197	6154	6154
query75	2995	2341	2355	2341
query76	2588	970	900	900
query77	342	227	227	227
query78	9251	8887	8449	8449
query79	2501	489	500	489
query80	2052	351	335	335
query81	547	194	197	194
query82	851	82	84	82
query83	250	122	124	122
query84	279	79	80	79
query85	2175	326	328	326
query86	484	315	286	286
query87	3406	3156	3247	3156
query88	3803	2306	2340	2306
query89	419	363	345	345
query90	1862	163	164	163
query91	155	132	119	119
query92	51	44	46	44
query93	3542	510	487	487
query94	1284	173	179	173
query95	467	7802	7724	7724
query96	613	269	278	269
query97	4217	4118	4106	4106
query98	212	195	198	195
query99	1171	709	663	663
Total cold run time: 280387 ms
Total hot run time: 181505 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.52 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 28b4e0460f3b55b8fb30f7449698aacff0ad98f7, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.02	0.02
query3	0.23	0.07	0.06
query4	1.67	0.10	0.10
query5	0.53	0.51	0.52
query6	1.18	0.64	0.65
query7	0.02	0.02	0.01
query8	0.03	0.02	0.03
query9	0.54	0.50	0.50
query10	0.56	0.54	0.55
query11	0.12	0.09	0.08
query12	0.11	0.09	0.09
query13	0.60	0.61	0.61
query14	0.78	0.80	0.78
query15	0.80	0.77	0.77
query16	0.38	0.39	0.40
query17	1.02	0.98	0.98
query18	0.25	0.27	0.22
query19	1.89	1.82	1.77
query20	0.02	0.01	0.01
query21	15.40	0.58	0.57
query22	2.48	2.26	2.17
query23	17.28	0.85	0.80
query24	2.46	1.21	1.76
query25	0.32	0.19	0.21
query26	0.64	0.13	0.14
query27	0.05	0.05	0.04
query28	10.92	0.85	0.86
query29	12.48	3.15	3.09
query30	0.58	0.54	0.54
query31	2.78	0.34	0.35
query32	3.35	0.47	0.48
query33	3.22	3.25	3.23
query34	15.75	4.23	4.26
query35	4.31	4.31	4.24
query36	1.10	1.04	1.04
query37	0.06	0.04	0.04
query38	0.04	0.03	0.02
query39	0.02	0.01	0.02
query40	0.16	0.14	0.14
query41	0.06	0.01	0.02
query42	0.02	0.02	0.01
query43	0.03	0.02	0.02
Total cold run time: 104.33 s
Total hot run time: 31.52 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 28b4e0460f3b55b8fb30f7449698aacff0ad98f7 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      33 seconds loaded 861443392 Bytes, about 24 MB/s
Insert into select:       13.1 seconds inserted 10000000 Rows, about 763K ops/s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 6, 2024
Copy link
Contributor

github-actions bot commented Feb 6, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Feb 6, 2024

PR approved by anyone and no changes requested.

@xinyiZzz xinyiZzz merged commit 2a2cced into apache:master Feb 6, 2024
33 of 42 checks passed
yiguolei pushed a commit that referenced this pull request Feb 6, 2024
# 1. Motivation
PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL.

Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL.

Reference documentation:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html

# 2. Implementation
Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol.
```
CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int)
          select count(*) from test;
          select count(*) into result from test where k = name;
END

declare result INT default = 0;
call A(‘xxx’, result);
print result;
```
![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd)

1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata.
2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>.
3. Execute Doris Statement
     - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax.
     - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult.
     - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables.
     - Stored Programs compatible with Mysql protocol support multiple statements.
4. Execute PL-SQL Statement
     - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL.

# 3. TODO
1. Support drop procedure.
2. Create procedure only in `PlSqlOperation`.
3. Doris Parser supports declare variable.
4. Select Statement supports insert into variable.
5. Parameters and fields have the same name.
6. If Cursor exits halfway, will there be a memory leak?
7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters?
8. Supports complex types such as Map and Struct.
9. Test syntax such as Package.
10. Support UDF
11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN,
but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
12. Built-in functions require a separate management.
13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt.
14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt.

# 4. Some questions
1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error;
2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste.
3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented.
4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution.
5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results.

# 5. Some thoughts
The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process.
HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated.
![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)
yiguolei pushed a commit that referenced this pull request Feb 16, 2024
# 1. Motivation
PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL.

Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL.

Reference documentation:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html

# 2. Implementation
Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol.
```
CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int)
          select count(*) from test;
          select count(*) into result from test where k = name;
END

declare result INT default = 0;
call A(‘xxx’, result);
print result;
```
![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd)

1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata.
2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>.
3. Execute Doris Statement
     - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax.
     - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult.
     - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables.
     - Stored Programs compatible with Mysql protocol support multiple statements.
4. Execute PL-SQL Statement
     - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL.

# 3. TODO
1. Support drop procedure.
2. Create procedure only in `PlSqlOperation`.
3. Doris Parser supports declare variable.
4. Select Statement supports insert into variable.
5. Parameters and fields have the same name.
6. If Cursor exits halfway, will there be a memory leak?
7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters?
8. Supports complex types such as Map and Struct.
9. Test syntax such as Package.
10. Support UDF
11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN,
but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
12. Built-in functions require a separate management.
13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt.
14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt.

# 4. Some questions
1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error;
2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste.
3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented.
4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution.
5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results.

# 5. Some thoughts
The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process.
HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated.
![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)
mymeiyi pushed a commit to mymeiyi/doris that referenced this pull request Feb 19, 2024
# 1. Motivation
PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL.

Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL.

Reference documentation:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html

# 2. Implementation
Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol.
```
CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int)
          select count(*) from test;
          select count(*) into result from test where k = name;
END

declare result INT default = 0;
call A(‘xxx’, result);
print result;
```
![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd)

1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata.
2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>.
3. Execute Doris Statement
     - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax.
     - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult.
     - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables.
     - Stored Programs compatible with Mysql protocol support multiple statements.
4. Execute PL-SQL Statement
     - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL.

# 3. TODO
1. Support drop procedure.
2. Create procedure only in `PlSqlOperation`.
3. Doris Parser supports declare variable.
4. Select Statement supports insert into variable.
5. Parameters and fields have the same name.
6. If Cursor exits halfway, will there be a memory leak?
7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters?
8. Supports complex types such as Map and Struct.
9. Test syntax such as Package.
10. Support UDF
11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN,
but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
12. Built-in functions require a separate management.
13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt.
14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt.

# 4. Some questions
1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error;
2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste.
3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented.
4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution.
5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results.

# 5. Some thoughts
The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process.
HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated.
![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)
@Zhao-1111
Copy link

打扰下,看到todo里plsql中的sql参数和字段重名这个问题,后续是打算做成哪个优先?像oracle是字段优先,这个也是做成一致吗?

@xinyiZzz
Copy link
Contributor Author

打扰下,看到todo里plsql中的sql参数和字段重名这个问题,后续是打算做成哪个优先?像oracle是字段优先,这个也是做成一致吗?

@Zhao-1111 目前是sql参数优先,目标是兼容oracle的行为,近期会尝试修一波

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. meta-change reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants