-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature-wip](plsql)(step1) Support PL-SQL #30817
Conversation
Thank you for your contribution to Apache Doris. |
477543b
to
ea1c496
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
fe/fe-common/src/main/java/org/apache/doris/catalog/MysqlColType.java
Outdated
Show resolved
Hide resolved
clang-tidy review says "All clean, LGTM! 👍" |
fe/fe-core/src/main/java/org/apache/doris/qe/AutoCloseConnectContext.java
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/qe/ConnectContext.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/persist/meta/PersistMetaModules.java
Show resolved
Hide resolved
TPC-H: Total hot run time: 37570 ms
|
fe/fe-core/src/main/java/org/apache/doris/ldap/LdapAuthenticate.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/call/CallFunc.java
Outdated
Show resolved
Hide resolved
TPC-DS: Total hot run time: 181306 ms
|
ClickBench: Total hot run time: 30.88 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
41a05ff
to
d0f982c
Compare
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
d0f982c
to
9d07864
Compare
clang-tidy review says "All clean, LGTM! 👍" |
af77b5a
to
0fcf45b
Compare
1c5a445
to
c9e3f36
Compare
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 36968 ms
|
TPC-DS: Total hot run time: 181186 ms
|
ClickBench: Total hot run time: 30.84 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to support this feature only with nereid planner? no plans to support with legacy planner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only supports nereids planner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, legacy planner will be supported later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No plans at the moment, do you need to use legacy planner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering how grammers can be merged in case of legacy planner as sql statements are defined using cup file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, it's easy. legacy planner only needs to forward the original SQL of create procedure stmt
and call stmt
to PlSqlOperation
for execution.
However, many new features only supported in the nereids planner, and it seems unnecessary to support the legacy planner.
TPC-H: Total hot run time: 37117 ms
|
TPC-DS: Total hot run time: 181505 ms
|
ClickBench: Total hot run time: 31.52 s
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
# 1. Motivation PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL. Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL. Reference documentation: Hive: http://mail.hplsql.org Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715 Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html # 2. Implementation Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol. ``` CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int) select count(*) from test; select count(*) into result from test where k = name; END declare result INT default = 0; call A(‘xxx’, result); print result; ``` ![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd) 1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata. 2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>. 3. Execute Doris Statement - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax. - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult. - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables. - Stored Programs compatible with Mysql protocol support multiple statements. 4. Execute PL-SQL Statement - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL. # 3. TODO 1. Support drop procedure. 2. Create procedure only in `PlSqlOperation`. 3. Doris Parser supports declare variable. 4. Select Statement supports insert into variable. 5. Parameters and fields have the same name. 6. If Cursor exits halfway, will there be a memory leak? 7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters? 8. Supports complex types such as Map and Struct. 9. Test syntax such as Package. 10. Support UDF 11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN, but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later. 12. Built-in functions require a separate management. 13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt. 14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt. # 4. Some questions 1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error; 2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste. 3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented. 4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution. 5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results. # 5. Some thoughts The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process. HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated. ![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)
# 1. Motivation PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL. Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL. Reference documentation: Hive: http://mail.hplsql.org Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715 Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html # 2. Implementation Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol. ``` CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int) select count(*) from test; select count(*) into result from test where k = name; END declare result INT default = 0; call A(‘xxx’, result); print result; ``` ![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd) 1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata. 2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>. 3. Execute Doris Statement - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax. - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult. - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables. - Stored Programs compatible with Mysql protocol support multiple statements. 4. Execute PL-SQL Statement - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL. # 3. TODO 1. Support drop procedure. 2. Create procedure only in `PlSqlOperation`. 3. Doris Parser supports declare variable. 4. Select Statement supports insert into variable. 5. Parameters and fields have the same name. 6. If Cursor exits halfway, will there be a memory leak? 7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters? 8. Supports complex types such as Map and Struct. 9. Test syntax such as Package. 10. Support UDF 11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN, but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later. 12. Built-in functions require a separate management. 13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt. 14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt. # 4. Some questions 1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error; 2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste. 3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented. 4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution. 5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results. # 5. Some thoughts The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process. HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated. ![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)
# 1. Motivation PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL. Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL. Reference documentation: Hive: http://mail.hplsql.org Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715 Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html # 2. Implementation Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol. ``` CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int) select count(*) from test; select count(*) into result from test where k = name; END declare result INT default = 0; call A(‘xxx’, result); print result; ``` ![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd) 1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata. 2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>. 3. Execute Doris Statement - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax. - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult. - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables. - Stored Programs compatible with Mysql protocol support multiple statements. 4. Execute PL-SQL Statement - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL. # 3. TODO 1. Support drop procedure. 2. Create procedure only in `PlSqlOperation`. 3. Doris Parser supports declare variable. 4. Select Statement supports insert into variable. 5. Parameters and fields have the same name. 6. If Cursor exits halfway, will there be a memory leak? 7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters? 8. Supports complex types such as Map and Struct. 9. Test syntax such as Package. 10. Support UDF 11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN, but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later. 12. Built-in functions require a separate management. 13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt. 14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt. # 4. Some questions 1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error; 2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste. 3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented. 4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution. 5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results. # 5. Some thoughts The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process. HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated. ![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)
打扰下,看到todo里plsql中的sql参数和字段重名这个问题,后续是打算做成哪个优先?像oracle是字段优先,这个也是做成一致吗? |
@Zhao-1111 目前是sql参数优先,目标是兼容oracle的行为,近期会尝试修一波 |
Proposed changes
1. Motivation
PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL.
Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL.
Reference documentation:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html
antlr4: https://github.com/antlr/antlr4/blob/master/doc/options.md
https://github.com/Peefy/CompileDragonBook/blob/master/doc/NOTE_ANTLR.md
http://lab.antlr.org/
Similar pr: #20776
2. Implementation
Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol.
3. TODO
PlSqlOperation
.but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
4. Some questions
xxxx\n, xxxx\n, 2 rows affected (0.03 sec)
. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results.5. Some thoughts
The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process.
HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated.
1. 动机
PL-SQL(Stored procedure)是一组sql的集合,定义及使用方式类似于函数。支持条件判断、循环等控制语句,支持游标处理结果集,可用sql的方式编写业务逻辑。
Hive 使用 Hplsql 支持PL-SQL,同时很大程度兼容 Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2 等,我们基于 Hplsql 在 Doris 支持PL-SQL,实现对 Oracle、PostgreSQL 等数据库系统Stored procedure的兼容。
参考文档:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html
2. 实现
以下面的 case 为例,解释使用 Mysql 协议连接 Doris FE 执行存储过程的流程。
[图片]
3. TODO
PlSqlOperation
。but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
4. 一些问题
xxxx\n, xxxx\n, 2 rows affected (0.03 sec)
,PL-SQL 使用 Print 打印变量值是无格式的,JDBC 无法方便的拿到真正结果。5. 一些思考
上面执行 Doris Statement 复用 Doris Logical Plan Builder 做语法解析,自顶向下解析成一个 Plan Tree,并调用 stmtExecutor 执行,PL-SQL 替换变量、去除 Into Variable 等操作耦合在 Doris 语法解析中,好处是较少的改动就可以兼容 Doris 语法,坏处是会侵入 Doris 语法解析流程。
HplSQL是独立于 Hive 单独做了一次语法解析,实现变量替换等操作,最终输出一个符合 Hive 语法的 SQL,下面是一个简单的语法解析流程,对 SQL 中 select、 where、expression、table name、join、agg、order 等语法的解析都要重新实现,好处是与原系统完全独立,但改动太复杂。
[图片]