You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Native UDF is written in C++ and has the best performance, but it is more difficult to write and debug, and may be limited by some system library versions(such as libc) that may not be compatible after upgrade.
Remote UDF solves the language problem very well, and in theory, UDF logic can be written in any language. But the disadvantage is that users need to implement their own high-performance UDF Service, and the efficiency is not good because of the RPC problem.
Java UDF is the main user-defined function solution in Doris. which reduces the migration cost of big data ecological users(Some big data ecologies such as Hive, Spark, etc. already exist a large number of ready-made UDF). The implementation of Java UDF is to start the JVM and call the relevant UDF logic from the BE side through JNI.
Motivation
Support for Wasm UDF is motivated by the following points:
Embeddable in [multiple programming languages. Users can write functional logic in multiple programming languages (Rust, C/C++, Golang, Java, TypeScript, Haskell)
Secure by default. No file, network, or environment access, unless explicitly enabled.
High-performance. WebAssembly engine compiles bytecode into machine-native machine code for execution instead of interpreting it, which greatly improves execution efficiency and can achieve an efficiency close to native execution.
How to implement
Need to start the Wasm Runtime and call the relevant UDF logic from the BE side by way of wasmtime-c-api.
Scheduling
I have a preliminary plan to support Wasm UDF in Doris.
Phase I, I'll complete Create-Function-Statement for Wasm UDF.
Phase II, I'll add wasmtime lib and wasmtime-c-api to BE, and implement basic data type based on Wasm Basic ABI Type.
Search before asking
Description
Backgroup
Doris currently supports both Native UDF, Remote UDF and Java UDF for user-defined functions.
Native UDF is written in C++ and has the best performance, but it is more difficult to write and debug, and may be limited by some system library versions(such as libc) that may not be compatible after upgrade.
Remote UDF solves the language problem very well, and in theory, UDF logic can be written in any language. But the disadvantage is that users need to implement their own high-performance UDF Service, and the efficiency is not good because of the RPC problem.
Java UDF is the main user-defined function solution in Doris. which reduces the migration cost of big data ecological users(Some big data ecologies such as Hive, Spark, etc. already exist a large number of ready-made UDF). The implementation of Java UDF is to start the JVM and call the relevant UDF logic from the BE side through JNI.
Motivation
Support for Wasm UDF is motivated by the following points:
How to implement
Need to start the Wasm Runtime and call the relevant UDF logic from the BE side by way of wasmtime-c-api.
Scheduling
I have a preliminary plan to support Wasm UDF in Doris.
Phase I, I'll complete Create-Function-Statement for Wasm UDF.
Phase II, I'll add wasmtime lib and wasmtime-c-api to BE, and implement basic data type based on Wasm Basic ABI Type.
Use case
No response
Related issues
Remote UDF: #7519
Lua UDF: #5979
Java UDF: #8389
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: