Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Wasm UDF #20267

Open
2 of 3 tasks
wzymumon opened this issue May 31, 2023 · 1 comment
Open
2 of 3 tasks

[Feature] Support Wasm UDF #20267

wzymumon opened this issue May 31, 2023 · 1 comment
Labels
diffculty/hard doris-future kind/feature Categorizes issue or PR as related to a new feature.

Comments

@wzymumon
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

Description

Backgroup

Doris currently supports both Native UDF, Remote UDF and Java UDF for user-defined functions.

Native UDF is written in C++ and has the best performance, but it is more difficult to write and debug, and may be limited by some system library versions(such as libc) that may not be compatible after upgrade.

Remote UDF solves the language problem very well, and in theory, UDF logic can be written in any language. But the disadvantage is that users need to implement their own high-performance UDF Service, and the efficiency is not good because of the RPC problem.

Java UDF is the main user-defined function solution in Doris. which reduces the migration cost of big data ecological users(Some big data ecologies such as Hive, Spark, etc. already exist a large number of ready-made UDF). The implementation of Java UDF is to start the JVM and call the relevant UDF logic from the BE side through JNI.

Motivation

Support for Wasm UDF is motivated by the following points:

  1. Embeddable in [multiple programming languages. Users can write functional logic in multiple programming languages ​​(Rust, C/C++, Golang, Java, TypeScript, Haskell)
  2. Secure by default. No file, network, or environment access, unless explicitly enabled.
  3. High-performance. WebAssembly engine compiles bytecode into machine-native machine code for execution instead of interpreting it, which greatly improves execution efficiency and can achieve an efficiency close to native execution.

How to implement

Need to start the Wasm Runtime and call the relevant UDF logic from the BE side by way of wasmtime-c-api.

Scheduling

I have a preliminary plan to support Wasm UDF in Doris.

Phase I, I'll complete Create-Function-Statement for Wasm UDF.

Phase II, I'll add wasmtime lib and wasmtime-c-api to BE, and implement basic data type based on Wasm Basic ABI Type.

Use case

No response

Related issues

Remote UDF: #7519
Lua UDF: #5979
Java UDF: #8389

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@taptao
Copy link
Contributor

taptao commented Sep 7, 2023

I'm doing this, please assign it to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diffculty/hard doris-future kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants