Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement udf server in databend #12729

Merged
merged 31 commits into from
Sep 17, 2023
Merged

feat: implement udf server in databend #12729

merged 31 commits into from
Sep 17, 2023

Conversation

gitccl
Copy link
Contributor

@gitccl gitccl commented Sep 6, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

This pr merges previous #12168 , #12417 and #12463 .
In this pr, we also change the syntax of create function command to make it similar to snowflake's syntax.

CREATE FUNCTION [IF NOT EXISTS] <udf_name> (<arg_type>, ...) RETURNS <return_type> LANGUAGE <language> HANDLER=<handler> ADDRESS=<udf_server_address>

This change is Reviewable

gitccl and others added 7 commits July 25, 2023 22:49
* feat: implement create and alter udf server

* feat: use protobuf to serialize these UDF types

* fix: fix typo

* refactor: change create udf server syntax

* chore: remove udf from proto_conv

* fix: fix fmt
* ci: add udfserver sqllogictest

* Update .github/workflows/reuse.linux.yml

Co-authored-by: everpcpc <everpcpc@users.noreply.github.com>

---------

Co-authored-by: everpcpc <everpcpc@users.noreply.github.com>
@gitccl gitccl requested a review from drmingdrmer as a code owner September 6, 2023 13:58
@vercel
Copy link

vercel bot commented Sep 6, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
databend ⬜️ Ignored (Inspect) Visit Preview Sep 17, 2023 3:02am

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Sep 6, 2023
Copy link
Member

@drmingdrmer drmingdrmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 71 files at r1, 8 of 57 files at r2, all commit messages.
Reviewable status: 11 of 72 files reviewed, all discussions resolved

@xudong963 xudong963 self-requested a review September 7, 2023 02:39
@xudong963
Copy link
Member

I'll review the PR today

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work @gitccl

Currently, our Python UDF syntax is similar to snowflake:

Snowflake:

create or replace function addone(i int)
returns int
language python
runtime_version = '3.8'
handler = 'addone_py'
as
$$
def addone_py(i):
  return i+1
$$;

Databend:

CREATE FUNCTION split_and_join (VARCHAR, VARCHAR, VARCHAR) RETURNS VARCHAR LANGUAGE python HANDLER = 'split_and_join' ADDRESS = 'http://0.0.0.0:8815';

@BohuTANG @sundy-li Any thoughts?

.github/workflows/reuse.linux.yml Show resolved Hide resolved
@xudong963
Copy link
Member

Given the security concerns we discussed internally, maybe we still need to add a flag to make it a feature, so we can avoid our cloud attacking. cc @Xuanwo @flaneur2020 @BohuTANG

@xudong963
Copy link
Member

cc @soyeric128 After the PR merged, need to add a doc. You can refer the README: https://github.com/datafuselabs/databend/pull/12729/files#diff-bf3534b9c183942327d5e8579cec90a1634bc242312a7cff036d80c035d11d98

@xudong963
Copy link
Member

Seems the ee test isn't related to the PR

image

@BohuTANG
Copy link
Member

BohuTANG commented Sep 7, 2023

Given the security concerns we discussed internally, maybe we still need to add a flag to make it a feature, so we can avoid our cloud attacking. cc @Xuanwo @flaneur2020 @BohuTANG

Add a new query config is ok, then we can enable it for some users. @flaneur2020

@gitccl
Copy link
Contributor Author

gitccl commented Sep 7, 2023

It seems that ci is stuck.

@Xuanwo
Copy link
Member

Xuanwo commented Sep 7, 2023

It seems that ci is stuck.

Please fix the conflicts first to make CI running, thanks!

@xudong963
Copy link
Member

image

A flaky test cc @TCeason

@TCeason
Copy link
Collaborator

TCeason commented Sep 8, 2023

image

A flaky test cc @TCeason

Get

@xudong963
Copy link
Member

Databend Query start failure, cause: Code: 1002, Text = TOML parse error at line 41, column 1
553
   |
554
41 | enable_udf_server = true
555
   | ^^^^^^^^^^^^^^^^^
556
unknown field `enable_udf_server`, expected one of `tenant_id`, `cluster_id`, `num_cpus`, `mysql_handler_host`, `mysql_handler_port`, `mysql_handler_tcp_keepalive_timeout_secs`, `max_active_sessions`, `max_server_memory_usage`, `max_memory_limit_enabled`, `clickhouse_handler_host`, `clickhouse_handler_port`, `clickhouse_http_handler_host`, `clickhouse_http_handler_port`, `http_handler_host`, `http_handler_port`, `http_handler_result_timeout_secs`, `flight_sql_handler_host`, `flight_sql_handler_port`, `flight_api_address`, `admin_api_address`, `metric_api_address`, `http_handler_tls_server_cert`, `http_handler_tls_server_key`, `http_handler_tls_server_root_ca_cert`, `flight_sql_tls_server_cert`, `flight_sql_tls_server_key`, `api_tls_server_cert`, `api_tls_server_key`, `api_tls_server_root_ca_cert`, `rpc_tls_server_cert`, `rpc_tls_server_key`, `rpc_tls_query_server_root_ca_cert`, `rpc_tls_query_service_domain_name`, `table_engine_memory_enabled`, `wait_timeout_mills`, `max_query_log_size`, `management_mode`, `jwt_key_file`, `jwt_key_files`, `default_storage_format`, `default_compression`, `users`, `share_endpoint_address`, `share_endpoint_auth_token_file`, `quota`, `internal_enable_sandbox_tenant`, `internal_merge_on_read_mutation`, `table_disk_cache_mb_size`, `table_meta_cache_enabled`, `table_cache_block_meta_count`, `table_memory_cache_mb_size`, `table_disk_cache_root`, `table_cache_snapshot_count`, `table_cache_statistic_count`, `table_cache_segment_count`, `table_cache_bloom_index_meta_count`, `table_cache_bloom_index_filter_count`, `table_cache_bloom_index_data_bytes`, `disable_system_table_load`, `openai_api_key`
557
, source: None
558
TOML parse error at line 41, column 1

cc @gitccl

@gitccl
Copy link
Contributor Author

gitccl commented Sep 14, 2023

Databend Query start failure, cause: Code: 1002, Text = TOML parse error at line 41, column 1
553
   |
554
41 | enable_udf_server = true
555
   | ^^^^^^^^^^^^^^^^^
556
unknown field `enable_udf_server`, expected one of `tenant_id`, `cluster_id`, `num_cpus`, `mysql_handler_host`, `mysql_handler_port`, `mysql_handler_tcp_keepalive_timeout_secs`, `max_active_sessions`, `max_server_memory_usage`, `max_memory_limit_enabled`, `clickhouse_handler_host`, `clickhouse_handler_port`, `clickhouse_http_handler_host`, `clickhouse_http_handler_port`, `http_handler_host`, `http_handler_port`, `http_handler_result_timeout_secs`, `flight_sql_handler_host`, `flight_sql_handler_port`, `flight_api_address`, `admin_api_address`, `metric_api_address`, `http_handler_tls_server_cert`, `http_handler_tls_server_key`, `http_handler_tls_server_root_ca_cert`, `flight_sql_tls_server_cert`, `flight_sql_tls_server_key`, `api_tls_server_cert`, `api_tls_server_key`, `api_tls_server_root_ca_cert`, `rpc_tls_server_cert`, `rpc_tls_server_key`, `rpc_tls_query_server_root_ca_cert`, `rpc_tls_query_service_domain_name`, `table_engine_memory_enabled`, `wait_timeout_mills`, `max_query_log_size`, `management_mode`, `jwt_key_file`, `jwt_key_files`, `default_storage_format`, `default_compression`, `users`, `share_endpoint_address`, `share_endpoint_auth_token_file`, `quota`, `internal_enable_sandbox_tenant`, `internal_merge_on_read_mutation`, `table_disk_cache_mb_size`, `table_meta_cache_enabled`, `table_cache_block_meta_count`, `table_memory_cache_mb_size`, `table_disk_cache_root`, `table_cache_snapshot_count`, `table_cache_statistic_count`, `table_cache_segment_count`, `table_cache_bloom_index_meta_count`, `table_cache_bloom_index_filter_count`, `table_cache_bloom_index_data_bytes`, `disable_system_table_load`, `openai_api_key`
557
, source: None
558
TOML parse error at line 41, column 1

cc @gitccl

This problem is caused by the 1.0.56 version of databend-query loading the latest configuration, but version 1.0.56 does not support the enable_udf_server configuration. Is it expected to use the old version databend-query to load the new version configuration?

@sundy-li
Copy link
Member

Is it expected to use the old version databend-query to load the new version configuration?

Currently, it's forbidden. Maybe we should fetch the old config file from git to start the old version process.

@sundy-li
Copy link
Member

@sundy-li sundy-li merged commit f06d9b3 into databendlabs:main Sep 17, 2023
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
Co-authored-by: everpcpc <everpcpc@users.noreply.github.com>
Co-authored-by: xudong.w <wxd963996380@gmail.com>
Co-authored-by: sundy-li <543950155@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking: Implement Databend UDF Server
10 participants