-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] Support udf through RPC #7519
Conversation
Let me see |
Whether RPC addresses support load balance? |
_brpc_stub_cache split _internal_client_cache& _function_client_cache Whether compatibility problems exist |
be/src/exec/tablet_sink.cpp
Outdated
_node_info.brpc_port)) { | ||
ExecEnv::GetInstance()->brpc_stub_cache()->erase(_open_closure->cntl.remote_side()); | ||
} | ||
ExecEnv::GetInstance()->brpc_internal_client_cache()->erase( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it no longer necessary to judge unavailabe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not cost so much when create a new stub
ptype->set_type(PParameterType::NULL_TYPE); | ||
continue; | ||
} | ||
switch (_children[i]->type().type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether need to support TYPE_STRING
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I will add later
bd0c3bc
to
c35abc1
Compare
d6ad395
to
f1cd42a
Compare
|
||
if (_client == nullptr) { | ||
return Status::InternalError( | ||
fmt::format("rpc env init error: {}/{}", _fn.hdfs_location, _rpc_function_symbol)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hdfs_location
?
be/src/exprs/rpc_fn_call.cpp
Outdated
Status st = _eval_children(context, row, &response); | ||
if (!st.ok() || response.status().status_code() != 0 || | ||
(response.result().has_null() && response.result().null_map(0))) { | ||
res_val.is_null = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to return null if RPC failed?
PFunctionCallResponse response; | ||
request.set_function_name(_symbol); | ||
int64_t name_hash = 0; | ||
murmur_hash3_x64_64(_symbol.c_str(), _symbol.size(), 21217891, &name_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does murmur_hash3_x64_64
work for ARM64?
@@ -63,3 +67,150 @@ message PUniqueId { | |||
required int64 lo = 2; | |||
}; | |||
|
|||
message PGenericType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a same datatype define in data.proto
, how about unify them?
I will do this in PR #7939
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, but I think the datatype define in data.proto
can not describe complex types very well, PGenericType
+ PList
+ PMap
+ PField
+ PStruct
+ PDecimal
may be better. And BTW, datatype should not to complex, it should be used more easier between languages
452dd4c
to
8754929
Compare
…ocol. This brings several benefits: 1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf 2. UDF is decoupled from Doris, udf will not be doris coredump together, udf computing resources are separated from doris, and doris services are not affected But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large. Create function like ``` CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES ( "SYMBOL"="add_int", "OBJECT_FILE"="127.0.0.1:9999", "TYPE"="RPC" ); ``` function service need to implements `check_fn` and `fn_call` methods Note: THIS IS AN EXPERIMENTAL FEATRUE, THE INTERFACE AND DATA STRUCTURE MAY CHANGED IN FUTURE !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
Document need to be added later |
Proposed changes
Support UDF is implemented through GRPC protocol. This brings several benefits:
But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large.
Create function like
function service need to implements
check_fn
andfn_call
methods#7502 #7578
the rpc udf call is a little slower than the native function call in vectorized mode:
But it will much slow in non-vectorized mode, and if using rpc udf in vectorized mode the query time is equal to native function call in non-vectorized mode
Types of changes
What types of changes does your code introduce to Doris?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...