-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: Unify Expr::ScalarFunction
and Expr::ScalarUDF
, introduce unresolved functions by name
#8258
Conversation
/// ScalarFunction expression invokes a built-in scalar function | ||
#[derive(Clone, PartialEq, Eq, Hash, Debug)] | ||
pub struct ScalarFunction { | ||
/// The function | ||
pub fun: built_in_function::BuiltinScalarFunction, | ||
pub func_def: ScalarFunctionDefinition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the major change
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the major change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @2010YOUY01 -- I think this looks great 🦾 . I had a few small comments, but otherwise I think it is ready to go.
I want to leave this PR open for a few days to allow people a chance to comment
then we can add a AnalyzerRule to resolve the function.
This PR only include refactor preparation for UDF Expr support.
While waiting, it might help to create another draft PR on top of this one that implements name resolution so people can see how that would work
Some other todos (for myself):
- File a ticket to do the same thing for AggregateUDF and WindowUDF (to maintain a consistent interface)
ScalarFunction::new_udf(fun, transform_vec(args, &mut transform)?), | ||
), | ||
ScalarFunctionDefinition::Name(_) => { | ||
return internal_err!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is probably ok to support tree walks on unresolved function names. I don't see any reason to throw an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe it's necessary, but I'm not sure how it will be implemented right now, so I plan to leave this change to next PR which resolves function Expr
name
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
@alamb Thank you! Review comments are addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @2010YOUY01 -- I had one comment on name()
signature but I also think we can do that as a follow on PR. As a reminder I want to leave this open a few more days to allow for comment prior to merging
datafusion/expr/src/expr.rs
Outdated
/// List of expressions to feed to the functions as arguments | ||
pub args: Vec<Expr>, | ||
} | ||
|
||
impl ScalarFunctionDefinition { | ||
/// Function's name for display | ||
pub fn name(&self) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to allow the callsites to decide if they needed to make the string copy -- so passing back &str
I think would make for a better API:
pub fn name(&self) -> String { | |
pub fn name(&self) -> &str { |
datafusion/expr/src/expr.rs
Outdated
Expr::ScalarUDF(ScalarUDF { fun, args }) => { | ||
fmt_function(f, fun.name(), false, args, true) | ||
Expr::ScalarFunction(ScalarFunction { func_def, args }) => { | ||
fmt_function(f, &func_def.name(), false, args, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, this callsite doesn't need an owned String
, &str
would work well
Expr::ScalarFunction
and Expr::ScalarUDF
Expr::ScalarFunction
and Expr::ScalarUDF
, introduce unresolved functions by name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great -- thanks again @2010YOUY01
Which issue does this PR close?
Part of #8157
Rationale for this change
See #8157 for the rationale:
Creation of
Expr
s forBuiltinScalarFunctions
can be done statelesslyHowever, we can't do that for ScalarUDFs because they are registered within
SessionState
, it's only possible to doFor the ongoing plan to migrate functions from
BuiltinScalarFunction
toScalarUDF
based implementation (ref #8045 ), we would like theScalarUDF
based functions to continue to support the originalExpr
construction API, so theExpr
struct should have a variant for unresolved name, and then we can add aAnalyzerRule
to resolve the function.This PR only include refactor preparation for UDF
Expr
support.What changes are included in this PR?
Expr::ScalarFunction
Expr::ScalarUDF
This way we can support unresolved function name variant.
Also, merge the execution path for
BuiltinScalarFunction
/ScalarUDF
, to make future work towards #8045 a bit easierAre these changes tested?
Covered by existing tests.
Are there any user-facing changes?
Yes, see comments for API changes