-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move make_array
, array_{intersect, union, distinct} to
datafusion-functions-array`
#9222
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a start on moving array functions -- I still need to work on this a bit more. FYI @jayzhan211
@@ -229,121 +228,6 @@ fn check_datatypes(name: &str, args: &[&ArrayRef]) -> Result<()> { | |||
Ok(()) | |||
} | |||
|
|||
/// Convert one or more [`ArrayRef`] of the same type into a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the code from here to the datafusion-function-array crate
@@ -564,7 +564,7 @@ enum ScalarFunction { | |||
Sqrt = 17; | |||
Tan = 18; | |||
Trunc = 19; | |||
Array = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array is an alias for make_array it seems
); | ||
} | ||
|
||
fn align_array_dimensions<O: OffsetSizeTrait>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this over from array_expressions where it is used in one other place. I'll try and figure out how to remove it shortly
@@ -1,254 +0,0 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this code to to_string.rs
to align with keeping the ScalarFunction definitions inline with their implementations
use std::any::Any; | ||
|
||
// Create static instances of ScalarUDFs for each function | ||
make_udf_function!(ArrayToString, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to to_string.rs
} | ||
|
||
// TODO figure out how to generalize this so it isn't hard coded by name | ||
match (fn_name(left), fn_name(right)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we have specific rewrites here for functions (that can't be done by constant folding or some other method). I don't like special casing the names
I need to look into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Weijun-H do you happen to remember the rationale for this code?
BuiltinScalarFunction::MakeArray, | ||
values, | ||
))) | ||
let make_array = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting -- now the SQL planner needs to replace the array literals with a scalar function, but it does the lookup by name. Rather than having this string hard coded maybe there should be a list of such functions somewhere 🤔
make_array, | ||
array, | ||
"returns an Arrow array using the specified input expressions.", | ||
udf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be make_array
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled to come up with a good name. The macro is basically creating two functions
/// fluent expr style
fn make_array(array: Expr) -> Expr {
}
/// return a ScalarUDF
fn udf() -> ScalarUdf {
...
}
The udf one is used to register the function:
/// Registers all enabled packages with a [`FunctionRegistry`]
pub fn register_all(registry: &mut dyn FunctionRegistry) -> Result<()> {
let functions: Vec<Arc<ScalarUDF>> = vec![
to_string::udf(),
make_array::udf(),
set_ops::union_udf(),
set_ops::intersect_udf(),
set_ops::distinct_udf(),
];
So it can't be the expr_fn name , but udf
is also pretty generic 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as_udf() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe make_udf
🤔
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should name it make_array_udf
or something else including a function name since we might not be able to reuse udf
or make_udf
for other array functions. Prefixing it with array_function seems like a better choice.
Draft while I figure out:
Which issue does this PR close?
Part of #8045
re #9100
Rationale for this change
Set the pattern for how more array functions can be moved from
datafusion
todatafusion-functions-array
cratesWhat changes are included in this PR?
make_array
,array_intersect
,array_union
andarray_distinct
function to datafusion-functions-arrayAre these changes tested?
Yes, by existing tests
Are there any user-facing changes?
No