-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove duplicate function name in its aliases list #10661
Remove duplicate function name in its aliases list #10661
Conversation
@@ -65,7 +65,6 @@ impl OptimizerRule for PushDownLimit { | |||
}; | |||
|
|||
let Limit { skip, fetch, input } = limit; | |||
let input = input; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why the clippy check in CI didn't detect this but it's showed when I ran clippy in my local.
} | ||
} | ||
|
||
for function in all_default_aggregate_functions() { | ||
let udaf = new_state.register_udaf(function).unwrap(); | ||
if let Some(udaf) = udaf { | ||
assert!(false, "Function {} already registered", udaf.name()); | ||
unreachable!("Function {} already registered", udaf.name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clippy suggests that I use panic
or unreachable
instead of assert
. I'm not sure which one is better.
function_factory: None, | ||
}; | ||
|
||
for function in all_default_functions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice test -- though it is basically a copy of SessionContext::new
I wonder if we could make a test that is a bit simpler for example by creating a new
https://docs.rs/datafusion/latest/datafusion/execution/registry/struct.MemoryFunctionRegistry.html
Something like (untested)
let registry = MemoryFunctionRegistry::new();
for function in all_default_array_functions() {
let existing_function = new_state.register_udf(function);
assert!(existing_function.is_none(), "{} was already registered", function.name()
}
// and similarly for the aggregate and array functins
🤔
Thank you @goldmedal 🙏 |
Thanks, @alamb! Actually, I tried to add some tests for datafusion/datafusion/execution/src/registry.rs Lines 174 to 176 in 4709fc6
Then, I found that the aliases are only inserted by datafusion/datafusion/core/src/execution/context/mod.rs Lines 2278 to 2283 in 4709fc6
That's why I added tests for I'm not sure, but maybe we should also insert aliases in |
I think this would make sense -- though maybe we can do it as a follow on PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this find @goldmedal -- very cool. I think we can simplify the test a bit which would be good to consider.
@@ -2860,6 +2863,57 @@ mod tests { | |||
Ok(()) | |||
} | |||
|
|||
#[tokio::test] | |||
async fn test_register_default_functions() -> Result<()> { | |||
let config = SessionConfig::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the proposal to test with code like SessionState::new
, I think it is problematic because if SessionState::new is changed, then this test will no longer match what it is doing.
I think the core of this test is verifying that there are no duplicate names in the alias lists.
Thus, perhaps we could write tests for all_default_functions
, etc like
let mut names = HashSet::new();
for func in all_default_functions() {
assert(names.insert(func.name()).is_none(), "func.name duplicated")
for alias in func.aliases() {
assert(names.insert(alias).is_none(), "alias duplicated")
}
}
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I'll add this test and remove the test for SessionState::new. Thanks
340329d
to
8c2c828
Compare
let mut names = HashSet::new(); | ||
for func in all_default_aggregate_functions() { | ||
assert!( | ||
names.insert(func.name().to_string().to_lowercase()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function name and alias should be case-insensitive. Therefore, lowercase them here.
I'm confused about the new
I removed the duplicate name as I did for other functions in 4086688. However, the planner can't resolve the lowercase function name. I have tried and confirmed that other functions (e.g., Do you have any idea? @alamb I found this function was created in #10644. |
I think I got it now. In #10644, See:
That's why we don't need to add a lowercase alias for I reverted the change for |
so as
Why adding a lowercase alias is a workaround for UDAF? Without alias, we will need to check the lowercase in many places. |
Thanks for the information! @jayzhan211
Since I thought we needed to list all case patterns for the function name (e.g.,
The function call will be transferred to
It will also be changed to
So, how about setting UDAF names to lowercase (e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tahnk you @goldmedal and @jayzhan211 -- I think this is a really nice improvement. Not only does it fix the bug it also makes the code easier to work with and avoids introducing the same regression again
I took the liberty of merging up from main to resolve a merge conflict
This makes sense to me Shall we file a follow on ticket for it? I don't think we need to fix it in this PR (as this PR doesn't make the situation any better or worse, from what I can tell). |
Thanks @alamb, I agree with you. Fixing it in a follow-up ticket is better. |
Thanks, @goldmedal and @alam. I think we can merge this |
Thanks again @alamb @jayzhan211 |
* remove duplicate name in function name aliases for array-function * add test for registering default functions * rename test case * add tests for agg and core function and refactor the test * remove unused mut * add comments for public function * cargo fmt * fix clippy * fix missing list_element * fix clippy * remove duplicate aliase name for median * add test for function list and remove the test for SessionState * remove the debug message * revert the change for medain and remove case insensitive test --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Closes #10658
Rationale for this change
What changes are included in this PR?
Are these changes tested?
yes
Also checked starting the CLI with the DEBUG level.
Are there any user-facing changes?
No