Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(functions): support Dictionary for string and int functions #7262

Merged
merged 2 commits into from
Aug 11, 2023

Conversation

appletreeisyellow
Copy link
Contributor

Which issue does this PR close?

Closes #5471.

Rationale for this change

Running upper(col) where col is a dictionary results in an internal error:

Internal error: The "upper" function can only accept strings.. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Other functions like length and character_length also have the same issue. Here is a list of all the functions with the same issue:

$ grep utf8_to_str_type

datafusion/expr/src/built_in_function.rs
600:                utf8_to_str_type(&input_expr_types[0], "btrim")
628:                utf8_to_str_type(&input_expr_types[0], "initcap")
630:            BuiltinScalarFunction::Left => utf8_to_str_type(&input_expr_types[0], "left"),
632:                utf8_to_str_type(&input_expr_types[0], "lower")
634:            BuiltinScalarFunction::Lpad => utf8_to_str_type(&input_expr_types[0], "lpad"),
636:                utf8_to_str_type(&input_expr_types[0], "ltrim")
638:            BuiltinScalarFunction::MD5 => utf8_to_str_type(&input_expr_types[0], "md5"),
651:                utf8_to_str_type(&input_expr_types[0], "regex_replace")
654:                utf8_to_str_type(&input_expr_types[0], "repeat")
657:                utf8_to_str_type(&input_expr_types[0], "replace")
660:                utf8_to_str_type(&input_expr_types[0], "reverse")
663:                utf8_to_str_type(&input_expr_types[0], "right")
665:            BuiltinScalarFunction::Rpad => utf8_to_str_type(&input_expr_types[0], "rpad"),
667:                utf8_to_str_type(&input_expr_types[0], "rtrimp")
711:                utf8_to_str_type(&input_expr_types[0], "split_part")
718:                utf8_to_str_type(&input_expr_types[0], "substr")
740:                utf8_to_str_type(&input_expr_types[0], "translate")
742:            BuiltinScalarFunction::Trim => utf8_to_str_type(&input_expr_types[0], "trim"),
744:                utf8_to_str_type(&input_expr_types[0], "upper")
$ grep utf8_to_int_type

datafusion/expr/src/built_in_function.rs
597:                utf8_to_int_type(&input_expr_types[0], "bit_length")
603:                utf8_to_int_type(&input_expr_types[0], "character_length")
645:                utf8_to_int_type(&input_expr_types[0], "octet_length")
715:                utf8_to_int_type(&input_expr_types[0], "strpos")

What changes are included in this PR?

Support Dictionary data type for string functions and int functions

Are these changes tested?

Yes, tests are added for all the functions listed above

Are there any user-facing changes?

User will be able to use upper function and other string and int functions where col is a dictionary without problem

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Aug 10, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wonderful @appletreeisyellow -- the fix is so beautiful and the tests very well done. 🏅

@alamb
Copy link
Contributor

alamb commented Aug 11, 2023

I will plan to merge this later today unless anyone else has comments

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb alamb merged commit 7a5354f into apache:main Aug 11, 2023
@alamb
Copy link
Contributor

alamb commented Aug 11, 2023

Thank you @appletreeisyellow and @liukun4515

@alamb
Copy link
Contributor

alamb commented Aug 11, 2023

🤔 I just tested this out in IOx and it looks like it still doesn't work for arrays

create table foo(x varchar) as values ('foo'), ('bar');
create table foo_dict as select arrow_cast(x, 'Dictionary(Int32, Utf8)') as x from foo;
select upper(x) from foo_dict;

Results in an internal error:

DataFusion CLI v28.0.0
❯ create table foo(x varchar) as values ('foo'), ('bar');
create table foo_dict as select arrow_cast(x, 'Dictionary(Int32, Utf8)') as x from foo;
select upper(x) from foo_dict;

0 rows in set. Query took 0.034 seconds.

0 rows in set. Query took 0.039 seconds.

Internal error: The "upper" function can only accept strings.. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

I will reopen #5471

@alamb
Copy link
Contributor

alamb commented Aug 11, 2023

TO be clear I think this PR is an improvement, it just isn't the entire fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
3 participants