feat(spark): Custom annotation for more string functions #4156

VaggelisD · 2024-09-25T10:32:01Z

This is a follow up on #4004, as during my investigation for SUBSTRING I came across other Spark string functions (CONCAT, LPAD, RPAD) which are input type dependent.

Although it's not documented clearly, these functions will return a BINARY only if all their "string" operands are BINARY, otherwise they return a STRING:

spark-sql (default)> with tbl as (select cast('test' as binary) as bin_col, 'str' as str_col) 
select 
 typeof(concat(bin_col, bin_col)), 
 typeof(concat(str_col, bin_col)), 
 typeof(concat(bin_col, str_col)), 
 typeof(concat(str_col, str_col)) 
from tbl;

binary  string  string  string

spark-sql (default)> with tbl as (select cast('test' as binary) as bin_col, 'str' as str_col) 
select 
 typeof(lpad(bin_col, 1, bin_col)), 
 typeof(lpad(str_col, 1, bin_col)), 
 typeof(lpad(bin_col, 1, str_col)), 
 typeof(lpad(str_col, 1, str_col))
from tbl;

binary  string  string  string

Docs

Databricks LPAD | Databricks CONCAT

…ctions

georgesittas

Can we consolidate _annotate_by_same_args and _annotate_by_args somehow? They look very similar, perhaps a flag could help?

sqlglot/dialects/spark2.py

tests/test_optimizer.py

VaggelisD · 2024-09-26T15:17:56Z

Can we consolidate _annotate_by_same_args and _annotate_by_args somehow? They look very similar, perhaps a flag could help?

Some parts are overlapping but the "access pattern" is different. I think as each dialect is starting to form their own annotators we should keep that logic separate from the common functions; If some boilerplate parts are getting repetitive, we should of course factor these out in time.

Regarding the Spark logic, I came up with a new set of rules for that annotator that biases STRING less now (that still remains the fallback type though). What do you think of that?

georgesittas

Let's test unknowns as well

VaggelisD · 2024-09-27T07:18:15Z

@georgesittas I did add a few test cases with UNKNOWNs in the previous commit, do you have more cases in mind?

georgesittas · 2024-09-27T14:26:08Z

@georgesittas I did add a few test cases with UNKNOWNs in the previous commit, do you have more cases in mind?

Ah I didn't see them, apologies. LGTM.

feat(spark, databricks): Custom annotation for CONCAT, LPAD, RPAD fun…

c5b8514

…ctions

VaggelisD changed the title ~~feat(spark, databricks): Custom annotation for string functions~~ feat(spark): Custom annotation for more string functions Sep 25, 2024

georgesittas reviewed Sep 25, 2024

View reviewed changes

sqlglot/dialects/spark2.py Outdated Show resolved Hide resolved

sqlglot/dialects/spark2.py Outdated Show resolved Hide resolved

tests/test_optimizer.py Outdated Show resolved Hide resolved

VaggelisD force-pushed the vaggelisd/spark_annotators branch from 1caba37 to ed25681 Compare September 26, 2024 15:20

PR Feedback 1

0c21fec

VaggelisD force-pushed the vaggelisd/spark_annotators branch from ed25681 to 0c21fec Compare September 26, 2024 16:00

georgesittas approved these changes Sep 26, 2024

View reviewed changes

georgesittas merged commit 7af33a2 into main Sep 27, 2024
6 checks passed

georgesittas deleted the vaggelisd/spark_annotators branch September 27, 2024 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spark): Custom annotation for more string functions #4156

feat(spark): Custom annotation for more string functions #4156

VaggelisD commented Sep 25, 2024 •

edited

Loading

georgesittas left a comment

VaggelisD commented Sep 26, 2024

georgesittas left a comment

VaggelisD commented Sep 27, 2024

georgesittas commented Sep 27, 2024

feat(spark): Custom annotation for more string functions #4156

feat(spark): Custom annotation for more string functions #4156

Conversation

VaggelisD commented Sep 25, 2024 • edited Loading

Docs

georgesittas left a comment

Choose a reason for hiding this comment

VaggelisD commented Sep 26, 2024

georgesittas left a comment

Choose a reason for hiding this comment

VaggelisD commented Sep 27, 2024

georgesittas commented Sep 27, 2024

VaggelisD commented Sep 25, 2024 •

edited

Loading