-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spark): Custom annotation for more string functions #4156
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consolidate _annotate_by_same_args
and _annotate_by_args
somehow? They look very similar, perhaps a flag could help?
Some parts are overlapping but the "access pattern" is different. I think as each dialect is starting to form their own annotators we should keep that logic separate from the common functions; If some boilerplate parts are getting repetitive, we should of course factor these out in time. Regarding the Spark logic, I came up with a new set of rules for that annotator that biases |
1caba37
to
ed25681
Compare
ed25681
to
0c21fec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's test unknowns as well
@georgesittas I did add a few test cases with UNKNOWNs in the previous commit, do you have more cases in mind? |
Ah I didn't see them, apologies. LGTM. |
This is a follow up on #4004, as during my investigation for
SUBSTRING
I came across other Spark string functions (CONCAT
,LPAD
,RPAD
) which are input type dependent.Although it's not documented clearly, these functions will return a
BINARY
only if all their "string" operands areBINARY
, otherwise they return aSTRING
:Docs
Databricks LPAD | Databricks CONCAT