Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substring type inference missing binary case #4002

Closed
racevedoo opened this issue Aug 29, 2024 · 1 comment · Fixed by #4004
Closed

Substring type inference missing binary case #4002

racevedoo opened this issue Aug 29, 2024 · 1 comment · Fixed by #4004
Assignees

Comments

@racevedoo
Copy link

Fully reproducible code snippet

>>> optimize(sqlglot.parse_one("select col from tbl",read='spark'),schema={"tbl": {"col": "BINARY"}}).expressions[0].type
DataType(this=Type.BINARY, nested=False)
>>> optimize(sqlglot.parse_one("select substring(col, 2, 3) as x from tbl",read='spark'),schema={"tbl": {"col": "BINARY"}}).expressions[0].type
DataType(this=Type.VARCHAR)

In the second expression, the expected output type is Type.BINARY. In spark shell, this works fine (spark 3.5.1):

scala> val df = spark.sql("select cast('thing' as binary)")
df: org.apache.spark.sql.DataFrame = [CAST(thing AS BINARY): binary]

scala> val df = spark.sql("select substring(cast('thing' as binary), 1, 3)")
df: org.apache.spark.sql.DataFrame = [substring(CAST(thing AS BINARY), 1, 3): binary]

From my understanding, the return data type for Substring is hardcoded to VARCHAR:

exp.DataType.Type.VARCHAR: {
exp.ArrayConcat,
exp.Concat,
exp.ConcatWs,
exp.DateToDateStr,
exp.GroupConcat,
exp.Initcap,
exp.Lower,
exp.Substring,

@racevedoo
Copy link
Author

Thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants