Substring type inference missing binary case #4002

racevedoo · 2024-08-29T10:54:05Z

Fully reproducible code snippet

>>> optimize(sqlglot.parse_one("select col from tbl",read='spark'),schema={"tbl": {"col": "BINARY"}}).expressions[0].type
DataType(this=Type.BINARY, nested=False)
>>> optimize(sqlglot.parse_one("select substring(col, 2, 3) as x from tbl",read='spark'),schema={"tbl": {"col": "BINARY"}}).expressions[0].type
DataType(this=Type.VARCHAR)

In the second expression, the expected output type is Type.BINARY. In spark shell, this works fine (spark 3.5.1):

scala> val df = spark.sql("select cast('thing' as binary)")
df: org.apache.spark.sql.DataFrame = [CAST(thing AS BINARY): binary]

scala> val df = spark.sql("select substring(cast('thing' as binary), 1, 3)")
df: org.apache.spark.sql.DataFrame = [substring(CAST(thing AS BINARY), 1, 3): binary]

From my understanding, the return data type for Substring is hardcoded to VARCHAR:

sqlglot/sqlglot/dialects/dialect.py

Lines 609 to 617 in fcaae87

    
           exp.DataType.Type.VARCHAR: { 
        
               exp.ArrayConcat, 
        
               exp.Concat, 
        
               exp.ConcatWs, 
        
               exp.DateToDateStr, 
        
               exp.GroupConcat, 
        
               exp.Initcap, 
        
               exp.Lower, 
        
               exp.Substring,

The text was updated successfully, but these errors were encountered:

racevedoo · 2024-08-29T13:25:27Z

Thanks for the quick fix!

georgesittas assigned VaggelisD Aug 29, 2024

VaggelisD mentioned this issue Aug 29, 2024

fix(spark): Custom annotation for SUBSTRING() #4004

Merged

georgesittas closed this as completed in #4004 Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substring type inference missing binary case #4002

Substring type inference missing binary case #4002

racevedoo commented Aug 29, 2024

racevedoo commented Aug 29, 2024

Substring type inference missing binary case #4002

Substring type inference missing binary case #4002

Comments

racevedoo commented Aug 29, 2024

racevedoo commented Aug 29, 2024