Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support left join(SMJ) when lower(String) and in are a part of the join condition on a single side of the join #11214

Open
viadea opened this issue Jul 17, 2024 · 3 comments
Labels
feature request New feature or request

Comments

@viadea
Copy link
Collaborator

viadea commented Jul 17, 2024

I wish we can support AST in a left join(SMJ) when lower(String)

Env:
24.08 snapshot jar

Mini repro:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._

val data = Seq(
    Row(Row("Adam ","","Green"),"1","M",1000,"2024-01-01"),
    Row(Row("Bob ","Middle","Green"),"2","M",2000,"2024-12-12"),
    Row(Row("Cathy ","","Green"),"3","F",3000,"2022-03-04")
)

val schema = (new StructType()
  .add("name",new StructType()
    .add("firstname",StringType)
    .add("middlename",StringType)
    .add("lastname",StringType)) 
  .add("id",StringType)
  .add("gender",StringType)
  .add("salary",IntegerType)
  .add("birthday_str",StringType))

val df = spark.createDataFrame(spark.sparkContext.parallelize(data),schema).withColumn("birthday_dt",current_date().as("dt"))
df.printSchema

df.write.format("parquet").mode("overwrite").save("/tmp/testparquet")
spark.read.parquet("/tmp/testparquet").createOrReplaceTempView("df")

spark.conf.set("spark.sql.autoBroadcastJoinThreshold","1")

val query="""
select count(*) as cnt 
from df a left join df b 
on lower(a.id) in ('1','2')
and a.name=b.name

"""

spark.sql(query).collect

Not-supported messages:

      !Exec <SortMergeJoinExec> cannot run on GPU because not all expressions can be replaced
        @Expression <AttributeReference> name#321 could run on GPU
        @Expression <AttributeReference> name#334 could run on GPU
        !Expression <In> lower(id#322) IN (1,2) cannot run on GPU because AST is required and this expression does not support AST
          !Expression <Lower> lower(id#322) cannot run on GPU because AST is required and this expression does not support AST
            @Expression <AttributeReference> id#322 could run on GPU
@revans2
Copy link
Collaborator

revans2 commented Jul 23, 2024

@viadea I am confused by this. lower is only part of the problem with the query you put up. Are you just asking about supporting lower in a non-equi-join case? Or do you want/need to support in also?

In either case we should be able to do this with changes based on #9759, but we would need to be sure to support it for all join cases, not just broadcast.

However that does not mean that we would support lower or in as a part of AST. It just means that we could support the example query you put up. Is that good enough?

@viadea
Copy link
Collaborator Author

viadea commented Jul 24, 2024

@revans2 I am not sure if the title is accurate since I do not know why we have to fallback for such query.
The ask is to make above sample query run on GPU.
TBH, I am not sure it is because of the in or the lower or both, since from the CPU fallback messages, both of them showed there.
Again, as long as the example join query can run on GPU, it is the goal of this FEA.

@revans2
Copy link
Collaborator

revans2 commented Jul 30, 2024

as long as the example join query can run on GPU, it is the goal of this FEA.

Sounds good we can make that work.

@revans2 revans2 changed the title [FEA] Support AST in a left join(SMJ) when lower(String) [FEA] Support left join(SMJ) when lower(String) and in are a part of the join condition on a single side of the join Jul 30, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants