Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50792][SQL][FOLLOWUP] Improve the push down information for binary #49555

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import org.apache.commons.lang3.StringUtils
import org.apache.spark.SparkException
import org.apache.spark.sql.catalyst
import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
import org.apache.spark.sql.types.{DataType, IntegerType, StringType}
import org.apache.spark.sql.types.{BinaryType, DataType, IntegerType, StringType}
import org.apache.spark.util.ArrayImplicits._

/**
Expand Down Expand Up @@ -388,12 +388,13 @@ private[sql] object HoursTransform {
}

private[sql] final case class LiteralValue[T](value: T, dataType: DataType) extends Literal[T] {
override def toString: String = {
if (dataType.isInstanceOf[StringType]) {
s"'${StringUtils.replace(s"$value", "'", "''")}'"
} else {
s"$value"
}
override def toString: String = dataType match {
case StringType => s"'${StringUtils.replace(s"$value", "'", "''")}'"
case BinaryType =>
assert(value.isInstanceOf[Array[Byte]])
val bytes = value.asInstanceOf[Array[Byte]]
bytes.map("%02X".format(_)).mkString("X'", "", "'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like X literal isn't universal and depends on dialect (#49452)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. you said right. The output information is just a generic represent like the others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about to use ApacheHex.encodeHexString(binary, ... as in other places in the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where else do we use encodeHexString?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, this is for display only, we just need a popular binary value pretty string format.

Copy link
Contributor Author

@beliefer beliefer Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark SQL already parsed the format X'123456'.

The binary literal show the format 0x123456.

  override def toString: String = value match {
    case null => "null"
    case binary: Array[Byte] => "0x" + ApacheHex.encodeHexString(binary, false)

Which one is the better?

case _ => s"$value"
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3107,13 +3107,10 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
sql(s"CREATE TABLE $tableName (binary_col BINARY)")
sql(s"INSERT INTO $tableName VALUES ($binary)")

val select = s"SELECT * FROM $tableName WHERE binary_col = $binary"
val df = sql(select)
val filter = df.queryExecution.optimizedPlan.collect {
case f: Filter => f
}
assert(filter.isEmpty, "Filter is not pushed")
assert(df.collect().length === 1, s"Binary literal test failed: $select")
val df = sql(s"SELECT * FROM $tableName WHERE binary_col = $binary")
checkFiltersRemoved(df)
checkPushedInfo(df, "PushedFilters: [binary_col IS NOT NULL, binary_col = X'123456']")
checkAnswer(df, Row(Array(18, 52, 86)))
}
}

Expand Down
Loading