-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48935][SQL][TESTS] Make checkEvaluation
directly check the Collation
expression itself in UT
#47401
Conversation
… constructor of `StringType`
@@ -28,14 +28,14 @@ class CollationExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { | |||
assert(collationId == 0) | |||
val collateExpr = Collate(Literal("abc"), "UTF8_BINARY") | |||
assert(collateExpr.dataType === StringType(collationId)) | |||
collateExpr.dataType.asInstanceOf[StringType].collationId == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix an miss
check
val nullStr = Literal.create(null, StringType) | ||
// Supported collations (StringTypeBinaryLcase) | ||
val binaryCollation = StringType(CollationFactory.collationNameToId("UTF8_BINARY")) | ||
val lowercaseCollation = StringType(CollationFactory.collationNameToId("UTF8_LCASE")) | ||
// LikeAll |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the following modifications are not related to this PR, they are only made to maintain consistency with the rest of the checks in this PR.
cc @cloud-fan |
@@ -88,7 +88,10 @@ class StringType private(val collationId: Int) extends AtomicType with Serializa | |||
*/ | |||
@Stable | |||
case object StringType extends StringType(0) { | |||
private[spark] def apply(collationId: Int): StringType = new StringType(collationId) | |||
private[spark] def apply(collationId: Int): StringType = { | |||
assert (collationId >= 0 && collationId <= (1 << 12)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did we check collation id before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did not check collationId
, it only checked collationName
,
val collationId = CollationFactory.collationNameToId(collation) |
spark/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Line 391 in 1a428c1
throw collationInvalidNameException(originalName); |
spark/common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java
Line 615 in 1a428c1
throw collationInvalidNameException(originalName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, exceptions may be thrown below due to invalid collationId
, but unfortunately, they are all functions
that only trigger
when called
class StringType private(val collationId: Int) extends AtomicType with Serializable { | |
/** | |
* Support for Binary Equality implies that strings are considered equal only if | |
* they are byte for byte equal. E.g. all accent or case-insensitive collations are considered | |
* non-binary. If this field is true, byte level operations can be used against this datatype | |
* (e.g. for equality and hashing). | |
*/ | |
def supportsBinaryEquality: Boolean = | |
CollationFactory.fetchCollation(collationId).supportsBinaryEquality | |
def isUTF8BinaryCollation: Boolean = | |
collationId == CollationFactory.UTF8_BINARY_COLLATION_ID | |
def isUTF8BinaryLcaseCollation: Boolean = | |
collationId == CollationFactory.UTF8_LCASE_COLLATION_ID |
@@ -67,16 +67,16 @@ class CollationExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { | |||
} | |||
|
|||
test("collation on non-explicit default collation") { | |||
checkEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY") | |||
checkEvaluation(Collation(Literal("abc")), "UTF8_BINARY") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does checkEvaluation
take care of RuntimeReplaceable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, as follows:
Lines 87 to 90 in 1a428c1
protected def checkEvaluation( | |
expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = { | |
// Make it as method to obtain fresh expression everytime. | |
def expr = prepareEvaluation(expression) |
Lines 79 to 82 in 1a428c1
private def prepareEvaluation(expression: Expression): Expression = { | |
val serializer = new JavaSerializer(new SparkConf()).newInstance() | |
val resolver = ResolveTimeZone | |
val expr = resolver.resolveTimeZones(replace(expression)) |
recursion
Lines 74 to 75 in 1a428c1
protected def replace(expr: Expression): Expression = expr match { | |
case r: RuntimeReplaceable => replace(r.replacement) |
@@ -88,7 +88,10 @@ class StringType private(val collationId: Int) extends AtomicType with Serializa | |||
*/ | |||
@Stable | |||
case object StringType extends StringType(0) { | |||
private[spark] def apply(collationId: Int): StringType = new StringType(collationId) | |||
private[spark] def apply(collationId: Int): StringType = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked several callers of this function. The input collation id is mostly calculated from collation name. This assertion doesn't seem to be necessary and it's not cheap. Shall we revert?
I'm fine with other cleanups in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I have revert it.
Thank you!
…Collation` expression itself in UT
collatinId
should be added to the constructor of StringType
checkEvaluation
directly check the Collation
expression itself in UT
checkEvaluation
directly check the Collation
expression itself in UTcheckEvaluation
directly check the Collation
expression itself in UT
thanks, merging to master! |
…Collation` expression itself in UT ### What changes were proposed in this pull request? The pr aims to: - make `checkEvaluation` directly check the `Collation` expression itself in UT, rather than `Collation(...).replacement`. - fix an `miss` check in UT. ### Why are the changes needed? When checking the `RuntimeReplaceable` expression in UT, there is no need to write as `checkEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY")`, because it has already undergone a similar replacement internally. https://github.com/apache/spark/blob/1a428c1606645057ef94ac8a6cadbb947b9208a6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L75 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Update existed UT. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47401 from panbingkun/SPARK-48935. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…Collation` expression itself in UT ### What changes were proposed in this pull request? The pr aims to: - make `checkEvaluation` directly check the `Collation` expression itself in UT, rather than `Collation(...).replacement`. - fix an `miss` check in UT. ### Why are the changes needed? When checking the `RuntimeReplaceable` expression in UT, there is no need to write as `checkEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY")`, because it has already undergone a similar replacement internally. https://github.com/apache/spark/blob/1a428c1606645057ef94ac8a6cadbb947b9208a6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L75 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Update existed UT. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47401 from panbingkun/SPARK-48935. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…Collation` expression itself in UT ### What changes were proposed in this pull request? The pr aims to: - make `checkEvaluation` directly check the `Collation` expression itself in UT, rather than `Collation(...).replacement`. - fix an `miss` check in UT. ### Why are the changes needed? When checking the `RuntimeReplaceable` expression in UT, there is no need to write as `checkEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY")`, because it has already undergone a similar replacement internally. https://github.com/apache/spark/blob/1a428c1606645057ef94ac8a6cadbb947b9208a6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L75 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Update existed UT. - Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47401 from panbingkun/SPARK-48935. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
The pr aims to:
checkEvaluation
directly check theCollation
expression itself in UT, rather thanCollation(...).replacement
.miss
check in UT.Why are the changes needed?
When checking the
RuntimeReplaceable
expression in UT, there is no need to write ascheckEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY")
, because it has already undergone a similar replacement internally.spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
Line 75 in 1a428c1
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.