Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48935][SQL][TESTS] Make checkEvaluation directly check the Collation expression itself in UT #47401

Closed
wants to merge 5 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Jul 18, 2024

What changes were proposed in this pull request?

The pr aims to:

  • make checkEvaluation directly check the Collation expression itself in UT, rather than Collation(...).replacement.
  • fix an miss check in UT.

Why are the changes needed?

When checking the RuntimeReplaceable expression in UT, there is no need to write as checkEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY"), because it has already undergone a similar replacement internally.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Update existed UT.
  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jul 18, 2024
@@ -28,14 +28,14 @@ class CollationExpressionSuite extends SparkFunSuite with ExpressionEvalHelper {
assert(collationId == 0)
val collateExpr = Collate(Literal("abc"), "UTF8_BINARY")
assert(collateExpr.dataType === StringType(collationId))
collateExpr.dataType.asInstanceOf[StringType].collationId == 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix an miss check

val nullStr = Literal.create(null, StringType)
// Supported collations (StringTypeBinaryLcase)
val binaryCollation = StringType(CollationFactory.collationNameToId("UTF8_BINARY"))
val lowercaseCollation = StringType(CollationFactory.collationNameToId("UTF8_LCASE"))
// LikeAll
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the following modifications are not related to this PR, they are only made to maintain consistency with the rest of the checks in this PR.

@panbingkun panbingkun marked this pull request as ready for review July 18, 2024 08:57
@panbingkun
Copy link
Contributor Author

cc @cloud-fan

@@ -88,7 +88,10 @@ class StringType private(val collationId: Int) extends AtomicType with Serializa
*/
@Stable
case object StringType extends StringType(0) {
private[spark] def apply(collationId: Int): StringType = new StringType(collationId)
private[spark] def apply(collationId: Int): StringType = {
assert (collationId >= 0 && collationId <= (1 << 12))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did we check collation id before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did not check collationId, it only checked collationName,

val collationId = CollationFactory.collationNameToId(collation)

throw collationInvalidNameException(originalName);

throw collationInvalidNameException(originalName);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, exceptions may be thrown below due to invalid collationId, but unfortunately, they are all functions that only trigger when called

class StringType private(val collationId: Int) extends AtomicType with Serializable {
/**
* Support for Binary Equality implies that strings are considered equal only if
* they are byte for byte equal. E.g. all accent or case-insensitive collations are considered
* non-binary. If this field is true, byte level operations can be used against this datatype
* (e.g. for equality and hashing).
*/
def supportsBinaryEquality: Boolean =
CollationFactory.fetchCollation(collationId).supportsBinaryEquality
def isUTF8BinaryCollation: Boolean =
collationId == CollationFactory.UTF8_BINARY_COLLATION_ID
def isUTF8BinaryLcaseCollation: Boolean =
collationId == CollationFactory.UTF8_LCASE_COLLATION_ID

@@ -67,16 +67,16 @@ class CollationExpressionSuite extends SparkFunSuite with ExpressionEvalHelper {
}

test("collation on non-explicit default collation") {
checkEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY")
checkEvaluation(Collation(Literal("abc")), "UTF8_BINARY")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does checkEvaluation take care of RuntimeReplaceable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, as follows:

protected def checkEvaluation(
expression: => Expression, expected: Any, inputRow: InternalRow = EmptyRow): Unit = {
// Make it as method to obtain fresh expression everytime.
def expr = prepareEvaluation(expression)

private def prepareEvaluation(expression: Expression): Expression = {
val serializer = new JavaSerializer(new SparkConf()).newInstance()
val resolver = ResolveTimeZone
val expr = resolver.resolveTimeZones(replace(expression))

recursion

protected def replace(expr: Expression): Expression = expr match {
case r: RuntimeReplaceable => replace(r.replacement)

@@ -88,7 +88,10 @@ class StringType private(val collationId: Int) extends AtomicType with Serializa
*/
@Stable
case object StringType extends StringType(0) {
private[spark] def apply(collationId: Int): StringType = new StringType(collationId)
private[spark] def apply(collationId: Int): StringType = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked several callers of this function. The input collation id is mostly calculated from collation name. This assertion doesn't seem to be necessary and it's not cheap. Shall we revert?

I'm fine with other cleanups in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have revert it.
Thank you!

@panbingkun panbingkun changed the title [SPARK-48935][SQL] Restrictions oncollatinId should be added to the constructor of StringType [SPARK-48935][SQL] Make checkEvaluation directly check the Collation expression itself in UT Jul 23, 2024
@panbingkun panbingkun changed the title [SPARK-48935][SQL] Make checkEvaluation directly check the Collation expression itself in UT [SPARK-48935][SQL][TESTS] Make checkEvaluation directly check the Collation expression itself in UT Jul 24, 2024
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 4de4ed1 Jul 24, 2024
ilicmarkodb pushed a commit to ilicmarkodb/spark that referenced this pull request Jul 29, 2024
…Collation` expression itself in UT

### What changes were proposed in this pull request?
The pr aims to:
- make `checkEvaluation` directly check the `Collation` expression itself in UT, rather than `Collation(...).replacement`.
- fix an `miss` check in UT.

### Why are the changes needed?
When checking the `RuntimeReplaceable` expression in UT, there is no need to write as `checkEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY")`, because it has already undergone a similar replacement internally.
https://github.com/apache/spark/blob/1a428c1606645057ef94ac8a6cadbb947b9208a6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L75

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Update existed UT.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47401 from panbingkun/SPARK-48935.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
fusheng-rd pushed a commit to fusheng-rd/spark that referenced this pull request Aug 6, 2024
…Collation` expression itself in UT

### What changes were proposed in this pull request?
The pr aims to:
- make `checkEvaluation` directly check the `Collation` expression itself in UT, rather than `Collation(...).replacement`.
- fix an `miss` check in UT.

### Why are the changes needed?
When checking the `RuntimeReplaceable` expression in UT, there is no need to write as `checkEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY")`, because it has already undergone a similar replacement internally.
https://github.com/apache/spark/blob/1a428c1606645057ef94ac8a6cadbb947b9208a6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L75

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Update existed UT.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47401 from panbingkun/SPARK-48935.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…Collation` expression itself in UT

### What changes were proposed in this pull request?
The pr aims to:
- make `checkEvaluation` directly check the `Collation` expression itself in UT, rather than `Collation(...).replacement`.
- fix an `miss` check in UT.

### Why are the changes needed?
When checking the `RuntimeReplaceable` expression in UT, there is no need to write as `checkEvaluation(Collation(Literal("abc")).replacement, "UTF8_BINARY")`, because it has already undergone a similar replacement internally.
https://github.com/apache/spark/blob/1a428c1606645057ef94ac8a6cadbb947b9208a6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala#L75

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Update existed UT.
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47401 from panbingkun/SPARK-48935.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants