Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47210][SQL] Addition of implicit casting without indeterminate support #45383

Closed
wants to merge 80 commits into from
Closed
Show file tree
Hide file tree
Changes from 74 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
b34544a
Implicit casting on collated expressions
mihailom-db Mar 5, 2024
fdbfa44
Fix doc files
mihailom-db Mar 5, 2024
ce9b027
Fix contains, startWith, endWith tests
mihailom-db Mar 5, 2024
e537190
Fix imports
mihailom-db Mar 5, 2024
b5a79c1
Fix docs and incorporate changes
mihailom-db Mar 6, 2024
8321d0c
Fix tests in CollationSuite
mihailom-db Mar 6, 2024
d178233
Add test and incorporate changes
mihailom-db Mar 7, 2024
a4b9be7
Fix godlen files
mihailom-db Mar 7, 2024
a6e7662
Incorporate StringType in findWiderCommonType
mihailom-db Mar 8, 2024
e1d7ad5
Merge branch 'master' into SPARK-47210
mihailom-db Mar 8, 2024
b3b1356
Fix ArrayType(StringType, _) casting in findWiderCommonType
mihailom-db Mar 11, 2024
7773d13
Fix type mismatch error
mihailom-db Mar 11, 2024
198a728
Merge branch 'apache:master' into SPARK-47210
mihailom-db Mar 11, 2024
255b1ab
Incorporate changes and fix errors
mihailom-db Mar 11, 2024
9ce417f
Merge branch 'master' into SPARK-47210
mihailom-db Mar 12, 2024
50f3aa2
Fix errors
mihailom-db Mar 12, 2024
ca0c84d
Rework casting
mihailom-db Mar 13, 2024
880a1b1
Merge branch 'master' into SPARK-47210
mihailom-db Mar 13, 2024
56d6c7c
Fix failing tests
mihailom-db Mar 14, 2024
94e5259
Fix array cast errors
mihailom-db Mar 14, 2024
ccb52ba
Fix additional errors
mihailom-db Mar 14, 2024
9b1387b
Fix explicit collation search
mihailom-db Mar 17, 2024
c9974e1
Fix scala style errors
mihailom-db Mar 18, 2024
fca9a65
Add support for ImplicitCastInputTypes
mihailom-db Mar 18, 2024
660d664
Fix accidental change in license header
mihailom-db Mar 18, 2024
c8edd93
Fix null casting
mihailom-db Mar 19, 2024
a91490b
Fix failing tests
mihailom-db Mar 19, 2024
49a8d61
Move implicit casting when strings present
mihailom-db Mar 19, 2024
4c4cd84
Fix unintentional changes
mihailom-db Mar 19, 2024
66122a6
improve types.py
mihailom-db Mar 20, 2024
50f46e4
Refactor code
mihailom-db Mar 21, 2024
cc86a87
Merge branch 'master' into SPARK-47210
mihailom-db Mar 21, 2024
c01e80c
Fix imports and failing tests
mihailom-db Mar 21, 2024
cc797a2
Disable casting of StructTypes
mihailom-db Mar 21, 2024
5d001ee
Fix imports
mihailom-db Mar 21, 2024
c68fc7d
Fix concat tests
mihailom-db Mar 21, 2024
1c926ab
Fix unnecessary repetition
mihailom-db Mar 21, 2024
dec39bf
Remove Elt test
mihailom-db Mar 21, 2024
e808446
Remove tests for Repeat
mihailom-db Mar 21, 2024
ca1a23a
Merge branch 'master' into SPARK-47210
mihailom-db Mar 21, 2024
116931c
Merge branch 'apache:master' into SPARK-47210
mihailom-db Mar 22, 2024
af487a2
Fix failing tests
mihailom-db Mar 22, 2024
4ba7055
Fix nullability for StringType->StringType
mihailom-db Mar 22, 2024
e490e42
Improve comments and switch tests from E2E to unit tests
mihailom-db Mar 24, 2024
00e88e7
Add new tests and remove compatibility test
mihailom-db Mar 25, 2024
85b4d16
Fix conflict resolution mistake
mihailom-db Mar 25, 2024
30f7225
Merge branch 'apache:master' into SPARK-47210
mihailom-db Mar 25, 2024
e89a354
Add indeterminate collation tests
mihailom-db Mar 26, 2024
788dc06
Fix test
mihailom-db Mar 26, 2024
75c0140
Block Alias on Indeterminate
mihailom-db Mar 27, 2024
2918413
Merge remote-tracking branch 'upstream/master' into SPARK-47210
mihailom-db Mar 28, 2024
f6ed55a
Remove introduction of indeterminate collation
mihailom-db Mar 28, 2024
98960c0
Fix import problem
mihailom-db Mar 28, 2024
de623c8
Fix failing tests
mihailom-db Mar 28, 2024
a92b4e1
Fix pyspark error
mihailom-db Mar 28, 2024
f7f3011
Merge branch 'apache:master' into SPARK-47210
mihailom-db Mar 28, 2024
f67808e
Fix errors
mihailom-db Mar 29, 2024
815ce42
Fix schema error
mihailom-db Mar 29, 2024
7fca38a
Merge remote-tracking branch 'upstream/master' into SPARK-47210
mihailom-db Mar 29, 2024
b19b0eb
Fix collated tests
mihailom-db Mar 29, 2024
a111f03
Add isExplicit flag
mihailom-db Mar 29, 2024
55bdd9b
Fix import error
mihailom-db Mar 29, 2024
a7228be
Fix imports in TypeCoercion
mihailom-db Mar 31, 2024
27a72c6
Merge remote-tracking branch 'upstream/master' into SPARK-47210
mihailom-db Apr 1, 2024
18ada04
Add support for explicit propagation in arrays
mihailom-db Apr 1, 2024
38670af
Fix tests to follow recent changes
mihailom-db Apr 1, 2024
01d891e
Incorporate changes
mihailom-db Apr 1, 2024
c5daf86
Fix error
mihailom-db Apr 1, 2024
9ac5678
Change var to val in StringType
mihailom-db Apr 1, 2024
0f1757d
Fix import style
mihailom-db Apr 1, 2024
506c8c0
Revert explicit flag addition
mihailom-db Apr 1, 2024
f743cf8
Narrow down expressions casting
mihailom-db Apr 2, 2024
4f8fe1d
Incorporate minor changes
mihailom-db Apr 2, 2024
52bf4dc
Incorporate changes
mihailom-db Apr 2, 2024
7cbeafe
Special case expressions
mihailom-db Apr 3, 2024
3e92e92
Return new line
mihailom-db Apr 3, 2024
b23e106
Remove indentation cosmetic
mihailom-db Apr 3, 2024
880ebed
Add more cosmetic changes
mihailom-db Apr 3, 2024
f96ecd9
Incorporate changes
mihailom-db Apr 3, 2024
e1e0cf4
Merge branch 'apache:master' into SPARK-47210
mihailom-db Apr 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 24 additions & 5 deletions common/utils/src/main/resources/error/error-classes.json
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,24 @@
],
"sqlState" : "42704"
},
"COLLATION_MISMATCH" : {
"message" : [
"Could not determine which collation to use for string functions and operators."
],
"subClass" : {
"EXPLICIT" : {
"message" : [
"Error occurred due to the mismatch between explicit collations: <explicitTypes>. Decide on a single explicit collation and remove others."
]
},
"IMPLICIT" : {
"message" : [
"Error occurred due to the mismatch between multiple implicit non-default collations. Use COLLATE function to set the collation explicitly."
]
}
},
"sqlState" : "42P21"
},
"COLLECTION_SIZE_LIMIT_EXCEEDED" : {
"message" : [
"Can't create array with <numberOfElements> elements which exceeding the array size limit <maxRoundedArrayLength>,"
Expand Down Expand Up @@ -688,11 +706,6 @@
"To convert values from <srcType> to <targetType>, you can use the functions <functionNames> instead."
]
},
"COLLATION_MISMATCH" : {
"message" : [
"Collations <collationNameLeft> and <collationNameRight> are not compatible. Please use the same collation for both strings."
]
},
"CREATE_MAP_KEY_DIFF_TYPES" : {
"message" : [
"The given keys of function <functionName> should all be the same type, but they are <dataType>."
Expand Down Expand Up @@ -1598,6 +1611,12 @@
],
"sqlState" : "22003"
},
"INDETERMINATE_COLLATION" : {
"message" : [
"Function called requires knowledge of the collation it should apply, but indeterminate collation was found. Use COLLATE function to set the collation explicitly."
],
"sqlState" : "42P22"
},
"INDEX_ALREADY_EXISTS" : {
"message" : [
"Cannot create the index <indexName> on table <tableName> because it already exists."
Expand Down
41 changes: 41 additions & 0 deletions docs/sql-error-conditions-collation-mismatch-error-class.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
layout: global
title: COLLATION_MISMATCH error class
displayTitle: COLLATION_MISMATCH error class
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---

<!--
DO NOT EDIT THIS FILE.
It was generated automatically by `org.apache.spark.SparkThrowableSuite`.
-->

[SQLSTATE: 42P21](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)

Could not determine which collation to use for string functions and operators.

This error class has the following derived error classes:

## EXPLICIT

Error occurred due to the mismatch between explicit collations: `<explicitTypes>`. Decide on a single explicit collation and remove others.

## IMPLICIT

Error occurred due to the mismatch between multiple implicit non-default collations. Use COLLATE function to set the collation explicitly.


4 changes: 0 additions & 4 deletions docs/sql-error-conditions-datatype-mismatch-error-class.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,6 @@ If you have to cast `<srcType>` to `<targetType>`, you can set `<config>` as `<c
cannot cast `<srcType>` to `<targetType>`.
To convert values from `<srcType>` to `<targetType>`, you can use the functions `<functionNames>` instead.

## COLLATION_MISMATCH

Collations `<collationNameLeft>` and `<collationNameRight>` are not compatible. Please use the same collation for both strings.

## CREATE_MAP_KEY_DIFF_TYPES

The given keys of function `<functionName>` should all be the same type, but they are `<dataType>`.
Expand Down
14 changes: 14 additions & 0 deletions docs/sql-error-conditions.md
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,14 @@ Cannot find a short name for the codec `<codecName>`.

The value `<collationName>` does not represent a correct collation name. Suggested valid collation name: [`<proposal>`].

### [COLLATION_MISMATCH](sql-error-conditions-collation-mismatch-error-class.html)

[SQLSTATE: 42P21](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)

Could not determine which collation to use for string functions and operators.

For more details see [COLLATION_MISMATCH](sql-error-conditions-collation-mismatch-error-class.html)

### [COLLECTION_SIZE_LIMIT_EXCEEDED](sql-error-conditions-collection-size-limit-exceeded-error-class.html)

[SQLSTATE: 54000](sql-error-conditions-sqlstates.html#class-54-program-limit-exceeded)
Expand Down Expand Up @@ -939,6 +947,12 @@ For more details see [INCONSISTENT_BEHAVIOR_CROSS_VERSION](sql-error-conditions-

Max offset with `<rowsPerSecond>` rowsPerSecond is `<maxSeconds>`, but 'rampUpTimeSeconds' is `<rampUpTimeSeconds>`.

### INDETERMINATE_COLLATION

[SQLSTATE: 42P22](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)

Function called requires knowledge of the collation it should apply, but indeterminate collation was found. Use COLLATE function to set the collation explicitly.

### INDEX_ALREADY_EXISTS

[SQLSTATE: 42710](sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation)
Expand Down
9 changes: 4 additions & 5 deletions python/pyspark/sql/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,11 +264,10 @@ def fromCollationId(self, collationId: int) -> "StringType":
return StringType(StringType.collationNames[collationId])

def collationIdToName(self) -> str:
return (
" collate %s" % StringType.collationNames[self.collationId]
if self.collationId != 0
else ""
)
if self.collationId == 0:
return ""
else:
return " collate %s" % StringType.collationNames[self.collationId]

@classmethod
def collationNameToId(cls, collationName: str) -> int:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ private[sql] object ArrowUtils {
case LongType => new ArrowType.Int(8 * 8, true)
case FloatType => new ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
case DoubleType => new ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
case StringType if !largeVarTypes => ArrowType.Utf8.INSTANCE
case _: StringType if !largeVarTypes => ArrowType.Utf8.INSTANCE
case BinaryType if !largeVarTypes => ArrowType.Binary.INSTANCE
case StringType if largeVarTypes => ArrowType.LargeUtf8.INSTANCE
case _: StringType if largeVarTypes => ArrowType.LargeUtf8.INSTANCE
case BinaryType if largeVarTypes => ArrowType.LargeBinary.INSTANCE
case DecimalType.Fixed(precision, scale) => new ArrowType.Decimal(precision, scale)
case DateType => new ArrowType.Date(DateUnit.DAY)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ object AnsiTypeCoercion extends TypeCoercionBase {
UnpivotCoercion ::
WidenSetOperationTypes ::
new AnsiCombinedTypeCoercionRule(
CollationTypeCasts ::
InConversion ::
PromoteStrings ::
DecimalPrecision ::
Expand All @@ -92,7 +93,7 @@ object AnsiTypeCoercion extends TypeCoercionBase {
ImplicitTypeCasts ::
DateTimeOperations ::
WindowFrameCoercion ::
GetDateFieldOperations:: Nil) :: Nil
GetDateFieldOperations :: Nil) :: Nil

val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
case (t1, t2) if t1 == t2 => Some(t1)
Expand Down Expand Up @@ -138,15 +139,16 @@ object AnsiTypeCoercion extends TypeCoercionBase {
@scala.annotation.tailrec
private def findWiderTypeForString(dt1: DataType, dt2: DataType): Option[DataType] = {
(dt1, dt2) match {
case (StringType, _: IntegralType) => Some(LongType)
case (StringType, _: FractionalType) => Some(DoubleType)
case (StringType, NullType) => Some(StringType)
case (_: StringType, _: IntegralType) => Some(LongType)
case (_: StringType, _: FractionalType) => Some(DoubleType)
case (st: StringType, NullType) => Some(st)
// If a binary operation contains interval type and string, we can't decide which
// interval type the string should be promoted as. There are many possible interval
// types, such as year interval, month interval, day interval, hour interval, etc.
case (StringType, _: AnsiIntervalType) => None
case (StringType, a: AtomicType) => Some(a)
case (other, StringType) if other != StringType => findWiderTypeForString(StringType, other)
case (_: StringType, _: AnsiIntervalType) => None
case (_: StringType, a: AtomicType) => Some(a)
case (other, st: StringType) if !other.isInstanceOf[StringType] =>
findWiderTypeForString(st, other)
case _ => None
}
}
Expand Down Expand Up @@ -182,7 +184,7 @@ object AnsiTypeCoercion extends TypeCoercionBase {

// If a function expects a StringType, no StringType instance should be implicitly cast to
// StringType with a collation that's not accepted (aka. lockdown unsupported collations).
case (_: StringType, StringType) => None
case (_: StringType, _: StringType) => None
case (_: StringType, _: StringTypeCollated) => None

// If a function expects integral type, fractional input is not allowed.
Expand All @@ -191,7 +193,7 @@ object AnsiTypeCoercion extends TypeCoercionBase {
// Ideally the implicit cast rule should be the same as `Cast.canANSIStoreAssign` so that it's
// consistent with table insertion. To avoid breaking too many existing Spark SQL queries,
// we make the system to allow implicitly converting String type as other primitive types.
case (StringType, a @ (_: AtomicType | NumericType | DecimalType | AnyTimestampType)) =>
case (_: StringType, a @ (_: AtomicType | NumericType | DecimalType | AnyTimestampType)) =>
Some(a.defaultConcreteType)

// When the target type is `TypeCollection`, there is another branch to find the
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.catalyst.analysis

import javax.annotation.Nullable

import scala.annotation.tailrec

import org.apache.spark.sql.catalyst.analysis.TypeCoercion.{hasStringType}
import org.apache.spark.sql.catalyst.expressions.{ArrayJoin, BinaryExpression, CaseWhen, Cast, Coalesce, Collate, Concat, ConcatWs, CreateArray, Expression, Greatest, If, In, InSubquery, Least, Substring}
import org.apache.spark.sql.errors.QueryCompilationErrors
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types.{AbstractDataType, ArrayType, DataType, StringType}

object CollationTypeCasts extends TypeCoercionRule {
override val transform: PartialFunction[Expression, Expression] = {
case e if !e.childrenResolved => e
case sc@(_: In
| _: InSubquery
| _: CreateArray
| _: If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add more case matches as some expressions do not require all its inputs to be the same type

case if: If =>
  if.withNewChildren(if.predicate +: collateToSingleType(if.trueValue, if.falseValue))

| _: ArrayJoin
| _: CaseWhen
| _: Concat
| _: Greatest
| _: Least
| _: Coalesce
| _: BinaryExpression
| _: ConcatWs
| _: Substring) =>
val newChildren = collateToSingleType(sc.children)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the tricky part of adding a new rule. We must make sure we follow the behavior of existing implicit cast rules and only add implicit cast for certain children of an expression. For example, the true and false branches of If expression need implicit cast, but not the if condition. ExpectsInputTypes does not indicate that the expression requires all its children to be the same type, and should not be handled here.

Please carefully check all implicit cast rules and revisit this new rule. @mihailom-db

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule is only touching children that have StringType's as their DataType. I can reorder execution to first filter out StringType arguments and then work on them, Agreed for ExpectInputTypes, I have checked now and it should not have any function with 2 StringType inputs, so we are safe not to cast it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not assume thar it's safe to only cast string-type inputs for any expression. Please follow the existing implicit cast rules and explicitly match expressions that need some of their inputs to be the same type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the problem. If I understood correctly, we do not want to replicate the code from ImplicitTypeCasts. This rule is concerned with transforming datatypes into their expected DataTypes, but CollationTypeCasts is concerned with transforming StringTypes into their expected collated StringType. There is a difference, as collated StringType is calculated based on expression parameters that have StringTypes not on what someone expects the input types to be. As you mentioned for If, this new collation type rule does not want to fail instead of the IfCoercion which fails if it cannot find wider for left and right, but this new rule wants to fail and mitigate errors of a customer providing differently collated strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, we do not want to replicate the code from ImplicitTypeCasts.

I don't think this is possible without significant refactoring. Correctness is more important than duplicated code at this stage.

sc.withNewChildren(newChildren)
}
/**
* Extracts StringTypes from filtered hasStringType
*/
@tailrec
private def extractStringType(dt: DataType): StringType = dt match {
case st: StringType => st
case ArrayType(et, _) => extractStringType(et)
}

/**
* Casts given expression to collated StringType with id equal to collationId only
* if expression has StringType in the first place.
* @param expr
* @param collationId
* @return
*/
def castStringType(expr: Expression, st: StringType): Option[Expression] =
castStringType(expr.dataType, st).map { dt => Cast(expr, dt)}

private def castStringType(inType: AbstractDataType, castType: StringType): Option[DataType] = {
@Nullable val ret: DataType = inType match {
case st: StringType if st.collationId != castType.collationId => castType
case ArrayType(arrType, nullable) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with special-case array type. The code looks broken. It assumes the children of the given expression can have both string type and array of string type, then tries to find a common collation between the string type child and the array element. This makes no sense without knowing the semantic of the given expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple example is ConcatWs. It can have ArrayType(StringType, _) for input strings and StringType for separator as parameters. What collations do we want for this then? We need to cast the ArrayType into a proper collation if separator is explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please match the ConcatWs expression explicitly to handle this case. What I disagree with is to do this blindly for all expressions.

castStringType(arrType, castType).map(ArrayType(_, nullable)).orNull
case _ => null
}
Option(ret)
}

/**
* Collates input expressions to a single collation.
*/
def collateToSingleType(exprs: Seq[Expression]): Seq[Expression] = {
val st = getOutputCollation(exprs)

exprs.map(e => castStringType(e, st).getOrElse(e))
}

/**
* Based on the data types of the input expressions this method determines
* a collation type which the output will have. This function accepts Seq of
* any expressions, but will only be affected by collated StringTypes or
* complex DataTypes with collated StringTypes (e.g. ArrayType)
*/
def getOutputCollation(expr: Seq[Expression]): StringType = {
val explicitTypes = expr.filter(_.isInstanceOf[Collate])
.map(_.dataType.asInstanceOf[StringType].collationId)
.distinct

explicitTypes.size match {
// We have 1 explicit collation
case 1 => StringType(explicitTypes.head)
// Multiple explicit collations occurred
case size if size > 1 =>
throw QueryCompilationErrors
.explicitCollationMismatchError(
explicitTypes.map(t => StringType(t).typeName)
)
// Only implicit or default collations present
case 0 =>
val implicitTypes = expr.map(_.dataType)
.filter(hasStringType)
.map(extractStringType)
cloud-fan marked this conversation as resolved.
Show resolved Hide resolved
.filter(dt => dt.collationId != SQLConf.get.defaultStringType.collationId)
cloud-fan marked this conversation as resolved.
Show resolved Hide resolved
.distinctBy(_.collationId)

if (hasMultipleImplicits(implicitTypes)) {
mihailom-db marked this conversation as resolved.
Show resolved Hide resolved
throw QueryCompilationErrors.implicitCollationMismatchError()
}
else {
implicitTypes.headOption.getOrElse(SQLConf.get.defaultStringType)
}
}
}

/**
* This check is always preformed when we have no explicit collation. It returns true
* if there are more than one implicit collations. Collations are distinguished by their
* collationId.
* @param dataTypes
* @return
*/
private def hasMultipleImplicits(dataTypes: Seq[StringType]): Boolean =
dataTypes.map(_.collationId)
.filter(dt => !(dt == SQLConf.get.defaultStringType.collationId)).distinct.size > 1

}
Loading