-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21247][SQL] Type comparison should respect case-sensitive SQL conf #18460
Changes from 8 commits
037120b
cded719
180e1b3
2343d97
8fa884e
b2d1338
c72aa18
67a037c
52d19d3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
|
||
package org.apache.spark.sql.catalyst.analysis | ||
|
||
import java.util.Locale | ||
import javax.annotation.Nullable | ||
|
||
import scala.annotation.tailrec | ||
|
@@ -100,6 +101,17 @@ object TypeCoercion { | |
case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) => | ||
Some(TimestampType) | ||
|
||
case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if t1.sameType(t2) => | ||
Some(StructType(fields1.zip(fields2).map { case (f1, f2) => | ||
// Since `t1.sameType(t2)` is true, two StructTypes have the same DataType | ||
// except `name` (in case of `spark.sql.caseSensitive=false`) and `nullable`. | ||
// - Different names: use a lower case name because findTightestCommonType is commutative. | ||
// - Different nullabilities: `nullable` is true iff one of them is nullable. | ||
val name = if (f1.name == f2.name) f1.name else f1.name.toLowerCase(Locale.ROOT) | ||
val dataType = findTightestCommonType(f1.dataType, f2.dataType).get | ||
StructField(name, dataType, nullable = f1.nullable || f2.nullable) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we follow what we are doing for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See the example, sql("SELECT 1 as a UNION ALL (SELECT 1 as A)").show()
sql("SELECT 1 as A UNION ALL (SELECT 1 as a)").show() There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR works as you want. This function is used to compare the equality only. BTW, for this function, it should use one of lower or upper case because it should be commutative.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The above code changes the case, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, right. It's for commutativity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see TypeCoercionSuite.checkWidenType. In order to use the first type name, we need to loosen this test helper function and to break the existing commutative assumption. I'm ok for that if you want. |
||
})) | ||
|
||
case _ => None | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ import org.json4s.jackson.JsonMethods._ | |
|
||
import org.apache.spark.annotation.InterfaceStability | ||
import org.apache.spark.sql.catalyst.expressions.Expression | ||
import org.apache.spark.sql.internal.SQLConf | ||
import org.apache.spark.util.Utils | ||
|
||
/** | ||
|
@@ -80,7 +81,11 @@ abstract class DataType extends AbstractDataType { | |
* (`StructField.nullable`, `ArrayType.containsNull`, and `MapType.valueContainsNull`). | ||
*/ | ||
private[spark] def sameType(other: DataType): Boolean = | ||
DataType.equalsIgnoreNullability(this, other) | ||
if (SQLConf.get.caseSensitiveAnalysis) { | ||
DataType.equalsIgnoreNullability(this, other) | ||
} else { | ||
DataType.equalsIgnoreCaseAndNullability(this, other) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we already have |
||
} | ||
|
||
/** | ||
* Returns the same data type but set all nullability fields are true | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the output nested column name is lower case? Is Hive behaving like this?
In addition, could you add one more test and check whether we also respect case sensitivity conf when we resolve the queries that contain the nested column in the references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Hive does like the following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a test case, does
nested column in the references
meanWHERE
clause?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do both
tab.Col1.a
andtab.Col1.A
work well when case sensitivity is off?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When case sensitive is off, Spark considers them in lower cases.
For example, in the test case, we need table name
struct1
andstruct2
. In case ofa = A
, it raises ambiquous column exceptions.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, does the case sensitivity conf works in nest column name resolution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Hive, your example works like the following.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I thought so. Let me check that again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It works. I updated the test cases.