-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make hive column matches not case-sensitive #11327
Conversation
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
build |
val distinctFields = distinctColumns.map(a => tableSchema.apply(a.name)) | ||
// In hive column names are case-insensitive but the default tableSchema lookup is | ||
// case-sensitive | ||
val fieldMap = CaseInsensitiveMap(tableSchema.map(f => (f.name, f)).toMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when spark.sql.caseSensitive
is set to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that Hive is always case-insensitive So even if I try to create a table with two columns with different case I get an error from hive.
scala> spark.conf.set("spark.sql.caseSensitive", true)
scala> spark.sql("""create table testcase_text(id int, nAme string, Name string)""").collect
24/08/15 15:38:30 WARN ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead.
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Duplicate column name name in the table definition.
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
at org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:244)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
If I try to put in a name with the wrong case in the query when case sensitive is true, then spark outputs an error in the logical plan phase before the GPU code ever runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Might be good to add the caseSensitive test to verify failure, followup issue is fine, to catch if Spark changes the behavior and fixes that setup, since we'd need to also change at that point.
This fixes #11318
I added in two tests. The partitioning test passes without these changes, but I wanted to be sure that we were doing the right thing.
I didn't add tests for Spark when it is made case sensitive because
spark.sql.caseSensitive = true
because it fails when spark goes to plan it both on the CPU and the GPU before the GPU code ever runs. But I can add tests for that if we really want to verify that.