You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
GPU query is case sensitive on Hive text table.
Steps/Code to reproduce bug
Spark-SQL:
create table testcase_text(id int, nAme string) ;
insert into testcase_text values(1,'Tom');
select name from testcase_text;
GPU run will fail with :
java.lang.IllegalArgumentException: name does not exist. Available: id, nAme
at org.apache.spark.sql.types.StructType.$anonfun$apply$1(StructType.scala:282)
at scala.collection.immutable.Map$Map2.getOrElse(Map.scala:236)
at org.apache.spark.sql.types.StructType.apply(StructType.scala:281)
at org.apache.spark.sql.hive.rapids.GpuHiveTableScanExec.$anonfun$getRequestedOutputDataSchema$3(GpuHiveTableScanExec.scala:236)
at scala.collection.immutable.List.map(List.scala:293)
at org.apache.spark.sql.hive.rapids.GpuHiveTableScanExec.getRequestedOutputDataSchema(GpuHiveTableScanExec.scala:236)
at org.apache.spark.sql.hive.rapids.GpuHiveTableScanExec.inputRDD$lzycompute(GpuHiveTableScanExec.scala:337)
at org.apache.spark.sql.hive.rapids.GpuHiveTableScanExec.inputRDD(GpuHiveTableScanExec.scala:323)
at org.apache.spark.sql.hive.rapids.GpuHiveTableScanExec.internalDoExecuteColumnar(GpuHiveTableScanExec.scala:359)
at com.nvidia.spark.rapids.GpuExec.doExecuteColumnar(GpuExec.scala:396)
at com.nvidia.spark.rapids.GpuExec.doExecuteColumnar$(GpuExec.scala:394)
at org.apache.spark.sql.hive.rapids.GpuHiveTableScanExec.doExecuteColumnar(GpuHiveTableScanExec.scala:76)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:221)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
at org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:217)
at com.nvidia.spark.rapids.GpuColumnarToRowExec.doExecute(GpuColumnarToRowExec.scala:365)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:194)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:232)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:229)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:190)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:340)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:421)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:451)
at org.apache.spark.sql.execution.HiveResult$.hiveResultString(HiveResult.scala:76)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.$anonfun$run$2(SparkSQLDriver.scala:76)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:76)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:396)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:516)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:510)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:510)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:298)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:973)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1061)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1070)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)```
Note 1: Hive parquet table works fine on GPU.
Note 2: Does not matter what value is set for spark.sql.caseSensitive=true or false, this issue always shows up on GPU.
Expected behavior
CPU run works fine by default and honors spark.sql.caseSensitive:
spark-sql> set spark.rapids.sql.enabled=false;
spark.rapids.sql.enabled false
Time taken: 0.016 seconds, Fetched 1 row(s)
spark-sql> select name from testcase_text;
Tom
Time taken: 2.382 seconds, Fetched 1 row(s)
spark-sql> set spark.sql.caseSensitive=true;
spark.sql.caseSensitive true
Time taken: 0.014 seconds, Fetched 1 row(s)
spark-sql> select name from testcase_text;
Error in query: Column 'name' does not exist. Did you mean one of the following? [spark_catalog.default.testcase_text.id, spark_catalog.default.testcase_text.nAme]; line 1 pos 7;
'Project ['name]
+- SubqueryAlias spark_catalog.default.testcase_text
+- HiveTableRelation [`default`.`testcase_text`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#23, nAme#24], Partition Cols: []]
Environment details (please complete the following information)
Dataproc 2.1
The text was updated successfully, but these errors were encountered:
So this is not the most straight forward thing. It looks like hive is always case insensitive for column name matches, except I have found a few places where people are complaining that partition by columns are case sensitive if the file system that they are running on is case sensitive (everything except some mac file systems).
So for the case of simplicity I am just going to try and make all of the column name matches case insensitive.
Describe the bug
GPU query is case sensitive on Hive text table.
Steps/Code to reproduce bug
Spark-SQL:
GPU run will fail with :
Note 1: Hive parquet table works fine on GPU.
Note 2: Does not matter what value is set for
spark.sql.caseSensitive
=true or false, this issue always shows up on GPU.Expected behavior
CPU run works fine by default and honors
spark.sql.caseSensitive
:Environment details (please complete the following information)
Dataproc 2.1
The text was updated successfully, but these errors were encountered: