Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema mismatch failure error message for parquet vectorized reader #11446

Open
Tracked by #11380
Feng-Jiang28 opened this issue Sep 9, 2024 · 0 comments
Open
Tracked by #11380
Labels
bug Something isn't working

Comments

@Feng-Jiang28
Copy link
Collaborator

Feng-Jiang28 commented Sep 9, 2024

When runing the code below which causes schema mismatch failure to interception, the Spark test case is expecting: eror message like this, "Column: [a], Expected: string, Found: INT32". But plugin gives "Column: a, Expected: string, Found: required int32 a
". It is the same thing, just the formats are different.
Reproduce:

spark.conf.set("spark.sql.parquet.enableVectorizedReader", "true")
Seq(("bcd", 2)).toDF("a", "b").coalesce(1).write.mode("overwrite").parquet("/home/fejiang/Documents/temp5")
Seq((1, "abc")).toDF("a", "b").coalesce(1).write.mode("append").parquet("/home/fejiang/Documents/temp5")
spark.read.parquet("/home/fejiang/Documents/temp5").collect()

CPU:

scala> spark.read.parquet("/home/fejiang/Documents/temp5").collect()
24/09/09 10:04:03 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 6)
org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file file:///home/fejiang/Documents/temp5/part-00000-f417699a-491e-4c03-85c9-18edf1aef53f-c000.snappy.parquet. Column: [a], Expected: string, Found: INT32
	at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:706)

GPU:

scala> spark.read.parquet("/home/fejiang/Documents/temp5").collect()
24/09/09 10:05:09 ERROR Executor: Exception in task 1.0 in stage 7.0 (TID 9)
org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file file:///home/fejiang/Documents/temp5/part-00000-f2e48e82-663b-4236-8b1f-387a9eadc848-c000.snappy.parquet. Column: a, Expected: string, Found: required int32 a
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.throwTypeIncompatibleError(GpuParquetScan.scala:1025)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$12(GpuParquetScan.scala:757)
	at com.nvidia.spark.rapids.GpuParquetFileFilterHandler.$anonfun$filterBlocks$12$adapted(GpuParquetScan.scala:757)
@Feng-Jiang28 Feng-Jiang28 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Sep 9, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants