[BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtimes #11184

pxLi · 2024-07-15T00:27:04Z

Describe the bug
started on Jul 13, a lot of our IT cases started failing in DB 13.3 runtime

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/apache/spark/sql/types/StructType;Lscala/Option;Lscala/collection/Seq;Lscala/Option;Ljava/lang/String;JJLjava/lang/String;Lscala/collection/immutable/Map;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/Seq;ZZLscala/collection/immutable/Map;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Set;Lorg/apache/spark/sql/catalyst/catalog/DeltaRuntimeProperties;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/Seq;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Set;Lscala/Option;)Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;

[2024-07-13T16:01:05.647Z] E                   : java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/apache/spark/sql/types/StructType;Lscala/Option;Lscala/collection/Seq;Lscala/Option;Ljava/lang/String;JJLjava/lang/String;Lscala/collection/immutable/Map;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/Seq;ZZLscala/collection/immutable/Map;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Set;Lorg/apache/spark/sql/catalyst/catalog/DeltaRuntimeProperties;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/Seq;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Set;Lscala/Option;)Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;

[2024-07-13T16:01:05.647Z] E                   	at org.apache.spark.sql.rapids.shims.GpuCreateDataSourceTableAsSelectCommand.run(GpuCreateDataSourceTableAsSelectCommandShims.scala:89)

[2024-07-13T16:01:05.647Z] E                   	at com.nvidia.spark.rapids.GpuExecutedCommandExec.sideEffectResult$lzycompute(GpuExecutedCommandExec.scala:52)

[2024-07-13T16:01:05.648Z] E                   	at com.nvidia.spark.rapids.GpuExecutedCommandExec.sideEffectResult(GpuExecutedCommandExec.scala:50)

[2024-07-13T16:01:05.648Z] E                   	at com.nvidia.spark.rapids.GpuExecutedCommandExec.executeCollect(GpuExecutedCommandExec.scala:61)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:286)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:166)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:286)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$9(SQLExecution.scala:303)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:533)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:226)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1148)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:155)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:482)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:285)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:259)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:280)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:265)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:465)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:69)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:465)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:39)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:339)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:335)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:441)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:265)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:395)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:265)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:217)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:214)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:356)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:956)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:797)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:774)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:654)

[2024-07-13T16:01:05.648Z] E                   	at sun.reflect.GeneratedMethodAccessor444.invoke(Unknown Source)

[2024-07-13T16:01:05.648Z] E                   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

[2024-07-13T16:01:05.648Z] E                   	at java.lang.reflect.Method.invoke(Method.java:498)

[2024-07-13T16:01:05.648Z] E                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

[2024-07-13T16:01:05.648Z] E                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)

[2024-07-13T16:01:05.648Z] E                   	at py4j.Gateway.invoke(Gateway.java:306)

[2024-07-13T16:01:05.648Z] E                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

[2024-07-13T16:01:05.648Z] E                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)

[2024-07-13T16:01:05.648Z] E                   	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)

[2024-07-13T16:01:05.648Z] E                   	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)

[2024-07-13T16:01:05.648Z] E                   	at java.lang.Thread.run(Thread.java:750)

e.g. cases


[2024-07-13T16:01:05.648Z] FAILED ../../src/main/python/parquet_test.py::test_buckets[-reader_confs0][DATAGEN_SEED=1720883956, TZ=UTC, IGNORE_ORDER, ALLOW_NON_GPU(DataWritingCommandExec,ExecutedCommandExec,WriteFilesExec)] - py4j.protocol.Py4JJavaError: An error occurred while calling o551437.saveAs...
[2024-07-13T16:01:05.648Z] FAILED ../../src/main/python/parquet_test.py::test_buckets[-reader_confs1][DATAGEN_SEED=1720883956, TZ=UTC, IGNORE_ORDER, ALLOW_NON_GPU(DataWritingCommandExec,ExecutedCommandExec,WriteFilesExec)] - py4j.protocol.Py4JJavaError: An error occurred while calling o551645.saveAs...
[2024-07-13T16:01:05.648Z] FAILED ../../src/main/python/parquet_test.py::test_buckets[-reader_confs2][DATAGEN_SEED=1720883956, TZ=UTC, IGNORE_ORDER, ALLOW_NON_GPU(DataWritingCommandExec,ExecutedCommandExec,WriteFilesExec)] - py4j.protocol.Py4JJavaError: An error occurred while calling 


[2024-07-13T15:01:08.386Z] =========================== short test summary info ============================

[2024-07-13T15:01:08.386Z] FAILED ../../src/main/python/explain_test.py::test_explain_bucketd_scan[DATAGEN_SEED=1720882809, TZ=UTC, ALLOW_NON_GPU(ANY)] - py4j.protocol.Py4JJavaError: An error occurred while calling o735.saveAsTable.
[2024-07-13T15:01:08.386Z] FAILED ../../src/main/python/explain_test.py::test_explain_bucket_column_not_read[DATAGEN_SEED=1720882809, TZ=UTC, ALLOW_NON_GPU(ANY)] - py4j.protocol.Py4JJavaError: An error occurred while calling o839.saveAsTable.

Steps/Code to reproduce bug
run parquet cases on databricks 13.3 runtime

Expected behavior
Pass the test

Environment details (please complete the following information)

Environment location: Databricks
Spark configuration settings related to the issue

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

pxLi · 2024-07-15T00:56:51Z

UPDATE: this is repro only on Azure 13.3 runtime

sameerz · 2024-07-15T16:17:32Z

@pxLi are we building the DBR 13.3 shim on AWS Databricks and then running tests on Azure Databricks?

jlowe · 2024-07-15T19:25:11Z

At first I could not replicate this on Azure Databricks, but then I discovered I was using a different Azure Databricks URL than the one CI is using. When I use the same Azure Databricks URL I'm able to replicate the issue, which implies there is a change that has been pushed to one Databricks Azure workspace but not another that breaks the plugin.

pxLi · 2024-07-16T00:09:19Z

@pxLi are we building the DBR 13.3 shim on AWS Databricks and then running tests on Azure Databricks?

Yes we do, and we run the build+deploy after we pass the tests on AWS databricks runtime, and we will run IT only on azure every 2 weekdays to double confirm our plugin could work on different csp DB instances, and apparently this time azure 13.3 LTS runtime is not identical as AWS. and as Jason mentioned, different URLs could result different runtimes even all in azure

pxLi · 2024-07-16T02:44:14Z

Now it failed AWS 13.3 runtime too. Looks like the runtime has been rolled out...

current hashes from select current_version();

azure 13.3:
{"dbr_version":null,"dbsql_version":null,"u_build_hash":"80cb8aa4b7284dc3c0f8047e102517d3f6326f84","r_build_hash":"4e8b4bdede528ea22ac005b80e72035f5cd0b293"}
aws 13.3:
{"dbr_version":null,"dbsql_version":null,"u_build_hash":"80cb8aa4b7284dc3c0f8047e102517d3f6326f84","r_build_hash":"4e8b4bdede528ea22ac005b80e72035f5cd0b293"}

Unfortunately, we didn't record the hashes of previous images for comparison.

It turns out everything works fine when both build and test are done in the same runtime. The error seems to be related to the inconsistent runtime versions caused by Databrick's way to upgrade the runtimes (so we failed using artifact an older AWS runtime but testing on an upgraded Azure runtime, which we cannot control)

and now AWS(our CI region) got upgraded after 2/3 days of Azure one

pxLi · 2024-07-17T00:20:47Z

close as all 13.3 runtimes become consistent in the AWS and Azure regions that we are using.

jlowe · 2024-07-18T18:20:43Z

Saw this again in the nightly Azure Databricks test run.

jlowe · 2024-07-18T18:24:42Z

The most recent failure may be related to using a stale artifact that was built before the runtime was updated. Leaving this open until the nightly Azure Databricks test pipeline succeeds.

pxLi · 2024-07-19T08:18:57Z

Saw this again in the nightly Azure Databricks test run.

I think you were seeing this in the 24.06.0 jar (post release test),
we are preparing and will release 24.06.1 soon to fix the issue.

I will close this after 24.06.1 gets released

pxLi · 2024-07-23T05:08:54Z

24.06.1 has been released and passed the post-release CI today.

Please file a new ticket if any new issues arise due to the DB runtime upgrade. Thank you!

pxLi · 2024-07-24T02:32:50Z

OK, now we met the same issue. the new-built plugin could not support old runtime (users could still use a cluster with old images or DB does not plan to update their csp regions, they do not want to share us the plan/cadance) and we do not have old/specific images to rebuild it with the latest fix

unless we keep DB clusters forever for each hashe-versions
or ask users to build DB shims directly on their ENV
or force them to stop all jobs and restart cluster (new a cluster) to come up with the new runtime image

ref: https://docs.databricks.com/en/release-notes/runtime/maintenance-updates.html#databricks-runtime-maintenance-updates

To add a maintenance update to an existing cluster, restart the cluster. 
For the maintenance updates on unsupported Databricks Runtime versions, 
see Maintenance updates for Databricks Runtime (archived).

cc @sameerz @GaryShen2008

sameerz · 2024-07-24T23:43:18Z

Going forward we need to consider supporting the old and new APIs for changes made in the Databricks environments. Otherwise users will face problems taking a newer jar and running on an already existing older cluster.

pxLi · 2024-07-25T00:25:11Z

moving this to 24.10 for further discussion if needed

pxLi added bug Something isn't working ? - Needs Triage Need team to review and classify and removed ? - Needs Triage Need team to review and classify labels Jul 15, 2024

pxLi changed the title ~~[BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtime~~ [BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Azure Databricks 13.3 runtime Jul 15, 2024

mattahrens assigned jlowe Jul 15, 2024

pxLi mentioned this issue Jul 16, 2024

[test] Add "select current_version();" to print version hashes in databricks runtimes #11195

Closed

pxLi closed this as completed Jul 17, 2024

pxLi mentioned this issue Jul 17, 2024

Improve Databricks runtime shim detection #8587

Open

jlowe reopened this Jul 18, 2024

pxLi closed this as completed Jul 23, 2024

pxLi reopened this Jul 24, 2024

pxLi changed the title ~~[BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Azure Databricks 13.3 runtime~~ [BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtimes Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtimes #11184

[BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtimes #11184

pxLi commented Jul 15, 2024 •

edited

Loading

pxLi commented Jul 15, 2024

sameerz commented Jul 15, 2024

jlowe commented Jul 15, 2024 •

edited

Loading

pxLi commented Jul 16, 2024 •

edited

Loading

pxLi commented Jul 16, 2024 •

edited

Loading

pxLi commented Jul 17, 2024

jlowe commented Jul 18, 2024

jlowe commented Jul 18, 2024

pxLi commented Jul 19, 2024 •

edited

Loading

pxLi commented Jul 23, 2024

pxLi commented Jul 24, 2024 •

edited

Loading

sameerz commented Jul 24, 2024

pxLi commented Jul 25, 2024

[BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtimes #11184

[BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtimes #11184

Comments

pxLi commented Jul 15, 2024 • edited Loading

pxLi commented Jul 15, 2024

sameerz commented Jul 15, 2024

jlowe commented Jul 15, 2024 • edited Loading

pxLi commented Jul 16, 2024 • edited Loading

pxLi commented Jul 16, 2024 • edited Loading

pxLi commented Jul 17, 2024

jlowe commented Jul 18, 2024

jlowe commented Jul 18, 2024

pxLi commented Jul 19, 2024 • edited Loading

pxLi commented Jul 23, 2024

pxLi commented Jul 24, 2024 • edited Loading

sameerz commented Jul 24, 2024

pxLi commented Jul 25, 2024

pxLi commented Jul 15, 2024 •

edited

Loading

jlowe commented Jul 15, 2024 •

edited

Loading

pxLi commented Jul 16, 2024 •

edited

Loading

pxLi commented Jul 16, 2024 •

edited

Loading

pxLi commented Jul 19, 2024 •

edited

Loading

pxLi commented Jul 24, 2024 •

edited

Loading