Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtimes #11184

Open
pxLi opened this issue Jul 15, 2024 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@pxLi
Copy link
Collaborator

pxLi commented Jul 15, 2024

Describe the bug
started on Jul 13, a lot of our IT cases started failing in DB 13.3 runtime

java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/apache/spark/sql/types/StructType;Lscala/Option;Lscala/collection/Seq;Lscala/Option;Ljava/lang/String;JJLjava/lang/String;Lscala/collection/immutable/Map;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/Seq;ZZLscala/collection/immutable/Map;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Set;Lorg/apache/spark/sql/catalyst/catalog/DeltaRuntimeProperties;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/Seq;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Set;Lscala/Option;)Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;

[2024-07-13T16:01:05.647Z] E                   : java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/apache/spark/sql/types/StructType;Lscala/Option;Lscala/collection/Seq;Lscala/Option;Ljava/lang/String;JJLjava/lang/String;Lscala/collection/immutable/Map;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/Seq;ZZLscala/collection/immutable/Map;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Set;Lorg/apache/spark/sql/catalyst/catalog/DeltaRuntimeProperties;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/Seq;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/immutable/Set;Lscala/Option;)Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;

[2024-07-13T16:01:05.647Z] E                   	at org.apache.spark.sql.rapids.shims.GpuCreateDataSourceTableAsSelectCommand.run(GpuCreateDataSourceTableAsSelectCommandShims.scala:89)

[2024-07-13T16:01:05.647Z] E                   	at com.nvidia.spark.rapids.GpuExecutedCommandExec.sideEffectResult$lzycompute(GpuExecutedCommandExec.scala:52)

[2024-07-13T16:01:05.648Z] E                   	at com.nvidia.spark.rapids.GpuExecutedCommandExec.sideEffectResult(GpuExecutedCommandExec.scala:50)

[2024-07-13T16:01:05.648Z] E                   	at com.nvidia.spark.rapids.GpuExecutedCommandExec.executeCollect(GpuExecutedCommandExec.scala:61)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:286)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:166)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:286)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$9(SQLExecution.scala:303)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:533)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:226)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1148)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:155)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:482)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:285)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:259)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:280)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:265)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:465)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:69)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:465)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:39)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:339)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:335)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:39)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:441)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:265)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:395)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:265)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:217)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:214)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:356)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:956)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:797)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:774)

[2024-07-13T16:01:05.648Z] E                   	at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:654)

[2024-07-13T16:01:05.648Z] E                   	at sun.reflect.GeneratedMethodAccessor444.invoke(Unknown Source)

[2024-07-13T16:01:05.648Z] E                   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

[2024-07-13T16:01:05.648Z] E                   	at java.lang.reflect.Method.invoke(Method.java:498)

[2024-07-13T16:01:05.648Z] E                   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

[2024-07-13T16:01:05.648Z] E                   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)

[2024-07-13T16:01:05.648Z] E                   	at py4j.Gateway.invoke(Gateway.java:306)

[2024-07-13T16:01:05.648Z] E                   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

[2024-07-13T16:01:05.648Z] E                   	at py4j.commands.CallCommand.execute(CallCommand.java:79)

[2024-07-13T16:01:05.648Z] E                   	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)

[2024-07-13T16:01:05.648Z] E                   	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)

[2024-07-13T16:01:05.648Z] E                   	at java.lang.Thread.run(Thread.java:750)

e.g. cases


[2024-07-13T16:01:05.648Z] FAILED ../../src/main/python/parquet_test.py::test_buckets[-reader_confs0][DATAGEN_SEED=1720883956, TZ=UTC, IGNORE_ORDER, ALLOW_NON_GPU(DataWritingCommandExec,ExecutedCommandExec,WriteFilesExec)] - py4j.protocol.Py4JJavaError: An error occurred while calling o551437.saveAs...
[2024-07-13T16:01:05.648Z] FAILED ../../src/main/python/parquet_test.py::test_buckets[-reader_confs1][DATAGEN_SEED=1720883956, TZ=UTC, IGNORE_ORDER, ALLOW_NON_GPU(DataWritingCommandExec,ExecutedCommandExec,WriteFilesExec)] - py4j.protocol.Py4JJavaError: An error occurred while calling o551645.saveAs...
[2024-07-13T16:01:05.648Z] FAILED ../../src/main/python/parquet_test.py::test_buckets[-reader_confs2][DATAGEN_SEED=1720883956, TZ=UTC, IGNORE_ORDER, ALLOW_NON_GPU(DataWritingCommandExec,ExecutedCommandExec,WriteFilesExec)] - py4j.protocol.Py4JJavaError: An error occurred while calling 


[2024-07-13T15:01:08.386Z] =========================== short test summary info ============================

[2024-07-13T15:01:08.386Z] FAILED ../../src/main/python/explain_test.py::test_explain_bucketd_scan[DATAGEN_SEED=1720882809, TZ=UTC, ALLOW_NON_GPU(ANY)] - py4j.protocol.Py4JJavaError: An error occurred while calling o735.saveAsTable.
[2024-07-13T15:01:08.386Z] FAILED ../../src/main/python/explain_test.py::test_explain_bucket_column_not_read[DATAGEN_SEED=1720882809, TZ=UTC, ALLOW_NON_GPU(ANY)] - py4j.protocol.Py4JJavaError: An error occurred while calling o839.saveAsTable.

Steps/Code to reproduce bug
run parquet cases on databricks 13.3 runtime

Expected behavior
Pass the test

Environment details (please complete the following information)

  • Environment location: Databricks
  • Spark configuration settings related to the issue

Additional context
Add any other context about the problem here.

@pxLi pxLi added bug Something isn't working ? - Needs Triage Need team to review and classify and removed ? - Needs Triage Need team to review and classify labels Jul 15, 2024
@pxLi
Copy link
Collaborator Author

pxLi commented Jul 15, 2024

UPDATE: this is repro only on Azure 13.3 runtime

@pxLi pxLi changed the title [BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtime [BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Azure Databricks 13.3 runtime Jul 15, 2024
@sameerz
Copy link
Collaborator

sameerz commented Jul 15, 2024

@pxLi are we building the DBR 13.3 shim on AWS Databricks and then running tests on Azure Databricks?

@jlowe
Copy link
Member

jlowe commented Jul 15, 2024

At first I could not replicate this on Azure Databricks, but then I discovered I was using a different Azure Databricks URL than the one CI is using. When I use the same Azure Databricks URL I'm able to replicate the issue, which implies there is a change that has been pushed to one Databricks Azure workspace but not another that breaks the plugin.

@pxLi
Copy link
Collaborator Author

pxLi commented Jul 16, 2024

@pxLi are we building the DBR 13.3 shim on AWS Databricks and then running tests on Azure Databricks?

Yes we do, and we run the build+deploy after we pass the tests on AWS databricks runtime, and we will run IT only on azure every 2 weekdays to double confirm our plugin could work on different csp DB instances, and apparently this time azure 13.3 LTS runtime is not identical as AWS. and as Jason mentioned, different URLs could result different runtimes even all in azure

@pxLi
Copy link
Collaborator Author

pxLi commented Jul 16, 2024

Now it failed AWS 13.3 runtime too. Looks like the runtime has been rolled out...

current hashes from select current_version();

azure 13.3:
{"dbr_version":null,"dbsql_version":null,"u_build_hash":"80cb8aa4b7284dc3c0f8047e102517d3f6326f84","r_build_hash":"4e8b4bdede528ea22ac005b80e72035f5cd0b293"}
aws 13.3:
{"dbr_version":null,"dbsql_version":null,"u_build_hash":"80cb8aa4b7284dc3c0f8047e102517d3f6326f84","r_build_hash":"4e8b4bdede528ea22ac005b80e72035f5cd0b293"}

Unfortunately, we didn't record the hashes of previous images for comparison.

It turns out everything works fine when both build and test are done in the same runtime. The error seems to be related to the inconsistent runtime versions caused by Databrick's way to upgrade the runtimes (so we failed using artifact an older AWS runtime but testing on an upgraded Azure runtime, which we cannot control)

and now AWS(our CI region) got upgraded after 2/3 days of Azure one

@pxLi
Copy link
Collaborator Author

pxLi commented Jul 17, 2024

close as all 13.3 runtimes become consistent in the AWS and Azure regions that we are using.

@jlowe
Copy link
Member

jlowe commented Jul 18, 2024

Saw this again in the nightly Azure Databricks test run.

@jlowe jlowe reopened this Jul 18, 2024
@jlowe
Copy link
Member

jlowe commented Jul 18, 2024

The most recent failure may be related to using a stale artifact that was built before the runtime was updated. Leaving this open until the nightly Azure Databricks test pipeline succeeds.

@pxLi
Copy link
Collaborator Author

pxLi commented Jul 19, 2024

Saw this again in the nightly Azure Databricks test run.

I think you were seeing this in the 24.06.0 jar (post release test),
we are preparing and will release 24.06.1 soon to fix the issue.

I will close this after 24.06.1 gets released

@pxLi
Copy link
Collaborator Author

pxLi commented Jul 23, 2024

24.06.1 has been released and passed the post-release CI today.

Please file a new ticket if any new issues arise due to the DB runtime upgrade. Thank you!

@pxLi pxLi closed this as completed Jul 23, 2024
@pxLi pxLi reopened this Jul 24, 2024
@pxLi
Copy link
Collaborator Author

pxLi commented Jul 24, 2024

OK, now we met the same issue. the new-built plugin could not support old runtime (users could still use a cluster with old images or DB does not plan to update their csp regions, they do not want to share us the plan/cadance) and we do not have old/specific images to rebuild it with the latest fix

unless we keep DB clusters forever for each hashe-versions
or ask users to build DB shims directly on their ENV
or force them to stop all jobs and restart cluster (new a cluster) to come up with the new runtime image

ref: https://docs.databricks.com/en/release-notes/runtime/maintenance-updates.html#databricks-runtime-maintenance-updates

To add a maintenance update to an existing cluster, restart the cluster. 
For the maintenance updates on unsupported Databricks Runtime versions, 
see Maintenance updates for Databricks Runtime (archived).

cc @sameerz @GaryShen2008

@pxLi pxLi changed the title [BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Azure Databricks 13.3 runtime [BUG] java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable... in Databricks 13.3 runtimes Jul 24, 2024
@sameerz
Copy link
Collaborator

sameerz commented Jul 24, 2024

Going forward we need to consider supporting the old and new APIs for changes made in the Databricks environments. Otherwise users will face problems taking a newer jar and running on an already existing older cluster.

@pxLi
Copy link
Collaborator Author

pxLi commented Jul 25, 2024

moving this to 24.10 for further discussion if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants