Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Dataproc notebook IT test failure - NoSuchMethodError: org.apache.spark.network.util.ByteUnit.toBytes #4787

Closed
tgravescs opened this issue Feb 15, 2022 · 10 comments
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@tgravescs
Copy link
Collaborator

Describe the bug
the dataproc notebook integration test build is failing with:

03:55:48  Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
03:55:48  : java.lang.NoSuchMethodError: org.apache.spark.network.util.ByteUnit.toBytes(J)D
03:55:48  	at com.nvidia.spark.rapids.RapidsConf$.<init>(RapidsConf.scala:311)
03:55:48  	at com.nvidia.spark.rapids.RapidsConf$.<clinit>(RapidsConf.scala)
03:55:48  	at com.nvidia.spark.rapids.RapidsPluginUtils$.fixupConfigs(Plugin.scala:130)
03:55:48  	at com.nvidia.spark.rapids.RapidsDriverPlugin.init(Plugin.scala:168)
03:55:48  	at org.apache.spark.internal.plugin.DriverPluginContainer.$anonfun$driverPlugins$1(PluginContainer.scala:53)
03:55:48  	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
03:55:48  	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
03:55:48  	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
03:55:48  	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
03:55:48  	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
03:55:48  	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
03:55:48  	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
03:55:48  	at org.apache.spark.internal.plugin.DriverPluginContainer.<init>(PluginContainer.scala:46)
03:55:48  	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:210)
03:55:48  	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:193)
03:55:48  	at org.apache.spark.SparkContext.<init>(SparkContext.scala:554)
03:55:48  	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
03:55:48  	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
03:55:48  	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
03:55:48  	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
03:55:48  	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
03:55:48  	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
03:55:48  	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
03:55:48  	at py4j.Gateway.invoke(Gateway.java:238)
03:55:48  	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
03:55:48  	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
03:55:48  	at py4j.GatewayConnection.run(GatewayConnection.java:238)
03:55:48  	at java.lang.Thread.run(Thread.java:748)
@tgravescs tgravescs added bug Something isn't working ? - Needs Triage Need team to review and classify P0 Must have for release labels Feb 15, 2022
@tgravescs
Copy link
Collaborator Author

My guess is either we have a setup issue or perhaps they changed the Spark function.

@revans2
Copy link
Collaborator

revans2 commented Feb 15, 2022

https://github.com/apache/spark/blob/master/common/network-common/src/main/java/org/apache/spark/network/util/ByteUnit.java has not changed since 2019, so my guess is that it is a setup issue.

@tgravescs
Copy link
Collaborator Author

Dataproc could have internally changed in their version, but seems like a weird change so agree likely setup issue.

@sameerz
Copy link
Collaborator

sameerz commented Feb 15, 2022

@NvTimLiu are you investigating whether we have a possible setup issue?

cc: @GaryShen2008

@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Feb 15, 2022
@NvTimLiu
Copy link
Collaborator

checking

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Feb 18, 2022

In this failure Dataproc notebook job(rapids_Dataproc_Notebook), copied cudf/rapids jar to spark/jars, then even start up a local spark, the error occurred as below:

export SPARK_CONF_DIR=/ && spark-shell --master local[*] --conf spark.plugins=com.nvidia.spark.SQLPlugin

sa_116163337916449219958@spark-rapids-notebook-cuda11-2-w-0:/usr/lib/spark/conf$ spark-shell --master local
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/02/18 13:42:43 INFO org.apache.spark.SparkEnv: Registering MapOutputTracker
22/02/18 13:42:43 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/02/18 13:42:43 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/02/18 13:42:43 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/02/18 13:42:43 WARN com.nvidia.spark.rapids.RapidsPluginUtils: RAPIDS Accelerator 22.04.0-SNAPSHOT using cudf 22.04.0-SNAPSHOT.
java.lang.NoSuchMethodError: org.apache.spark.network.util.ByteUnit.toBytes(J)D
  at com.nvidia.spark.rapids.RapidsConf$.<init>(RapidsConf.scala:311)
  at com.nvidia.spark.rapids.RapidsConf$.<clinit>(RapidsConf.scala)
  at com.nvidia.spark.rapids.RapidsPluginUtils$.fixupConfigs(Plugin.scala:130)
  at com.nvidia.spark.rapids.RapidsDriverPlugin.init(Plugin.scala:168)
  at org.apache.spark.internal.plugin.DriverPluginContainer.$anonfun$driverPlugins$1(PluginContainer.scala:53)
  at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
  at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
  at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
  at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
  at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
  at org.apache.spark.internal.plugin.DriverPluginContainer.<init>(PluginContainer.scala:46)
  at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:210)
  at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:193)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:554)
  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2680)
  at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:945)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
  ... 55 elided
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql
              ^
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
      /_/

Using Scala version 2.12.14 (OpenJDK 64-Bit Server VM, Java 1.8.0_312)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

@NvTimLiu
Copy link
Collaborator

Another nightly IT Dataproc job(rapids_it-Datapro) runs well, we found it transform cudf/rapids to yarn nodes via --jars instead of copy them to spark/jars

We changed to use --jars for the failed job(rapids_Dataproc_Notebook), it can PASS, too.

I still do not know the diff between --jars and copying jars to spark/jars, will continue investigating

@tgravescs
Copy link
Collaborator Author

check to see if dataproc is using a prepackaged spark version. Meaning copying them into the spark/jars only gets picked up on the client host but not on the node managers. Many times on YARN you can have a distribution in HDFS so its not copied from the client machine.

Look for configs: spark.yarn.archive or spark.yarn.jars

@NvTimLiu NvTimLiu self-assigned this Feb 21, 2022
@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Feb 21, 2022

I found the problem, we copied rapids-4-spark-sql-meta_2.11 into spark/jars dir, which messed the spark jars loader.

+ gcloud compute scp --zone us-central1-b workspace/cuda11/cudf-22.04.0-SNAPSHOT-cuda11.jar workspace/cuda11/rapids-4-spark-integration-tests_2.12-22.04.0-SNAPSHOT-spark312.jar workspace/cuda11/rapids-4-spark-sql-meta_2.11-22.04.0-SNAPSHOT-spark24.jar workspace/cuda11/rapids-4-spark-udf-examples_2.12-22.04.0-SNAPSHOT.jar workspace/cuda11/rapids-4-spark_2.12-22.04.0-SNAPSHOT.jar user@spark-rapids-xxx-m:/home/user

I've updated our Jenkins scripts NOT copying this jars to Dataproc spark/jars

@NvTimLiu
Copy link
Collaborator

MR686 got merged, the latest job rapids_Dataproc_Notebook/436 got PASS. Got this issue fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

No branches or pull requests

5 participants