Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Spark loader Task not serializable #467

Closed
1 task done
haohao0103 opened this issue May 18, 2023 · 0 comments · Fixed by #471
Closed
1 task done

[Bug] Spark loader Task not serializable #467

haohao0103 opened this issue May 18, 2023 · 0 comments · Fixed by #471
Labels
bug Something isn't working

Comments

@haohao0103
Copy link
Contributor

haohao0103 commented May 18, 2023

Bug Type (问题类型)

None

Before submit

  • I had searched in the issues and found no similar issues.

Environment (环境信息)

  • Server Version: v1.0.0
  • Toolchain Version: v1.0.0

Expected & Actual behavior (期望与实际表现)

error message:
ERROR Printer: Failed to start loading, cause: org.apache.spark.SparkException: Task not serializable
java.util.concurrent.ExecutionException: org.apache.spark.SparkException: Task not serializable
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at org.apache.hugegraph.loader.spark.HugeGraphSparkLoader.load(HugeGraphSparkLoader.java:193)
        at org.apache.hugegraph.loader.spark.HugeGraphSparkLoader.main(HugeGraphSparkLoader.java:88)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:966)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:191)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:214)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1054)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1063)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Task not serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:416)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:406)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:2487)
        at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$1(RDD.scala:1019)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:1018)
        at org.apache.spark.sql.Dataset.$anonfun$foreachPartition$1(Dataset.scala:2912)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.sql.Dataset.$anonfun$withNewRDDExecutionId$1(Dataset.scala:3695)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.Dataset.withNewRDDExecutionId(Dataset.scala:3693)
        at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2912)
        at org.apache.spark.sql.Dataset.foreachPartition(Dataset.scala:2923)
        at org.apache.hugegraph.loader.spark.HugeGraphSparkLoader.lambda$load$1(HugeGraphSparkLoader.java:171)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.NotSerializableException: java.util.concurrent.ThreadPoolExecutor
Serialization stack:
        - object not serializable (class: java.util.concurrent.ThreadPoolExecutor, value: java.util.concurrent.ThreadPoolExecutor@65daf1e0[Running, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 0])
        - field (class: org.apache.hugegraph.loader.spark.HugeGraphSparkLoader, name: executor, type: interface java.util.concurrent.ExecutorService)
        - object (class org.apache.hugegraph.loader.spark.HugeGraphSparkLoader, org.apache.hugegraph.loader.spark.HugeGraphSparkLoader@3a21317)
        - element of array (index: 0)
        - array (class [Ljava.lang.Object;, size 2)
        - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
        - object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class org.apache.hugegraph.loader.spark.HugeGraphSparkLoader, functionalInterfaceMethod=org/apache/spark/api/java/function/ForeachPartitionFunction.call:(Ljava/util/Iterator;)V, implementation=invokeSpecial org/apache/hugegraph/loader/spark/HugeGraphSparkLoader.lambda$null$18e75a97$1:(Lorg/apache/hugegraph/loader/mapping/InputStruct;Ljava/util/Iterator;)V, instantiatedMethodType=(Ljava/util/Iterator;)V, numCaptured=2]);

CMD :
sh bin/hugegraph-spark-loader.sh --master local --name spark-hugegraph-loader --file example/spark/struct.json --username admin --token admin --host 127.0.0.1 --port 8080 --graph graph-test

HugeGraphSparkLoader implement Serializable, but Its ExecutorService executor property does not implement Serializable; So that's the problem. I tried to use transient to identify the property, and it worked

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

@haohao0103 haohao0103 added the bug Something isn't working label May 18, 2023
@imbajin imbajin changed the title spark loader Task not serializable [Bug] Spark loader Task not serializable May 18, 2023
haohao0103 pushed a commit to haohao0103/incubator-hugegraph-toolchain that referenced this issue May 19, 2023
[Bug] Spark loader Task not serializable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant