Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Spark loader Task not serializable #471

Merged
merged 14 commits into from
May 22, 2023
Merged

Conversation

haohao0103
Copy link
Contributor

@haohao0103 haohao0103 commented May 19, 2023

fix #467

@imbajin imbajin changed the title fix:Spark loader Task not serializableAlan zhao fix: Spark loader Task not serializable May 19, 2023
@@ -77,7 +77,7 @@ public class HugeGraphSparkLoader implements Serializable {
private final LoadOptions loadOptions;
private final Map<ElementBuilder, List<GraphElement>> builders;

private final ExecutorService executor;
private final transient ExecutorService executor;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the root problem? And how to ensure it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i think the root cause of the problem is that we later used the specific properties of hugegraphsparkloader when building the vertex and edge data, such as: loadoptions,builders; spark needs to transfer the entire hugegraphsparkloader object over the network, so it requires both the hugegraphsparkloader itself and its properties to implement a serialization interface. However, the ExecutorService type does not implement serialization, so it throws an error that the task cannot be serialized. Using transient allows us to avoid serializing the ExecutorService executor when serializing hugegraphsparkloader, because we will not need the ExecutorService executor during the vertexs/edges construction process after submitting the job. So I think we can solve this problem by adding transient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

@codecov
Copy link

codecov bot commented May 19, 2023

Codecov Report

Merging #471 (345ef1c) into master (55f4282) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master     #471   +/-   ##
=========================================
  Coverage     62.51%   62.51%           
  Complexity      894      894           
=========================================
  Files            91       91           
  Lines          4396     4396           
  Branches        516      516           
=========================================
  Hits           2748     2748           
  Misses         1445     1445           
  Partials        203      203           
Impacted Files Coverage Δ
...e/hugegraph/loader/spark/HugeGraphSparkLoader.java 0.00% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@imbajin imbajin requested a review from simon824 May 19, 2023 11:07
@imbajin imbajin merged commit 44a0818 into apache:master May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Spark loader Task not serializable
3 participants