Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix: without unpersist method in RandomForest.scala #2775

Closed
wants to merge 2 commits into from
Closed

Bug Fix: without unpersist method in RandomForest.scala #2775

wants to merge 2 commits into from

Conversation

omgteam
Copy link
Contributor

@omgteam omgteam commented Oct 12, 2014

During trainning Gradient Boosting Decision Tree on large-scale sparse data, spark spill hundreds of data onto disk. And find the bug below:
In version 1.1.0 DecisionTree.scala, train Method, treeInput has been persisted in Memory, but without unpersist. It caused heavy DISK usage.
In github version(1.2.0 maybe), RandomForest.scala, train Method, baggedInput has been persisted but without unpersisted too.

After added unpersist, it works right.
https://issues.apache.org/jira/browse/SPARK-3918

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

1 similar comment
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mengxr
Copy link
Contributor

mengxr commented Oct 13, 2014

Jenkins, this is ok to test.

@mengxr
Copy link
Contributor

mengxr commented Oct 13, 2014

LGTM. Waiting for Jenkins. Unreferenced RDDs will get cleaned automatically. But it is always good to call unpersist explicitly.

@mengxr
Copy link
Contributor

mengxr commented Oct 13, 2014

test this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21676/
Test FAILed.

@mengxr
Copy link
Contributor

mengxr commented Oct 13, 2014

test this please

@SparkQA
Copy link

SparkQA commented Oct 13, 2014

QA tests have started for PR 2775 at commit 815d543.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 13, 2014

QA tests have finished for PR 2775 at commit 815d543.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21678/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Oct 13, 2014

Merged into master. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants