Bug Fix: without unpersist method in RandomForest.scala #2775

omgteam · 2014-10-12T16:32:09Z

During trainning Gradient Boosting Decision Tree on large-scale sparse data, spark spill hundreds of data onto disk. And find the bug below:
In version 1.1.0 DecisionTree.scala, train Method, treeInput has been persisted in Memory, but without unpersist. It caused heavy DISK usage.
In github version(1.2.0 maybe), RandomForest.scala, train Method, baggedInput has been persisted but without unpersisted too.

After added unpersist, it works right.
https://issues.apache.org/jira/browse/SPARK-3918

AmplabJenkins · 2014-10-12T16:37:05Z

Can one of the admins verify this patch?

AmplabJenkins · 2014-10-12T16:37:11Z

Can one of the admins verify this patch?

mengxr · 2014-10-13T05:19:11Z

Jenkins, this is ok to test.

mengxr · 2014-10-13T05:21:53Z

LGTM. Waiting for Jenkins. Unreferenced RDDs will get cleaned automatically. But it is always good to call unpersist explicitly.

mengxr · 2014-10-13T05:22:02Z

test this please

AmplabJenkins · 2014-10-13T05:32:18Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21676/
Test FAILed.

mengxr · 2014-10-13T06:22:07Z

test this please

SparkQA · 2014-10-13T06:29:42Z

QA tests have started for PR 2775 at commit 815d543.

This patch merges cleanly.

SparkQA · 2014-10-13T07:38:35Z

QA tests have finished for PR 2775 at commit 815d543.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-13T07:38:39Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21678/
Test PASSed.

mengxr · 2014-10-13T17:00:05Z

Merged into master. Thanks!

omgteam added 2 commits October 13, 2014 08:12

Bug: fix without unpersist baggedInput in RandomForest.scala

1a36f83

adjust tab to spaces

815d543

asfgit closed this in 942847f Oct 13, 2014

mengxr mentioned this pull request Oct 14, 2014

[SPARK-3934] [SPARK-3918] [mllib] Bug fixes for RandomForest, DecisionTree #2785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Fix: without unpersist method in RandomForest.scala #2775

Bug Fix: without unpersist method in RandomForest.scala #2775

omgteam commented Oct 12, 2014

AmplabJenkins commented Oct 12, 2014

AmplabJenkins commented Oct 12, 2014

mengxr commented Oct 13, 2014

mengxr commented Oct 13, 2014

mengxr commented Oct 13, 2014

AmplabJenkins commented Oct 13, 2014

mengxr commented Oct 13, 2014

SparkQA commented Oct 13, 2014

SparkQA commented Oct 13, 2014

AmplabJenkins commented Oct 13, 2014

mengxr commented Oct 13, 2014

Bug Fix: without unpersist method in RandomForest.scala #2775

Bug Fix: without unpersist method in RandomForest.scala #2775

Conversation

omgteam commented Oct 12, 2014

AmplabJenkins commented Oct 12, 2014

AmplabJenkins commented Oct 12, 2014

mengxr commented Oct 13, 2014

mengxr commented Oct 13, 2014

mengxr commented Oct 13, 2014

AmplabJenkins commented Oct 13, 2014

mengxr commented Oct 13, 2014

SparkQA commented Oct 13, 2014

SparkQA commented Oct 13, 2014

AmplabJenkins commented Oct 13, 2014

mengxr commented Oct 13, 2014