Skip to content

Commit

Permalink
Bug Fix: without unpersist method in RandomForest.scala
Browse files Browse the repository at this point in the history
During trainning Gradient Boosting Decision Tree on large-scale sparse data, spark spill hundreds of data onto disk. And find the bug below:
    In version 1.1.0 DecisionTree.scala, train Method, treeInput has been persisted in Memory, but without unpersist. It caused heavy DISK usage.
    In github version(1.2.0 maybe), RandomForest.scala, train Method, baggedInput has been persisted but without unpersisted too.

After added unpersist, it works right.
https://issues.apache.org/jira/browse/SPARK-3918

Author: omgteam <Kimlong.Liu@gmail.com>

Closes apache#2775 from omgteam/master and squashes the following commits:

815d543 [omgteam] adjust tab to spaces
1a36f83 [omgteam] Bug: fix without unpersist baggedInput in RandomForest.scala
  • Loading branch information
omgteam authored and mengxr committed Oct 13, 2014
1 parent 92e017f commit 942847f
Showing 1 changed file with 2 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,8 @@ private class RandomForest (
timer.stop("findBestSplits")
}

baggedInput.unpersist()

timer.stop("total")

logInfo("Internal timing for DecisionTree:")
Expand Down

0 comments on commit 942847f

Please sign in to comment.