-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Fix: without unpersist method in RandomForest.scala #2775
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
Jenkins, this is ok to test. |
LGTM. Waiting for Jenkins. Unreferenced RDDs will get cleaned automatically. But it is always good to call |
test this please |
Test FAILed. |
test this please |
QA tests have started for PR 2775 at commit
|
QA tests have finished for PR 2775 at commit
|
Test PASSed. |
Merged into master. Thanks! |
During trainning Gradient Boosting Decision Tree on large-scale sparse data, spark spill hundreds of data onto disk. And find the bug below:
In version 1.1.0 DecisionTree.scala, train Method, treeInput has been persisted in Memory, but without unpersist. It caused heavy DISK usage.
In github version(1.2.0 maybe), RandomForest.scala, train Method, baggedInput has been persisted but without unpersisted too.
After added unpersist, it works right.
https://issues.apache.org/jira/browse/SPARK-3918