-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow unique prediction vector for each input matrix #4275
Conversation
@@ -771,7 +772,7 @@ class LearnerImpl : public Learner { | |||
// name of objective function | |||
std::string name_obj_; | |||
// temporal storages for prediction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can take the chance and fix this comment as well, I assume it's meant to say "temporary"?
Hello @RAMitchell, I'm curious about how this affects the overall memory usage, when you say the What I assume this PR does is keep both preds vectors in memory pointed to by the map? Am I correct to assume the values in this map get populated by reference? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RAMitchell Is it possible to use only one prediction vector with largest size? The memory is not freed in each session so a CV search might end up using lots of memory..
Yes this is correct.
I don't understand this question. The map entries each contain a pointer uniquely identifying each DMatrix as well as a vector for the predictions.
It would be possible but IMO it's not even worth it. We actually already do a similar thing in prediction cacheing. The extra amount of memory will be equal to 4 bytes times the number of rows in the second matrix. We already use much more memory than this for storing things like gradients (8 bytes per row), labels, weights, prediction cacheing. Not to mention memory costs proportional to m rows are typically dwarfed by the matrix itself which costs m*n. |
Does anyone have further thoughts on this? @hcho3? Again I think the impact is minimal and is consistent with an approach we have taken previously, so I would like to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined to merge this as well. When the Learner class was first written, it was likely assumed by the author that re-sizing the std::vector
was fast and efficient (i.e. no extra memory cost). But now preds_
is a HostDeviceVector
, not std::vector
, so resizing imposes an extra memory copy from GPU to CPU.
When training with multiple matrices, such as a test and train matrix, the preds_ member variable is resized at least twice per boosting iteration.
When the GPU is being used this resize is particularly costly, as copies between the host and device must be performed.
This PR uses std::map to provide a unique predictions vector for each dmatrix so the predictions vectors never need to be resized.
This PR results in around 10% improvement in runtime for tests/benchmark/benchmark_tree.py.