Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow unique prediction vector for each input matrix #4275

Merged
merged 1 commit into from
Mar 20, 2019

Conversation

RAMitchell
Copy link
Member

When training with multiple matrices, such as a test and train matrix, the preds_ member variable is resized at least twice per boosting iteration.

When the GPU is being used this resize is particularly costly, as copies between the host and device must be performed.

This PR uses std::map to provide a unique predictions vector for each dmatrix so the predictions vectors never need to be resized.

This PR results in around 10% improvement in runtime for tests/benchmark/benchmark_tree.py.

@@ -771,7 +772,7 @@ class LearnerImpl : public Learner {
// name of objective function
std::string name_obj_;
// temporal storages for prediction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can take the chance and fix this comment as well, I assume it's meant to say "temporary"?

@thvasilo
Copy link
Contributor

Hello @RAMitchell, I'm curious about how this affects the overall memory usage, when you say the preds_ variable was resized, where did this happen and how does this PR fix it?
My assumption is that the storage used by the preds_ vector was "shared" so if we predicted on the training data first it would have size N_train, then on the test data it would be resized to N_test.

What I assume this PR does is keep both preds vectors in memory pointed to by the map?

Am I correct to assume the values in this map get populated by reference?

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RAMitchell Is it possible to use only one prediction vector with largest size? The memory is not freed in each session so a CV search might end up using lots of memory..

@RAMitchell
Copy link
Member Author

Hello @RAMitchell, I'm curious about how this affects the overall memory usage, when you say the preds_ variable was resized, where did this happen and how does this PR fix it?
My assumption is that the storage used by the preds_ vector was "shared" so if we predicted on the training data first it would have size N_train, then on the test data it would be resized to N_test.

What I assume this PR does is keep both preds vectors in memory pointed to by the map?

Yes this is correct.

Am I correct to assume the values in this map get populated by reference?

I don't understand this question. The map entries each contain a pointer uniquely identifying each DMatrix as well as a vector for the predictions.

@RAMitchell Is it possible to use only one prediction vector with largest size? The memory is not freed in each session so a CV search might end up using lots of memory..

It would be possible but IMO it's not even worth it. We actually already do a similar thing in prediction cacheing. The extra amount of memory will be equal to 4 bytes times the number of rows in the second matrix. We already use much more memory than this for storing things like gradients (8 bytes per row), labels, weights, prediction cacheing. Not to mention memory costs proportional to m rows are typically dwarfed by the matrix itself which costs m*n.

@RAMitchell
Copy link
Member Author

Does anyone have further thoughts on this? @hcho3?

Again I think the impact is minimal and is consistent with an approach we have taken previously, so I would like to merge.

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined to merge this as well. When the Learner class was first written, it was likely assumed by the author that re-sizing the std::vector was fast and efficient (i.e. no extra memory cost). But now preds_ is a HostDeviceVector, not std::vector, so resizing imposes an extra memory copy from GPU to CPU.

@RAMitchell RAMitchell merged commit 8eab966 into dmlc:master Mar 20, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 18, 2019
@RAMitchell RAMitchell deleted the prediction-map branch April 19, 2022 12:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants