Allow unique prediction vector for each input matrix #4275

RAMitchell · 2019-03-18T21:57:24Z

When training with multiple matrices, such as a test and train matrix, the preds_ member variable is resized at least twice per boosting iteration.

When the GPU is being used this resize is particularly costly, as copies between the host and device must be performed.

This PR uses std::map to provide a unique predictions vector for each dmatrix so the predictions vectors never need to be resized.

This PR results in around 10% improvement in runtime for tests/benchmark/benchmark_tree.py.

thvasilo · 2019-03-19T09:52:07Z

src/learner.cc

@@ -771,7 +772,7 @@ class LearnerImpl : public Learner {
  // name of objective function
  std::string name_obj_;
  // temporal storages for prediction


Can take the chance and fix this comment as well, I assume it's meant to say "temporary"?

thvasilo · 2019-03-19T09:57:54Z

Hello @RAMitchell, I'm curious about how this affects the overall memory usage, when you say the preds_ variable was resized, where did this happen and how does this PR fix it?
My assumption is that the storage used by the preds_ vector was "shared" so if we predicted on the training data first it would have size N_train, then on the test data it would be resized to N_test.

What I assume this PR does is keep both preds vectors in memory pointed to by the map?

Am I correct to assume the values in this map get populated by reference?

trivialfis

@RAMitchell Is it possible to use only one prediction vector with largest size? The memory is not freed in each session so a CV search might end up using lots of memory..

RAMitchell · 2019-03-19T21:19:01Z

Hello @RAMitchell, I'm curious about how this affects the overall memory usage, when you say the preds_ variable was resized, where did this happen and how does this PR fix it?
My assumption is that the storage used by the preds_ vector was "shared" so if we predicted on the training data first it would have size N_train, then on the test data it would be resized to N_test.

What I assume this PR does is keep both preds vectors in memory pointed to by the map?

Yes this is correct.

Am I correct to assume the values in this map get populated by reference?

I don't understand this question. The map entries each contain a pointer uniquely identifying each DMatrix as well as a vector for the predictions.

@RAMitchell Is it possible to use only one prediction vector with largest size? The memory is not freed in each session so a CV search might end up using lots of memory..

It would be possible but IMO it's not even worth it. We actually already do a similar thing in prediction cacheing. The extra amount of memory will be equal to 4 bytes times the number of rows in the second matrix. We already use much more memory than this for storing things like gradients (8 bytes per row), labels, weights, prediction cacheing. Not to mention memory costs proportional to m rows are typically dwarfed by the matrix itself which costs m*n.

RAMitchell · 2019-03-20T21:08:03Z

Does anyone have further thoughts on this? @hcho3?

Again I think the impact is minimal and is consistent with an approach we have taken previously, so I would like to merge.

hcho3

I am inclined to merge this as well. When the Learner class was first written, it was likely assumed by the author that re-sizing the std::vector was fast and efficient (i.e. no extra memory cost). But now preds_ is a HostDeviceVector, not std::vector, so resizing imposes an extra memory copy from GPU to CPU.

Allow unique prediction vector for each input matrix

7daf5da

thvasilo reviewed Mar 19, 2019

View reviewed changes

trivialfis reviewed Mar 19, 2019

View reviewed changes

hcho3 approved these changes Mar 20, 2019

View reviewed changes

RAMitchell merged commit 8eab966 into dmlc:master Mar 20, 2019

hcho3 mentioned this pull request May 17, 2019

[RFC] Version 0.90 release candidate #4475

Merged

lock bot locked as resolved and limited conversation to collaborators Jun 18, 2019

RAMitchell deleted the prediction-map branch April 19, 2022 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow unique prediction vector for each input matrix #4275

Allow unique prediction vector for each input matrix #4275

RAMitchell commented Mar 18, 2019

thvasilo Mar 19, 2019

thvasilo commented Mar 19, 2019

trivialfis left a comment

RAMitchell commented Mar 19, 2019

RAMitchell commented Mar 20, 2019

hcho3 left a comment

Allow unique prediction vector for each input matrix #4275

Allow unique prediction vector for each input matrix #4275

Conversation

RAMitchell commented Mar 18, 2019

thvasilo Mar 19, 2019

Choose a reason for hiding this comment

thvasilo commented Mar 19, 2019

trivialfis left a comment

Choose a reason for hiding this comment

RAMitchell commented Mar 19, 2019

RAMitchell commented Mar 20, 2019

hcho3 left a comment

Choose a reason for hiding this comment