Degraded performance after porting multi-thread build to new MacBook Pro #1851

rmeehan · 2016-12-08T10:14:16Z

Environment info

Operating System: MacOS Sierra, 2.9 GHz Intel Core i5
Compiler: gcc...info as follows (having used "brew install gcc --without-multilib)":
gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.1.0
Thread model: posix

Package used (python/R/jvm/C++): python
xgboost version used: Latest, 0.6

I've just ported my xgboost code from an old Macbook Air to a new Macbook Pro - same processor but faster. I've built the code as I did before, using the instructions on https://xgboost.readthedocs.io/en/latest/build.html, and so have multi-core xgboost all working in a Jupyter notebook, and it nicely maxes out all the CPU cores when training a model etc. So far so good. Weirdly the performance is way slower (I estimate by factor of 5) than on my previous machine. It "feels like" a compilation issue, almost like it's not fully exploiting the processor's features. I've put xgboost into verbose mode so I can see each iteration being logged, and it only manages one iteration per second or so (across 4 cores, with a relatively simple set of training data).

Any ideas where to look/how to debug this problem?

The text was updated successfully, but these errors were encountered:

rmeehan · 2016-12-08T12:23:38Z

More info - it's related to the multi-threading:

With nthread=4, I get about 3-4 rows logged as follows:
[12:17:52] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 52 extra nodes, 0 pruned nodes, max_depth=6
But weirdly with nthread=1, I get about 10 rows logged per second. ie. it's 2.5x faster with only 25% of the cores in use

Ayutthaya · 2016-12-08T14:10:50Z

I am not sure if it is related but I've also noticed a degradation in performances since recent commits added coverage tests and accidentally left TEST_COVER variable in make/config.mk set to 1.
Setting this value to 0 solved the issue for me.

AbdealiLoKo · 2016-12-08T17:28:34Z

Ouch. really sorry about that. I have set TEST_COVER to 0 in #1853 That indeed disables optimization while compiling and would make things slower. Not sure how thread parallelization would be affected by it though. (I would assume it wouldnt become worse with more threads!)

rmeehan · 2016-12-09T13:08:28Z

Yep - that's improved the performance back to where it should be! Also the multi-threading performance is better than single-threaded/core as you'd expect. Thanks.

rmeehan closed this as completed Dec 9, 2016

lock bot locked as resolved and limited conversation to collaborators Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Degraded performance after porting multi-thread build to new MacBook Pro #1851

Degraded performance after porting multi-thread build to new MacBook Pro #1851

rmeehan commented Dec 8, 2016

rmeehan commented Dec 8, 2016

Ayutthaya commented Dec 8, 2016

AbdealiLoKo commented Dec 8, 2016 •

edited

Loading

rmeehan commented Dec 9, 2016

Degraded performance after porting multi-thread build to new MacBook Pro #1851

Degraded performance after porting multi-thread build to new MacBook Pro #1851

Comments

rmeehan commented Dec 8, 2016

Environment info

rmeehan commented Dec 8, 2016

Ayutthaya commented Dec 8, 2016

AbdealiLoKo commented Dec 8, 2016 • edited Loading

rmeehan commented Dec 9, 2016

AbdealiLoKo commented Dec 8, 2016 •

edited

Loading