Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degraded performance after porting multi-thread build to new MacBook Pro #1851

Closed
rmeehan opened this issue Dec 8, 2016 · 4 comments
Closed

Comments

@rmeehan
Copy link

rmeehan commented Dec 8, 2016

Environment info

Operating System: MacOS Sierra, 2.9 GHz Intel Core i5
Compiler: gcc...info as follows (having used "brew install gcc --without-multilib)":
gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.1.0
Thread model: posix

Package used (python/R/jvm/C++): python
xgboost version used: Latest, 0.6

I've just ported my xgboost code from an old Macbook Air to a new Macbook Pro - same processor but faster. I've built the code as I did before, using the instructions on https://xgboost.readthedocs.io/en/latest/build.html, and so have multi-core xgboost all working in a Jupyter notebook, and it nicely maxes out all the CPU cores when training a model etc. So far so good. Weirdly the performance is way slower (I estimate by factor of 5) than on my previous machine. It "feels like" a compilation issue, almost like it's not fully exploiting the processor's features. I've put xgboost into verbose mode so I can see each iteration being logged, and it only manages one iteration per second or so (across 4 cores, with a relatively simple set of training data).

Any ideas where to look/how to debug this problem?

@rmeehan
Copy link
Author

rmeehan commented Dec 8, 2016

More info - it's related to the multi-threading:

  • With nthread=4, I get about 3-4 rows logged as follows:
    [12:17:52] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 52 extra nodes, 0 pruned nodes, max_depth=6
  • But weirdly with nthread=1, I get about 10 rows logged per second. ie. it's 2.5x faster with only 25% of the cores in use

@Ayutthaya
Copy link

I am not sure if it is related but I've also noticed a degradation in performances since recent commits added coverage tests and accidentally left TEST_COVER variable in make/config.mk set to 1.
Setting this value to 0 solved the issue for me.

@AbdealiLoKo
Copy link
Contributor

AbdealiLoKo commented Dec 8, 2016

Ouch. really sorry about that. I have set TEST_COVER to 0 in #1853 That indeed disables optimization while compiling and would make things slower. Not sure how thread parallelization would be affected by it though. (I would assume it wouldnt become worse with more threads!)

@rmeehan
Copy link
Author

rmeehan commented Dec 9, 2016

Yep - that's improved the performance back to where it should be! Also the multi-threading performance is better than single-threaded/core as you'd expect. Thanks.

@rmeehan rmeehan closed this as completed Dec 9, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Oct 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants