-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use std::thread
instead of OMP
for GPUs.
#4302
Conversation
@hcho3 Can we upgrade the compiler at some point ... What's the convention of supported compiler? |
@trivialfis Right now, we assume that C++11 is supported. Do you want something from C++14 or C++17? |
@hcho3 No. Just a compiler that has complete support for c++11. For example you found I might be able to workaround this later, but if we were to claim c++11 support, 4.8.2 might be too old. |
@trivialfis Got it, we should probably bump the version. Any suggestion? |
@hcho3 I'm not sure about what's the status in commercial deployment. Can you find an example for currently active distribution that uses 4.x? If not, I would suggest sticking to one of those "popular stable distributions" like debian. |
The reason for using CentOS 6 is to maximize the compatibility of the binary wheel in many Linux distributions. Otherwise, XGBoost wheel may depend on recent versions of GLIBC and other system libraries that other distributions may lack. That said, we can certainly upgrade the GCC version. We just need to use later version of |
@hcho3 I see. Just checked out CentOS, EOL is at the end of 2020 ..... Let's hold it and I will try to work around it for this PR. Thanks. |
8d86077
to
3f1ad96
Compare
42001bd
to
e23dbea
Compare
One lesson I learned from this PR is, under no circumstance should one change the device in master thread. I wish I can add a test for that. |
Codecov Report
@@ Coverage Diff @@
## master #4302 +/- ##
==========================================
+ Coverage 67.82% 67.82% +<.01%
==========================================
Files 132 132
Lines 12201 12203 +2
==========================================
+ Hits 8275 8277 +2
Misses 3926 3926
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to merge after removing formatting changes and span related changes.
* Span for atomic write symbol. * Use `ExecutePerDevice` instead of OpenMP loop. * Use `SaveCudaContext` to make sure nothing can change master thread's device.
7efc7ab
to
6b44d5b
Compare
@trivialfis For some reason, one of the GPU agents failed with error "Remote call on Linux GPU slave (i-0c8b6da83669c814e) failed". I restarted the agent. |
@hcho3 Thanks for helping out. @RAMitchell Done removing unrelated changes. |
Let me do a performance test then I will merge it if everything is okay. |
@trivialfis there is a performance regression here. Using the following command with 8x Tesla V100-SXM2-32GB:
Time appears to have gone from around 20s to 50s. According to my recent profiling poor scaling on multi-GPU seems related to the way we are using threads. Another way of achieving what you are trying to do while staying with omp would be to store the omp global number of threads (set according to nthreads parameter) to a temporary variable, change the number of threads to the number of GPUs just for ExecuteShards, then change it back afterwards. |
@RAMitchell I wonder what happened ... Let me see if I can reproduce the slow down. BTW, |
With #4454 it's possible to get rid of the global OpenMP parameter, I will close this one. |
Use std::thread in ExecuteIndexShards.
ExecutePerDevice
instead of OpenMP loop.SaveCudaContext
to make sure nothing can change master thread's device.This decouples parameter
nthread
withn_gpus
. See #4162.I added some debugging utilities in device_helpers.cuh, please let me keep them ...