-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multi-GPU unit test environment #3741
Conversation
@RAMitchell @trivialfis The test
Should we be concerned with this error? |
@trivialfis Would you like to add a multi-GPU test to #3643? If so, I can disable the failing test for now and merge this PR to master. |
@hcho3 I will add cpp test for that. |
@hcho3 This is amazing thanks. My view is we should disable any failing tests, merge this, then work to re-enable the tests. |
@RAMitchell Got it. Let me go ahead and disable the failing test. |
@trivialfis One idea I have is to prefix name of Google test cases with TEST(FooBarTest, MGPU_RunBar) {
// Use device ordinals 0, 1, 2, ...
} |
Actually, I kind of like the idea of keeping all tests together. For Python tests, we can add |
Currently I put the multi-gpu test code inside (at the end of) normal cuda test files, wrapped with: #if defined(XGBOOST_USE_NCCL)
TEST(Foo, MultiGpuBar) {
// do something.
}
#endif But there are other options too, I would love to hear about other opinions. |
…hon tests from single-CPU counterparts
@trivialfis I don't think that would work, because one of the test configurations compile with NCCL (hence I'm open to suggestions, but for now I am using |
@hcho3 I agree with keeping all tests together. But I don't think running multi-GPU tests on single GPU would introduce problem, since any multi-GPU code should also run on single GPU. If such difficulties do arise we can pass Another option for not spiting up tests would be making a definition like I don't have much experience in this and it's still in early stage. |
@trivialfis It is certainly true that one should be able to run multi-GPU code on a single-GPU machine. However, clearly indicating the target machine of each test is still a good idea. Personally, I prefer My rationale for separating single- and multi-GPU tests is that provisioning a large instance like p2.8xlarge takes siginficantly longer than provisioning p2xlarge. So to reduce test backlog, we want to run on the multi-GPU instance only those tests that would necessitate the plurality of GPU devices. But at any rate, this is an early attempt. We can always improve it later as needs arise. |
* Add multi-GPU unit test environment * Better assertion message * Temporarily disable failing test * Distinguish between multi-GPU and single-GPU CPP tests * Consolidate Python tests. Use attributes to distinguish multi-GPU Python tests from single-CPU counterparts
gpu_hist
.MGPU_
will run on the multi-GPU instance.