Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] multi-gpu #620

Closed
guolinke opened this issue Jun 15, 2017 · 10 comments
Closed

[GPU] multi-gpu #620

guolinke opened this issue Jun 15, 2017 · 10 comments
Assignees

Comments

@guolinke
Copy link
Collaborator

@huanzhang12
Do you have plan for the multi-gpu support ?

@huanzhang12
Copy link
Contributor

Sorry for my late reply because I was traveling recently.

The multi-GPU case can be view as a special case of distributed feature-parallel training, and many existing code can (hopefully) be reused. Right now we can theoretically run multiple instances of LightGBM on one machine, each assigned a different GPU and a subset of features, and use the feature parallel distributed tree learner. But this makes multiple copies of data and is inefficient.

One way to enable multi-GPU is to just allow LightGBM to launch multiple instances of parallel learner, which share the input data. This is not perfect, but I think it is the fastest path and only needs to make minimal changes to GPU code. @guolinke Do you think it is possible?

I am currently working on another issue. Currently, we build feature histograms on GPU, and transfer the histograms to CPU to find the best split. The overhead of data transfer is significant on dataset with a lot of features and limits the available speedup. If we find the best split on GPU, it can be done in the very high bandwidth GPU local memory, and eliminate most data transfer overhead. To do this, we also need to store the histograms on GPU, because a future split may need them to construct the feature histogram for the larger child (the subtraction trick). This requires to implement a histogram pool on GPU (similar to what we have on CPU), and we only move histograms to CPU memory when GPU memory is not sufficient. After this is implemented, I can expect a significant speedup on Bosch, YahooLTR and epsilon dataset especially when the number of bins used is larger (255).

@zhukunism
Copy link

zhukunism commented Jun 26, 2017

@huanzhang12 , I am running lightgbm in R on a machine with 4 GPU's. I tried to specify gpu_device_id with 0,1,2,3, but it always runs on the default 0 device. Is gpu_device_id not supported by R wrapper, or am I doing something wrong? Need your helps. thanks!

Btw, the improvement you are working on can be super useful, as the dataset I am using have ~4K features, the training time is actually slower than using the CPU (16 cores) alone. And I reckon the overhead of data transfer to CPU can becomes a bottleneck for multi-GPU who races for CPU resources

@Laurae2
Copy link
Contributor

Laurae2 commented Jun 26, 2017

@zhukunism It runs only on a single GPU currently as explained in this issue

@zhukunism
Copy link

@Laurae2, I am trying to launch multiple session and let each lightgbm runs on different GPU device, but got no luck. My machine has 4 GPUs.

So my question is, can it runs on one specific GPU by configuring the gpu_device_id ?

@Laurae2
Copy link
Contributor

Laurae2 commented Jun 26, 2017

@zhukunism
Copy link

@Laurae2 , I fixed the issues after checking that doc. I need to specify both gpu_platform_id and gpu_device_id to use GPU devices other than default one. Thanks!

@wuyunhua
Copy link

wuyunhua commented May 30, 2019

so @huanzhang12 training lightgbm make use of all GPUs on machine avaiable like xgboost does? thanks

@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

@StrikerRUS
Copy link
Collaborator

For everyone subscribed to this issue, please try our new experimental CUDA version which was kindly contributed by our friends from IBM. This version supports multi-GPU training. We will really appreciate any early feedback on this experimental feature (please create new issues, do not comment here).

How to install: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-cuda-version-experimental.

Argument to specify number of GPUs: https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_gpu.

@flybywind
Copy link

flybywind commented Sep 19, 2024

For everyone subscribed to this issue, please try our new experimental CUDA version which was kindly contributed by our friends from IBM. This version supports multi-GPU training. We will really appreciate any early feedback on this experimental feature (please create new issues, do not comment here).

How to install: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-cuda-version-experimental.

Argument to specify number of GPUs: https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_gpu.

hi @StrikerRUS I find in the latest release version the multi-gpu support has been removed. Since the commit 6b56a90 of @shiyu1994 a new cuda learner had been introduced, which completely replaced the old multi-gpu learner later.
So two questions,

  1. Is there any plan to support multi-gpu learner in the future ? Cause in the doc num_gpu parameter still exists, but actually user can't use it, if do, he/she will get an error: "Currently cuda version only supports training on a single GPU", which is very confusing
  2. Currently what's the best approach if I wanna take full power of multi-gpu for training ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants