[GPU] multi-gpu #620

guolinke · 2017-06-15T05:29:01Z

@huanzhang12
Do you have plan for the multi-gpu support ?

huanzhang12 · 2017-06-21T03:53:26Z

Sorry for my late reply because I was traveling recently.

The multi-GPU case can be view as a special case of distributed feature-parallel training, and many existing code can (hopefully) be reused. Right now we can theoretically run multiple instances of LightGBM on one machine, each assigned a different GPU and a subset of features, and use the feature parallel distributed tree learner. But this makes multiple copies of data and is inefficient.

One way to enable multi-GPU is to just allow LightGBM to launch multiple instances of parallel learner, which share the input data. This is not perfect, but I think it is the fastest path and only needs to make minimal changes to GPU code. @guolinke Do you think it is possible?

I am currently working on another issue. Currently, we build feature histograms on GPU, and transfer the histograms to CPU to find the best split. The overhead of data transfer is significant on dataset with a lot of features and limits the available speedup. If we find the best split on GPU, it can be done in the very high bandwidth GPU local memory, and eliminate most data transfer overhead. To do this, we also need to store the histograms on GPU, because a future split may need them to construct the feature histogram for the larger child (the subtraction trick). This requires to implement a histogram pool on GPU (similar to what we have on CPU), and we only move histograms to CPU memory when GPU memory is not sufficient. After this is implemented, I can expect a significant speedup on Bosch, YahooLTR and epsilon dataset especially when the number of bins used is larger (255).

zhukunism · 2017-06-26T06:43:34Z

@huanzhang12 , I am running lightgbm in R on a machine with 4 GPU's. I tried to specify gpu_device_id with 0,1,2,3, but it always runs on the default 0 device. Is gpu_device_id not supported by R wrapper, or am I doing something wrong? Need your helps. thanks!

Btw, the improvement you are working on can be super useful, as the dataset I am using have ~4K features, the training time is actually slower than using the CPU (16 cores) alone. And I reckon the overhead of data transfer to CPU can becomes a bottleneck for multi-GPU who races for CPU resources

Laurae2 · 2017-06-26T06:44:26Z

@zhukunism It runs only on a single GPU currently as explained in this issue

zhukunism · 2017-06-26T06:56:50Z

@Laurae2, I am trying to launch multiple session and let each lightgbm runs on different GPU device, but got no luck. My machine has 4 GPUs.

So my question is, can it runs on one specific GPU by configuring the gpu_device_id ?

Laurae2 · 2017-06-26T07:31:52Z

@zhukunism Check this: https://github.com/Microsoft/LightGBM/blob/master/docs/GPU-Targets.rst

zhukunism · 2017-06-26T12:50:10Z

@Laurae2 , I fixed the issues after checking that doc. I need to specify both gpu_platform_id and gpu_device_id to use GPU devices other than default one. Thanks!

wuyunhua · 2019-05-30T11:05:44Z

so @huanzhang12 training lightgbm make use of all GPUs on machine avaiable like xgboost does? thanks

StrikerRUS · 2019-08-01T16:44:24Z

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

StrikerRUS · 2021-01-31T14:27:54Z

For everyone subscribed to this issue, please try our new experimental CUDA version which was kindly contributed by our friends from IBM. This version supports multi-GPU training. We will really appreciate any early feedback on this experimental feature (please create new issues, do not comment here).

How to install: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-cuda-version-experimental.

Argument to specify number of GPUs: https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_gpu.

flybywind · 2024-09-19T10:46:47Z

For everyone subscribed to this issue, please try our new experimental CUDA version which was kindly contributed by our friends from IBM. This version supports multi-GPU training. We will really appreciate any early feedback on this experimental feature (please create new issues, do not comment here).

How to install: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-cuda-version-experimental.

Argument to specify number of GPUs: https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_gpu.

hi @StrikerRUS I find in the latest release version the multi-gpu support has been removed. Since the commit 6b56a90 of @shiyu1994 a new cuda learner had been introduced, which completely replaced the old multi-gpu learner later.
So two questions,

Is there any plan to support multi-gpu learner in the future ? Cause in the doc num_gpu parameter still exists, but actually user can't use it, if do, he/she will get an error: "Currently cuda version only supports training on a single GPU", which is very confusing
Currently what's the best approach if I wanna take full power of multi-gpu for training ?

guolinke mentioned this issue Aug 2, 2017

[GPU] further improving GPU performance #768

Closed

guolinke added this to the v3.0 milestone Aug 3, 2017

Laurae2 assigned huanzhang12 Oct 1, 2017

StrikerRUS added the feature request label Oct 4, 2017

StrikerRUS mentioned this issue May 30, 2019

any TODO for supporting train lightgbm make use of multi-GPU #2202

Closed

guolinke mentioned this issue Aug 1, 2019

Feature Requests & Voting Hub #2302

Open

guolinke closed this as completed Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] multi-gpu #620

[GPU] multi-gpu #620

guolinke commented Jun 15, 2017

huanzhang12 commented Jun 21, 2017

zhukunism commented Jun 26, 2017 •

edited

Loading

Laurae2 commented Jun 26, 2017

zhukunism commented Jun 26, 2017

Laurae2 commented Jun 26, 2017 •

edited by StrikerRUS

Loading

zhukunism commented Jun 26, 2017

wuyunhua commented May 30, 2019 •

edited

Loading

StrikerRUS commented Aug 1, 2019

StrikerRUS commented Jan 31, 2021

flybywind commented Sep 19, 2024 •

edited

Loading

[GPU] multi-gpu #620

[GPU] multi-gpu #620

Comments

guolinke commented Jun 15, 2017

huanzhang12 commented Jun 21, 2017

zhukunism commented Jun 26, 2017 • edited Loading

Laurae2 commented Jun 26, 2017

zhukunism commented Jun 26, 2017

Laurae2 commented Jun 26, 2017 • edited by StrikerRUS Loading

zhukunism commented Jun 26, 2017

wuyunhua commented May 30, 2019 • edited Loading

StrikerRUS commented Aug 1, 2019

StrikerRUS commented Jan 31, 2021

flybywind commented Sep 19, 2024 • edited Loading

zhukunism commented Jun 26, 2017 •

edited

Loading

Laurae2 commented Jun 26, 2017 •

edited by StrikerRUS

Loading

wuyunhua commented May 30, 2019 •

edited

Loading

flybywind commented Sep 19, 2024 •

edited

Loading