Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Install documentation for LightGBM on GPU #389

Closed
2 of 3 tasks
Laurae2 opened this issue Apr 9, 2017 · 41 comments
Closed
2 of 3 tasks

[WIP] Install documentation for LightGBM on GPU #389

Laurae2 opened this issue Apr 9, 2017 · 41 comments

Comments

@Laurae2
Copy link
Contributor

Laurae2 commented Apr 9, 2017

Related to PR #368

Is there any documentation to install, setup, and use LightGBM on GPU? As the docs folder and the Wiki did not change, I was wondering if there are specific steps required.

Similar to this, but for this repository.

Similar question for:

  • Python package GPU installation/usage
  • R package GPU installation/usage

To-do:

  • Linux
  • Windows
  • Mac OS / macOS (need someone to with a Mac who can/want to do this...)
@huanzhang12
Copy link
Contributor

@Laurae2 I will work on move the documentation on my development repository to here.

@huanzhang12
Copy link
Contributor

For python I think just using the GPU-enabled shared library and pass device=gpu as an additional config parameter is sufficient. For R package I am not sure...

@guolinke
Copy link
Collaborator

I update some documents for GPU, @huanzhang12 can you check them?

For R/python package, I think they both need to pass the additional paramter device=gpu to enable it.
@Laurae2 @wxchan you can create PR to simplify it.

@huanzhang12
Copy link
Contributor

huanzhang12 commented Apr 10, 2017

@guolinke For the building procedure, we need to mention the dependency package (cmake >= 3.2, boost >= 1.56, OpenCL >= 1.2) and how to install them.
On Windows it is probably gonna work (as nothing I used is platform dependent), but untested.

@huanzhang12
Copy link
Contributor

@guolinke @Laurae2 It seems I can't make a pull request to the wiki (github does not have this feature). So I am not able to directly edit it. But most detailed instructions are on my development repository page. You can probably move most materials to the wiki, perhaps adding a "GPU tutorial", helping people to setup GPU training and achieve good speedup.

@guolinke
Copy link
Collaborator

@huanzhang12 you can put your materials to the docs/... , I can add a link to it in the wiki.

@huanzhang12
Copy link
Contributor

@guolinke OK, I think I will put a tutorial there

@Laurae2
Copy link
Contributor Author

Laurae2 commented Apr 10, 2017

@guolinke for R package we might have to convert the cmake to make in Makevars.

I found this but I don't know how it could be integrated: https://github.com/forexample/r-cmake

@huanzhang12 R in Windows uses MinGW, while in Unix it uses the default way of installation. Are they enough alone or do we need more to install with GPU support? If cmake is a must, we might try to find to workaround for R.

Also, can we use environment variables? (if there are any to use)

We may also attempt to detect whether boost is existing or not (by checking whether ../include/LightGBM folder exists):

  • If not existing => CPU only (this is our current way of installing the R package)
  • If existing => install GPU version

@wxchan
Copy link
Contributor

wxchan commented Apr 10, 2017

I don't have machine to test gpu version right now. @huanzhang12 can you also help update python package?

@huanzhang12
Copy link
Contributor

@Laurae2 It is fine if we don't have cmake. I list cmake 3.x as a dependency just because I need to look for OpenCL and boost headers/libraries automatically in CMakeList.txt. cmake 3.x can do this more reliably (than older cmake versions or hard-coded paths). Without cmake, if everything has been installed to standard system location (like /usr/includes and /usr/lib64, etc) and the compiler can find them, compilation with R should be fine, without using CMake.

But the problem is to detect the existence of OpenCL and boost, and enable GPU support accordingly. If we can't do this automatically, we will probably have to provide two Makevars, and an user must ensure that necessary dependencies have been installed and manually use the one with GPU support to compile.

@Laurae2 I am not sure what environment variables you want to use? Do you want to detect the existence of OpenCL/boost using environment variables?

@wxchan Currently it seems the Python package is working, by compiling LightGBM with GPU support as normal and then run python setup.py install. Is there anything else we need to do?

For both Python and R, as @guolinke mentioned, the user needs to pass the additional parameter device=gpu to lightGBM to enable GPU at run-time. Not sure if we want to do anything else, like a adding a global function SetDevice("xpu") to Python/R interface to globally enable GPU on all later LightGBM calls. I think it is acceptable to just let the user pass an additional parameter (device=gpu) to LightGBM each time, for the time being.

BTW, the GPU code can be tested without real GPUs installed. The beauty of OpenCL (unlike CUDA) is that it is a universal standard, targeting a wide range of device include CPU and GPU. You can install Intel OpenCL runtime or AMD APP SDK to get OpenCL working on CPUs (slow but good enough for testing). Currently in .travis.yml we test the GPU code using this way (and in fact we are already using the python interface there).

@wxchan
Copy link
Contributor

wxchan commented Apr 11, 2017

can it work on mac? I got segfault on my machine.

[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 6143
[LightGBM] [Info] Number of data: 7000, number of used features: 28
[LightGBM] [Info] Using GPU Device: HD Graphics 5000, Vendor: Intel
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
Segmentation fault: 11
Thread 3 Crashed:
0   libstdc++.6.dylib             	0x0000000105964ff8 __cxxabiv1::__si_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info const*, void const*, __cxxabiv1::__class_type_info::__dyncast_result&) const + 24
1   libc++abi.dylib               	0x00007fffd556a44e __cxxabiv1::__class_type_info::can_catch(__cxxabiv1::__shim_type_info const*, void*&) const + 146
2   libc++abi.dylib               	0x00007fffd556bc0b default_terminate_handler() + 199
3   libobjc.A.dylib               	0x00007fffd6075f33 _objc_terminate() + 124
4   libc++abi.dylib               	0x00007fffd5568d69 std::__terminate(void (*)()) + 8
5   libc++abi.dylib               	0x00007fffd55687de __cxa_throw + 121
6   libboost_filesystem-mt.dylib  	0x0000000105932b72 boost::filesystem::detail::create_directory(boost::filesystem::path const&, boost::system::error_code*) + 274
7   libboost_filesystem-mt.dylib  	0x00000001059328dd boost::filesystem::detail::create_directories(boost::filesystem::path const&, boost::system::error_code*) + 461
8   lib_lightgbm.so               	0x000000010582182a boost::compute::detail::program_binary_path(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) + 618
9   ???                           	0x0000000000000021 0 + 33
10  ???                           	0x000000010113f620 func_new.kwlist + 48

@huanzhang12
Copy link
Contributor

huanzhang12 commented Apr 11, 2017

@wxchan It should work on mac as it detects your Intel HD 5000 GPU as a OpenCL device. Glad to see you get it compiled successfully!

Base on the backtrace it seems the problem is in the offline cache of Boost.Compute. The offline cache is for caching compiled GPU kernels, so that it does not need to be compiled again when next time you launch LightGBM with GPU. From the backtrace I guess it crashed during creating the cache directory

Based on compute/include/boost/compute/detail/path.hpp, the default cache path is set to

    static const std::string appdata = detail::getenv("HOME")
        + path_delim() + ".boost_compute";

On my Linux machine it creates a folder ".boost_compute" in my home folder, not sure how it works for Mac. If you can figure out the exact reason of the crash, you can submit a PR to Boost.Compute.

Otherwise, you can remove the macro BOOST_COMPUTE_USE_OFFLINE_CACHE in src/treelearner/gpu_tree_learner.h to disable the offline kernel cache, make clean and make again.

@wxchan
Copy link
Contributor

wxchan commented Apr 11, 2017

It works after comment out BOOST_COMPUTE_USE_OFFLINE_CACHE line. I think it can be added into instruction?

@huanzhang12
Copy link
Contributor

@wxchan Glad to know it works for you! I can add this to instruction, but please report this bug to Boost.Compute so that they can fix it in a future release. Offline cache is a nice feature to have, otherwise the user has to wait for kernel compilation (Compiling OpenCL Kernel with 256 bins...) each time launching LightGBM, and it can take some time.

I guess it is not a hard problem to fix, probably just add a #ifdef for the OSX case. You can try to print the variable dir in https://github.com/boostorg/compute/blob/master/include/boost/compute/detail/path.hpp#L46-L48 and see what happens. I don't have any OSX system at hand so can't test it.

@Laurae2
Copy link
Contributor Author

Laurae2 commented Apr 12, 2017

@huanzhang12 @guolinke I managed to compile with with GPU trainer in R + Windows! Will come very soon with a PR for enabling GPU in R package.

image

@gugatr0n1c
Copy link

@Laurae2 Which method did you used? Visual studio or mingw?

@Laurae2
Copy link
Contributor Author

Laurae2 commented Apr 12, 2017

@gugatr0n1c I used only MinGW for CLI, Python, and R, no need Visual Studio for compiling LightGBM and Boost (performance might vary though).

@huanzhang12 Got the same issue in Windows about @wxchan issue on Mac. Commenting that line 26 BOOST_COMPUTE_USE_OFFLINE_CACHE fixed the issue.

@huanzhang12
Copy link
Contributor

huanzhang12 commented Apr 12, 2017

@Laurae2 Glad to know you get it working on Windows! I think we need to look into the BOOST_COMPUTE_USE_OFFLINE_CACHE issue a little bit, because I think offline cache is a good feature to have; otherwise each time we have to wait for a few seconds to get GPU kernels compiled, which is annoying.

@Laurae2 You can probably try to remove the OpenMP pragma at https://github.com/Microsoft/LightGBM/blob/master/src/treelearner/gpu_tree_learner.cpp#L569. I think probably Boost.Compute has a bug when using multiple threads to build the kernel. We need to report it to Boost.Compute if we can track down the issue.

@huanzhang12
Copy link
Contributor

@Laurae2 I am able to fix the offline cache issue in Boost.Compute on Windows. I have created a pull request at Boost.Compute repository: boostorg/compute#704

@Laurae2
Copy link
Contributor Author

Laurae2 commented Apr 18, 2017

@huanzhang12 now we have good documentation for Linux and Windows, and on how to choose device/platform. We still lack some doc for Mac, and I don't have access to my Mac currently.

Someone volunteering to make some Mac documentation:

@huanzhang12
Copy link
Contributor

huanzhang12 commented Apr 18, 2017

@Laurae2 Thanks for your hard work on getting the instructions on Windows ready!

Based on the previous issues on Mac you mentioned, it could be tricky to get everything working on Mac. Currently I don't have access to any Mac computers, and I can't find any cloud computing service providing Mac virtual machines with GPU. So we need to look for some volunteers here.

@guolinke
Copy link
Collaborator

@Laurae2
Copy link
Contributor Author

Laurae2 commented Apr 25, 2017

@guolinke great! Now, the last is Mac to do.

perhaps you can add a call for contribution for docs on this.

@guolinke guolinke changed the title Documentation for LightGBM on GPU [WIP] Install documentation for LightGBM on GPU Apr 25, 2017
@guolinke
Copy link
Collaborator

@Laurae2 Did we solve the gpu build for R ?

@Laurae2
Copy link
Contributor Author

Laurae2 commented Apr 25, 2017

@guolinke GPU build works for R in Windows and in Linux as they work nearly identically in R (just feed 4 extra variables in Makeconf + modify Makevars + add gpu_tree_learner in include), it is the safest way due to how R is confining its own environment variables).

Linux got freedom to use the default compiler which may be common for CLI / Python, in Windows it is mandatory to use Rtools' MinGW.

For Mac + R, it should be the same as Linux except it requires to be able first to compile the CPU-only version (gcc, OpenMP issues...).

@zhukunism
Copy link

@Laurae2 , can you please give more details on the GPU build for R in Linux? Not sure what's involved on the "feed 4 extra variables in Makeconf + modify Makevars + add gpu_tree_learner in include" steps. Thanks!

@Laurae2
Copy link
Contributor Author

Laurae2 commented Jun 1, 2017

@zhukunism on linux it depends too much on where you things are installed by the OS (or where you install them), I can't really make more detailed. The same rules from Windows applies on Linux, with a different file naming scheme:

image

@bushmanov
Copy link

Successfully build LightGBM with GPU support on Ubuntu 16.04 and installed a Python version. Able to run in python with device="gpu".

As far as R version is concerned, which is installed by further running ./unix_build_package.sh and installing resulting lightgbm_0.1.tar.gz package, it runs successfully on cpu, but crashes as soon as I insert device="gpu".

I would really appreciate a clear instruction on installing R version on Linux with GPU support.

@guolinke
Copy link
Collaborator

guolinke commented Jun 5, 2017

@bushmanov
I am working for the easier install method for R package in this PR: #584 .
Welcome to try.
For the GPU support, you need to set use_gpu <- TRUE in R-package/src/install.libs.R .

@zhukunism
Copy link

@Laurae2 , I set the boost and openCL env variables as you suggested. But still get the errors below when building on my ubuntu machine:

  • installing source package ‘lightgbm’ ...
    ** libs
    make: Nothing to be done for 'all'.
    installing to /home/zhukun/Workspace/Library/lightgbm/install/lightgbm/libs
    ** R
    ** data
    ** demo
    ** preparing package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded
    Error in dyn.load(file, DLLpath = DLLpath, ...) :
    unable to load shared object '/home/zhukun/Workspace/Library/lightgbm/install/lightgbm/libs/lightgbm.so':
    /home/zhukun/Workspace/Library/lightgbm/install/lightgbm/libs/lightgbm.so: undefined symbol: clGetCommandQueueInfo
    Error: loading failed
    Execution halted
    ERROR: loading failed

The OpenCL headers and so are installed properly:

$ ls /usr/include/CL/
cl2.hpp cl_d3d10.h cl_d3d11.h cl_dx9_media_sharing.h cl_egl.h cl_ext.h cl_gl_ext.h cl_gl.h cl.h cl.hpp cl_platform.h opencl.h

$ ls /usr/lib/x86_64-linux-gnu/libOpenCL.so
/usr/lib/x86_64-linux-gnu/libOpenCL.so

Do you have any ideas? many thanks!

@Laurae2
Copy link
Contributor Author

Laurae2 commented Jun 9, 2017

@zhukunism For R installation, you can now use a precompiled lib you have to put in root of LightGBM folder (if it compiled elsewhere). Compile it as if you were doing for CLI, then compile the R package.

It is now much easier to compile R with GPU support that way. Current R installation can't compile with custom flags currently (you can override them by editing the sys/install.libs.R of the package.

Remember to adjust use_precompile to True if you use precompiled lib.

@jzun
Copy link

jzun commented Jun 22, 2017

A following error came when I use R 3.3.3 to install lgb with devtools:

Error in inDL(x, as.logical(local), as.logical(now), ...) : 
无法载入共享目标对象‘C:/Users/DIUNI/Documents/R/win-library/3.3/lightgbm/libs/i386/lib_lightgbm.dll’::
LoadLibrary failure:  %1 不是有效的 Win32 应用程序。 

translated as:

Error in inDL(x, as.logical(local), as.logical(now), ...) : 
unable to load the shared object ‘C:/Users/DIUNI/Documents/R/win-library/3.3/lightgbm/libs/i386/lib_lightgbm.dll’::
LoadLibrary failure:  %1 is not a valid Win32 application. 

ps: cmake 3.9, rtools3.4 and vs2017 have been installed
Any reply will be greatly appreciated!

@guolinke
Copy link
Collaborator

@jzun it seems your r version is 32-bit (i386), can you use the 64-bit R ?

@jzun
Copy link

jzun commented Jun 22, 2017

@guolinke it seems i just used the 64-bit to install with its head line in R console:

R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

@guolinke
Copy link
Collaborator

guolinke commented Jun 22, 2017

@jzun maybe you are using the 32-bit Rtools ?
you can list the folder in your C:\Rtools

And did you have a folder C:\R32 ?

@jzun
Copy link

jzun commented Jun 22, 2017

@guolinke
It worked after I reinstall the R and Rtools without 32-bit.
And GPU version did!
thank you!

@jzun
Copy link

jzun commented Jun 22, 2017

@guolinke
But a confusion came out with the running time test:
i used GTX 1060 to train the multiclass demo wiht about 0.4~0.5 sec, but just about 0.01 sec on amd 1700 cpu, which is a little bit strange...

@guolinke
Copy link
Collaborator

@jzun
I guess you are running with a small dataset.
When data is small, using GPU cannot gain the speed-up.

@jzun
Copy link

jzun commented Jun 22, 2017

@guolinke
yeah i geust so
And I plan to translate the LightGBM installation guide into Chinese, and put it on my WeChat Subscription called "统计译文", it's OKAY?
I think we should call guys to install and use lgb, and report the their suggestions and problems, through some effective ways.

@guolinke
Copy link
Collaborator

@jzun sure

@Laurae2 Laurae2 closed this as completed Oct 1, 2017
@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants