Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Add GPU install options (fixes #3765) #3779

Merged
merged 7 commits into from
Jan 19, 2021
Merged

Conversation

jameslamb
Copy link
Collaborator

@jameslamb jameslamb commented Jan 18, 2021

This PR offers a fix for #3765. In that issue, @szilard described some issues using CMake-based builds of the LightGBM R package with GPU support. Specifically, compilation failed because LightGBM couldn't find OpenCL and Boost.

Changes in this PR

This PR adds the following command-line options to build_r.R

  • --boost-root
  • --boost-dir
  • --boost-include-dir
  • --boost-librarydir
  • --opencl-include-dir
  • --opencl-library

The approach it takes is similar to how the Python package handles these same arguments:

How I tested this

I adopted @szilard 's reproducible example from #3765

  1. Create a new directory and clone LightGBM into it
mkdir lgb-gpu-test
cd lgb-gpu-test
git clone --recursive https://github.com/microsoft/LightGBM.git
pushd LightGBM
    git fetch origin fix/r-gpu
    git checkout fix/r-gpu
popd
  1. Write a file Dockerfile with the following content. The laptop I do GPU development on has CUDA 10.2 installed so I chose a CUDA 10.2 image, but I expect this would work for other versions.
Dockerfile (click me)
FROM nvidia/cuda:10.2-devel-ubuntu18.04

ENV DEBIAN_FRONTEND="noninteractive"

RUN apt-get update && \
    apt-get install \
    	-y \
    	software-properties-common \
    	apt-transport-https

RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
    add-apt-repository 'deb [arch=amd64] https://cran.rstudio.com/bin/linux/ubuntu bionic-cran40/' && \
    apt-get update && \
    apt-get install -y \
    		r-base

RUN apt-get install -y \
	git \
	wget \
	libcurl4-openssl-dev \
	default-jdk-headless \
	libssl-dev \
	libxml2-dev \
	cmake

ENV MAKE="make -j$(nproc)"

RUN R -e 'install.packages(c("R6","data.table","jsonlite"), repos = "https://cran.rstudio.com/")'

RUN apt-get install -y \
		libboost-dev \
		libboost-system-dev \
		libboost-filesystem-dev \
		ocl-icd-opencl-dev \
		opencl-headers \
		clinfo

RUN mkdir -p /etc/OpenCL/vendors && \
    echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd   ## otherwise lightgm segfaults at runtime (compiles fine without it)

COPY LightGBM /tmp/LightGBM

RUN cd /tmp/LightGBM && \
	git submodule init && \
	git submodule update --recursive && \
    Rscript build_r.R \
    	--use-gpu \
    	--opencl-library=/usr/lib/x86_64-linux-gnu/libOpenCL.so \
    	--boost-librarydir=/usr/lib/x86_64-linux-gnu
  1. Build the image
docker build --no-cache -t test-lgb-gpu -f Dockerfile .
  1. Run a container from that image. Open two terminals. From one, open an R session in the container. In the other, open a shell running nvidia-smi.
nvidia-docker run \
	-w /tmp/LightGBM \
	-it test-lgb-gpu \
	R
nvidia-docker run \
	-w /tmp/LightGBM \
	-it gbmperf-lgb-gpu \
	watch -n 3 nvidia-smi
  1. In the R shell, install some packages and then run @szilard 's test script from R package install with GPU support fails #3765 (comment)
test R script (click me)
install.packages(c("ROCR", "curl"), repos = "https://cran.r-project.org")

library(data.table)
library(ROCR)
library(lightgbm)
library(Matrix)

set.seed(123)

d_train <- fread("https://s3.amazonaws.com/benchm-ml--main/train-1m.csv", showProgress=FALSE)
d_test <- fread("https://s3.amazonaws.com/benchm-ml--main/test.csv", showProgress=FALSE)

d_all <- rbind(d_train, d_test)
d_all$dep_delayed_15min <- ifelse(d_all$dep_delayed_15min=="Y",1,0)

d_all_wrules <- lgb.convert_with_rules(d_all)       
d_all <- d_all_wrules$data
cols_cats <- names(d_all_wrules$rules) 

d_train <- d_all[1:nrow(d_train)]
d_test <- d_all[(nrow(d_train)+1):(nrow(d_train)+nrow(d_test))]

p <- ncol(d_all)-1
dlgb_train <- lgb.Dataset(data = as.matrix(d_train[,1:p]), label = d_train$dep_delayed_15min, free_raw_data = FALSE)

md <- lgb.train(
	data = dlgb_train, 
	objective = "binary", 
	nrounds = 100, num_leaves = 512, learning_rate = 0.1, 
	categorical_feature = cols_cats,
	device = "gpu",
	verbose = 0
)

phat <- predict(md, data = as.matrix(d_test[,1:p]))
rocr_pred <- prediction(phat, d_test$dep_delayed_15min)
cat(performance(rocr_pred, "auc")@y.values[[1]],"\n")

Based on the output of nvidia-smi, I'm fairly sure that training is actually taking advantage of the GPU.

image

Notes for Reviewers

  • I think we should have a CI job for R + GPU, but that that's outside the scope of this PR. I'll add an issue and update here. [R-package] Add an R GPU job in CI #3780
  • I chose not to hard-code any default values into build_r.R. I think this will make this more stable, even if it means users need to do a little bit more configuration. Let me know if you disagree with this.

@StrikerRUS
Copy link
Collaborator

@jameslamb Please note this my comment in #3765

I'm not sure that newer version from Ubuntu ppa is better than preinstalled native version from NVIDIA in case you are really using NVIDIA cards for training.
#3765 (comment)

So I believe it worth to run tests without

RUN apt-get install -y \
                ...
		ocl-icd-opencl-dev \
		opencl-headers \
                ...

before merging this PR. Could you please do this as I guess you already have easy access to the environment you've described in your starting comment?

@jameslamb
Copy link
Collaborator Author

@jameslamb Please note this my comment in #3765

I'm not sure that newer version from Ubuntu ppa is better than preinstalled native version from NVIDIA in case you are really using NVIDIA cards for training.
#3765 (comment)

So I believe it worth to run tests without

RUN apt-get install -y \
                ...
		ocl-icd-opencl-dev \
		opencl-headers \
                ...

before merging this PR. Could you please do this as I guess you already have easy access to the environment you've described in your starting comment?

I just tried this, and compilation failed

-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- R version passed into FindLibR.cmake: 4.0.3
-- Found LibR: /usr/lib/R  
-- LIBR_EXECUTABLE: /usr/bin/R
-- LIBR_INCLUDE_DIRS: /usr/share/R/include
-- LIBR_CORE_LIBRARY: /usr/lib/R/lib/libR.so
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - not found
-- Looking for CL_VERSION_2_1
-- Looking for CL_VERSION_2_1 - not found
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - not found
-- Looking for CL_VERSION_1_1
-- Looking for CL_VERSION_1_1 - not found
-- Looking for CL_VERSION_1_0
-- Looking for CL_VERSION_1_0 - not found
CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find OpenCL (missing: OpenCL_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.10/Modules/FindOpenCL.cmake:132 (find_package_handle_standard_args)
  CMakeLists.txt:138 (find_package)


-- Configuring incomplete, errors occurred!
See also "/tmp/RtmpH3RoRU/R.INSTALL9e1d4796b2/lightgbm/src/build/CMakeFiles/CMakeOutput.log".
See also "/tmp/RtmpH3RoRU/R.INSTALL9e1d4796b2/lightgbm/src/build/CMakeFiles/CMakeError.log".
Error in .run_shell_command("cmake", c(cmake_args, "..")) : 
  Command failed with exit code: 1
* removing '/usr/local/lib/R/site-library/lightgbm'
Error in .run_shell_command(install_cmd, install_args) : 
  Command failed with exit code: 1
Execution halted

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Jan 18, 2021

I just tried this, and compilation failed

OK, expected.

Please try passing -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ as script arguments, to make sure that this use case can also be covered with new arguments.

@jameslamb
Copy link
Collaborator Author

Building like this, with ocl-icd-opencl-dev and opencl-headers installation removed, succeeded 🎉

Rscript build_r.R \
    	--use-gpu \
    	--opencl-library=/usr/local/cuda/lib64/libOpenCL.so \
    	--opencl-include-dir=/usr/local/cuda/include/
full Dockerfile
FROM nvidia/cuda:10.2-devel-ubuntu18.04

ENV DEBIAN_FRONTEND="noninteractive"

RUN apt-get update && \
    apt-get install \
    	-y \
    	software-properties-common \
    	apt-transport-https

RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 && \
    add-apt-repository 'deb [arch=amd64] https://cran.rstudio.com/bin/linux/ubuntu bionic-cran40/' && \
    apt-get update && \
    apt-get install -y \
    		r-base

RUN apt-get install -y \
	git \
	wget \
	libcurl4-openssl-dev \
	default-jdk-headless \
	libssl-dev \
	libxml2-dev \
	cmake

ENV MAKE="make -j$(nproc)"

RUN R -e 'install.packages(c("R6","data.table","jsonlite"), repos = "https://cran.rstudio.com/")'

RUN apt-get install -y \
		libboost-dev \
		libboost-system-dev \
		libboost-filesystem-dev \
		clinfo

RUN mkdir -p /etc/OpenCL/vendors && \
    echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd   ## otherwise lightgm segfaults at runtime (compiles fine without it)

COPY LightGBM /tmp/LightGBM

RUN cd /tmp/LightGBM && \
	git submodule init && \
	git submodule update --recursive && \
    Rscript build_r.R \
    	--use-gpu \
    	--opencl-library=/usr/local/cuda/lib64/libOpenCL.so \
    	--opencl-include-dir=/usr/local/cuda/include/
Build Logs (click me)
-- Found OpenMP: TRUE (found version "4.5")  
-- Looking for CL_VERSION_2_2
-- Looking for CL_VERSION_2_2 - not found
-- Looking for CL_VERSION_2_1
-- Looking for CL_VERSION_2_1 - not found
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - found
-- Found OpenCL: /usr/local/cuda/lib64/libOpenCL.so (found version "1.2") 
-- OpenCL include directory: /usr/local/cuda/include
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   filesystem
--   system
-- Performing Test MM_PREFETCH
-- Performing Test MM_PREFETCH - Success
-- Using _mm_prefetch
-- Performing Test MM_MALLOC
-- Performing Test MM_MALLOC - Success
-- Using _mm_malloc
-- Configuring done
-- Generating done
...
[100%] Built target _lightgbm
Found library file: /tmp/RtmpaLCutl/R.INSTALL9e5da08f48/lightgbm/src/lib_lightgbm.so to move to /usr/local/lib/R/site-library/00LOCK-lightgbm/00new/lightgbm/libs
Removing 'build/' directory
** R
** data
** demo
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
*** copying figures
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (lightgbm)

I ran the testing code but with verbose = 1, and can see the following logs confirming that the GPU was utilized

[LightGBM] [Info] Number of positive: 192982, number of negative: 807018
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 1095
[LightGBM] [Info] Number of data points in the train set: 1000000, number of used features: 8
[LightGBM] [Info] Using GPU Device: GeForce RTX 2070 with Max-Q Design, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 8
[LightGBM] [Info] 8 dense feature groups (7.63 MB) transferred to GPU in 0.020953 secs. 0 sparse feature groups

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one typo! But I haven't dug deep into R code.

build_r.R Outdated Show resolved Hide resolved
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
@jameslamb
Copy link
Collaborator Author

... I haven't dug deep into R code.

I think that since we have so many CI jobs for R + CMake, and since this PR didn't change any CI scripts or tests, we can be pretty confident that the changes to build_r.R didn't break the experience of building the CPU package with Rscript build_r.R. All of the non-GPU command line options are covered by tests

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants