Skip to content

Commit

Permalink
Initial GPU acceleration support for LightGBM (#368)
Browse files Browse the repository at this point in the history
* add dummy gpu solver code

* initial GPU code

* fix crash bug

* first working version

* use asynchronous copy

* use a better kernel for root

* parallel read histogram

* sparse features now works, but no acceleration, compute on CPU

* compute sparse feature on CPU simultaneously

* fix big bug; add gpu selection; add kernel selection

* better debugging

* clean up

* add feature scatter

* Add sparse_threshold control

* fix a bug in feature scatter

* clean up debug

* temporarily add OpenCL kernels for k=64,256

* fix up CMakeList and definition USE_GPU

* add OpenCL kernels as string literals

* Add boost.compute as a submodule

* add boost dependency into CMakeList

* fix opencl pragma

* use pinned memory for histogram

* use pinned buffer for gradients and hessians

* better debugging message

* add double precision support on GPU

* fix boost version in CMakeList

* Add a README

* reconstruct GPU initialization code for ResetTrainingData

* move data to GPU in parallel

* fix a bug during feature copy

* update gpu kernels

* update gpu code

* initial port to LightGBM v2

* speedup GPU data loading process

* Add 4-bit bin support to GPU

* re-add sparse_threshold parameter

* remove kMaxNumWorkgroups and allows an unlimited number of features

* add feature mask support for skipping unused features

* enable kernel cache

* use GPU kernels withoug feature masks when all features are used

* REAdme.

* REAdme.

* update README

* fix typos (#349)

* change compile to gcc on Apple as default

* clean vscode related file

* refine api of constructing from sampling data.

* fix bug in the last commit.

* more efficient algorithm to sample k from n.

* fix bug in filter bin

* change to boost from average output.

* fix tests.

* only stop training when all classes are finshed in multi-class.

* limit the max tree output. change hessian in multi-class objective.

* robust tree model loading.

* fix test.

* convert the probabilities to raw score in boost_from_average of classification.

* fix the average label for binary classification.

* Add boost_from_average to docs (#354)

* don't use "ConvertToRawScore" for self-defined objective function.

* boost_from_average seems doesn't work well in binary classification. remove it.

* For a better jump link (#355)

* Update Python-API.md

* for a better jump in page

A space is needed between `#` and the headers content according to Github's markdown format [guideline](https://guides.github.com/features/mastering-markdown/)

After adding the spaces, we can jump to the exact position in page by click the link.

* fixed something mentioned by @wxchan

* Update Python-API.md

* add FitByExistingTree.

* adapt GPU tree learner for FitByExistingTree

* avoid NaN output.

* update boost.compute

* fix typos (#361)

* fix broken links (#359)

* update README

* disable GPU acceleration by default

* fix image url

* cleanup debug macro

* remove old README

* do not save sparse_threshold_ in FeatureGroup

* add details for new GPU settings

* ignore submodule when doing pep8 check

* allocate workspace for at least one thread during builing Feature4

* move sparse_threshold to class Dataset

* remove duplicated code in GPUTreeLearner::Split

* Remove duplicated code in FindBestThresholds and BeforeFindBestSplit

* do not rebuild ordered gradients and hessians for sparse features

* support feature groups in GPUTreeLearner

* Initial parallel learners with GPU support

* add option device, cleanup code

* clean up FindBestThresholds; add some omp parallel

* constant hessian optimization for GPU

* Fix GPUTreeLearner crash when there is zero feature

* use np.testing.assert_almost_equal() to compare lists of floats in tests

* travis for GPU
  • Loading branch information
huanzhang12 authored and guolinke committed Apr 9, 2017
1 parent db3d1f8 commit 0bb4a82
Show file tree
Hide file tree
Showing 30 changed files with 4,163 additions and 246 deletions.
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "include/boost/compute"]
path = compute
url = https://github.com/boostorg/compute
28 changes: 26 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,48 @@ before_install:
- export PATH="$HOME/miniconda/bin:$PATH"
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda
- sudo add-apt-repository ppa:george-edison55/cmake-3.x -y
- sudo apt-get update -q
- bash .travis/amd_sdk.sh;
- tar -xjf AMD-SDK.tar.bz2;
- AMDAPPSDK=${HOME}/AMDAPPSDK;
- export OPENCL_VENDOR_PATH=${AMDAPPSDK}/etc/OpenCL/vendors;
- mkdir -p ${OPENCL_VENDOR_PATH};
- sh AMD-APP-SDK*.sh --tar -xf -C ${AMDAPPSDK};
- echo libamdocl64.so > ${OPENCL_VENDOR_PATH}/amdocl64.icd;
- export LD_LIBRARY_PATH=${AMDAPPSDK}/lib/x86_64:${LD_LIBRARY_PATH};
- chmod +x ${AMDAPPSDK}/bin/x86_64/clinfo;
- ${AMDAPPSDK}/bin/x86_64/clinfo;
- export LIBRARY_PATH="$HOME/miniconda/lib:$LIBRARY_PATH"
- export LD_RUN_PATH="$HOME/miniconda/lib:$LD_RUN_PATH"
- export CPLUS_INCLUDE_PATH="$HOME/miniconda/include:$AMDAPPSDK/include/:$CPLUS_INCLUDE_PATH"

install:
- sudo apt-get install -y libopenmpi-dev openmpi-bin build-essential
- sudo apt-get install -y cmake
- conda install --yes atlas numpy scipy scikit-learn pandas matplotlib
- conda install --yes -c conda-forge boost=1.63.0
- pip install pep8


script:
- cd $TRAVIS_BUILD_DIR
- mkdir build && cd build && cmake .. && make -j
- cd $TRAVIS_BUILD_DIR/tests/c_api_test && python test.py
- cd $TRAVIS_BUILD_DIR/python-package && python setup.py install
- cd $TRAVIS_BUILD_DIR/tests/python_package_test && python test_basic.py && python test_engine.py && python test_sklearn.py && python test_plotting.py
- cd $TRAVIS_BUILD_DIR && pep8 --ignore=E501 .
- cd $TRAVIS_BUILD_DIR && pep8 --ignore=E501 --exclude=./compute .
- rm -rf build && mkdir build && cd build && cmake -DUSE_MPI=ON ..&& make -j
- cd $TRAVIS_BUILD_DIR/tests/c_api_test && python test.py
- cd $TRAVIS_BUILD_DIR/python-package && python setup.py install
- cd $TRAVIS_BUILD_DIR/tests/python_package_test && python test_basic.py && python test_engine.py && python test_sklearn.py && python test_plotting.py
- cd $TRAVIS_BUILD_DIR
- rm -rf build && mkdir build && cd build && cmake -DUSE_GPU=ON -DBOOST_ROOT="$HOME/miniconda/" -DOpenCL_INCLUDE_DIR=$AMDAPPSDK/include/ ..
- sed -i 's/std::string device_type = "cpu";/std::string device_type = "gpu";/' ../include/LightGBM/config.h
- make -j$(nproc)
- sed -i 's/std::string device_type = "gpu";/std::string device_type = "cpu";/' ../include/LightGBM/config.h
- cd $TRAVIS_BUILD_DIR/tests/c_api_test && python test.py
- cd $TRAVIS_BUILD_DIR/python-package && python setup.py install
- cd $TRAVIS_BUILD_DIR/tests/python_package_test && python test_basic.py && python test_engine.py && python test_sklearn.py && python test_plotting.py

notifications:
email: false
Expand Down
38 changes: 38 additions & 0 deletions .travis/amd_sdk.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
#!/bin/bash

# Original script from https://github.com/gregvw/amd_sdk/

# Location from which get nonce and file name from
URL="http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-tools-sdks/amd-accelerated-parallel-processing-app-sdk/"
URLDOWN="http://developer.amd.com/amd-license-agreement-appsdk/"

NONCE1_STRING='name="amd_developer_central_downloads_page_nonce"'
FILE_STRING='name="f"'
POSTID_STRING='name="post_id"'
NONCE2_STRING='name="amd_developer_central_nonce"'

#For newest FORM=`wget -qO - $URL | sed -n '/download-2/,/64-bit/p'`
FORM=`wget -qO - $URL | sed -n '/download-5/,/64-bit/p'`

# Get nonce from form
NONCE1=`echo $FORM | awk -F ${NONCE1_STRING} '{print $2}'`
NONCE1=`echo $NONCE1 | awk -F'"' '{print $2}'`
echo $NONCE1

# get the postid
POSTID=`echo $FORM | awk -F ${POSTID_STRING} '{print $2}'`
POSTID=`echo $POSTID | awk -F'"' '{print $2}'`
echo $POSTID

# get file name
FILE=`echo $FORM | awk -F ${FILE_STRING} '{print $2}'`
FILE=`echo $FILE | awk -F'"' '{print $2}'`
echo $FILE

FORM=`wget -qO - $URLDOWN --post-data "amd_developer_central_downloads_page_nonce=${NONCE1}&f=${FILE}&post_id=${POSTID}"`

NONCE2=`echo $FORM | awk -F ${NONCE2_STRING} '{print $2}'`
NONCE2=`echo $NONCE2 | awk -F'"' '{print $2}'`
echo $NONCE2

wget --content-disposition --trust-server-names $URLDOWN --post-data "amd_developer_central_nonce=${NONCE2}&f=${FILE}" -O AMD-SDK.tar.bz2;
19 changes: 18 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ PROJECT(lightgbm)

OPTION(USE_MPI "MPI based parallel learning" OFF)
OPTION(USE_OPENMP "Enable OpenMP" ON)
OPTION(USE_GPU "Enable GPU-acclerated training (EXPERIMENTAL)" OFF)

if(APPLE)
OPTION(APPLE_OUTPUT_DYLIB "Output dylib shared library" OFF)
Expand All @@ -34,8 +35,17 @@ else()
endif()
endif(USE_OPENMP)

if(USE_GPU)
find_package(OpenCL REQUIRED)
include_directories(${OpenCL_INCLUDE_DIRS})
MESSAGE(STATUS "OpenCL include directory:" ${OpenCL_INCLUDE_DIRS})
find_package(Boost 1.56.0 COMPONENTS filesystem system REQUIRED)
include_directories(${Boost_INCLUDE_DIRS})
ADD_DEFINITIONS(-DUSE_GPU)
endif(USE_GPU)

if(UNIX OR MINGW OR CYGWIN)
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread -O3 -Wall -std=c++11")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread -O3 -Wall -std=c++11 -Wno-ignored-attributes")
endif()

if(MSVC)
Expand Down Expand Up @@ -65,11 +75,13 @@ endif()


SET(LightGBM_HEADER_DIR ${PROJECT_SOURCE_DIR}/include)
SET(BOOST_COMPUTE_HEADER_DIR ${PROJECT_SOURCE_DIR}/compute/include)

SET(EXECUTABLE_OUTPUT_PATH ${PROJECT_SOURCE_DIR})
SET(LIBRARY_OUTPUT_PATH ${PROJECT_SOURCE_DIR})

include_directories (${LightGBM_HEADER_DIR})
include_directories (${BOOST_COMPUTE_HEADER_DIR})

if(APPLE)
if (APPLE_OUTPUT_DYLIB)
Expand Down Expand Up @@ -105,6 +117,11 @@ if(USE_MPI)
TARGET_LINK_LIBRARIES(_lightgbm ${MPI_CXX_LIBRARIES})
endif(USE_MPI)

if(USE_GPU)
TARGET_LINK_LIBRARIES(lightgbm ${OpenCL_LIBRARY} ${Boost_LIBRARIES})
TARGET_LINK_LIBRARIES(_lightgbm ${OpenCL_LIBRARY} ${Boost_LIBRARIES})
endif(USE_GPU)

if(WIN32 AND (MINGW OR CYGWIN))
TARGET_LINK_LIBRARIES(lightgbm Ws2_32)
TARGET_LINK_LIBRARIES(_lightgbm Ws2_32)
Expand Down
1 change: 1 addition & 0 deletions compute
Submodule compute added at 1380a0
5 changes: 3 additions & 2 deletions include/LightGBM/bin.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ class BinMapper {
explicit BinMapper(const void* memory);
~BinMapper();

static double kSparseThreshold;
bool CheckAlign(const BinMapper& other) const {
if (num_bin_ != other.num_bin_) {
return false;
Expand Down Expand Up @@ -258,6 +257,7 @@ class BinIterator {
* \return Bin data
*/
virtual uint32_t Get(data_size_t idx) = 0;
virtual uint32_t RawGet(data_size_t idx) = 0;
virtual void Reset(data_size_t idx) = 0;
virtual ~BinIterator() = default;
};
Expand Down Expand Up @@ -383,12 +383,13 @@ class Bin {
* \param num_bin Number of bin
* \param sparse_rate Sparse rate of this bins( num_bin0/num_data )
* \param is_enable_sparse True if enable sparse feature
* \param sparse_threshold Threshold for treating a feature as a sparse feature
* \param is_sparse Will set to true if this bin is sparse
* \param default_bin Default bin for zeros value
* \return The bin data object
*/
static Bin* CreateBin(data_size_t num_data, int num_bin,
double sparse_rate, bool is_enable_sparse, bool* is_sparse);
double sparse_rate, bool is_enable_sparse, double sparse_threshold, bool* is_sparse);

/*!
* \brief Create object for bin data of one feature, used for dense feature
Expand Down
18 changes: 18 additions & 0 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,11 @@ struct IOConfig: public ConfigBase {
int num_iteration_predict = -1;
bool is_pre_partition = false;
bool is_enable_sparse = true;
/*! \brief The threshold of zero elements precentage for treating a feature as a sparse feature.
* Default is 0.8, where a feature is treated as a sparse feature when there are over 80% zeros.
* When setting to 1.0, all features are processed as dense features.
*/
double sparse_threshold = 0.8;
bool use_two_round_loading = false;
bool is_save_binary_file = false;
bool enable_load_from_binary_file = true;
Expand Down Expand Up @@ -188,6 +193,16 @@ struct TreeConfig: public ConfigBase {
// max_depth < 0 means no limit
int max_depth = -1;
int top_k = 20;
/*! \brief OpenCL platform ID. Usually each GPU vendor exposes one OpenCL platform.
* Default value is -1, using the system-wide default platform
*/
int gpu_platform_id = -1;
/*! \brief OpenCL device ID in the specified platform. Each GPU in the selected platform has a
* unique device ID. Default value is -1, using the default device in the selected platform
*/
int gpu_device_id = -1;
/*! \brief Set to true to use double precision math on GPU (default using single precision) */
bool gpu_use_dp = false;
LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
};

Expand Down Expand Up @@ -216,11 +231,14 @@ struct BoostingConfig: public ConfigBase {
// only used for the regression. Will boost from the average labels.
bool boost_from_average = true;
std::string tree_learner_type = "serial";
std::string device_type = "cpu";
TreeConfig tree_config;
LIGHTGBM_EXPORT void Set(const std::unordered_map<std::string, std::string>& params) override;
private:
void GetTreeLearnerType(const std::unordered_map<std::string,
std::string>& params);
void GetDeviceType(const std::unordered_map<std::string,
std::string>& params);
};

/*! \brief Config for Network */
Expand Down
25 changes: 25 additions & 0 deletions include/LightGBM/dataset.h
Original file line number Diff line number Diff line change
Expand Up @@ -355,6 +355,9 @@ class Dataset {
inline int Feture2SubFeature(int feature_idx) const {
return feature2subfeature_[feature_idx];
}
inline uint64_t GroupBinBoundary(int group_idx) const {
return group_bin_boundaries_[group_idx];
}
inline uint64_t NumTotalBin() const {
return group_bin_boundaries_.back();
}
Expand Down Expand Up @@ -421,19 +424,36 @@ class Dataset {
const int sub_feature = feature2subfeature_[i];
return feature_groups_[group]->bin_mappers_[sub_feature]->num_bin();
}

inline int FeatureGroupNumBin(int group) const {
return feature_groups_[group]->num_total_bin_;
}

inline const BinMapper* FeatureBinMapper(int i) const {
const int group = feature2group_[i];
const int sub_feature = feature2subfeature_[i];
return feature_groups_[group]->bin_mappers_[sub_feature].get();
}

inline const Bin* FeatureBin(int i) const {
const int group = feature2group_[i];
return feature_groups_[group]->bin_data_.get();
}

inline const Bin* FeatureGroupBin(int group) const {
return feature_groups_[group]->bin_data_.get();
}

inline BinIterator* FeatureIterator(int i) const {
const int group = feature2group_[i];
const int sub_feature = feature2subfeature_[i];
return feature_groups_[group]->SubFeatureIterator(sub_feature);
}

inline BinIterator* FeatureGroupIterator(int group) const {
return feature_groups_[group]->FeatureGroupIterator();
}

inline double RealThreshold(int i, uint32_t threshold) const {
const int group = feature2group_[i];
const int sub_feature = feature2subfeature_[i];
Expand Down Expand Up @@ -461,6 +481,9 @@ class Dataset {
/*! \brief Get Number of used features */
inline int num_features() const { return num_features_; }

/*! \brief Get Number of feature groups */
inline int num_feature_groups() const { return num_groups_;}

/*! \brief Get Number of total features */
inline int num_total_features() const { return num_total_features_; }

Expand Down Expand Up @@ -516,6 +539,8 @@ class Dataset {
Metadata metadata_;
/*! \brief index of label column */
int label_idx_ = 0;
/*! \brief Threshold for treating a feature as a sparse feature */
double sparse_threshold_;
/*! \brief store feature names */
std::vector<std::string> feature_names_;
/*! \brief store feature names */
Expand Down
17 changes: 15 additions & 2 deletions include/LightGBM/feature_group.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,11 @@ class FeatureGroup {
* \param bin_mappers Bin mapper for features
* \param num_data Total number of data
* \param is_enable_sparse True if enable sparse feature
* \param sparse_threshold Threshold for treating a feature as a sparse feature
*/
FeatureGroup(int num_feature,
std::vector<std::unique_ptr<BinMapper>>& bin_mappers,
data_size_t num_data, bool is_enable_sparse) : num_feature_(num_feature) {
data_size_t num_data, double sparse_threshold, bool is_enable_sparse) : num_feature_(num_feature) {
CHECK(static_cast<int>(bin_mappers.size()) == num_feature);
// use bin at zero to store default_bin
num_total_bin_ = 1;
Expand All @@ -46,7 +47,7 @@ class FeatureGroup {
}
double sparse_rate = 1.0f - static_cast<double>(cnt_non_zero) / (num_data);
bin_data_.reset(Bin::CreateBin(num_data, num_total_bin_,
sparse_rate, is_enable_sparse, &is_sparse_));
sparse_rate, is_enable_sparse, sparse_threshold, &is_sparse_));
}
/*!
* \brief Constructor from memory
Expand Down Expand Up @@ -120,6 +121,18 @@ class FeatureGroup {
uint32_t default_bin = bin_mappers_[sub_feature]->GetDefaultBin();
return bin_data_->GetIterator(min_bin, max_bin, default_bin);
}

/*!
* \brief Returns a BinIterator that can access the entire feature group's raw data.
* The RawGet() function of the iterator should be called for best efficiency.
* \return A pointer to the BinIterator object
*/
inline BinIterator* FeatureGroupIterator() {
uint32_t min_bin = bin_offsets_[0];
uint32_t max_bin = bin_offsets_.back() - 1;
uint32_t default_bin = 0;
return bin_data_->GetIterator(min_bin, max_bin, default_bin);
}

inline data_size_t Split(
int sub_feature,
Expand Down
9 changes: 6 additions & 3 deletions include/LightGBM/tree_learner.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ class TreeLearner {
/*!
* \brief Initialize tree learner with training dataset
* \param train_data The used training data
* \param is_constant_hessian True if all hessians share the same value
*/
virtual void Init(const Dataset* train_data) = 0;
virtual void Init(const Dataset* train_data, bool is_constant_hessian) = 0;

virtual void ResetTrainingData(const Dataset* train_data) = 0;

Expand Down Expand Up @@ -71,10 +72,12 @@ class TreeLearner {

/*!
* \brief Create object of tree learner
* \param type Type of tree learner
* \param learner_type Type of tree learner
* \param device_type Type of tree learner
* \param tree_config config of tree
*/
static TreeLearner* CreateTreeLearner(const std::string& type,
static TreeLearner* CreateTreeLearner(const std::string& learner_type,
const std::string& device_type,
const TreeConfig* tree_config);
};

Expand Down
4 changes: 2 additions & 2 deletions src/boosting/gbdt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,10 @@ void GBDT::ResetTrainingData(const BoostingConfig* config, const Dataset* train_

if (train_data_ != train_data && train_data != nullptr) {
if (tree_learner_ == nullptr) {
tree_learner_ = std::unique_ptr<TreeLearner>(TreeLearner::CreateTreeLearner(new_config->tree_learner_type, &new_config->tree_config));
tree_learner_ = std::unique_ptr<TreeLearner>(TreeLearner::CreateTreeLearner(new_config->tree_learner_type, new_config->device_type, &new_config->tree_config));
}
// init tree learner
tree_learner_->Init(train_data);
tree_learner_->Init(train_data, is_constant_hessian_);

// push training metrics
training_metrics_.clear();
Expand Down
6 changes: 2 additions & 4 deletions src/io/bin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -339,12 +339,10 @@ template class OrderedSparseBin<uint8_t>;
template class OrderedSparseBin<uint16_t>;
template class OrderedSparseBin<uint32_t>;

double BinMapper::kSparseThreshold = 0.8f;

Bin* Bin::CreateBin(data_size_t num_data, int num_bin, double sparse_rate,
bool is_enable_sparse, bool* is_sparse) {
bool is_enable_sparse, double sparse_threshold, bool* is_sparse) {
// sparse threshold
if (sparse_rate >= BinMapper::kSparseThreshold && is_enable_sparse) {
if (sparse_rate >= sparse_threshold && is_enable_sparse) {
*is_sparse = true;
return CreateSparseBin(num_data, num_bin);
} else {
Expand Down
Loading

0 comments on commit 0bb4a82

Please sign in to comment.