-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial GPU acceleration support for LightGBM #368
Merged
Merged
Changes from all commits
Commits
Show all changes
97 commits
Select commit
Hold shift + click to select a range
4810c79
add dummy gpu solver code
huanzhang12 e41ba15
initial GPU code
huanzhang12 6dde565
fix crash bug
huanzhang12 2dce7d1
first working version
huanzhang12 146b2dd
use asynchronous copy
huanzhang12 1f39a03
use a better kernel for root
huanzhang12 435674d
parallel read histogram
huanzhang12 22f478a
sparse features now works, but no acceleration, compute on CPU
huanzhang12 cfd77ae
compute sparse feature on CPU simultaneously
huanzhang12 40c3212
fix big bug; add gpu selection; add kernel selection
huanzhang12 c3398c9
better debugging
huanzhang12 76a13c7
clean up
huanzhang12 2dc4555
add feature scatter
huanzhang12 d4c1c01
Add sparse_threshold control
huanzhang12 97da274
fix a bug in feature scatter
huanzhang12 a96ca80
clean up debug
huanzhang12 9be6438
temporarily add OpenCL kernels for k=64,256
huanzhang12 cbef453
fix up CMakeList and definition USE_GPU
huanzhang12 4d08152
add OpenCL kernels as string literals
huanzhang12 624d405
Add boost.compute as a submodule
huanzhang12 11b241f
add boost dependency into CMakeList
huanzhang12 5142f19
fix opencl pragma
huanzhang12 508b48c
use pinned memory for histogram
huanzhang12 1a63b99
use pinned buffer for gradients and hessians
huanzhang12 e2166b1
better debugging message
huanzhang12 3b24e33
add double precision support on GPU
huanzhang12 e7336ee
fix boost version in CMakeList
huanzhang12 b29fec7
Add a README
huanzhang12 97fed3e
reconstruct GPU initialization code for ResetTrainingData
huanzhang12 164dbd1
move data to GPU in parallel
huanzhang12 c1c605e
fix a bug during feature copy
huanzhang12 c5ab1ae
update gpu kernels
huanzhang12 947629a
update gpu code
huanzhang12 105b0dd
initial port to LightGBM v2
huanzhang12 ba2c0a3
speedup GPU data loading process
huanzhang12 a6cb794
Add 4-bit bin support to GPU
huanzhang12 ed929cb
re-add sparse_threshold parameter
huanzhang12 2cd3d85
remove kMaxNumWorkgroups and allows an unlimited number of features
huanzhang12 4d2758f
add feature mask support for skipping unused features
huanzhang12 62bc04e
enable kernel cache
huanzhang12 e4dd344
use GPU kernels withoug feature masks when all features are used
huanzhang12 61b09a3
REAdme.
da20fc0
REAdme.
2d43e36
update README
huanzhang12 9602cd7
update to v2
huanzhang12 cd52bb0
fix typos (#349)
wxchan be91a98
change compile to gcc on Apple as default
chivee 8f1d05e
clean vscode related file
chivee 411383f
refine api of constructing from sampling data.
guolinke 487660e
fix bug in the last commit.
guolinke 882f420
more efficient algorithm to sample k from n.
guolinke 7d0f338
fix bug in filter bin
guolinke 0b44817
change to boost from average output.
guolinke 85a3ba4
fix tests.
guolinke f615ba0
only stop training when all classes are finshed in multi-class.
guolinke fbed3ca
limit the max tree output. change hessian in multi-class objective.
guolinke 8eb961b
robust tree model loading.
guolinke 10cd85f
fix test.
guolinke e57ec49
convert the probabilities to raw score in boost_from_average of class…
guolinke 39965a0
fix the average label for binary classification.
guolinke 8ac77dc
Add boost_from_average to docs (#354)
Laurae2 25f6268
don't use "ConvertToRawScore" for self-defined objective function.
guolinke bf3dfb6
boost_from_average seems doesn't work well in binary classification. …
guolinke 22df883
For a better jump link (#355)
JayveeHe 9f4d2f0
add FitByExistingTree.
guolinke f54ac4d
adapt GPU tree learner for FitByExistingTree
huanzhang12 59c473b
avoid NaN output.
guolinke a0549d1
update boost.compute
huanzhang12 5e945d2
fix typos (#361)
zhangyafeikimi 3891cdb
fix broken links (#359)
wxchan 48b4d9d
update README
huanzhang12 7248e58
disable GPU acceleration by default
huanzhang12 56fe2cc
fix image url
huanzhang12 1c51775
cleanup debug macro
huanzhang12 78ae386
Initial GPU acceleration
huanzhang12 2690181
Merge remote-tracking branch 'gpudev/master'
huanzhang12 f3573d5
remove old README
huanzhang12 12e5b82
do not save sparse_threshold_ in FeatureGroup
huanzhang12 1159854
add details for new GPU settings
huanzhang12 c719ead
ignore submodule when doing pep8 check
huanzhang12 15c97b4
allocate workspace for at least one thread during builing Feature4
huanzhang12 cb35a02
move sparse_threshold to class Dataset
huanzhang12 a039a3a
remove duplicated code in GPUTreeLearner::Split
huanzhang12 35ab97f
Remove duplicated code in FindBestThresholds and BeforeFindBestSplit
huanzhang12 28c1715
do not rebuild ordered gradients and hessians for sparse features
huanzhang12 2af1860
support feature groups in GPUTreeLearner
huanzhang12 475cf8c
Merge remote-tracking branch 'upstream/master'
huanzhang12 4d5d957
Initial parallel learners with GPU support
huanzhang12 4b44173
add option device, cleanup code
huanzhang12 b948c1f
clean up FindBestThresholds; add some omp parallel
huanzhang12 50f7da1
Merge remote-tracking branch 'upstream/master'
huanzhang12 3a16753
Merge remote-tracking branch 'upstream/master'
huanzhang12 2b0514e
constant hessian optimization for GPU
huanzhang12 e72d8cd
Fix GPUTreeLearner crash when there is zero feature
huanzhang12 a68ae52
use np.testing.assert_almost_equal() to compare lists of floats in tests
huanzhang12 2ac5103
travis for GPU
huanzhang12 edb30a6
Merge remote-tracking branch 'upstream/master'
huanzhang12 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[submodule "include/boost/compute"] | ||
path = compute | ||
url = https://github.com/boostorg/compute | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#!/bin/bash | ||
|
||
# Original script from https://github.com/gregvw/amd_sdk/ | ||
|
||
# Location from which get nonce and file name from | ||
URL="http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-tools-sdks/amd-accelerated-parallel-processing-app-sdk/" | ||
URLDOWN="http://developer.amd.com/amd-license-agreement-appsdk/" | ||
|
||
NONCE1_STRING='name="amd_developer_central_downloads_page_nonce"' | ||
FILE_STRING='name="f"' | ||
POSTID_STRING='name="post_id"' | ||
NONCE2_STRING='name="amd_developer_central_nonce"' | ||
|
||
#For newest FORM=`wget -qO - $URL | sed -n '/download-2/,/64-bit/p'` | ||
FORM=`wget -qO - $URL | sed -n '/download-5/,/64-bit/p'` | ||
|
||
# Get nonce from form | ||
NONCE1=`echo $FORM | awk -F ${NONCE1_STRING} '{print $2}'` | ||
NONCE1=`echo $NONCE1 | awk -F'"' '{print $2}'` | ||
echo $NONCE1 | ||
|
||
# get the postid | ||
POSTID=`echo $FORM | awk -F ${POSTID_STRING} '{print $2}'` | ||
POSTID=`echo $POSTID | awk -F'"' '{print $2}'` | ||
echo $POSTID | ||
|
||
# get file name | ||
FILE=`echo $FORM | awk -F ${FILE_STRING} '{print $2}'` | ||
FILE=`echo $FILE | awk -F'"' '{print $2}'` | ||
echo $FILE | ||
|
||
FORM=`wget -qO - $URLDOWN --post-data "amd_developer_central_downloads_page_nonce=${NONCE1}&f=${FILE}&post_id=${POSTID}"` | ||
|
||
NONCE2=`echo $FORM | awk -F ${NONCE2_STRING} '{print $2}'` | ||
NONCE2=`echo $NONCE2 | awk -F'"' '{print $2}'` | ||
echo $NONCE2 | ||
|
||
wget --content-disposition --trust-server-names $URLDOWN --post-data "amd_developer_central_nonce=${NONCE2}&f=${FILE}" -O AMD-SDK.tar.bz2; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we avoid the use of boost? it seems you only use it for the aligned_alloc .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use Boost.Compute as the interface API to GPU, so unfortunately Boost dependency cannot be easily removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if GPU support is not enabled at compile time, Boost is not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guolinke Can you check for the licensing?
Boost licensing shouldn't be an issue as Boost users are free to do whatever they do with it. The only obligation is to not remove the Boost license from the submodule (if you create compiled code afterwards, then putting the Boost license is not required - it must be only there before compiling code).
Users are able to fetch the LightGBM repository without Boost using
recursive
, while those who want the full repository can userecursive
on git.