-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] New CUDA version Part 1 #4630
Merged
shiyu1994
merged 176 commits into
microsoft:master
from
shiyu1994:cuda-tree-learner-subset
Mar 23, 2022
Merged
Changes from all commits
Commits
Show all changes
176 commits
Select commit
Hold shift + click to select a range
94aed50
new cuda framework
shiyu1994 18df6b2
add histogram construction kernel
9b21d2b
before removing multi-gpu
634a4f1
new cuda framework
23bcaa2
tree learner cuda kernels
6c14cd9
single tree framework ready
aa0b3de
single tree training framework
bc85ced
remove comments
18d957a
boosting with cuda
28186c0
optimize for best split find
60c7e4e
data split
57547fb
move boosting into cuda
608fd70
parallel synchronize best split point
277be8b
merge split data kernels
ffcf765
before code refactor
a58c1e1
use tasks instead of features as units for split finding
72d41c9
refactor cuda best split finder
f7a7658
fix configuration error with small leaves in data split
shiyu1994 b6efd10
skip histogram construction of too small leaf
shiyu1994 6f4e39d
skip split finding of invalid leaves
shiyu1994 4072bb8
support row wise with CUDA
shiyu1994 88ecde9
copy data for split by column
shiyu1994 dec7501
copy data from host to CPU by column for data partition
shiyu1994 2dccb7f
add synchronize best splits for one leaf from multiple blocks
shiyu1994 0168d2c
partition dense row data
shiyu1994 0570fe0
fix sync best split from task blocks
shiyu1994 374018c
add support for sparse row wise for CUDA
shiyu1994 40c49cc
remove useless code
shiyu1994 dc41a00
add l2 regression objective
shiyu1994 bd065b7
sparse multi value bin enabled for CUDA
shiyu1994 a5fadfb
fix cuda ranking objective
shiyu1994 3202b79
support for number of items <= 2048 per query
cd687c9
speedup histogram construction by interleaving global memory access
320c449
split optimization
eb1d7fa
add cuda tree predictor
dd177f5
remove comma
ee836d6
refactor objective and score updater
0467fce
before use struct
f05da3c
use structure for split information
400622a
use structure for leaf splits
d9d3aa9
return CUDASplitInfo directly after finding best split
45cf7a7
split with CUDATree directly
9dea18d
use cuda row data in cuda histogram constructor
572e2b0
clean src/treelearner/cuda
fe58d4c
gather shared cuda device functions
dc461dc
put shared CUDA functions into header file
ba565c1
change smaller leaf from <= back to < for consistent result with CPU
a781ef5
add tree predictor
c8a6fab
remove useless cuda_tree_predictor
a7504dc
predict on CUDA with pipeline
896d47b
add global sort algorithms
fe6ed74
add global argsort for queries with many items in ranking tasks
7808455
remove limitation of maximum number of items per query in ranking
7a0d218
add cuda metrics
ca42f3b
fix CUDA AUC
c681102
remove debug code
ea60566
add regression metrics
5c84788
remove useless file
c2c2407
don't use mask in shuffle reduce
b43d367
add more regression objectives
951aa37
fix cuda mape loss
b50ce5b
use template for different versions of BitonicArgSortDevice
f51fd70
add multiclass metrics
35c742d
add ndcg metric
510d878
fix cross entropy objectives and metrics
95f4612
fix cross entropy and ndcg metrics
bb997d0
add support for customized objective in CUDA
17b78d1
complete multiclass ova for CUDA
72aa863
merge master
8537b8c
separate cuda tree learner
8fb8562
use shuffle based prefix sum
883ed15
clean up cuda_algorithms.hpp
e7ffc3f
add copy subset on CUDA
d7c4bb4
add bagging for CUDA
d9bf3e5
clean up code
95fd61a
copy gradients from host to device
285c2d6
support bagging without using subset
1a09c19
add support of bagging with subset for CUDAColumnData
740f853
add support of bagging with subset for dense CUDARowData
f42e87e
refactor copy sparse subrow
0b9ca24
use copy subset for column subset
9a94240
add reset train data and reset config for CUDA tree learner
1f6dd90
add USE_CUDA ifdef to cuda tree learner files
4ca7586
check that dataset doesn't contain CUDA tree learner
25f57e3
remove printf debug information
12794b0
use full new cuda tree learner only when using single GPU
44e47ec
Merge branch 'master' of https://github.com/microsoft/LightGBM into c…
7e18687
disable all CUDA code when using CPU version
469e992
recover main.cpp
f2812c8
add cpp files for multi value bins
8e884b2
update LightGBM.vcxproj
9b9a63c
update LightGBM.vcxproj
e0c9f6f
fix lint errors
3bba6d7
fix lint errors
8f9f03e
update Makevars
01d772d
fix the case with 0 feature and 0 bin
e57dd15
fix lint errors
a5b9f7a
recover default device type to cpu
5f03d45
fix na_as_missing case
b2aaa9f
fix UpdateDataIndexToLeafIndexKernel
shiyu1994 0726d87
create CUDA trees when needed in CUDADataPartition::UpdateTrainScore
shiyu1994 1dea6bc
add refit by tree for cuda tree learner
shiyu1994 14b9ce9
fix test_refit in test_engine.py
4b936de
create set of large bin partitions in CUDARowData
4193768
add histogram construction for columns with a large number of bins
shiyu1994 0b6e79e
add find best split for categorical features on CUDA
shiyu1994 25f20a7
add bitvectors for categorical split
shiyu1994 82c33e4
cuda data partition split for categorical features
shiyu1994 ca16070
fix split tree with categorical feature
shiyu1994 c8716f1
fix categorical feature splits
shiyu1994 4bcaa03
refactor cuda_data_partition.cu with multi-level templates
shiyu1994 536f603
refactor CUDABestSplitFinder by grouping task information into struct
shiyu1994 015e099
pre-allocate space for vector split_find_tasks_ in CUDABestSplitFinder
shiyu1994 4c260d2
fix misuse of reference
shiyu1994 89d8214
remove useless changes
shiyu1994 54bc66a
add support for path smoothing
shiyu1994 86e208a
virtual destructor for LightGBM::Tree
shiyu1994 d888f1d
fix overlapped cat threshold in best split infos
shiyu1994 5efe0fb
reset histogram pointers in data partition and spllit finder in Reset…
shiyu1994 559a569
merge with LightGBM/master
shiyu1994 0bb88fb
comment useless parameter
shiyu1994 0678d9a
fix reverse case when na is missing and default bin is zero
shiyu1994 26130d9
fix mfb_is_na and mfb_is_zero and is_single_feature_column
shiyu1994 d49e92a
remove debug log
shiyu1994 3214d68
fix cat_l2 when one-hot
shiyu1994 361d2b0
merge master
shiyu1994 85ea408
switch shared histogram size according to CUDA version
shiyu1994 2af0f5d
gpu_use_dp=true when cuda test
shiyu1994 d0a628f
revert modification in config.h
shiyu1994 e0018ea
fix setting of gpu_use_dp=true in .ci/test.sh
shiyu1994 e54b51a
fix linter errors
shiyu1994 541235f
fix linter error
shiyu1994 a2ead3c
recover main.cpp
shiyu1994 2a81af6
separate cuda_exp and cuda
shiyu1994 9881075
fix ci bash scripts
shiyu1994 52b1e88
add USE_CUDA_EXP flag
shiyu1994 09054a1
Merge branch 'master' into cuda-tree-learner-subset
shiyu1994 c2a0be8
switch off USE_CUDA_EXP
shiyu1994 0651cca
Merge remote-tracking branch 'LightGBM/master' into cuda-tree-learner…
shiyu1994 fbc3760
revert changes in python-packages
shiyu1994 c58635b
more careful separation for USE_CUDA_EXP
shiyu1994 93d5950
fix CUDARowData::DivideCUDAFeatureGroups
shiyu1994 9f6aa8a
revert config.h
shiyu1994 12d8161
fix test settings for cuda experimental version
shiyu1994 354845e
skip some tests due to unsupported features or differences in impleme…
shiyu1994 cb49dd1
fix lint issue by adding a blank line
shiyu1994 2e2c696
fix lint errors by resorting imports
shiyu1994 3433674
fix lint errors by resorting imports
shiyu1994 0c94bdd
Merge branch 'master' into cuda-tree-learner-subset
shiyu1994 c72d555
fix lint errors by resorting imports
shiyu1994 63a9dc1
merge cuda.yml and cuda_exp.yml
shiyu1994 31ac33b
update python version in cuda.yml
shiyu1994 5f1f38d
remove cuda_exp.yml
shiyu1994 ba22deb
remove unrelated changes
shiyu1994 b008424
fix compilation warnings
shiyu1994 fad4b91
resolve conflicts with master
shiyu1994 55a94b5
Merge branch 'cuda-tree-learner-subset' of https://github.com/shiyu19…
shiyu1994 d77dd23
recover task
shiyu1994 6a9d530
use multi-level template in histogram construction
shiyu1994 4adca58
ignore NVCC related lines in parameter_generator.py
shiyu1994 8d99b2b
Merge remote-tracking branch 'LightGBM/master' into cuda-tree-learner…
shiyu1994 1e23342
update job name for CUDA tests
shiyu1994 f44b881
apply review suggestions
shiyu1994 d7b65c4
Update .github/workflows/cuda.yml
shiyu1994 a6a51fd
Update .github/workflows/cuda.yml
shiyu1994 9135582
update header
shiyu1994 cd101ae
remove useless TODOs
shiyu1994 9af98ac
remove [TODO(shiyu1994): constrain the split with min_data_in_group] …
shiyu1994 e34fcce
#include <LightGBM/utils/log.h> for USE_CUDA_EXP only
shiyu1994 499639d
fix include order
shiyu1994 6fe4874
fix include order
shiyu1994 3cf4c74
remove extra space
shiyu1994 34fdfe4
address review comments
shiyu1994 3bb91ae
add warning when cuda_exp is used together with deterministic
shiyu1994 e47d009
add comment about gpu_use_dp in .ci/test.sh
shiyu1994 53430dd
revert changing order of included headers
shiyu1994 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to handle this here:
LightGBM/src/io/config.cpp
Lines 342 to 346 in d130bb1
Also, we can avoid
if/else
complicatation by using$TASK
env variable as a value fordevice_type
config value and Python-package installation flag.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I'll handle these unresolved comments today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether I understand your idea correctly. The old CUDA version only supports double precision training. But the new CUDA version will support both double precision and single precision training. Users can specify the mode through
gpu_use_dp
. We use single precision training in new CUDA version by default because it is faster without hurting the accuracy. However, to ensure that results are identical to that on CPU (which uses double precision histograms), we need to switch to double precision training in CI test. That's why we need to replace the defaultgpu_use_dp
setting here intest.sh
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After trying, I found that it will further complicate the bash code since single quotes will treat everything inside literally. And the python options
cuda-exp
is not identical with device typecuda_exp
(We use-
instead of_
for consistency with other python build options).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I got it. Thanks for the explanation! Please add a short comment explaining why we need to change
gpu_use_dp
param here, in CI files. It wasn't obvious for me, so I thought that single precision isn't supported, just like it is in our current CUDA implementation. However, it's not true according to your comments.OK, agree with you. Thanks for trying that!