Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CUDA treelearner according to changes introduced for linear trees #3750

Merged
merged 3 commits into from
Jan 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/cuda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ jobs:
export ROOT_DOCKER_FOLDER=/LightGBM
cat > docker.env <<EOF
TASK=cuda
METHOD=source
COMPILER=gcc
GITHUB_ACTIONS=true
OS_NAME=linux
Expand Down
4 changes: 2 additions & 2 deletions src/treelearner/cuda_tree_learner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -534,8 +534,8 @@ void CUDATreeLearner::InitGPU(int num_gpu) {
copyDenseFeature();
}

Tree* CUDATreeLearner::Train(const score_t* gradients, const score_t *hessians) {
Tree *ret = SerialTreeLearner::Train(gradients, hessians);
Tree* CUDATreeLearner::Train(const score_t* gradients, const score_t *hessians, bool is_first_tree) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gha run cuda-builds

Copy link
Contributor

@ChipKerchner ChipKerchner Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CUDA not being built automatically with each pull request?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We didn't set it to be built automatically as it was constantly failing.
Now I think we can set it as ordinary test which will be required to be passed to merge new pull requests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gha run cuda-builds

Copy link
Collaborator Author

@StrikerRUS StrikerRUS Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the error is still here. But with fixes from #3748 SOME tests are now passed (🎉), while without it all tests were failing.
The error is (I believe the same as before)

[LightGBM] [Fatal] [CUDA] invalid argument /LightGBM/src/treelearner/cuda_tree_learner.cpp 414

Full test report (. - success, F - failure, s - skipped, x - expected to fail):

============================= test session starts ==============================
platform linux -- Python 3.8.2, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /LightGBM
collected 238 items

../tests/c_api_test/test_.py ..                                          [  0%]
../tests/python_package_test/test_basic.py ...F..FF.....                 [  6%]
../tests/python_package_test/test_consistency.py ......                  [  8%]
../tests/python_package_test/test_dask.py FFFFFFFFFFFFFFFFFFFFFFFFFF..   [ 20%]
../tests/python_package_test/test_dual.py s                              [ 21%]
../tests/python_package_test/test_engine.py FF...F.....FFFF...FFF..FFFF. [ 32%]
.FFFF.FFFF.....F...FFFFFF...FF.FFF.FF.F                                  [ 49%]
../tests/python_package_test/test_plotting.py F...F                      [ 51%]
../tests/python_package_test/test_sklearn.py .F.F.F.FFFFFFFF.FF..FF...FF [ 62%]
FF.FF.xFF..F..FF...FF.F..F.FFF..F.F..FF.....FF.FxFF..F..FF...FF.F......F [ 92%]
.F..FF.....FF.F..                                                        [100%]
= 123 failed, 112 passed, 1 skipped, 2 xfailed, 69 warnings in 152.56s (0:02:32) =

https://github.com/microsoft/LightGBM/runs/1682722249?check_suite_focus=true

@guolinke @jameslamb @btrotta I think we need to merge this PR first to sync CUDA code with the rest codebase. Meantime I'll prepare new minimal reproducible example to show it @ChipKerchner .

UPD: all the rest failures can be fixed by #3450 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll provide a pull request to fix the above failures.

Tree *ret = SerialTreeLearner::Train(gradients, hessians, is_first_tree);
return ret;
}

Expand Down
2 changes: 1 addition & 1 deletion src/treelearner/cuda_tree_learner.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ class CUDATreeLearner: public SerialTreeLearner {
~CUDATreeLearner();
void Init(const Dataset* train_data, bool is_constant_hessian) override;
void ResetTrainingDataInner(const Dataset* train_data, bool is_constant_hessian, bool reset_multi_val_bin) override;
Tree* Train(const score_t* gradients, const score_t *hessians);
Tree* Train(const score_t* gradients, const score_t *hessians, bool is_first_tree);
void SetBaggingData(const Dataset* subset, const data_size_t* used_indices, data_size_t num_data) override {
SerialTreeLearner::SetBaggingData(subset, used_indices, num_data);
if (subset == nullptr && used_indices != nullptr) {
Expand Down