-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setup CUDA CI job #3424
setup CUDA CI job #3424
Conversation
039ae7d
to
c32f4f4
Compare
pull_request_review_comment: | ||
types: [created] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably simple issue_comment
will be easier, but it requires workflow config in the master
branch, so I cannot test it right now in this PR.
https://docs.github.com/en/free-pro-team@latest/actions/reference/events-that-trigger-workflows#issue_comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh interesting. I think it's ok to leave it like this if it works!
- name: Remove old folder with repository | ||
run: sudo rm -rf $GITHUB_WORKSPACE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step is needed because actions/checkout@v1
fails to remove old files (CMake temporary build files particularly) from previous runs, because they were created in docker by another user.
Warning: Unable to run "git clean -ffdx" and "git reset --hard HEAD" successfully, delete source folder instead.
Error: One or more errors occurred. (One or more errors occurred. (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/3.18.1/CMakeCCompiler.cmake' is denied.)) (One or more errors occurred. (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/cmake.check_cache' is denied.)) (One or more errors occurred. (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/3.18.1/CMakeDetermineCompilerABI_C.bin' is denied.)) (One or more errors occurred. (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/CMakeError.log' is denied.)) (One or more errors occurred. (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/3.18.1/CMakeDetermineCompilerABI_CXX.bin' is denied.)) (One or more errors occurred. (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeCache.txt' is denied.)) (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/3.18.1/CMakeCCompiler.cmake' is denied.) (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/cmake.check_cache' is denied.) (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/3.18.1/CMakeDetermineCompilerABI_C.bin' is denied.) (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/CMakeError.log' is denied.) (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeFiles/3.18.1/CMakeDetermineCompilerABI_CXX.bin' is denied.) (Access to the path '/home/guoke/actions-runner/_work/LightGBM/LightGBM/build/CMakeCache.txt' is denied.)
Error: Exit code 1 returned from process: file name '/home/guoke/actions-runner/bin/Runner.PluginHost', arguments 'action "GitHub.Runner.Plugins.Repository.v1_0.CheckoutTask, Runner.Plugins"'.
$ROOT_DOCKER_FOLDER/.ci/setup.sh || exit -1 | ||
$ROOT_DOCKER_FOLDER/.ci/test.sh | ||
EOF | ||
sudo docker run --env-file docker.env -v "$GITHUB_WORKSPACE":"$ROOT_DOCKER_FOLDER" --rm --gpus all nvidia/cuda:11.0-devel-ubuntu20.04 /bin/bash $ROOT_DOCKER_FOLDER/docker-script.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sudo
to workaround the following error:
docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/create: dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'.
test: | ||
name: CUDA | ||
runs-on: [self-hosted, linux] | ||
if: github.event.comment.body == '/gha run cuda-builds' && contains('OWNER,MEMBER,COLLABORATOR', github.event.comment.author_association) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/gha run cuda-builds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment above triggered the following build: https://github.com/microsoft/LightGBM/runs/1190615093?check_suite_focus=true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/gha run cuda-builds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so cool!!
c32f4f4
to
369a081
Compare
Further possible improvements:
Unfortunately, I'm not sure I'll be able to work on the items listed above. |
@StrikerRUS It seems currently both azure pipeline and github actions cannot support the power-off/turn-on machines. |
Yeah, also they are a ton of pain! For instance:
Yep, that's true! 😞 |
Unfortunately, switching to P100 didn't help to get rid of segfault. When I run
However, I don't think that it should block this PR from merging. |
@StrikerRUS so it is hardware problem or ? |
@guolinke TBH, I have no idea...
|
Sorry, this is a new feature in CMake 3.18 and I'm not familiar with it. |
@ChipKerchner Thanks for your super fast response! I'll try to re-run with older CMake. Will it be possible to adapt CMakeLists.txt code according to the recent CMake changes in the future? |
Just tested CMake |
Looks like we could do the same: |
I'm not sure that it's related to https://github.com/microsoft/LightGBM/blob/master/src/treelearner/cuda_tree_learner.cpp#L414 |
Seems not correct pointers or size |
@StrikerRUS any updates of this PR? |
Which updates do you mean? I think this PR is ready. |
Great! I thought the cuda job cannot run. |
It can but unfortunately fails with the following runtime error: #3424 (comment). |
any insights regard to these errors? @ChipKerchner |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Closed #3402.