-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] CUDA CI jobs failing: "Certificate verification failed" #4646
Comments
It's hard to tell if this is still an issue by looking at more recent CI jobs, since they're now failing before trying to install But I can see at https://ngc.nvidia.com/catalog/containers/nvidia:cuda/tags that none of the |
I was able to reproduce this in docker locally. docker run -it nvcr.io/nvidia/cuda:9.0-devel /bin/bash
echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
apt-get update
apt-get install --no-install-recommends -y \
curl \
lsb-release \
software-properties-common
curl \
-s \
-L \
--insecure \
https://apt.kitware.com/keys/kitware-archive-latest.asc \
| apt-key add -
#curl -sL https://apt.kitware.com/keys/kitware-archive-latest.asc | apt-key add -
apt-add-repository "deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main" -y
apt-get update
apt-get install --no-install-recommends -y \
cmake
cmake --version
# cmake version 3.5.1 v3.5.1 was released in March 2016. |
I don't think the issue is with kitware's apt package channel, and now I'm more convinced that it is about outdated certificates in the NVIDIA images. Following the instructions at https://apt.kitware.com/, I'm able to successfully install docker run -it ubuntu:16.04 /bin/bash
apt-get update
apt-get install apt-transport-https wget
wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null \
| gpg --dearmor - \
| tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null
echo 'deb [signed-by=/usr/share/keyrings/kitware-archive-keyring.gpg] https://apt.kitware.com/ubuntu/ xenial main' \
| tee /etc/apt/sources.list.d/kitware.list >/dev/null
apt-get update
apt-get install -y --no-install-recommends \
cmake
cmake --version
# cmake version 3.20.5 |
@StrikerRUS I've opened an issue with NVIDIA documenting the challenges we faced. https://gitlab.com/nvidia/container-images/cuda/-/issues/140 |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description
CUDA CI jobs in this project have been failing for the last few days with errors like the following.
These errors are happening because installations of
cmake
are failing.I agree with #4636 (comment), based on the timing it seems like this could be related to the recent expiration of the root certificate used by Let's Encrypt (https://scotthelme.co.uk/lets-encrypt-old-root-expiration/).
Reproducible example
This has been happening on CUDA CI jobs for the last few days.
For example, saw that on jobs for #4636, like https://github.com/microsoft/LightGBM/pull/4636/checks?check_run_id=3759447547.
Environment info
LightGBM CUDA CI jobs.
Additional Comments
The CUDA CI jobs run in docker containers
nvcr.io/nvidia/cuda:${cuda_version}-devel
LightGBM/.github/workflows/cuda.yml
Line 92 in a77260f
This issue should be resolved by those images being updated upstream. I think it could also be worked around by forcing an update of
openssl
at runtime.Some relevant links:
apt
: https://apt.kitware.com/LightGBM/.ci/setup.sh
Lines 85 to 106 in a77260f
nvidia/cuda
images: https://gitlab.com/nvidia/container-images/cuda/-/issuescmake
: https://gitlab.kitware.com/cmake/cmake/-/issuesThe text was updated successfully, but these errors were encountered: