-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aws-k8s-1.21-nvidia variant #1799
Add aws-k8s-1.21-nvidia variant #1799
Conversation
41ff0f8
to
95a3bdf
Compare
Forced push includes:
|
95a3bdf
to
aab3bcd
Compare
Forced push includes:
|
aab3bcd
to
d5fc8a0
Compare
Forced push includes:
|
packages/nvidia-container-toolkit/nvidia-container-toolkit-config.toml
Outdated
Show resolved
Hide resolved
packages/nvidia-container-toolkit/nvidia-container-toolkit-config.toml
Outdated
Show resolved
Hide resolved
packages/nvidia-container-toolkit/nvidia-container-toolkit-config.toml
Outdated
Show resolved
Hide resolved
What=overlay | ||
Where=PREFIX/lib/modules | ||
Type=overlay | ||
Options=noatime,nosuid,nodev,lowerdir=/lib/modules,upperdir=/var/lib/kernel-modules/upper,workdir=/var/lib/kernel-modules/work,context=system_u:object_r:state_t:s0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcressey should we add anything specific to the SELinux policy covering files in /var/lib/kernel-modules
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this would be good to do for all the host directories we create for overlayfs mechanics.
sources/driverdog/src/main.rs
Outdated
@@ -0,0 +1,359 @@ | |||
/*! | |||
driverdog is a tool to link kernel modules at runtime. It uses a toml configuration file with the following shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra indent? That shows up in the readme too.
sources/shimpei/src/main.rs
Outdated
/// Path to hooks definition, provided by runtime "vendor" | ||
const HOOK_CONFIG_PATH: &str = "/usr/share/oci-add-hooks/hook.json"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is actually vendor-specific, it might be worthwhile to hard-code the (only) vendor path we have today: /usr/share/oci-add-hooks/nvidia.json
. When we get to the point of adding a second vendor, we'd need a second compiled version of shimpei pointing to that second path anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make this change, but I would prefer if we don't hard-code the vendor specific name in shimpei. I understand that the contents of the hook.json
file are vendor specific, but there won't be any case in which we have two "vendor runtimes" enabled in the same AMI, and shimpei doesn't really care about the hook that will be executed, it only knows that there is a hook.json
file that oci-add-hooks
will use.
The hook.json
file can be provided by a kmod-<kernel>-<vendor>
package, just like I did with kmod-5.10-nvidia
, and we will save ourselves the N shimpei compilations, per vendor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid a second copy of shimpei
by doing the multicall / busybox thing where we inspect arg0 and see if we're called as nvidia-oci-hook
, amdgpupro-hook
, etc, and use the matching /usr/share/oci-add-hooks/<prefix>.json
as our input.
Then packages that wanted to use shimpei
could just install a symlink, e.g. /usr/bin/nvidia-oci-hook -> /usr/bin/shimpei
.
packages/libnvidia-container/0004-Use-NVIDIA_PATH-to-look-up-binaries.patch
Outdated
Show resolved
Hide resolved
|
||
%build | ||
%cross_go_configure %{goimport} | ||
GO111MODULE=off go build -o oci-add-hooks -ldflags "-linkmode=external" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GO111MODULE=off go build -buildmode=pie -ldflags="-linkmode=external" -o oci-add-hooks .
[plugins."io.containerd.grpc.v1.cri".containerd] | ||
default_runtime_name = "nvidia" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we setting this as the default? Wouldn't the typical experience be to have it available as an option set through RuntimeClass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The typical experience with EKS Optimzed AMI, CO'OS, and the GPU Operator is that the users don't have to set a RuntimeClass
to run their workloads. This settings helps us have parity with at least the EKS Optimized AMI.
d5fc8a0
to
7ced1ff
Compare
Forced pushed includes:
|
packages/nvidia-container-toolkit/nvidia-container-toolkit.spec
Outdated
Show resolved
Hide resolved
7ced1ff
to
9bfdfb8
Compare
Forced push includes:
|
9bfdfb8
to
6e3af9e
Compare
Rebased develop to fix conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed everything except the kmod packaging and driverdog.
packages/nvidia-container-toolkit/nvidia-container-toolkit.spec
Outdated
Show resolved
Hide resolved
6e3af9e
to
3780bbb
Compare
Forced push includes:
|
Forced push to rebase changes in libnvidia-container + k8s-device plugin |
657d8e1
to
93d010e
Compare
Forced push to fix build for aarch64 + add k8s device plugin to nvidia variant |
93d010e
to
2a7cbdb
Compare
include: | ||
- variant: aws-dev | ||
arch: x86_64 | ||
supported: false | ||
fetch-upstream: "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to override this for each item in the list, even if the default isn't changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could try removing it and see if it works 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't 😄 , I changed it back per our offline conversation.
%global spdx_id %(bottlerocket-license-tool -l $PWD/rpmbuild/BUILD/Licenses.toml spdx-id nvidia) | ||
%global license_file %(bottlerocket-license-tool -l $PWD/rpmbuild/BUILD/Licenses.toml path nvidia -p ./licenses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%global spdx_id %(bottlerocket-license-tool -l $PWD/rpmbuild/BUILD/Licenses.toml spdx-id nvidia) | |
%global license_file %(bottlerocket-license-tool -l $PWD/rpmbuild/BUILD/Licenses.toml path nvidia -p ./licenses) | |
%global spdx_id %(bottlerocket-license-tool -l %{_builddir}/Licenses.toml spdx-id nvidia) | |
%global license_file %(bottlerocket-license-tool -l %{_builddir}/Licenses.toml path nvidia -p ./licenses) |
# NVIDIA .run scripts from 0 to 199 | ||
Source0: https://us.download.nvidia.com/tesla/%{nvidia_tesla_470_version}/NVIDIA-Linux-%{_cross_arch}-%{nvidia_tesla_470_version}.run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is considered bad form to conditionally include files like this in the spec file since then a source RPM won't be complete - whether it includes the x86 or arm64 run file will depend on what it was built for rather than what the end user of the srpm wants to build for.
In the context of our distro, it's fine since we don't build source RPMs, but I'd still prefer to include both .run
files and conditionally invoke the right one, just as a stylistic twitch.
@@ -0,0 +1 @@ | |||
D /lib/modules/KERNEL_VERSION/kernel/drivers/extra/video/nvidia/tesla 0755 root root - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
D /lib/modules/KERNEL_VERSION/kernel/drivers/extra/video/nvidia/tesla 0755 root root - | |
D /lib/modules/__KERNEL_VERSION__/kernel/drivers/extra/video/nvidia/tesla 0755 root root - |
install -d %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}/{drivers,ld.so.conf.d} | ||
|
||
KERNEL_VERSION=$(cat %{kernel_sources}/include/config/kernel.release) | ||
sed -e "s|KERNEL_VERSION|$KERNEL_VERSION|" %{S:200} > nvidia.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obligatory nit:
sed -e "s|KERNEL_VERSION|$KERNEL_VERSION|" %{S:200} > nvidia.conf | |
sed -e "s|KERNEL_VERSION|${KERNEL_VERSION}|" %{S:200} > nvidia.conf |
install -m 755 nvidia-ngx-updater %{buildroot}%{_cross_libexecdir}/nvidia/tesla/bin/%{nvidia_tesla_470_version} | ||
%endif | ||
|
||
# TODO: add remaining libraries once we implement "image per variant" in the rpm2img script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO: add remaining libraries once we implement "image per variant" in the rpm2img script | |
# TODO: add remaining libraries |
# TODO: add remaining libraries once we implement "image per variant" in the rpm2img script | ||
# misc | ||
# Add libnvidia-ml.so for testing purposes | ||
install -m755 libnvidia-ml.so.%{nvidia_tesla_470_version} %{buildroot}%{_cross_libdir}/nvidia/tesla/%{nvidia_tesla_470_version} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
install -m755 libnvidia-ml.so.%{nvidia_tesla_470_version} %{buildroot}%{_cross_libdir}/nvidia/tesla/%{nvidia_tesla_470_version} | |
install -m755 libnvidia-ml.so.%{nvidia_tesla_470_version} %{buildroot}%{_cross_libdir}/nvidia/tesla/%{nvidia_tesla_470_version} |
2a7cbdb
to
9182ec2
Compare
Forced push includes documentation changes |
9182ec2
to
69ee262
Compare
Forced push to rebase the upstream |
BUILDING.md
Outdated
1. Fetch the drivers `.run` archive from the URL provided in the [kmod-5.10-nvidia package](packages/kmod-5.10-nvidia/Cargo.toml), i. e.: | ||
|
||
```shell | ||
curl -LO https://us.download.nvidia.com/tesla/470.82.01/NVIDIA-Linux-x86_64-470.82.01.run | ||
``` | ||
|
||
2. Extract the sources and copy the `LICENSE` file to the `licenses` directory in your Bottlerocket root directory: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems cleaner just to document using the HTML license file; then we won't have driver links that go out of date.
QUICKSTART-EKS.md
Outdated
Also be aware that when operating in GovCloud the IAM ARNs will need to be updated to the following: `arn:aws-us-gov`. | ||
For example: | ||
`arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy` | ||
will be updated to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I like the cleanup but I tend to want it in a separate commit - can still be in this PR though
QUICKSTART-EKS.md
Outdated
|
||
The `aws-k8s-1.21-nvidia` variant includes the required packages and configurations to leverage NVIDIA GPUs. | ||
It comes with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) included in your orchestrated containers. | ||
It also includes the [NVIDIA k8s device plugin](https://github.com/NVIDIA/k8s-device-plugin), so please make sure you don't have a daemonset running the device plugin in your cluster before launching a new node using this variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also includes the [NVIDIA k8s device plugin](https://github.com/NVIDIA/k8s-device-plugin), so please make sure you don't have a daemonset running the device plugin in your cluster before launching a new node using this variant. | |
It also includes the [NVIDIA k8s device plugin](https://github.com/NVIDIA/k8s-device-plugin). | |
If you already have a daemonset for the device plugin in your cluster, you may need to use taints and tolerations to keep it from running on Bottlerocket nodes. |
QUICKSTART-EKS.md
Outdated
It comes with the [NVIDIA Tesla driver](https://docs.nvidia.com/datacenter/tesla/drivers/index.html) along with the libraries required by the [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit) included in your orchestrated containers. | ||
It also includes the [NVIDIA k8s device plugin](https://github.com/NVIDIA/k8s-device-plugin), so please make sure you don't have a daemonset running the device plugin in your cluster before launching a new node using this variant. | ||
|
||
With this variant, most of the existing NVIDIA tools (like [DCGM](https://github.com/NVIDIA/dcgm-exporter) and the [GPU Feature Discovery](https://github.com/NVIDIA/gpu-feature-discovery)) work as you would expect, you can install them in your cluster following the instructions provided for each project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this variant, most of the existing NVIDIA tools (like [DCGM](https://github.com/NVIDIA/dcgm-exporter) and the [GPU Feature Discovery](https://github.com/NVIDIA/gpu-feature-discovery)) work as you would expect, you can install them in your cluster following the instructions provided for each project. | |
Additional NVIDIA tools such as [DCGM](https://github.com/NVIDIA/dcgm-exporter) and [GPU Feature Discovery](https://github.com/NVIDIA/gpu-feature-discovery) will work as expected. | |
You can install them in your cluster by following the `helm install` instructions provided for each project. |
QUICKSTART-EKS.md
Outdated
It also includes the [NVIDIA k8s device plugin](https://github.com/NVIDIA/k8s-device-plugin), so please make sure you don't have a daemonset running the device plugin in your cluster before launching a new node using this variant. | ||
|
||
With this variant, most of the existing NVIDIA tools (like [DCGM](https://github.com/NVIDIA/dcgm-exporter) and the [GPU Feature Discovery](https://github.com/NVIDIA/gpu-feature-discovery)) work as you would expect, you can install them in your cluster following the instructions provided for each project. | ||
Even though you could use the [GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#install-nvidia-gpu-operator) to install these tools, we recommend installing these tools individually via `helm install` since the Operator could break your workloads on upgrades. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to understand without more details. I'd suggest saying something like:
Even though you could use the [GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#install-nvidia-gpu-operator) to install these tools, we recommend installing these tools individually via `helm install` since the Operator could break your workloads on upgrades. | |
The [GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/getting-started.html#install-nvidia-gpu-operator) can also be used to install these tools. | |
However, it is cumbersome to select the right subset of features to avoid conflicts with the software included in the variant. | |
Therefore we recommend installing the tools individually if they are required. |
README.md
Outdated
@@ -54,6 +54,9 @@ The following variants support EKS, as described above: | |||
- `aws-k8s-1.19` | |||
- `aws-k8s-1.20` | |||
- `aws-k8s-1.21` | |||
- `aws-k8s-1.21-nvidia` | |||
|
|||
Please refer to [this document](QUICKSTART-EKS.md#aws-k8s-121-nvidia-variant) to learn more about the `aws-k8s-1.21-nvidia` variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to link to variant specifics here, or else we'll be doing that everywhere on this page. In general, we need to figure out a more structured approach to getting started with particular variants, since this isn't scaling very well.
SECURITY_FEATURES.md
Outdated
@@ -134,6 +134,8 @@ All binaries are linked with the following options: | |||
|
|||
Together these enable [full RELRO support](https://www.redhat.com/en/blog/hardening-elf-binaries-using-relocation-read-only-relro) which makes [ROP](https://en.wikipedia.org/wiki/Return-oriented_programming) attacks more difficult to execute. | |||
|
|||
**Note:** the `aws-k8s-1.21-nvidia` variant includes the NVIDIA k8s device plugin which is compiled without the `-wl,-z,now` flags, and other precompiled NVIDIA libraries that could haven't been compiled with hardening flags. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Note:** the `aws-k8s-1.21-nvidia` variant includes the NVIDIA k8s device plugin which is compiled without the `-wl,-z,now` flags, and other precompiled NVIDIA libraries that could haven't been compiled with hardening flags. | |
**Note:** Certain variants, such as the ones for NVIDIA, include precompiled binaries that may not have been built with these hardening flags. |
69ee262
to
36e2eff
Compare
There are some rpm commands in the Dockerfile executed as 'root', which results in macros like `%{_builddir}` to point to `/root` instead of `/home/builder`. By changing `%_topdir` to `/home/builder`, macros that rely on this value will always point to the build directory. Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
36e2eff
to
06847f1
Compare
Forced push uses |
06847f1
to
d9c1e57
Compare
BUILDING.md
Outdated
cargo make fetch-licenses -e BUILDSYS_UPSTREAM_LICENSES_FETCH=true | ||
``` | ||
|
||
3. Build your image, setting the `BUILDSYS_UPSTREAM_SOURCE_FALLBACK` flag to `true`, if you haven't cache the driver's sources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. Build your image, setting the `BUILDSYS_UPSTREAM_SOURCE_FALLBACK` flag to `true`, if you haven't cache the driver's sources: | |
3. Build your image, setting the `BUILDSYS_UPSTREAM_SOURCE_FALLBACK` flag to `true`, if you haven't cached the driver's sources: |
This commit adds the NVIDIA tesla 470 driver. The drivers are subpackages of `kmod-5.10-nvidia`, since we don't want to have a spec file per driver type per driver version. The spec file for `kmod-5.10-nvidia` holds the drivers that are compatible with the 5.10 kernel. New spec files should be added for newer kernel and driver versions. The `kmod-5.10-nvidia` package provides a tmpfilesd conf file to create the lib modules directory where the kernel modules will be created. Each subpackage installs the libraries and binaries underneath `%{_cross_bindir}/nvidia/<type>` and `%{_cross_libdir}/nvidia/<type>` respectively to prevent collisions while building the subpackages. Kernel module objects are installed in `%{_cross_datadir}/nvidia/<type>/modules`, so that `driverdog` can compile them at runtime. Each subpackage provides a drop-in configuration file for containerd, that sets the `NVIDIA_PATH` environment variable. This environment variable must be set to the directory that contains the NVIDIA userland tools, which will be mounted on the containers by `libnvidia-container`. The environment variable is set for containerd, since `libnvidia-container` is called by the runtime to set up the containers. Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
The *-nvidia variants use a different runtime instead of runc to inject prestart hooks using shimpei Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
The `fetch-upstream` variable is be used to fetch upstream sources when they aren't provided in the lookaside cache. Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
Signed-off-by: Arnaldo Garcia Rincon <agarrcia@amazon.com>
d9c1e57
to
54415cb
Compare
Forced push fixes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include a releases-url
, but not necessarily this PR.
@@ -27,30 +27,59 @@ jobs: | |||
variant: [aws-k8s-1.18, aws-k8s-1.19, aws-k8s-1.20, aws-k8s-1.21, aws-ecs-1] | |||
arch: [x86_64, aarch64] | |||
supported: [true] | |||
fetch-upstream: ["false"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. our previous booleans (and values) were unquoted, but these ones are quoted. I think I would unquote these for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous booleans values were used within the context of the github workflows, which are literal booleans. Since GH worklflows are defined in YAML, I can't use false
and expect it to get translated to "false"
, I could get a 1
, or another value that doesn't work for the environment variables passed to cargo make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌃
Issue number:
N / A
Description of changes:
Draft PR for aws-k8s-1.21-nvidia variant.
This PR adds the
aws-k8s-1.21-nvidia
variant, along with all the building blocks required to support NVIDIA GPUs in future releases. This variant includes all the configurations and software required to leverage GPUs in containerized workloads. In this variant, the default containerd runtime's name isnvidia
, and uses theshimpei
helper program as the OCI runtime binary.kmod-5.10-nvidia
The drivers are subpackages of
kmod-5.10-nvidia
, since we don't want to have a spec file per driver type per driver version. The spec file forkmod-5.10-nvidia
will hold the drivers that are compatible with the 5.10 kernel. New spec files should be added for newer kernel and driver versions. Thekmod-5.10-nvidia
package provides a tmpfilesd conf file to create the lib modules directory where the kernel modules will be created.Each subpackage installs the libraries and binaries underneath
%{_cross_bindir}/nvidia/<type>
and%{_cross_libdir}/nvidia/<type>
respectively to prevent collisions while building the subpackages. Kernel module objects are installed in%{_cross_datadir}/nvidia/<type>/modules
, so thatdriverdog
can link them at runtime.Each subpackage provides a drop-in configuration file for containerd, that sets the
NVIDIA_PATH
environment variable. This environment variable must be set to the directory that contains the NVIDIA userland tools, which will be mounted on the containers bylibnvidia-container
. The environment variable is set for containerd, sincelibnvidia-container
is called by the runtime to set up the containers.Remaining tasks:
oci-add-hooks
instead ofnvidia-container-runtime
Testing done:
I launched p3/g4dn/g5g instances and deployed the
vectoradd-cuda
sample app, I had access tonvidia-smi
and I ran the sample app:Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.