-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builder/daemonless: pass through /dev/kvm
if present
#410
base: master
Are you sure you want to change the base?
Conversation
Fedora 34 is ancient and EOL. Bump to Fedora 40 so that it's closer to the buildroot devs are using locally to build openshift-builder. As part of the bump, we now need to also pull in netavark.
Not sure on the history there, but there are no daemonsets nowadays in the openshift-controller-manager namespace. There's a deployment object though and I see it cycle the pods when patching the builder pod, so I assume that's the correct object to watch now.
Currently, if `devices.kubevirt.io/kvm` resources are requested in the build object, the resource request makes it to the build pod, but it doesn't really have any visible effect because the nested build process itself doesn't have access to it. The only reason we'd have `/dev/kvm` in our pod is if the user wants to use it in their build. So just pass it through if we find it. The use case for this is being able to build artifacts which would normally require privileges. One example includes base bootable container (bootc) images. Building these currently requires privileges because it itself uses containerization features. In the future, this should work with user namespacing, currently in Tech Preview. However, because we need not just uid 0 but `CAP_SYS_ADMIN`, and capabilities would still be restricted by default, we would still require access to non-default SCCs. (And of course, the builder would also have to be adapted to pass through the capabilities.) Another example is building disk images and shipping them in container images. This is done for example by Kubevirt and podman-machine. Two common ways to build disk images currently are via loopback devices or virtualization. The former can't be used because loopback devices are not namespaced and require privileges. This patch enables the latter. Using virtualization enables us to build these artifacts all while using the _default_ OpenShift restricted SCC.
Just to emphasize this: doing this would have a pretty drastic impact on image mode convergence. For example, it would immediately enable us to build RHCOS in a container-native way via the OpenShift Build API (because we currently use coreos-assembler, which is already made to use virtualization for composes). E.g. this was tested to work with this patch: ARG BUILD_CONTEXTDIR=.
FROM quay.io/coreos-assembler/coreos-assembler:latest as builder
USER root
ENV COSA_SKIP_OVERLAY=1
RUN --mount=type=bind,rw=true,src=.,dst=/src,bind-propagation=shared \
cosa init --force /src --variant c9s && cosa fetch && cosa build ostree && \
rm -rf cache/* tmp/* && mv -v builds/latest/$(arch)/*.ociarchive /src/out.ociarchive
FROM oci-archive:${BUILD_CONTEXTDIR}/out.ociarchive
RUN --mount=type=bind,rw=true,src=.,dst=/src,bind-propagation=shared \
rm /src/out.ociarchive And this in turn enables us to have better CI on openshift/os, multi-PR testing, integration in clusterbot, etc... At the higher image mode level, the most obvious impact is on bootc-image-builder. This would allow building image mode disk images unprivileged via OpenShift by just bringing up a podman machine and doing the work there. Otherwise, for base composes, it's obviously very desirable for users to be able to build their base container images with the default OpenShift restricted SCC. The build process there currently assumes privileges but we could have tooling to make this ergonomic (in the short-term, I'm pretty sure Regardless of how things are set up in the future, I think in the short to mid-term, this patch would be extremely valuable. |
Sorry, going to add more here. (You can tell I'm excited about this. :)) A lot of this is context for folks I'm discussing this with and not necessarily for the maintainers of this repo directly. How does this intersect with openshift/enhancements#1637? Well, with that enhancement, the node image becomes a layered build like any other. This means that testing changes to that node layer shouldn't require any privileges or KVM. And I'd like also for RHCOS itself to eventually become a layered build on top of rhel-bootc. But there's a lot of work needed before that can happen. In the short/midterm, RHCOS will instead be inheriting definitions (and maybe even lockfiles) from rhel-bootc. And so it would still be a base compose that requires privileges. And suffice to say, it's extremely valuable to be able to easily test changes to that base compose as part of Prow. |
I am not sure I'd want to scope in podman-machine for that type of production use; there's already e.g. Kata containers and peer pods that are designed to be able to run generic workloads and shipped today. |
This makes sense to me 👍 |
Yeah, the appeal in this case is the ability to model the workload as an OpenShift Build and using all the well-known build APIs surrounding that and tooling built on top of it (like ci-operator). |
@jlebon: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Hmm, CI failures don't seem to be related to the changes here. At least, I see the same failures in other PRs. |
CI failures are due to existing issues: https://issues.redhat.com/browse/OCPBUGS-44257
Can you clarify which "extra" permissions the build needs in this use case? Build containers already run privileged with uid 0 by default... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Code changes otherwise look good. Unfortunately we don't have a good means to verify this works end to end without some form of end to end test - and even then, would it require KubeVirt to be installed on the cluster?
Yes, it requires KubeVirt installed. The way I've tested this is by launching a cluster with clusterbot with the |
/approve cancel Going back on my decision to "green light" this feature.
If using layered operators is already a precondition for this feature to work, then I don't recommend using OpenShift BuildConfigs. We have declared this component "feature complete," and I am reluctant to add anything that doesn't have some form of automated testing behind it. Otherwise it is very easy for future maintainers to introduce regressions. Our future feature sets for building container images are going to land upstream either in the Tekton or Shipwright projects. Better options for this feature/use case:
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jlebon The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
On Mon, Nov 18, 2024, at 2:13 PM, Adam Kaplan wrote:
/
If using layered operators is already a precondition for this feature
I made an independent version in https://github.com/cgwalters/kvm-device-plugin
|
Yeah, understandable. But the thing is that OpenShift CI is still very heavily reliant on the Build API currently and that's the primary motivation for this PR. I can look at adding a test for this if it helps. |
Still very interested in this. Please let me know if writing a test for it would unblock it. |
Currently, if
devices.kubevirt.io/kvm
resources are requested in thebuild object, the resource request makes it to the build pod, but it
doesn't really have any visible effect because the nested build process
itself doesn't have access to it.
The only reason we'd have
/dev/kvm
in our pod is if the user wants touse it in their build. So just pass it through if we find it.
The use case for this is being able to build artifacts which would
normally require privileges.
One example includes base bootable container (bootc) images.
Building these currently requires privileges because it itself uses
containerization features.
In the future, this should work with user namespacing, currently in Tech
Preview. However, because we need not just uid 0 but
CAP_SYS_ADMIN
,and capabilities would still be restricted by default, we would still
require access to non-default SCCs. (And of course, the builder would
also have to be adapted to pass through the capabilities.)
Another example is building disk images and shipping them in container
images. This is done for example by Kubevirt and podman-machine. Two
common ways to build disk images currently are via loopback devices or
virtualization. The former can't be used because loopback devices are
not namespaced and require privileges. This patch enables the latter.
Using virtualization enables us to build these artifacts all while using
the default OpenShift restricted SCC.