Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builder/daemonless: pass through /dev/kvm if present #410

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Nov 12, 2024

Currently, if devices.kubevirt.io/kvm resources are requested in the
build object, the resource request makes it to the build pod, but it
doesn't really have any visible effect because the nested build process
itself doesn't have access to it.

The only reason we'd have /dev/kvm in our pod is if the user wants to
use it in their build. So just pass it through if we find it.

The use case for this is being able to build artifacts which would
normally require privileges.

One example includes base bootable container (bootc) images.
Building these currently requires privileges because it itself uses
containerization features.

In the future, this should work with user namespacing, currently in Tech
Preview. However, because we need not just uid 0 but CAP_SYS_ADMIN,
and capabilities would still be restricted by default, we would still
require access to non-default SCCs. (And of course, the builder would
also have to be adapted to pass through the capabilities.)

Another example is building disk images and shipping them in container
images. This is done for example by Kubevirt and podman-machine. Two
common ways to build disk images currently are via loopback devices or
virtualization. The former can't be used because loopback devices are
not namespaced and require privileges. This patch enables the latter.

Using virtualization enables us to build these artifacts all while using
the default OpenShift restricted SCC.

Fedora 34 is ancient and EOL. Bump to Fedora 40 so that it's closer to
the buildroot devs are using locally to build openshift-builder.

As part of the bump, we now need to also pull in netavark.
Not sure on the history there, but there are no daemonsets nowadays in
the openshift-controller-manager namespace. There's a deployment object
though and I see it cycle the pods when patching the builder pod, so I
assume that's the correct object to watch now.
Currently, if `devices.kubevirt.io/kvm` resources are requested in the
build object, the resource request makes it to the build pod, but it
doesn't really have any visible effect because the nested build process
itself doesn't have access to it.

The only reason we'd have `/dev/kvm` in our pod is if the user wants to
use it in their build. So just pass it through if we find it.

The use case for this is being able to build artifacts which would
normally require privileges.

One example includes base bootable container (bootc) images.
Building these currently requires privileges because it itself uses
containerization features.

In the future, this should work with user namespacing, currently in Tech
Preview. However, because we need not just uid 0 but `CAP_SYS_ADMIN`,
and capabilities would still be restricted by default, we would still
require access to non-default SCCs. (And of course, the builder would
also have to be adapted to pass through the capabilities.)

Another example is building disk images and shipping them in container
images. This is done for example by Kubevirt and podman-machine. Two
common ways to build disk images currently are via loopback devices or
virtualization. The former can't be used because loopback devices are
not namespaced and require privileges. This patch enables the latter.

Using virtualization enables us to build these artifacts all while using
the _default_ OpenShift restricted SCC.
@jlebon
Copy link
Member Author

jlebon commented Nov 12, 2024

Just to emphasize this: doing this would have a pretty drastic impact on image mode convergence. For example, it would immediately enable us to build RHCOS in a container-native way via the OpenShift Build API (because we currently use coreos-assembler, which is already made to use virtualization for composes). E.g. this was tested to work with this patch:

ARG BUILD_CONTEXTDIR=.
FROM quay.io/coreos-assembler/coreos-assembler:latest as builder
USER root
ENV COSA_SKIP_OVERLAY=1
RUN --mount=type=bind,rw=true,src=.,dst=/src,bind-propagation=shared \
      cosa init --force /src --variant c9s && cosa fetch && cosa build ostree && \
      rm -rf cache/* tmp/* && mv -v builds/latest/$(arch)/*.ociarchive /src/out.ociarchive

FROM oci-archive:${BUILD_CONTEXTDIR}/out.ociarchive
RUN --mount=type=bind,rw=true,src=.,dst=/src,bind-propagation=shared \
      rm /src/out.ociarchive

And this in turn enables us to have better CI on openshift/os, multi-PR testing, integration in clusterbot, etc...

At the higher image mode level, the most obvious impact is on bootc-image-builder. This would allow building image mode disk images unprivileged via OpenShift by just bringing up a podman machine and doing the work there. Otherwise, for base composes, it's obviously very desirable for users to be able to build their base container images with the default OpenShift restricted SCC. The build process there currently assumes privileges but we could have tooling to make this ergonomic (in the short-term, I'm pretty sure cosa supermin-run podman build ... in a Containerfile woud work). Or alternatively, keep pursuing the move away from rpm-ostree in a way that drops any required privileges (see also https://gitlab.com/fedora/bootc/tracker/-/issues/32).

Regardless of how things are set up in the future, I think in the short to mid-term, this patch would be extremely valuable.

@jlebon
Copy link
Member Author

jlebon commented Nov 12, 2024

Sorry, going to add more here. (You can tell I'm excited about this. :)) A lot of this is context for folks I'm discussing this with and not necessarily for the maintainers of this repo directly.

How does this intersect with openshift/enhancements#1637? Well, with that enhancement, the node image becomes a layered build like any other. This means that testing changes to that node layer shouldn't require any privileges or KVM.

And I'd like also for RHCOS itself to eventually become a layered build on top of rhel-bootc. But there's a lot of work needed before that can happen. In the short/midterm, RHCOS will instead be inheriting definitions (and maybe even lockfiles) from rhel-bootc. And so it would still be a base compose that requires privileges. And suffice to say, it's extremely valuable to be able to easily test changes to that base compose as part of Prow.

@cgwalters
Copy link
Member

At the higher image mode level, the most obvious impact is on bootc-image-builder. This would allow building image mode disk images unprivileged via OpenShift by just bringing up a podman machine and doing the work there.

I am not sure I'd want to scope in podman-machine for that type of production use; there's already e.g. Kata containers and peer pods that are designed to be able to run generic workloads and shipped today.

@cgwalters
Copy link
Member

The only reason we'd have /dev/kvm in our pod is if the user wants to
use it in their build. So just pass it through if we find it.

This makes sense to me 👍

@jlebon
Copy link
Member Author

jlebon commented Nov 12, 2024

At the higher image mode level, the most obvious impact is on bootc-image-builder. This would allow building image mode disk images unprivileged via OpenShift by just bringing up a podman machine and doing the work there.

I am not sure I'd want to scope in podman-machine for that type of production use; there's already e.g. Kata containers and peer pods that are designed to be able to run generic workloads and shipped today.

Yeah, the appeal in this case is the ability to model the workload as an OpenShift Build and using all the well-known build APIs surrounding that and tooling built on top of it (like ci-operator).

Copy link
Contributor

openshift-ci bot commented Nov 12, 2024

@jlebon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 5ef8a3d link true /test unit
ci/prow/e2e-aws-ovn-builds-techpreview 5ef8a3d link false /test e2e-aws-ovn-builds-techpreview
ci/prow/e2e-aws-ovn-builds 5ef8a3d link true /test e2e-aws-ovn-builds
ci/prow/security 5ef8a3d link false /test security

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jlebon
Copy link
Member Author

jlebon commented Nov 12, 2024

Hmm, CI failures don't seem to be related to the changes here. At least, I see the same failures in other PRs.

@adambkaplan
Copy link
Contributor

CI failures are due to existing issues:

https://issues.redhat.com/browse/OCPBUGS-44257
https://issues.redhat.com/browse/OCPBUGS-44243

The only reason we'd have /dev/kvm in our pod is if the user wants to
use it in their build. So just pass it through if we find it.

The use case for this is being able to build artifacts which would
normally require privileges.

Can you clarify which "extra" permissions the build needs in this use case? Build containers already run privileged with uid 0 by default...

Copy link
Contributor

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Code changes otherwise look good. Unfortunately we don't have a good means to verify this works end to end without some form of end to end test - and even then, would it require KubeVirt to be installed on the cluster?

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2024
@jlebon
Copy link
Member Author

jlebon commented Nov 18, 2024

Unfortunately we don't have a good means to verify this works end to end without some form of end to end test - and even then, would it require KubeVirt to be installed on the cluster?

Yes, it requires KubeVirt installed. The way I've tested this is by launching a cluster with clusterbot with the virtualization-support option, and then installing OpenShift CNV. I don't have the logs anymore, but I can confirm that it works. :)

@adambkaplan
Copy link
Contributor

/approve cancel

Going back on my decision to "green light" this feature.

Yes, it requires KubeVirt installed. The way I've tested this is by launching a cluster with clusterbot with the virtualization-support option, and then installing OpenShift CNV. I don't have the logs anymore, but I can confirm that it works. :)

If using layered operators is already a precondition for this feature to work, then I don't recommend using OpenShift BuildConfigs. We have declared this component "feature complete," and I am reluctant to add anything that doesn't have some form of automated testing behind it. Otherwise it is very easy for future maintainers to introduce regressions. Our future feature sets for building container images are going to land upstream either in the Tekton or Shipwright projects.

Better options for this feature/use case:

  1. Ensure /dev/kvm can be added as a device if detected in the Tekton buildah Task (published on Tekton Hub). Red Hat's version is published here. I beliveve the buildah-bud.sh script has all the logic.
  2. Similar idea for Shipwright with our buildah ClusterBuildStrategy. Downstream build strategy definition is in our operator repo

Copy link
Contributor

openshift-ci bot commented Nov 18, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jlebon
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2024
@cgwalters
Copy link
Member

cgwalters commented Nov 18, 2024 via email

@jlebon
Copy link
Member Author

jlebon commented Nov 18, 2024

Our future feature sets for building container images are going to land upstream either in the Tekton or Shipwright projects.

Yeah, understandable. But the thing is that OpenShift CI is still very heavily reliant on the Build API currently and that's the primary motivation for this PR.

I can look at adding a test for this if it helps.

@jlebon
Copy link
Member Author

jlebon commented Nov 22, 2024

Still very interested in this. Please let me know if writing a test for it would unblock it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants