Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --mountns or equivalent #1838

Open
cgwalters opened this issue Jan 3, 2023 · 3 comments
Open

Add --mountns or equivalent #1838

cgwalters opened this issue Jan 3, 2023 · 3 comments
Labels
kind/feature A request for, or a PR adding, new functionality stale-issue

Comments

@cgwalters
Copy link
Contributor

cgwalters commented Jan 3, 2023

For a use case I have, I'd like to execute a skopeo binary from inside a container image, but have it fetch data from containers-storage: which requires entering the host mount namespace.

Several low-level commands have gradually gained support for calling setns(); e.g. there's now mount -N: -N, --namespace <ns> perform mount in another namespace.

The key benefit of this is that it's performed after dynamic linking is done, so assuming no further external binaries are run it avoids a host dependency.

But a general well known problem with setns() and Go is that the runtime will happily spawn threads in the background to service goroutines which may not propagate the namespace.

I came across https://cs.github.com/containers/podman/blob/864288b8dabbe3eb89854b737cc7fbd93077aa1e/libpod/container_copy_linux.go?q=org%3Acontainers+setns+lang%3Ago#L17 which seems related.

Thinking about this, I wonder if we could add support to containers/storage for having it take an explicit mount namespace and perform operations there?

(I tried the below code, which worked for skopeo inspect --mount-namespace 1 containers-storage:docker.io/library/busybox but skopeo copy --mount-namespace 1 containers-storage:docker.io/library/busybox oci:/tmp/busybox fails trying to access some files, I think because the accesses are running on threads spawned before we were able to unshare...so to do this right we'd need to have the main entrypoint be either C or Rust)

diff --git a/cmd/skopeo/main.go b/cmd/skopeo/main.go
index 3f8a9621..af8132c8 100644
--- a/cmd/skopeo/main.go
+++ b/cmd/skopeo/main.go
@@ -3,6 +3,7 @@ package main
 import (
 	"context"
 	"fmt"
+	"os"
 	"strings"
 	"time"
 
@@ -13,6 +14,7 @@ import (
 	"github.com/containers/storage/pkg/reexec"
 	"github.com/sirupsen/logrus"
 	"github.com/spf13/cobra"
+	"golang.org/x/sys/unix"
 )
 
 // gitCommit will be the hash that the binary was built from
@@ -27,6 +29,7 @@ type globalOptions struct {
 	policyPath         string                  // Path to a signature verification policy file
 	insecurePolicy     bool                    // Use an "allow everything" signature verification policy
 	registriesDirPath  string                  // Path to a "registries.d" registry configuration directory
+	mountns            string                  // PID or path to specified mount namespace
 	overrideArch       string                  // Architecture to use for choosing images, instead of the runtime one
 	overrideOS         string                  // OS to use for choosing images, instead of the runtime one
 	overrideVariant    string                  // Architecture variant to use for choosing images, instead of the runtime one
@@ -83,6 +86,7 @@ func createApp() (*cobra.Command, *globalOptions) {
 	rootCommand.PersistentFlags().BoolVar(&opts.debug, "debug", false, "enable debug output")
 	rootCommand.PersistentFlags().StringVar(&opts.policyPath, "policy", "", "Path to a trust policy file")
 	rootCommand.PersistentFlags().BoolVar(&opts.insecurePolicy, "insecure-policy", false, "run the tool without any policy check")
+	rootCommand.PersistentFlags().StringVar(&opts.mountns, "mount-namespace", "", "Enter target mount namespace, specified via PID or magic link /proc/<pid>/ns/mnt")
 	rootCommand.PersistentFlags().StringVar(&opts.registriesDirPath, "registries.d", "", "use registry configuration files in `DIR` (e.g. for container signature storage)")
 	rootCommand.PersistentFlags().StringVar(&opts.overrideArch, "override-arch", "", "use `ARCH` instead of the architecture of the machine for choosing images")
 	rootCommand.PersistentFlags().StringVar(&opts.overrideOS, "override-os", "", "use `OS` instead of the running OS for choosing images")
@@ -121,6 +125,25 @@ func (opts *globalOptions) before(cmd *cobra.Command) error {
 	if opts.tlsVerify.Present() {
 		logrus.Warn("'--tls-verify' is deprecated, please set this on the specific subcommand")
 	}
+	if opts.mountns != "" {
+		if !strings.HasPrefix(opts.mountns, "/") {
+			opts.mountns = fmt.Sprintf("/proc/%s/ns/mnt", opts.mountns)
+		}
+
+		// AIUI, we need to unshare() because the process is already threaded
+		if err := unix.Unshare(unix.CLONE_NEWNS); err != nil {
+			return fmt.Errorf("failed to unshare mount namespace: %w", err)
+		}
+		fd, err := os.Open(opts.mountns)
+		if err != nil {
+			return err
+		}
+		defer fd.Close()
+
+		if err := unix.Setns(int(fd.Fd()), unix.CLONE_NEWNS); err != nil {
+			return fmt.Errorf("failed to enter mount namespace: %w", err)
+		}
+	}
 	return nil
 }
 
@mtrmac
Copy link
Collaborator

mtrmac commented Jan 3, 2023

Some very minimal thoughts so far:

  • Switching a mount namespace is per-process, so it will affect not just the c-storage store, but also config files and the like. That might not matter too much in some cases, but if we are worried about host-side dynamic libraries, I would expect host-side config files to also be an issue.
  • Doing this safely vs. Goroutines is indeed non-trivial. The solution for that has been github.com/containers/storage/pkg/reexec, i.e. to delegate the operation in question into a subprocess, and in that subprocess, do the Goroutine-incompatible operation at the very start of main(). I think might not be strictly sufficient due to package initializers, but it seems to work well enough.
  • Notably, even some very basic c/storage operations (creating a layer from a tar file, and vice versa), use the pkg/reexec mechanism. So, exec("/proc/self/exe") must be possible for c/storage to work, and I think that makes the original intent of not triggering a dynamic linking step in the host namespace impossible to achieve. (Or I might well be misunderstanding how chroot and mount namespaces interact in this case.)
    • There are some situations in which reexec might not be necessary right now. Using the overlay driver, and exporting only tarballs of child vs. immediate-parent layers, and using short enough path names with few enough image layers, might work. So it might work in practice … but it seems hard to promise that no more uses of pkg/reexec will be triggered by code, configuration or environment changes.

Either way, I’m way out of my area of expertise. @giuseppe knows much more than me.

@giuseppe
Copy link
Member

giuseppe commented Jan 4, 2023

we do a few lazy loading of files in containers/storage. That would be affected if we join a different mount namespace.

Do you need to access the storage on the host only in read-only mode?

In this case, I wonder if an easier approach like getting a reference to the storage from the target mount namespace and treating it like an additional image store would work.

@mtrmac mtrmac added the kind/feature A request for, or a PR adding, new functionality label Jan 7, 2023
@github-actions
Copy link

github-actions bot commented Feb 7, 2023

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A request for, or a PR adding, new functionality stale-issue
Projects
None yet
Development

No branches or pull requests

3 participants