Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider unifying mountEntry.srcFD and mountEntry.idmapFD approach #3935

Open
thaJeztah opened this issue Jul 11, 2023 · 2 comments
Open

consider unifying mountEntry.srcFD and mountEntry.idmapFD approach #3935

thaJeztah opened this issue Jul 11, 2023 · 2 comments

Comments

@thaJeztah
Copy link
Member

I'm somewhat curious why we use a string for srcFD and an int for idmapFD. Wondering if we should unify the approach (probably can be done in a follow-up)

Originally posted by @thaJeztah in #3717 (comment)

One uses a string with the path (/proc/self/fd/<FD>), whereas the other uses the bare FD (int).

We could consider using an int for both, and update mountViaFDs() (which is not exported) to internalize constructing the path if the value is not -1;

func mountViaFDs(source, srcFD, target, dstFD, fstype string, flags uintptr, data string) error {
src := source
if srcFD != "" {
src = srcFD
}

@cyphar
Copy link
Member

cyphar commented Jul 18, 2023

I suspect that the best solution would actually be to entirely remove the idmapping logic from C and instead implement it in Go (if we can use move_mount(2) entirely without needing to worry about backward compatibility, we can move the mountfd logic there too since clone_tree allocates an anonymous mount namespace that doesn't have the same source mount limitations as mount --bind when mounted using move_mount).

While it's not perfect, given that the mountfd stuff is really quite niche, I would be in favour of requiring the new mount API for it to be used (this would limit the feature to Linux 5.2 and newer -- @alban is this a deal-breaker?) -- but it would mean we could completely eliminate the mountfd code from the C code in runc and it would completely unify the implementations (you would open all the bind-mount sources on the host with clone_tree -- possibly adding the idmap mount attrs -- and pass them to the Go code in rootfs_linux.go. this would also critically let us avoid leaking file descriptors from the host namespace into containers entirely). It could also work for non-bind-mount mapped mounts (something that was also missed in the initial implementation) though I suspect this will have issues with sb->s_user_ns.

Unfortunately we cannot use CLONE_NEWUSER in a multi-threaded program so we would need to do some dodgy stuff to get that bit working, but I suspect that even with that complication, it would make things a lot simpler. We can probably spawn the processes using syscall.SpawnProcess since we just need the mappings, though there's the question of what process we want to spawn (and would it be more efficient to do a re-exec of ourselves and then do all the mappings in C? annoyingly you cannot temporarily create a user namespace and join the original one due to how permissions work... each userns has to be done in a separate threadgroup).

@rata
Copy link
Member

rata commented Jul 21, 2023

@cyphar for idmap mounts forcing a minimum of 5.2 is not an issue, as we need newer kernels for idmap mounts anyways.

For opening the mount sources, I think it might be fine too. I'm changing the k8s userns implementation to idmap mounts (we hit several limitations and concerns from other SIGs when we tried to avoid it), and opening the mount sources is also a patch we added for userns support in k8s, because /var/lib/kubelet/pods has permissions for root:root and nothing for others.

I think this bug with what @thaJeztah reported is closed now that #3939 is merged (that part is unified). But do you want to open another issue with this idea of avoiding the C parts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants