Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devpts silently not mounted in unprivileged (user namespace) mode #1186

Closed
olifre opened this issue Dec 1, 2017 · 16 comments
Closed

devpts silently not mounted in unprivileged (user namespace) mode #1186

olifre opened this issue Dec 1, 2017 · 16 comments

Comments

@olifre
Copy link
Contributor

olifre commented Dec 1, 2017

Version of Singularity:

2.4

Expected behavior

devpts is mounted also in unprivileged, user namespace mode.

Actual behavior

Silently fails.

Steps to reproduce behavior

Start any container, have

allow setuid = no
mount devpts = yes

set.

This issue exists since #944 .
The more correct fix would have been to actually analyze why it's failing. Sadly, that has not been done (lack of QA), but a silent deactivation of the feature was implemented.

A simple analysis reveals the reason is the gid=5 parameter used in the devpts mount (while the 5 is resolved from the tty gid):
https://github.com/singularityware/singularity/blob/f2d7795d04bad5af371c3a4a0a6f372ba07ff810/src/lib/runtime/mounts/dev/dev.c#L114
Naturally, this is usually wanted, but can't work in user namespace containers, unless the tty-gid is mapped. Otherwise, EINVALID will be received.
This has already been discussed a few years ago for more full-fledged container solutions:
opencontainers/runc#225

So it would be better to:

  • drop the gid=5 from the mount parameters if user namespaces are used. This will not help for programms actually checking ownership of the devices (but one may argue that's a somewhat backwards idea anyways, since programs should rather check effective permissions), but at least makes devpts mounts work again.
  • potentially add support for the workaround described in add FAQ about what to do if programs want to chown stuff hpc/charliecloud#67 (i.e. map tty group by mapping the user's group to it). This should help for most programs doing such ownership checks, but of course may look awkward to the users.

Pinging @hollowec , who initially (thankfully) implemented devpts mounting, @gmkurtzer who merged the workaround silencing the actual error, and @DrDaveD who is interested in the topic (and implemented the workaround - I don't blame him for that, but I'd rather have expected a proper problem analysis by the core team before merging a silencing of an error).

I'd also like to point out it's kind of bad that the failure discussed in #944 was actually not discovered by the test suite.

@hollowec
Copy link
Contributor

hollowec commented Dec 1, 2017

Hi @olifre,

My apologies for the delay in getting back to you on #857... I've been somewhat busy lately. As discussed in that issue, I've been testing a branch which disables passing "gid=" to the devpts mount when setuid is disabled. However, unfortunately, in further testing in recent weeks I found this is not sufficient in RHEL 7.4 and EINVAL is still returned during the devpts mount. In my initial testing of this branch under Fedora 26, with a newer kernel, I did not have this issue. It appears the older RHEL7 kernel devpts implementation requires that you be UID0/root in the namespace to mount (even without gid passed), which isn't the case during singularity mounts when user namespaces are being used. With newer kernels, this isn't an issue, but I can't think of an simple solution for #857 for RHEL 7.4. I'm currently testing a new branch which doesn't pass "gid=" to the mount when setuid is disabled, but will also not error out on RHEL7.4.

@olifre
Copy link
Contributor Author

olifre commented Dec 1, 2017

Hi @hollowec , thanks for your reply!

I'm currently testing a new branch which doesn't pass "gid=" to the mount when setuid is disabled, but will also not error out on RHEL7.4.

That sounds like the best way to go in this case. It's still a bit strange, since looking through:
https://github.com/torvalds/linux/commits/master/fs/devpts/inode.c
it seems the feature was added in 2012, and no significant modification seems to have happened after.
So it might be worthwhile to find out where exactly the EINVAL is generated to get to the root of this issue.

@olifre
Copy link
Contributor Author

olifre commented Dec 1, 2017

My guess is that it's either:
torvalds/linux@eedf265#diff-6d314687c035fb9da87d65b2b9ffb698L463
or
torvalds/linux@e98d413#diff-6d314687c035fb9da87d65b2b9ffb698L281

My interpretation of the second EINVAL is that this was fired if the uid / gid of the user was not mapped to uid = 0 and gid = 0 inside the container.
That also matches the commit message here:
torvalds/linux@ec2aa8e

So in short, for CentOS 7.4 things might work only if the user's uid / gid is mapped to 0 / 0 inside the container.

@hollowec
Copy link
Contributor

hollowec commented Dec 1, 2017

Yes, I think you're correct... it's likely the second kernel commit you mentioned that resolves the issue. I'll continue testing, and will try to submit a PR sometime next week.

@hollowec
Copy link
Contributor

hollowec commented Dec 5, 2017

I've submitted PR #1196 as a potential fix for this.

@olifre
Copy link
Contributor Author

olifre commented Dec 6, 2017

@hollowec This looks good (did not have time to test yet)!

Do you think it's a good idea to add support to map of the uid / gid of the user starting singularity to another uid / gid in the user namespace, for example to 0 / 0, as charliecloud supports?
This would (likely) allow to use devpts also on CentOS 7.4 inside a user namespace, if I read the kernel code correctly.

@hollowec
Copy link
Contributor

hollowec commented Dec 6, 2017

Hi @olifre,

Actually, I noticed the root uid/gid mapping in the container user/mount namespace you're requesting was recently merged into development. See PR #1032: pass '-f' (or '--fakeroot') to shell or exec to run in this mode.

@olifre
Copy link
Contributor Author

olifre commented Dec 6, 2017

Hi @hollowec ,

I just tested with a build from current development with your three commits applied on top. Here's what I get on a fresh CentOS 7:

[olifre@centos-7-userns ~]$ singularity shell -f -C foo
Singularity: Invoking an interactive shell within container...

Singularity foo:~> exit
[olifre@centos-7-userns ~]$ singularity shell -C foo
ERROR  : The feature you are requesting requires privilege you do not have
ABORT  : Retval = 255

So at least, it looks like -f does the trick with CentOS 7.4 and makes the devpts mount work!
However, something else is broken (probably not related to your change?).
Debug log shows:

DEBUG   [U=38503,P=23666]  singularity_config_get_value_impl()       No configuration entry found for 'allow ipc ns'; returning default value 'yes'
DEBUG   [U=38503,P=23666]  singularity_config_get_bool_char_impl()   Return singularity_config_get_bool(allow ipc ns, yes) = 1
DEBUG   [U=38503,P=23666]  singularity_registry_get()                Returning value from registry: 'UNSHARE_IPC' = '1'
DEBUG   [U=38503,P=23666]  singularity_runtime_ns_ipc()              Using IPC namespace: CLONE_NEWIPC
DEBUG   [U=38503,P=23666]  singularity_priv_escalate()               Temporarily escalating privileges (U=38503)
ERROR   [U=38503,P=23666]  singularity_priv_escalate()               The feature you are requesting requires privilege you do not have
ABORT   [U=38503,P=23666]  singularity_priv_escalate()               Retval = 255

@hollowec
Copy link
Contributor

hollowec commented Dec 6, 2017

Hi @olifre,

Yes, I noticed that for some reason with the current development branch I have to pass '-u' (or '-f') in order to use user namespaces (even with "allow setuid = no" configured). I'm not sure if that's intentional, or a bug, but it's unrelated to my PR.

@olifre
Copy link
Contributor Author

olifre commented Dec 6, 2017

Hi @hollowec ,
thanks for the heads up, yes, using -u (which is probably implied by -f too) things work. This is what I get on a CentOS 7.4 system:

[olifre@centos-7-userns ~]$ singularity shell -u -C foo
WARNING: Couldn't mount /var/singularity/mnt/session/dev/pts, continuing
Singularity: Invoking an interactive shell within container...

Singularity foo:~> exit
[olifre@centos-7-userns ~]$ singularity shell -u -f -C foo
Singularity: Invoking an interactive shell within container...

Singularity foo:~> exit

So we can see that -f indeed manages to work around the issue of mounting devpts inside a user namespace on CentOS 7.4, as I assumed from the kernel commit I linked previously.
Next, I'll look into opening a bug on RedHat's bugtracker - if we're lucky, maybe they'll backport this commit (or parts of it) into 7.5. Once I've done that, I'll add the bugzilla URL for reference here.

In any case, your PR works very well for me and uses the correct approach in all those combinations ;-).

@hollowec
Copy link
Contributor

hollowec commented Dec 6, 2017

OK, sounds good. If you plan on using OpenSSH sshd with this PR with setuid disabled (as discussed in #857), as you mentioned, the sshd code will also have to be modified to not care about pts device group ownership (like xterm).

@olifre
Copy link
Contributor Author

olifre commented Dec 6, 2017

[...] the sshd code will also have to be modified to not care about pts device group ownership (like xterm).

Yes, I know - I'm sadly not sure the OpenSSH people will be very open to such a suggestion.

I have opened the RedHat bug report asking for backporting the kernel fix here:
https://bugzilla.redhat.com/show_bug.cgi?id=1522992

@gmkurtzer
Copy link
Contributor

I am looping in @cclerget as he has been doing a massive overhaul on the development branch including implementing Linux capabilities to limit security escalation potential.

@cclerget
Copy link
Collaborator

cclerget commented Dec 7, 2017

@hollowec @olifre Sorry for the delay, for the need to specify -u or -f with allow setuid = no, I forgot to set an environment variable to force user namespace, I will submit a PR for this one soon

@olifre
Copy link
Contributor Author

olifre commented Dec 16, 2017

the sshd code will also have to be modified to not care about pts device group ownership (like xterm).

I've opened a bug report on OpenSSH bugzilla about this:
https://bugzilla.mindrot.org/show_bug.cgi?id=2813
Note that an alternative would be to offer to map the user's gid to the gid of the tty group (usually "5"). I tested this with Charliecloud using the instructions from here:
hpc/charliecloud#67
Same should work for runC and Docker with user namespace support.
Only for Singularity, it's sadly not yet possible to specify the uid / gid the user's uid / gid should be mapped to.

@dtrudg
Copy link
Contributor

dtrudg commented Mar 24, 2020

The use-case of running an ssh server without modification should now be supported via --fakeroot in which it is possible for the container processes to assume uid/gid necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants