-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cargo 1.56 beta hang when run inside Gentoo's sandbox #89522
Comments
Seems ... a little surprising? I wouldn't expect that PR to change cargo behavior. |
The most likely reason I can think of would be if the Gentoo sandbox tool doesn't understand the clone3 syscall. |
I can reproduce this in a gentoo container. On a RHEL 8 host (kernel 4.18, without
|
I don't know anything about /* Call the clone syscall with fork semantic. The CTID address is used
to store the child thread ID at its locationm, to erase it in child memory
when the child exits, and do a wakeup on the futex at that address.
The architecture with non-default kernel abi semantic should correctlly
override it with one of the supported calling convention (check generic
kernel-features.h for the clone abi variants). */
static inline pid_t
arch_fork (void *ctid)
{
const int flags = CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD; ... where In I suppose it's generally not safe to call I think it's also possible in general that some |
Perhaps we should initialize |
I agree that it seems like a good idea to either revert the patch wholesale on 1.56 (current beta) or apply some minimal fix that avoids this problem while we discuss -- @cuviper would you be up for posting a direct-to-beta branch patch making that change? Once that lands we can bump this to track 1.57 rather than the 1.56 milestone to make sure we still keep it tracked, but we probably want to apply the partial "revert" on master as well to avoid a continuous cherry pick dance. |
I just read through the source of the Gentoo sandbox tool, and I don't think this is an issue with bypassing libc at all; this seems like a bug in the libsandbox library. And as far as I can tell, it isn't specific to clone3; this would happen with any program calling clone instead of fork to spawn a new process, as well. I think it might also happen when calling libsandbox hooks fork, and uses that hook as what it describes as "a poor man's pthread_atfork()"; it acquires its lock, calls the function, then releases its lock, to ensure that the fork doesn't happen while another thread holds the lock. However, libsandbox does not hook either clone or clone3. I would guess that libsandbox assumes that clone is only used for threads. I think the correct fix here would be for libsandbox to hook both clone and clone3, and perform the same logic whenever the flags do not include I would propose that we temporarily disable |
There's a similar issue at https://bugs.gentoo.org/807832 involving java, and it appears to also involve |
In rust-lang#89522 we learned that `clone3` is interacting poorly with Gentoo's `sandbox` tool. We only need that for the unstable pidfd extensions, so otherwise avoid that and use a normal `fork`.
See #89924 for restricting
I suppose they could hook But I'm still concerned about bypassing libc in general. Suppose instead of "a poor man's pthread_atfork()", they could have used the real thing, or any other library might be trying to do the right thing here. Even libc internals will have state it's trying to maintain, like the TID and robust mutex list, so I fear we can't really call libc at all after this. |
That's true, clone3 would be harder to interpose. As far as I can tell, sandbox has support for tracing binaries from the outside by using ptrace, which seems substantially safer in any case. (I can think of other approaches to that kind of sandboxing, as well, which would be much more robust; a mount namespace and a filesystem, for instance, or an overlayfs and a "commit" operation.) |
The clone3 syscall can do many different semantically distinct operations depending on the flags passed to the syscall. I'm here as a libc author at @cuviper request. If a language runtime or library interposes fork it will have a very difficult time complying with all of the standards requirements and libc internals required to bring the forked process back to a consistent state once the fork is complete. Simply calling clone3 as an emulation of fork is possible, but once you do this you may never call a libc function again. My understanding here is that Rust started using clone3-as-fork, and the sandbox library interposes execvp. In this situation the bug is in Rust for calling the C library execvp after clone3-as-fork. Once Rust calls clone3-as-fork it has diverged from the underlying C runtime and may not call another C library function. In general once a process calls clone to emulate pthread_create, fork, or vfork and create a process or thread and bypasses libc in doing so, it may never call a libc function again. There are instances of this in the wild, including the resource-recapturing pseudo-thread that valgrind creates while it waits for all the system threads to exit. You must be extremely careful what you do in the new pseudo-thread/process, and the official answer is that you may never call back into libc once you've done that. The unofficial answer is that anything that doesn't touch the thread-register, or attempt to use resources from the parent, is likely to work, but it's not safe. In this case it looks like libsandbox's wrapper calls pthread_mutext_lock which is a libc function and that will attempt to access the thread register and possible look at global state, and this is in an inconsistent state because the libc atfork handlers have not run to return the runtime to a consistent state (worse if it's a multithreaded fork). To answer @joshtriplett question, calling 'posix_spawn' should not cause any problems because we call internal hidden versions of exec* family functions that cannot be interposed at the ELF level (we do this to provide a consistent implementation of posix_spawn and hide the internal detail that it calls exec* syscalls at some point to create the process). Does that explanation help? |
Thanks @codonell! Now looking forward, I think we're going to need a different way to get the pidfd before we can stabilize that API. That's just too big of a footgun, especially combined with the |
Only use `clone3` when needed for pidfd In rust-lang#89522 we learned that `clone3` is interacting poorly with Gentoo's `sandbox` tool. We only need that for the unstable pidfd extensions, so otherwise avoid that and use a normal `fork`. r? `@Mark-Simulacrum`
i think this analysis is misunderstanding what libsandbox is doing. which is not unreasonable for looking at a ~20 year old code base with many maintainers. the interposed fork() simply forces the ordering:
it doesn't implement fork itself or call such syscalls directly. so glibc's internal consistency is maintained. |
@vapier I think the concern here is that if Rust uses Do you have any thoughts on how the Gentoo sandbox tool could handle sandboxed applications that want to call |
This is undefined behavior unless the unlock is skipped in the child. You cannot call unlock on a mutex the calling thread does not own.
Block it with seccomp (in particular, force it to fail with |
@richfelker Which Linux libc implementations will this actually cause problems with in practice? It works with glibc's "fast" pthread_mutex implementation, and will certainly fail with the "error-checking" and "recursive" types. Does musl have a mutex type that acts like glibc's "fast" type and doesn't care where the unlock happens? Would calling
That will not work for applications that are designed to run on recent kernels and expect clone3 to actually function as designed. Over time, some applications will simply require newer syscalls and fail without them, even if Rust has fallbacks for applications that don't need a pidfd. I'm asking after a way to handle such applications, not break such applications. (I'm wondering if the ideal solution would be for applications using clone3 to just be handled via the external ptrace mechanism.) |
"Which...will this actually cause problems" is the wrong question to ask about UB. It will (or at least should) trap with an appropriate sanitizer. It will "happen to work" with current performance-optimized implementations simply because not recording the owner at all is currently the fastest behavior, but if that ever changed, trapping would be the preferred behavior for musl (generally we prefer for UB to immediately crash, but don't go out of our way making things slower just to achieve that, especially for UB that's not inherently a memory-safety issue).
I guess we have different philosophies about application design. I believe they should have portable fallbacks, even at runtime, not to depend on non-baseline functionality, and should not hard-code "version" assumptions that might be invalid in a context that's not even Linux but another system providing a Linux-syscall-compatible interface -- or a sandbox that has legitimate sandboxing reason not to offer the new interface.
I'm not sure how this works, so it might be viable, but it would break internal ptrace use or other external ptrace use by a debugger. I would consider breaking that to be a much worse sin than breaking |
@richfelker I'm aware of the implications of UB. Rust's use of clone3 isn't itself directly invoking UB; libsandbox as an LD_PRELOADed library is invoking UB already in its handling of fork, and isn't handling clone3 at all, which makes the combination fail, so we're trying to find solutions that will work in practice. libsandbox, as far as I can tell, isn't designed to be a complete sandbox, just a best-effort sandbox for detecting issues in code run as part of gentoo build processes. LD_PRELOAD libraries are already going to rely rather extensively on the specifics and internals of particular software and libraries, in order to do their job; I don't think it's at all unreasonable to ask the practical question of what the best path would be for that LD_PRELOAD library to break less in this circumstance. As I understand it, libsandbox is not necessarily expected to be 100% portable to all possible environments. At the moment, we've gone from "any application spawning another process from Rust on Linux will use clone3" to "any application spawning another process from Rust on Linux and wanting a pidfd will use clone3", but that's still a legitimate thing for an application to expect to work. That isn't inherently broken; libsandbox's non-handling of clone3 is leading to a deadlock on libsandbox's internal lock.
I'm not saying all applications should be written that way. I'm saying some applications will be written that way, and those applications are not broken. (If you believe that no applications should ever be written to require current kernel features, then yes, we have fundamentally different perspectives on applications and portability.) It's perfectly acceptable to write an application that says "requires Linux >= 5.x with xyz syscalls available", and such an application can simply exit with an error if a syscall provided by that kernel doesn't work. (Along similar lines, some applications will at some point start requiring io_uring, and may not necessarily wish to provide fallbacks.) I'm very much hoping we can avoid straying into the specifics of libc design here, and just look for concrete solutions for making libsandbox and applications using clone/clone3 compatible with each other.
I'm not suggesting that an application should detect Linux 5.x and assume that clone3 will work. I'm suggesting that an application may legitimately document that it requires Linux 5.x, and then call clone3 and simply fail on ENOSYS.
That seems less likely to come up inside of a gentoo build process being run under libsandbox, outside of an unusual test suite.
Using clone3 with |
From my perspective, libsandbox is pretty much a red herring. It's just the way you happened to observe the problem.
It does. Not because of the stack or memory sharing issue, but because the contents of the TCB are no longer valid in the child process, and because you cannot call libc (or anything that might cause code in the dynamic linker to run) without a valid TCB. |
As explained above, you can' use |
As far as I can tell, clone and clone3 are both non-portable syscalls with no associated standard, which would tend to make their behavior more of a practical consideration than a theoretical one. Could you please point to what standard or similar that you're using to treat them as having undefined behavior in this context? It seems like you're referencing implementation-specific behavior of specific libc implementations (insofar as whether any given function expects to access TLS/TCB at any specific time), rather than referencing a standard or similar. That's perfectly fine, but then it seems reasonable to further reference the actual behavior of specific libc implementations regarding whether code works in practice. When it comes to something specified in POSIX or SUS or similar, I can understand carefully scrutinizing what the standard defines and doesn't define, and then hesitating before delving further into implementation-specific additional functionality and permissiveness. But in this case, as far as I can tell, I'm not aware of any specific standard making this behavior verboten, I know that real applications beyond just Rust already actually use this behavior in practice and are likely to continue to do so, and I know that this behavior does in fact work in libc implementations. In practice, it seems like what arose here is not an issue with either Rust's behavior or any given libc implementation, but with a particular LD_PRELOAD interoposer library whose behavior is itself relying on internals of libc implementations in a way that happens to be incompatible. Rather than explore more elaborate ways that this could theoretically fail, I think it'd be helpful to look at the practical consequences in C libraries. In practice, it appears to be a non-issue to call (for instance) execv after clone. For instance, the following code appears to compile and run correctly in both glibc and musl: #define _GNU_SOURCE
#include <err.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int child_func(void *ptr_param)
{
unsigned long param = (unsigned long)ptr_param;
fprintf(stderr, "child: param=%lu; calling execv\n", param);
char *const argv[] = { "echo", "hello from echo executed by the child process", NULL };
execv("/bin/echo", argv);
exit(1);
}
unsigned char stack[16384];
int main(void)
{
int pidfd = -1;
int pid = clone(child_func, stack + 16384 - 16, CLONE_PIDFD, (void *)42UL, &pidfd);
if (pid == -1)
err(1, "clone");
printf("parent: pid=%i pidfd=%i\n", pid, pidfd);
return 0;
} On both musl and glibc, this produces output like the following (other than varying PIDs):
In practice, what is wrong with the above program, other than the fact that a library using LD_PRELOAD to interpose For completeness, here's a similar program using #define _GNU_SOURCE
#include <err.h>
#include <sched.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <syscall.h>
#include <unistd.h>
struct clone_args {
uint64_t flags;
uint64_t pidfd;
uint64_t child_tid;
uint64_t parent_tid;
uint64_t exit_signal;
uint64_t stack;
uint64_t stack_size;
uint64_t tls;
uint64_t set_tid;
uint64_t set_tid_size;
uint64_t cgroup;
};
int main(void)
{
int pidfd = -1;
struct clone_args clone_args = {
.flags = CLONE_PIDFD,
.pidfd = (uint64_t)&pidfd,
};
long ret = syscall(SYS_clone3, &clone_args, sizeof(clone_args));
if (ret == -1)
err(1, "clone3");
else if (ret == 0) {
fprintf(stderr, "child: calling execv\n");
char *const argv[] = { "echo", "hello from echo executed by the child process", NULL };
execv("/bin/echo", argv);
exit(1);
}
printf("parent: pid=%ld pidfd=%i\n", ret, pidfd);
return 0;
} |
To whit, note that even our
@joshtriplett -- we argue against this kind of Rust UB analysis all the time! Even in testing this reported issue, I found that |
@cuviper I absolutely agree! I'm not proposing that we ignore a real but rare race. I'm proposing that we consider whether there exists a problem at all with a given function or combination of functions. |
The only reference to UB in this discussion is the unlock of the lock via The specific code in question comes from sandbox
The emulated
I am telling you it is verboten. The design and usage of kernel syscalls as the building blocks for a language runtime are fuzzy at best. You appear to be asking, as a Rust core language developer, where the contract between the C runtime and Rust runtime is defined in such a way that it appears clear you cannot do what you want to do. There is no such definition. We haven't written down anywhere, but as two libc developers are telling you, there are syscalls that if you call, you irrevocably leave the C runtime behind, and this is a concrete issue. POSIX, SUS, ISO C, ISO C++... they will all avoid saying anything about this issue, because they do not want to get involved in the discrete semantics that exist between a given kernel and the runtime implementation. For example Rust uses The definition of what is allowed exists between us as developers.
This is not how language implementations are designed. Design is done at a higher level to allow room for implementations to change and grow.
At this point you have created a child process on your own. The child doesn't exist from the perspective of the C runtime. When This will immediately crash or corrupt state if the child If you want I can follow up with the Linux Man Pages project and add subsequent sections to Generally we have used the Linux Man Pages project, carefully stewarded by Michael Kerrisk, as the place to iron our such discussions about ABI and API (we had a particularly very very long discussion about allowable futex behaviour there). |
I think that would be valuable! If there are a number like this, perhaps it would also be good to have a note in
Hmm, Rust is using bare |
that's being discussed in the Gentoo bug tracker. i don't think it needs to be hashed out here too when the project in question (rust) isn't specific to Gentoo. let rust worry about the systems/standards they're supporting, and forget about Gentoo's sandbox entirely.
let's assume you're right for the sake of argument. i could not care less. libsandbox does not target POSIX compliant or standard abstract machines -- it is full of gnarly low level checks & API assumptions that go far below the standard. it only needs to run on the subset of real world configs that Gentoo supports. so we can look into this further if such a real world configuration actually materializes. |
@codonell I apologize, that was not at all what I was going for, and I didn't intend to come across as though I was trying to give no future wiggle-room in the implementation of libc interfaces. Quite the opposite: I was trying to understand what would go wrong, in order to understand what the issue was and how to work around it. I would very much like to find ways to make use of recent Linux kernel functionality without causing problems within the implementation of C libraries, without having to implement elaborate or suboptimal workarounds, and with as much reasonable future-proofing as we can to allow for the flexible evolution of both Rust and C libraries. My intent was to better understand what the issue was, not to declare it a non-issue. I was not attempting to be sarcastic when I asked about specifications or about what was wrong with the code I posted, but re-reading my message I feel like I came across that way. Again, my apologies. I'm used to seeing two different kinds of issues in interfaces like this:
I had previously been assuming that we were in case (2), and was trying to understand what the practical issue and considerations were; that was what led me to post code that appeared to me to work, to understand what I was missing. Before your most recent message, it didn't seem clear to me what the practical issue with calls to clone/clone3 were, and in previous messages the proscription of clone/clone3 had seemed less like an explanation and more like a pronouncement ("this is not allowed" rather than "here's what will actually go wrong"), which led me to wonder if I was misunderstanding and we were in case (1) and there was a spec-violation I was not aware of. Because of that, I had been wondering what specification might apply, and what the boundaries of that spec were, and whether we might be able to work around the issue for practical purposes since we have targets specific to particular C libraries rather than to any arbitrary C library (hence my attempts to seek practical implementation-specific issues). FWIW, Rust also runs into this kind of issue regularly from the perspective of being the interface-provider as well, and I tend to be one of the folks pushing for Rust to be more permissive about what it allows and more minimal about what it leaves undefined, in order to support unusual low-level code. So, I'm not trying to apply a different approach to the interfaces we consume than the interfaces we produce. ;) Your message clarified exactly the practical issues I was wondering about (e.g. where does errno get written to). Failures in the error case make perfect sense, and explain how code can appear to work at first but still have non-obvious problems. In the case of a call to I've seen some discussions going around (such as at LPC last year) about how Linux C libraries could expose clone/clone3 wrappers that would allow for this kind of usage. What's the current state of those discussions? Is there a potential approach to a clone3 wrapper we could use in the future, and would help implementing such a wrapper be useful? Or, is there some potential interface that could allow notifying the C library of a new thread after having created it? Or, more minimally, an interface to provide a location for errno, which might allow calling a subset of the C library that doesn't need other thread-local values? (Not for the first time I find myself wishing there was an alternate C library interface that did away with errno entirely, in favor of something more like the underlying syscalls: returning an error value.) |
It's undefined because we don't define the behavior of going behind the implementation's back and breaking invariants it depends on. It's as simple as that. It does not need to be spelled out explicitly that "X is UB". Yes "the TCB" is an implementation detail. That's not the point. The point is that you're creating a process where you intend to use part of the implementation, which depends on its own implementation details, with some of those details having been undermined via making a syscall that produces an inconsistent state. |
Yes, a |
Hey, I'm the original author of clone3() and I gave a session about this at last year's LPC so far two things have happened. There was a brief discussion after an RFE filed by Lennart to make use of clone3() in systemd: https://sourceware.org/bugzilla/show_bug.cgi?id=26371 |
looks like commits from #89924 made it to 1.56.0 I've added |
Yes, but master and beta-1.57 do not have any fix yet, so we need to keep tracking this. |
Assigning priority as discussed in the Zulip thread of the Prioritization Working Group. @rustbot label -I-prioritize +P-medium +T-compiler |
In rust-lang#89522 we learned that `clone3` is interacting poorly with Gentoo's `sandbox` tool. We only need that for the unstable pidfd extensions, so otherwise avoid that and use a normal `fork`.
Only use `clone3` when needed for pidfd In rust-lang#89522 we learned that `clone3` is interacting poorly with Gentoo's `sandbox` tool. We only need that for the unstable pidfd extensions, so otherwise avoid that and use a normal `fork`. This is a re-application of beta rust-lang#89924, now that we're aware that we need more than just a temporary release fix. I also reverted 12fbabd, as that was just fallout from using `clone3` instead of `fork`. r? `@Mark-Simulacrum` cc `@joshtriplett`
In rust-lang#89522 we learned that `clone3` is interacting poorly with Gentoo's `sandbox` tool. We only need that for the unstable pidfd extensions, so otherwise avoid that and use a normal `fork`. (cherry picked from commit 85b55ce)
Backported in #90938. |
Not sure if this is considered a bug/regression of rust. If not, just close this issue.
#81825 break cargo run inside Gentoo's sandbox.
How to Reproduce
sandbox cargo build
on any rust project that has few dependence ( eg. rustc, Firefox or https://github.com/ogham/exa )rustc
orcargo
processes don't exitThe bisect process:
cargo bisect-rustc
with add timeout functionality for bisecting hangs cargo-bisect-rustc#135cargo bisect-rustc --start=2021-07-30 --end=2021-09-30 --script=./test.sh -t 120
Meta
rustc --version --verbose
:Backtrace of cargo
The text was updated successfully, but these errors were encountered: