-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we require bounded stack consumption? #427
Comments
By the way, it may be possible to redefine this in terms of the abstract machine by saying "there is some implementation-defined N such that as long as the maximum abstract stack depth is at most N, the program will not enter a stack overflow state". N would be calculated as the stack size divided by B (the value from my last comment). But I can think of two reasons this is more difficult. First, given how I've defined B, it may easily be close to or greater than a typical thread stack size just for a single call, if the program contains any functions anywhere that use a huge amount of stack. So N would be 0 or 1, making the guarantee not very useful. This could potentially be ameliorated by, instead of calculating a single maximum stack usage across all functions, saying that each function gets an arbitrary implementation-defined "maximum stack usage" value, and that as long as the sum of all the "maximum stack usage" values of the functions in the abstract the call stack is within an implementation-defined "stack size" value, the abstract machine will not enter stack overflow state. However, this version is more complicated and has more edge cases. An alternative would be to reason about a hypothetical version of the implementation that gives threads larger stacks. Second, a Unix signal handler can push data onto the stack, and thus potentially cause an overflow, at literally any time during execution. You could potentially subtract some amount from the stack size before dividing by B, effectively reserving that amount of stack for a potential signal handler, though there's no a priori guarantee of how much stack the signal handler might use. Or you could just choose to characterize the implementation's behavior under the assumption there are no signals. |
Sadly that guarantee is useless in practice because B might be 2^100 or something like that... I don't think I understand what the motivation is for making this a hard guarantee. Which theorems would you want to derive from this hard compiler promise?
There are some misconceptions here. "each finite prefix satisfies X" being equivalent to "all (potentially infinite) traces satisfy X" is the definition of a safety property (and whether or not it is a bisimulation is completely irrelevant). However, there are also liveness properties, such as "every thread will either terminate or always eventually take a step again" -- for such a property it is not sufficient to just check all finite prefixes. The Rust compiler makes not only safety but also liveness guarantees (this is what is known in C++ as "forward progress" guarantees). Bisimulation is a technique to deal with infinite traces and to prove properties about infinite traces; it is particularly relevant for liveness properties since for safety properties one can just look at all finite prefixes, and never consider infinite traces directly. |
To fit asymptotics into the framework, IMO it makes no sense to just look at stack consumption of tail calls. We'd need a cost model for the Rust AM, and have the compiler guarantee that it matches the cost model. When writing an algorithm in I am pretty sure there is some basic PL research on cost models for time and space, and cost-preserving compilation. It's pretty far removed from a language like Rust though. But focusing on just space consumption of tail calls is pointless IMO. We've treated time and space complexity as a QoI concern so far, why do tail calls require more? |
I think it is itself being treated as an end goal promise, people just want this
Sure, we can either treat this as a safety property or a liveness property. My contention is that (1) we want this to be a safety property, with concrete falsifiable predictions after finite time, (2) any asymptotic argument isn't going to give us that, (3) doing this would make stack allocation observable under almost any conceivable scheme and we don't want that either. The discussion about bisimulation relations was just another way to make this point in relation to (2). |
How about this for an alternative framing of the problem: We want TCE in Rust. But to do that we need to ensure that TCO is possible. So the job for opsem is to make sure that the model validates TCO under the conditions where If we can do this, then I think it is quite reasonable to say that Rust "has tail-calls" in the AM, and we can give |
Tbqh, I'm not sure. The reason for discussing it, at least, is that when I mentioned that the application of TCE to Personally, I think treating TCE as a codegen/QOI guarantee provided by rustc (and/or required of alternative Rust compilers) is sufficient, and it shouldn't matter to developers how exactly guarantees are provided. Either way, I do think we'll want to have a standard way of providing/documenting guarantees at this layer. |
For a practically-minded reason. If someone writes a Rust compiler, even if they try to do the minimum possible work to satisfy the spec and thus end up with a 'low quality' implementation, that compiler is still not going to turn an (More generally, LLVM in debug mode already generates almost the lowest quality code imaginable in most respects, so it's hard to do worse.) For tail calls, on the other hand, it's very easy to just not implement them. Alternatively, depending on how the RFC ends up, there may be targets where they're simply impossible to implement, such as ISO C; even a high-effort compiler (or port of an existing compiler) for such a target would lack support. And if a community of users formed around such a compiler, there would be pressure for authors of portable Rust code to support that compiler by eschewing the use of the feature. Of course, a community can form around a compiler even if it's officially nonconforming. But being conforming would give it a greater sense of legitimacy.
On a similar note here, the fact that B won't be 2^100 is safe to classify as QoI because you wouldn't accidentally write a compiler where B is 2^100. If B exists at all, then in practice it will likely have a reasonable value. Well, at least unless you argue that B always exists because the address space is finite. Ideally that would be ruled out, somehow. |
I'm not sure this is true. Miri is a perfect example of a standards-conforming implementation which can change the asymptotic order of programs, because the maintenance of shadow state need not be O(1). |
IDK if we can bound stack size in any meaningful way. I just watched llvm invent an 8192 byte stack variable a couple hours ago in |
Well, my understanding of "QoI" is that it's by definition not required of alternative compilers. A codegen guarantee that is required of alternative compilers is exactly what I'm hoping for. A guarantee provided by rustc would be better than nothing. But if you can define it precisely enough to guarantee it – either because you can define it precisely, or because you accept an imprecisely-defined guarantee as good enough – then why isn't that definition also good enough to require of alternative compilers? At least ones that target the same targets as rustc.
Good point, but from a practical standpoint Miri is quite different from a typical compiler in terms of people's expectations of portability, compatibility, and speed. (And it's probably not adding more than an O(log n) factor, I think? Though I could imagine some other sanitizer doing so.) |
I mean, if an upper bound of 2^100 is allowed, you run into QoI anyways. Report bugs to your favourite compiler of choice.
But it is very much a completely valid implementation nonetheless, and we should write the spec in a way that keeps it (and any other "pure AM imlpementations") in mind. |
Just setting up the borrow stacks is already a linear overhead. This program exhibits quadratic behavior under Miri and linear behavior on a regular machine: fn main() {
let n = std::env::args().nth(1).unwrap().parse().unwrap();
let alloc = vec![0; n];
let mut p = &alloc[..];
for i in (0..n).rev() {
p = &p[..i];
}
} I wouldn't be surprised if the overhead is at most linear, though. |
This program is visibly quadratic in Miri if you disable the tag garbage collector with We have a tag GC because this complexity growth is well-known and really annoying from a user perspective. The complexity growth occurs in memory usage as well. All that being said, I do not think anything about Miri can impact whether or not it is possible to bound stack usage of the interpreted program. The interpreter itself is a resource hog and it effectively leaks memory in a variety of ways, so it would be very good if we could separate consumption of the executed program from impact on the host machine. |
@CAD97 wrote:
Maybe, there is just a misunderstanding. The RFC (which currently only discusses guaranteed TCE) specifies that the backend for a target can raise a compiler error if the TCE can not be guaranteed, so there is the expectation that TCE should be guaranteed if no compiler error is raised. I also want to add that there are two possible tail call guarantees that would be interesting for users:
I'm happy to update the RFC specifying if it is possible to guarantee and what exactly can and should be guaranteed. Also let me know if there are any questions regarding the RFC. |
On Wed, 5 Jul 2023 at 14:20, phi-go ***@***.***> wrote:
@CAD97 <https://github.com/CAD97> wrote:
Tbqh, I'm not sure. The reason for discussing it, at least, is that when I
mentioned that the application of TCE to become would likely be a
nonguaranteed QOI, the people in the RFC thread said that become isn't
sufficient for their use without a full guarantee of TCE. (Potentially due
to my wording it as being guaranteed by the target?)
Maybe, there is just a misunderstanding. The RFC (which currently only
discusses guaranteed TCE) specifies that the backend for a target can raise
a compiler error if the TCE can *not* be guaranteed, so there is the
expectation that TCE should be guaranteed if no compiler error is raised.
------------------------------
I also want to add that there are two possible tail call guarantees that
would be interesting for users:
- A "performance" guarantee, maybe best described as, this tail call
does not create a new stack frame.
Performance is not part of opsem.
- A "semantic" guarantee, that this tail call will not cause stack
exhaustion.
I am not sure this guarantee *is* possible from an opsem perspective,
given that the compiler can seemingly choose to exhaust the stack whenever
it feels like it.
…
I'm happy to update the RFC specifying if it is possible to guarantee and
what exactly can and should be guaranteed.
Also let me know if there are any questions regarding the RFC.
—
Reply to this email directly, view it on GitHub
<#427 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD26SKEKEI26KJUYMWN3XOWV6JANCNFSM6AAAAAAZ6EDZ24>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
From an opsem perspective, I agree we probably can't specify "this call does not create a new stack frame": the AM has absolutely no clue what a stack even is, let alone a stack frame. However, we can probably specify the "no stack exhaustion" part, at the very least for the specific case of tail recursion: puts on opsem hat
For the more complex case of TCE, It's harder, because you could TCE into a larger function than the current one, and that could OOM. However, I think we can say something like
Which, if I'm right, translates from opsemese as meaning "assuming the biggest function you call as part of your "TCE graph" doesn't overflow the stack, then none of the functions in that graph ever will regardless of how many times you (This is assuming we also specify that functions allocate all of their locals as part of the call expression, which I think is possible to require in the opsem) |
(GitHub borked your email reply formatting again @chorman0773)
The T-opsem side seems to be mostly in consensus that TCE isn't something which would1 be guaranteed at an opsem level. However, performance and stack non-exhaustion are important properties to developers (especially those that would use It doesn't truly matter at what layer the guarantee happens, so long as developers can expect TCE to occur reliably (and thus that a bug preventing TCE in a guaranteed position would be treated as unsound). It can be left to a QOI issue from the formal perspective, so long as rustc provides an implementation promise of meeting the expected QOI. Notably, the RFC participants want a stronger promise than is currently given for Footnotes
|
Also frankly, if miri is a technically nonconforming implementation because of side metrics like big-O memory usage, that's both not a problem and not new - it's already sketchy around plenty of pointer things, it's a semi-hidden extra flag on the target spec, etc. |
What does the AM have? I don't quite know what model or definition is being used in discussions like these, which makes it hard to evaluate if I'm even following the discussion correctly. In any case, I'm wondering if instead of bytes on the stack, it would be more appropriate to just count the bindings that have come into scope and haven't left yet. I'll use MIR's definition of local in the following: "Locals: Memory locations allocated on the stack (conceptually, at least), such as function arguments, local variables, and temporaries.". Let's say that, at some point during the execution, a local exists if it is a local in the current function, or if it exists in the caller of the current function (or more specifically, the function where Then suppose it's guaranteed that
That's currently the case, right? The MIR for a function has a unique label for each binding. It can't dynamically create more. Even with dynamically sized locals you'd still have a constant count, their size just might vary. Then suppose semantics for explicit tail calls: All the locals in the current function go out of scope (stop existing), except the arguments, which become locals of the target function. That is, a tail call changes the number of existing locals by exactly the difference Maybe the part about locals deeper in the call stack is unnecessary, since that doesn't change in any of the relevant operations, but it does make the difference to a normal function call obvious: A function call increases the count by While it doesn't by itself exclude overflowing the stack, it does mean that without unbounded non-tail recursion there will never be a situation where there is an unbounded number of locals that need to be held somewhere in a bounded space. |
This was never a problem in the first place. As long as a local is never read again (either by drop, a read, a retag, or some other AM operation which ultimately impacts some observable effect), the compiler is free to not keep it around in physical memory. That's exactly how TCO works. There is no need for the model to explicitly contain a mechanism for that, as the relation between the AM and the real machine has always included the necessary wiggle room (the "as-if" rule, aka observational equivalence) to be able to do optimizations regarding unobservable quantities.
The quoted claim is not really correct. The current working AM model is https://github.com/RalfJung/minirust, and it contains an explicit stack. However it's not like the kind of stack in a regular machine, and in particular none of the information in the stack frame has to exist in physical memory. The locals themselves are essentially pointers to memory locations (locals have actual addresses and have to exist at those addresses), but because they don't have to remain valid after last use the only observable effect of TCO is that locals exist at addresses overlapping locals up the stack frame that have since been discarded. (That is, you can print the address of the local, and then the nested local, and observe that these address values are equal, even though neither local is nominally out of scope yet.) Whether this is currently legal is still under debate, but it is made explicitly legal under every proposed variation on |
I think that is completely pointless when we don't even guarantee anything about the space and time complexity of loop-based algorithms without recursion.
I think that's exactly what rust-lang/rust#113128 does in Miri: (Well, there are some thorny questions around
I don't know any production language that makes hard time/space complexity guarantees. I doubt those people worry about that when using tail calls (or loops) elsewhere. I don't think they are making a reasonable request.
opsem totally has a stack and stack frames. They just don't live in memory, they are dedicated machine state. We can totally specify that a tail call doesn't grow the stack. We just can't relate that to when and whether a stack overflow occurs in the real execution. |
If rustc produces stack overflows with tail calls, that is a bug. We all agree on that. We don't need a formal guarantee in the opsem for this. rustc is expected to be much more than just a conforming implementation of the AM. (And we are not currently in the business of making any statements about other conforming implementations, we are specifying rustc, see rust-lang/rfcs#3355. So there is absolutely no need for an "adversarial" mindset where an implementation somehow tries to cheat and claim it implements Rust but refuses to get tail calls right.) Formalism is useful for the specification of UB because those details are often subtle, and they form a contract between unsafe code authors and compiler authors. Experience shows that leaving that contract to an English-language document is insufficient: it is too easy to arrive at conflicting interpretations of that document. I don't see any such risk with the question of whether a tail-call-based algorithm overflows the stack or not. So it is just not needed to take out the big formal hammer here. Of course it would be great to provide actual space and time complexity preservation guarantees for Rust programs. But then obviously we'd want that for loops as well, not just for tail calls. This is a really hard problem, and I can only advice against blocking tail calls on solving that problem. Trying to "solve" this problem with some hacky language aimed specifically at tail calls is totally pointless IMO and doesn't actually help anyone, so let's not do it. |
I didn't say anything about a complexity bound there. I mean that tail recursion is something that should happen at a point in time, with attendant affects on the machine state. However, as the rest of the quoted passage explains, I don't know how we can do that without also making inlining observable, and as a result I would push for the opposite, making tail recursion something that can be formally proved as unobservable so that we don't need to say tail recursion in the model to still get TCO.
That's a compiler PR though? Or are you saying that tail calls would also be part of the AM? The plan I am laying out above would result in an opsem that says nothing at all about tail calls and treats them exactly like regular calls, so there wouldn't be a |
Most of the interpreter is in rust-lang/rust, search in that PR's files for |
I don't think we should apply the term TCO to It is possible that this is almost unobservable without talking about stack space exhaustion. However, we do need to explain why something like this is UB let local = x;
let ptr = addr_of!(local);
become f(ptr); // assume this loads from `ptr` So IMO the clearest and cleanest way to do this is to simply have actual tail calls in the AM, which pop the old stack frame before pushing the new one. (Yes we could just have a whole lot of StorageDead before
Yes. |
While I agree that we want |
While I agree that we want become to "actually do a tail call", I think it is better to have this fall out of a theorem rather than literally doing a tail call in the opsem
Why? We have a new language primitive for a reason. It would be very strange to not define it the way it is intended to work.
We should not conflate this with opportunistic TCO on 'return'. 'become' has a clear intent and the job of the opsem is to formalize that intent, not to obscure it.
|
In RFC 3407 (Explicit Tail Calls), there has been a discussion about whether stack consumption can be formally required. Specifically, while stack consumption has not been treated as observable so far, it seems useful to make tail call elimination mandatory rather than just optimization or quality of implementation.
"Formally required" may or may not have anything to do with operational semantics per se. minirust has a concept of
StackFrame
, but the idea that aStackFrame
requires a certain number of bytes just to store the abstract data likefunc
andcaller_return_info
is clearly in the domain of the concrete machine, and thus target-dependent. So the requirement may be better defined as part of what it means to be a conforming Rust compiler for a given target, similar to the handling ofasm!
or even command line arguments. Or it may be a guarantee given specifically by rustc (but not necessarily other Rust compilers), but I find that unsatisfactory.Anyway, I am responding to this comment from @digama0, but I agree that the conversation is getting off-topic there, so I'm posting the response here.
@digama0 I see what you're objecting to.
Well, the idea that "a program which makes a finite number of non-tail nested calls does not consume an unbounded amount of stack" can be almost-equivalently expressed without infinity in this form:
Optimizations don't have to keep B the same, just preserve the property that some B exists.
Technically this definition is not equivalent to the "does not consume an unbounded amount of stack" definition, because that definition would allow stack usage to grow super-linearly (e.g. O(n^2) in the number of calls), while the new definition limits it to linear growth.
But B doesn't have to be anywhere near the actual number of bytes used by any particular function. It also doesn't have to be explicitly calculated. It just has to be an upper bound.
For an AOT implementation, B can be calculated relatively straightforwardly once you have the final program. Basically, for every machine code function, calculate its maximum stack usage, including all paths through the function. Then set B to the maximum of all those values.
Compiler optimizations are then handled as follows:
Outlining: When calculating maximum stack usage of a function, also include the maximum stack usage of any outlined chunks that it might call, recursively. (There must be a maximum, or else you're saying that outlining could create infinite recursion where it didn't previously exist.)
Inlining: Not a problem. Inlined calls use 0 bytes of stack, and 0 is less than B. The stack usage for variables from the inlined function was already accounted for in the calling function.
Tail calls: If they're implemented as true tail calls at the machine code level, then no extra work is needed. If they're implemented by adding some extra 'driver' frame to the stack, then just add the stack usage of that to B.
Again, the important thing is not the value of B; the important thing is the fact that some procedure exists to calculate an upper bound. The procedure needs to account for all compiler optimizations that can change which concrete frames are created for a given series of abstract calls. If the procedure , then the compiler is correct.
But what about a JIT implementation, where the program might have their stack usage randomly changed on the fly when the JIT decides to optimize or deoptimize a function, potentially based on data measured during execution of the program? Well, it doesn't make too much difference. There must be some maximum amount of stack it could use for all possible compilations of a given program, under all possible measurement outcomes. The maximum may be impractical to calculate, but it exists. If not, then by definition there is some circumstance where you have unbounded stack usage. For example, an optimization that turns dynamically-sized heap allocations into stack allocations must have a limit on the size of allocations that are transformed this way, because otherwise you could get arbitrarily high stack usage by making larger and larger heap allocations.
(What if the JIT upgrades itself while the program is still running? Then you can still define B after the fact with respect to the 'implementation' consisting of the JIT plus all upgraded versions ever used. You have to include the upgrades in the implementation; you can't just say "an implementation is correct if it only upgrades itself to implementations that are also correct". Because otherwise you could have an implementation that 'upgrades' itself every second to a version that uses 1 more byte of stack than the last, or in other words, one that leaks stack at a rate of 1 byte per second. That can't be considered bounded.)
The text was updated successfully, but these errors were encountered: