Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we require bounded stack consumption? #427

Open
comex opened this issue Jul 4, 2023 · 27 comments
Open

Can we require bounded stack consumption? #427

comex opened this issue Jul 4, 2023 · 27 comments

Comments

@comex
Copy link

comex commented Jul 4, 2023

In RFC 3407 (Explicit Tail Calls), there has been a discussion about whether stack consumption can be formally required. Specifically, while stack consumption has not been treated as observable so far, it seems useful to make tail call elimination mandatory rather than just optimization or quality of implementation.

"Formally required" may or may not have anything to do with operational semantics per se. minirust has a concept of StackFrame, but the idea that a StackFrame requires a certain number of bytes just to store the abstract data like func and caller_return_info is clearly in the domain of the concrete machine, and thus target-dependent. So the requirement may be better defined as part of what it means to be a conforming Rust compiler for a given target, similar to the handling of asm! or even command line arguments. Or it may be a guarantee given specifically by rustc (but not necessarily other Rust compilers), but I find that unsatisfactory.

Anyway, I am responding to this comment from @digama0, but I agree that the conversation is getting off-topic there, so I'm posting the response here.

@digama0 I see what you're objecting to.

Well, the idea that "a program which makes a finite number of non-tail nested calls does not consume an unbounded amount of stack" can be almost-equivalently expressed without infinity in this form:

  • For a given implementation and program, there exists some number B such that for all executions of that program under that implementation, the total stack consumption (not counting consumption by unsized locals) is at most B times the number of nested non-tail calls.

Optimizations don't have to keep B the same, just preserve the property that some B exists.

Technically this definition is not equivalent to the "does not consume an unbounded amount of stack" definition, because that definition would allow stack usage to grow super-linearly (e.g. O(n^2) in the number of calls), while the new definition limits it to linear growth.

But B doesn't have to be anywhere near the actual number of bytes used by any particular function. It also doesn't have to be explicitly calculated. It just has to be an upper bound.

For an AOT implementation, B can be calculated relatively straightforwardly once you have the final program. Basically, for every machine code function, calculate its maximum stack usage, including all paths through the function. Then set B to the maximum of all those values.

Compiler optimizations are then handled as follows:

  • Outlining: When calculating maximum stack usage of a function, also include the maximum stack usage of any outlined chunks that it might call, recursively. (There must be a maximum, or else you're saying that outlining could create infinite recursion where it didn't previously exist.)

  • Inlining: Not a problem. Inlined calls use 0 bytes of stack, and 0 is less than B. The stack usage for variables from the inlined function was already accounted for in the calling function.

  • Tail calls: If they're implemented as true tail calls at the machine code level, then no extra work is needed. If they're implemented by adding some extra 'driver' frame to the stack, then just add the stack usage of that to B.

Again, the important thing is not the value of B; the important thing is the fact that some procedure exists to calculate an upper bound. The procedure needs to account for all compiler optimizations that can change which concrete frames are created for a given series of abstract calls. If the procedure , then the compiler is correct.

But what about a JIT implementation, where the program might have their stack usage randomly changed on the fly when the JIT decides to optimize or deoptimize a function, potentially based on data measured during execution of the program? Well, it doesn't make too much difference. There must be some maximum amount of stack it could use for all possible compilations of a given program, under all possible measurement outcomes. The maximum may be impractical to calculate, but it exists. If not, then by definition there is some circumstance where you have unbounded stack usage. For example, an optimization that turns dynamically-sized heap allocations into stack allocations must have a limit on the size of allocations that are transformed this way, because otherwise you could get arbitrarily high stack usage by making larger and larger heap allocations.

(What if the JIT upgrades itself while the program is still running? Then you can still define B after the fact with respect to the 'implementation' consisting of the JIT plus all upgraded versions ever used. You have to include the upgrades in the implementation; you can't just say "an implementation is correct if it only upgrades itself to implementations that are also correct". Because otherwise you could have an implementation that 'upgrades' itself every second to a version that uses 1 more byte of stack than the last, or in other words, one that leaks stack at a rate of 1 byte per second. That can't be considered bounded.)

@comex
Copy link
Author

comex commented Jul 4, 2023

By the way, it may be possible to redefine this in terms of the abstract machine by saying "there is some implementation-defined N such that as long as the maximum abstract stack depth is at most N, the program will not enter a stack overflow state". N would be calculated as the stack size divided by B (the value from my last comment). But I can think of two reasons this is more difficult.

First, given how I've defined B, it may easily be close to or greater than a typical thread stack size just for a single call, if the program contains any functions anywhere that use a huge amount of stack. So N would be 0 or 1, making the guarantee not very useful. This could potentially be ameliorated by, instead of calculating a single maximum stack usage across all functions, saying that each function gets an arbitrary implementation-defined "maximum stack usage" value, and that as long as the sum of all the "maximum stack usage" values of the functions in the abstract the call stack is within an implementation-defined "stack size" value, the abstract machine will not enter stack overflow state. However, this version is more complicated and has more edge cases. An alternative would be to reason about a hypothetical version of the implementation that gives threads larger stacks.

Second, a Unix signal handler can push data onto the stack, and thus potentially cause an overflow, at literally any time during execution. You could potentially subtract some amount from the stack size before dividing by B, effectively reserving that amount of stack for a potential signal handler, though there's no a priori guarantee of how much stack the signal handler might use. Or you could just choose to characterize the implementation's behavior under the assumption there are no signals.

@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2023

For a given implementation and program, there exists some number B such that for all executions of that program under that implementation, the total stack consumption (not counting consumption by unsized locals) is at most B times the number of nested non-tail calls.

Sadly that guarantee is useless in practice because B might be 2^100 or something like that...

I don't think I understand what the motivation is for making this a hard guarantee. Which theorems would you want to derive from this hard compiler promise?

@digama0

(it is a bisimulation relation, meaning that programs which are equivalent on each finite initial segment of the trace are also equivalent on the whole trace),

There are some misconceptions here. "each finite prefix satisfies X" being equivalent to "all (potentially infinite) traces satisfy X" is the definition of a safety property (and whether or not it is a bisimulation is completely irrelevant). However, there are also liveness properties, such as "every thread will either terminate or always eventually take a step again" -- for such a property it is not sufficient to just check all finite prefixes. The Rust compiler makes not only safety but also liveness guarantees (this is what is known in C++ as "forward progress" guarantees).

Bisimulation is a technique to deal with infinite traces and to prove properties about infinite traces; it is particularly relevant for liveness properties since for safety properties one can just look at all finite prefixes, and never consider infinite traces directly.

@RalfJung
Copy link
Member

RalfJung commented Jul 4, 2023

To fit asymptotics into the framework, IMO it makes no sense to just look at stack consumption of tail calls. We'd need a cost model for the Rust AM, and have the compiler guarantee that it matches the cost model. When writing an algorithm in O(n * log(n)), currently a conforming compiler can totally turn this into an O(n^2) output, or even O(2^n). Similar things apply to space consumption: if you allocate O(n * log(n)) memory, we make no guarantees at all how much memory the final program allocates.

I am pretty sure there is some basic PL research on cost models for time and space, and cost-preserving compilation. It's pretty far removed from a language like Rust though.

But focusing on just space consumption of tail calls is pointless IMO. We've treated time and space complexity as a QoI concern so far, why do tail calls require more?

@digama0
Copy link

digama0 commented Jul 4, 2023

I don't think I understand what the motivation is for making this a hard guarantee. Which theorems would you want to derive from this hard compiler promise?

I think it is itself being treated as an end goal promise, people just want this become thing to have guaranteed TCE so they can write scheme in it and so on. (There is a long discussion about this on the linked thread; have a go at trying to argue that this need not be an opsem guarantee if you like.)

There are some misconceptions here. "each finite prefix satisfies X" being equivalent to "all (potentially infinite) traces satisfy X" is the definition of a safety property. However, there are also liveness properties...

Sure, we can either treat this as a safety property or a liveness property. My contention is that (1) we want this to be a safety property, with concrete falsifiable predictions after finite time, (2) any asymptotic argument isn't going to give us that, (3) doing this would make stack allocation observable under almost any conceivable scheme and we don't want that either. The discussion about bisimulation relations was just another way to make this point in relation to (2).

@digama0
Copy link

digama0 commented Jul 4, 2023

How about this for an alternative framing of the problem:

We want TCE in Rust. But to do that we need to ensure that TCO is possible. So the job for opsem is to make sure that the model validates TCO under the conditions where become is legal. One way to express this is to consider an alternative model which has an actual tail-call operation, popping an AM stack frame and jumping to the target function, and it should be a theorem that this altered AM model has the same behaviors as the AM without tail-calls.

If we can do this, then I think it is quite reasonable to say that Rust "has tail-calls" in the AM, and we can give become a spec which is just as secure as loop { ... } in terms of protecting against stack growth. This has to be paired with a solid compiler implementation + QoI promise ensuring no regressions on this front, of course, but I think this should address the concerns of people who want to rely on tail-calls for implementation correctness.

@CAD97
Copy link

CAD97 commented Jul 4, 2023

We've treated time and space complexity as a QoI concern so far, why do tail calls require more?

Tbqh, I'm not sure. The reason for discussing it, at least, is that when I mentioned that the application of TCE to become would likely be a nonguaranteed QOI, the people in the RFC thread said that become isn't sufficient for their use without a full guarantee of TCE. (Potentially due to my wording it as being guaranteed by the target?)

Personally, I think treating TCE as a codegen/QOI guarantee provided by rustc (and/or required of alternative Rust compilers) is sufficient, and it shouldn't matter to developers how exactly guarantees are provided. Either way, I do think we'll want to have a standard way of providing/documenting guarantees at this layer.

@comex
Copy link
Author

comex commented Jul 4, 2023

But focusing on just space consumption of tail calls is pointless IMO. We've treated time and space complexity as a QoI concern so far, why do tail calls require more?

For a practically-minded reason. If someone writes a Rust compiler, even if they try to do the minimum possible work to satisfy the spec and thus end up with a 'low quality' implementation, that compiler is still not going to turn an O(n * log(n))-time program into O(2^n). To get that effect you would have to go out of your way to do something strange. So there is never going to be a serious compiler, with a community of users depending on it, that does that. Therefore it's safe to treat time complexity as QoI.

(More generally, LLVM in debug mode already generates almost the lowest quality code imaginable in most respects, so it's hard to do worse.)

For tail calls, on the other hand, it's very easy to just not implement them. Alternatively, depending on how the RFC ends up, there may be targets where they're simply impossible to implement, such as ISO C; even a high-effort compiler (or port of an existing compiler) for such a target would lack support.

And if a community of users formed around such a compiler, there would be pressure for authors of portable Rust code to support that compiler by eschewing the use of the feature.

Of course, a community can form around a compiler even if it's officially nonconforming. But being conforming would give it a greater sense of legitimacy.

Sadly that guarantee is useless in practice because B might be 2^100 or something like that...

I don't think I understand what the motivation is for making this a hard guarantee. Which theorems would you want to derive from this hard compiler promise?

On a similar note here, the fact that B won't be 2^100 is safe to classify as QoI because you wouldn't accidentally write a compiler where B is 2^100. If B exists at all, then in practice it will likely have a reasonable value.

Well, at least unless you argue that B always exists because the address space is finite. Ideally that would be ruled out, somehow.

@digama0
Copy link

digama0 commented Jul 4, 2023

If someone writes a Rust compiler, even if they try to do the minimum possible work to satisfy the spec and thus end up with a 'low quality' implementation, that compiler is still not going to turn an O(n * log(n))-time program into O(2^n). To get that effect you would have to go out of your way to do something strange. So there is never going to be a serious compiler, with a community of users depending on it, that does that. Therefore it's safe to treat time complexity as QoI.

I'm not sure this is true. Miri is a perfect example of a standards-conforming implementation which can change the asymptotic order of programs, because the maintenance of shadow state need not be O(1).

@chorman0773
Copy link
Contributor

IDK if we can bound stack size in any meaningful way. I just watched llvm invent an 8192 byte stack variable a couple hours ago in -O0.

@comex
Copy link
Author

comex commented Jul 4, 2023

@CAD97

Personally, I think treating TCE as a codegen/QOI guarantee provided by rustc (and/or required of alternative Rust compilers) is sufficient,

Well, my understanding of "QoI" is that it's by definition not required of alternative compilers. A codegen guarantee that is required of alternative compilers is exactly what I'm hoping for.

A guarantee provided by rustc would be better than nothing. But if you can define it precisely enough to guarantee it – either because you can define it precisely, or because you accept an imprecisely-defined guarantee as good enough – then why isn't that definition also good enough to require of alternative compilers? At least ones that target the same targets as rustc.

@digama0

I'm not sure this is true. Miri is a perfect example of a standards-conforming implementation which can change the asymptotic order of programs, because the maintenance of shadow state need not be O(1).

Good point, but from a practical standpoint Miri is quite different from a typical compiler in terms of people's expectations of portability, compatibility, and speed. (And it's probably not adding more than an O(log n) factor, I think? Though I could imagine some other sanitizer doing so.)

@chorman0773
Copy link
Contributor

Well, my understanding of "QoI" is that it's by definition not required of alternative compilers. A codegen guarantee that is required of alternative compilers is exactly what I'm hoping for.

I mean, if an upper bound of 2^100 is allowed, you run into QoI anyways. Report bugs to your favourite compiler of choice.

Good point, but from a practical standpoint Miri is quite different from a typical compiler in terms of people's expectations of portability, compatibility, and speed.

But it is very much a completely valid implementation nonetheless, and we should write the spec in a way that keeps it (and any other "pure AM imlpementations") in mind.

@digama0
Copy link

digama0 commented Jul 4, 2023

(And [Miri]'s probably not adding more than an O(log n) factor, I think? Though I could imagine some other sanitizer doing so.)

Just setting up the borrow stacks is already a linear overhead. This program exhibits quadratic behavior under Miri and linear behavior on a regular machine:

fn main() {
    let n = std::env::args().nth(1).unwrap().parse().unwrap();
    let alloc = vec![0; n];
    let mut p = &alloc[..];
    for i in (0..n).rev() {
        p = &p[..i];
    }
}

I wouldn't be surprised if the overhead is at most linear, though.

@saethlin
Copy link
Member

saethlin commented Jul 4, 2023

This program is visibly quadratic in Miri if you disable the tag garbage collector with -Zmiri-tag-gc=0: https://github.com/rust-lang/miri/blob/addd7c3234dcda54b7e1f296742705adf2975aab/bench-cargo-miri/slice-get-unchecked/src/main.rs

We have a tag GC because this complexity growth is well-known and really annoying from a user perspective. The complexity growth occurs in memory usage as well.

All that being said, I do not think anything about Miri can impact whether or not it is possible to bound stack usage of the interpreted program. The interpreter itself is a resource hog and it effectively leaks memory in a variety of ways, so it would be very good if we could separate consumption of the executed program from impact on the host machine.

@phi-go
Copy link

phi-go commented Jul 5, 2023

@CAD97 wrote:

Tbqh, I'm not sure. The reason for discussing it, at least, is that when I mentioned that the application of TCE to become would likely be a nonguaranteed QOI, the people in the RFC thread said that become isn't sufficient for their use without a full guarantee of TCE. (Potentially due to my wording it as being guaranteed by the target?)

Maybe, there is just a misunderstanding. The RFC (which currently only discusses guaranteed TCE) specifies that the backend for a target can raise a compiler error if the TCE can not be guaranteed, so there is the expectation that TCE should be guaranteed if no compiler error is raised.


I also want to add that there are two possible tail call guarantees that would be interesting for users:

  • A "performance" guarantee, maybe best described as, this tail call does not create a new stack frame.
  • A "semantic" guarantee, that this tail call will not cause stack exhaustion.

I'm happy to update the RFC specifying if it is possible to guarantee and what exactly can and should be guaranteed.

Also let me know if there are any questions regarding the RFC.

@chorman0773
Copy link
Contributor

chorman0773 commented Jul 5, 2023 via email

@T-Dark0
Copy link

T-Dark0 commented Jul 5, 2023

From an opsem perspective, I agree we probably can't specify "this call does not create a new stack frame": the AM has absolutely no clue what a stack even is, let alone a stack frame.

However, we can probably specify the "no stack exhaustion" part, at the very least for the specific case of tail recursion: puts on opsem hat

A recursive function call performed as part of a become expression will either fail to compile or never OOM.

For the more complex case of TCE, It's harder, because you could TCE into a larger function than the current one, and that could OOM. However, I think we can say something like

Let a tailgroup be a collection of functions that become each other. Let (f, g) be a pair of functions in a tailgroup T. If, for any pair (f, g) in T such that f is the current function, the operation of returning from f and then immediately calling g would not OOM, then any become expression in f either fails to compile or will not OOM.

Which, if I'm right, translates from opsemese as meaning "assuming the biggest function you call as part of your "TCE graph" doesn't overflow the stack, then none of the functions in that graph ever will regardless of how many times you become"

(This is assuming we also specify that functions allocate all of their locals as part of the call expression, which I think is possible to require in the opsem)

@CAD97
Copy link

CAD97 commented Jul 5, 2023

(GitHub borked your email reply formatting again @chorman0773)

I also want to add that there are two possible tail call guarantees that would be interesting for users:

  • A "performance" guarantee, maybe best described as, this tail call does not create a new stack frame.

Performance is not part of opsem.

  • A "semantic" guarantee, that this tail call will not cause stack exhaustion.

I am not sure this guarantee is possible from an opsem perspective, given that the compiler can seemingly choose to exhaust the stack whenever it feels like it.

The T-opsem side seems to be mostly in consensus that TCE isn't something which would1 be guaranteed at an opsem level. However, performance and stack non-exhaustion are important properties to developers (especially those that would use become), so there should be, at some level, a guarantee provided by (the totality of) the rustc compiler.

It doesn't truly matter at what layer the guarantee happens, so long as developers can expect TCE to occur reliably (and thus that a bug preventing TCE in a guaranteed position would be treated as unsound). It can be left to a QOI issue from the formal perspective, so long as rustc provides an implementation promise of meeting the expected QOI. Notably, the RFC participants want a stronger promise than is currently given for #[inline(always)].

Footnotes

  1. I remain idealistically hopeful, but not very, that it would be possible to provide a stronger bound to stack exhaustion in the face of become, roughly of the shape that while going through a control flow path prefix already traveled will not cause stack exhaustion if it didn't the first time through. But this bound is so weak as to potentially not even be useful, since iteration counts are sufficient to split the control flow path and reenable the potential for stack exhaustion. It's certainly not of any assistance to further automatic optimization and would only serve as a guarantee of weakly observable codegen properties.

@talchas
Copy link

talchas commented Jul 5, 2023

Also frankly, if miri is a technically nonconforming implementation because of side metrics like big-O memory usage, that's both not a problem and not new - it's already sketchy around plenty of pointer things, it's a semi-hidden extra flag on the target spec, etc.

@quaternic
Copy link

quaternic commented Jul 6, 2023

From an opsem perspective, I agree we probably can't specify "this call does not create a new stack frame": the AM has absolutely no clue what a stack even is, let alone a stack frame.

What does the AM have? I don't quite know what model or definition is being used in discussions like these, which makes it hard to evaluate if I'm even following the discussion correctly. In any case, I'm wondering if instead of bytes on the stack, it would be more appropriate to just count the bindings that have come into scope and haven't left yet. I'll use MIR's definition of local in the following: "Locals: Memory locations allocated on the stack (conceptually, at least), such as function arguments, local variables, and temporaries.".

Let's say that, at some point during the execution, a local exists if it is a local in the current function, or if it exists in the caller of the current function (or more specifically, the function where returning would take us). In other words, the existing locals are exactly those of all the functions in the call stack.

Then suppose it's guaranteed that

The number of existing locals does not change when execution performs any closed loop in the control flow graph, stepping over (non-tail) function calls, and stopping at any return.

That's currently the case, right? The MIR for a function has a unique label for each binding. It can't dynamically create more. Even with dynamically sized locals you'd still have a constant count, their size just might vary.

Then suppose semantics for explicit tail calls: All the locals in the current function go out of scope (stop existing), except the arguments, which become locals of the target function.

That is, a tail call changes the number of existing locals by exactly the difference #{locals in new function} - #{locals in old function}, so a closed cycle of tail calls always has net zero effect.

Maybe the part about locals deeper in the call stack is unnecessary, since that doesn't change in any of the relevant operations, but it does make the difference to a normal function call obvious: A function call increases the count by #{locals in new function} and the return from that function subtracts the same number.

While it doesn't by itself exclude overflowing the stack, it does mean that without unbounded non-tail recursion there will never be a situation where there is an unbounded number of locals that need to be held somewhere in a bounded space.

@digama0
Copy link

digama0 commented Jul 6, 2023

While it doesn't by itself exclude overflowing the stack, it does mean that without unbounded non-tail recursion there will never be a situation where there is an unbounded number of locals that need to be held somewhere in a bounded space.

This was never a problem in the first place. As long as a local is never read again (either by drop, a read, a retag, or some other AM operation which ultimately impacts some observable effect), the compiler is free to not keep it around in physical memory. That's exactly how TCO works. There is no need for the model to explicitly contain a mechanism for that, as the relation between the AM and the real machine has always included the necessary wiggle room (the "as-if" rule, aka observational equivalence) to be able to do optimizations regarding unobservable quantities.

From an opsem perspective, I agree we probably can't specify "this call does not create a new stack frame": the AM has absolutely no clue what a stack even is, let alone a stack frame.

What does the AM have? I don't quite know what model or definition is being used in discussions like these, which makes it hard to evaluate if I'm even following the discussion correctly. In any case, I'm wondering if instead of bytes on the stack, it would be more appropriate to just count the bindings that have come into scope and haven't left yet. I'll use MIR's definition of local in the following: "Locals: Memory locations allocated on the stack (conceptually, at least), such as function arguments, local variables, and temporaries.".

The quoted claim is not really correct. The current working AM model is https://github.com/RalfJung/minirust, and it contains an explicit stack. However it's not like the kind of stack in a regular machine, and in particular none of the information in the stack frame has to exist in physical memory. The locals themselves are essentially pointers to memory locations (locals have actual addresses and have to exist at those addresses), but because they don't have to remain valid after last use the only observable effect of TCO is that locals exist at addresses overlapping locals up the stack frame that have since been discarded. (That is, you can print the address of the local, and then the nested local, and observe that these address values are equal, even though neither local is nominally out of scope yet.) Whether this is currently legal is still under debate, but it is made explicitly legal under every proposed variation on become.

@RalfJung
Copy link
Member

RalfJung commented Jul 7, 2023

Sure, we can either treat this as a safety property or a liveness property. My contention is that (1) we want this to be a safety property, with concrete falsifiable predictions after finite time,

I think that is completely pointless when we don't even guarantee anything about the space and time complexity of loop-based algorithms without recursion.

We want TCE in Rust. But to do that we need to ensure that TCO is possible. So the job for opsem is to make sure that the model validates TCO under the conditions where become is legal. One way to express this is to consider an alternative model which has an actual tail-call operation, popping an AM stack frame and jumping to the target function, and it should be a theorem that this altered AM model has the same behaviors as the AM without tail-calls.

I think that's exactly what rust-lang/rust#113128 does in Miri: become replaces the topmost stack frame. From an AM perspective I think that is all we need to specify tail calls.

(Well, there are some thorny questions around nounwind and type validity invariants, but they are off-topic here.)

Tbqh, I'm not sure. The reason for discussing it, at least, is that when I mentioned that the application of TCE to become would likely be a nonguaranteed QOI, the people in the RFC thread said that become isn't sufficient for their use without a full guarantee of TCE. (Potentially due to my wording it as being guaranteed by the target?)

I don't know any production language that makes hard time/space complexity guarantees. I doubt those people worry about that when using tail calls (or loops) elsewhere. I don't think they are making a reasonable request.

From an opsem perspective, I agree we probably can't specify "this call does not create a new stack frame": the AM has absolutely no clue what a stack even is, let alone a stack frame.

opsem totally has a stack and stack frames. They just don't live in memory, they are dedicated machine state. We can totally specify that a tail call doesn't grow the stack. We just can't relate that to when and whether a stack overflow occurs in the real execution.

@RalfJung
Copy link
Member

RalfJung commented Jul 7, 2023

If rustc produces stack overflows with tail calls, that is a bug. We all agree on that. We don't need a formal guarantee in the opsem for this. rustc is expected to be much more than just a conforming implementation of the AM. (And we are not currently in the business of making any statements about other conforming implementations, we are specifying rustc, see rust-lang/rfcs#3355. So there is absolutely no need for an "adversarial" mindset where an implementation somehow tries to cheat and claim it implements Rust but refuses to get tail calls right.)

Formalism is useful for the specification of UB because those details are often subtle, and they form a contract between unsafe code authors and compiler authors. Experience shows that leaving that contract to an English-language document is insufficient: it is too easy to arrive at conflicting interpretations of that document. I don't see any such risk with the question of whether a tail-call-based algorithm overflows the stack or not. So it is just not needed to take out the big formal hammer here.

Of course it would be great to provide actual space and time complexity preservation guarantees for Rust programs. But then obviously we'd want that for loops as well, not just for tail calls. This is a really hard problem, and I can only advice against blocking tail calls on solving that problem. Trying to "solve" this problem with some hacky language aimed specifically at tail calls is totally pointless IMO and doesn't actually help anyone, so let's not do it.

@digama0
Copy link

digama0 commented Jul 7, 2023

Sure, we can either treat this as a safety property or a liveness property. My contention is that (1) we want this to be a safety property, with concrete falsifiable predictions after finite time,

I think that is completely pointless when we don't even guarantee anything about the space and time complexity of loop-based algorithms without recursion.

I didn't say anything about a complexity bound there. I mean that tail recursion is something that should happen at a point in time, with attendant affects on the machine state. However, as the rest of the quoted passage explains, I don't know how we can do that without also making inlining observable, and as a result I would push for the opposite, making tail recursion something that can be formally proved as unobservable so that we don't need to say tail recursion in the model to still get TCO.

We want TCE in Rust. But to do that we need to ensure that TCO is possible. So the job for opsem is to make sure that the model validates TCO under the conditions where become is legal. One way to express this is to consider an alternative model which has an actual tail-call operation, popping an AM stack frame and jumping to the target function, and it should be a theorem that this altered AM model has the same behaviors as the AM without tail-calls.

I think that's exactly what rust-lang/rust#113128 does in Miri: become replaces the topmost stack frame. From an AM perspective I think that is all we need to specify tail calls.

That's a compiler PR though? Or are you saying that tail calls would also be part of the AM? The plan I am laying out above would result in an opsem that says nothing at all about tail calls and treats them exactly like regular calls, so there wouldn't be a TailCall terminator in the MIR flavor that is interpreted by the AM.

@saethlin
Copy link
Member

saethlin commented Jul 8, 2023

That's a compiler PR though?

Most of the interpreter is in rust-lang/rust, search in that PR's files for rustc_const_eval for the changes to Miri. The code in rust-lang/miri is mostly extension traits that add more UB detection that hooks into the middle of interpretation. And also shims, that are similarly hooked into the midst of the normal interpreter.

@RalfJung
Copy link
Member

RalfJung commented Jul 8, 2023

I don't think we should apply the term TCO to become, since it's not intended to be just an "optimization". IMO the clearest semantics is to just have the AM actually do a tail call.

It is possible that this is almost unobservable without talking about stack space exhaustion. However, we do need to explain why something like this is UB

let local = x;
let ptr = addr_of!(local);
become f(ptr); // assume this loads from `ptr`

So IMO the clearest and cleanest way to do this is to simply have actual tail calls in the AM, which pop the old stack frame before pushing the new one. (Yes we could just have a whole lot of StorageDead before become, but we also implicitly clean up all remaining locals in return so it would be somewhat asymmetric to not also offer the same possibility for tail calls.)

Or are you saying that tail calls would also be part of the AM?

Yes.

@digama0
Copy link

digama0 commented Jul 8, 2023

I don't think we should apply the term TCO to become, since it's not intended to be just an "optimization". IMO the clearest semantics is to just have the AM actually do a tail call.

While I agree that we want become to "actually do a tail call", I think it is better to have this fall out of a theorem rather than literally doing a tail call in the opsem. (In other words, whether the AM actually has a tail call operation or not should not make a difference and we can prove it.) I think the strategy of ensuring that become is possible to TCO formally in the model will also help ensure that TCO is possible in other cases that don't use become. (I'm thinking in particular about issues with address equality and liveness that make TCO impossible even when only copy types are involved, as highlighted by @CAD97 on the RFC.)

@RalfJung
Copy link
Member

RalfJung commented Jul 8, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants