-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for tail calls in Wasmtime and Cranelift #1065
Comments
Webassembly will be getting explicit tail calls soon, so cranelift will have to support as well. At the moment, tce is not supported. I assume we'll add an instruction. |
Agreed. We'll need direct and indirect forms. And, we'll need a new calling convention, since native calling conventions don't support guaranteed tail calling in general. For the calling convention, I imagine we can pick between implementing GHC's and/or HiPE's, and/or inventing something Cranelift-specific. |
I would be interested in this for a project I'm working on. As far as I understand, the primary issue with calling conventions and tail calls are callee-saved registers. The issue is that there's generally no way of efficiently knowing if they were already spilt to stack or not by the time we're in a function prologue and similarly if we should restore them (and even from where) in the epilogue. That's one of the reasons, why AFAIK all the calling conventions in LLVM that support tail calls don't have any callee-saved registers. That way the tail call can easily deallocate the current frame and just jump into the new function passing all the arguments in the registers or on the stack if needed, but does not keep anything on the stack that would need to be manipulated when finally returning. In LLVM (at least on x64) the HiPE convention means that all integer types are "upgraded" to i64, all stack slots are 8-byte wide and 8-byte aligned. It additionally pins 6 registers - for the process struct (similar to the VM data Cranelift already supports), for the heap pointer and for 4 arguments. Looking through the code and some opened issues and PRs, I can see there's quite some work on special-casing some arguments to be pinned in specific registers, etc. I wonder if that could be somehow generalized for the front-end to provide those settings, if needed. As far as what I would consider ideal support for my project - I'd be happy with instructions to execute the guaranteed tail calls as well as a way to pin certain registers for needed data structures. I believe it's not very scalable for the code generator itself to support all the various schemes - it probably would be better for the front-end to provide this information somehow. A separate issue is stackmaps that is already touched on in a separate PR. |
I am not sure that implementing wasm tail calls will require a specific calling convention, as the set of functions that may be tail-called is not known to the compiler. It is up to the tail-caller to restore registers that it saved when it performs the call, along with popping off any arguments pushed on the stack and possibly shuffling the saved return address. It is always possible to do this, AFAIU. |
@wingo Cranelift will indeed need special calling conventions for tail calls. You're right that if the current tail call proposal advances in its current form, we won't know which wasm functions are tail callable and which are not, but that just means we'll need to mark all wasm functions with the special calling conventions. |
Hah I am an idiot, indeed a new convention is needed -- callees always have to pop spilled arguments, because callers no longer have the information to do this. |
Has any work been done on this lately? |
Could this also be implemented in the x86 codegen? |
Any update on this? Is it easy enough that an outsider could do it? |
Hi @L-as -- no, there unfortunately hasn't been progress on this. I'd love for that to be different; if you or another motivated person wants to tackle it, it's possible, but it's likely a month or more of delicate surgery updating our ABI and runtime code to use a new calling convention that supports tail calls. It's the sort of problem that feels simple-ish at first -- indeed if you only have args that go in registers, and no callee-saves or spillslots, it can be as simple as some moves to set up the tailcall args and then a jump... but imposing those requirements on the CLIF producer to obtain a guaranteed tailcall would lead to some surprisingly brittle behavior where e.g. changing some unrelated code requires a spill and suddenly breaks the tailcall. Basically, in the general case, one needs to clean up one's own stackframe, restore callee-saves, and remove one's stack args from the stack, and then set up args on the stack for the callee, but possibly some of those args need to come from your spillslots or stack args (and if there are enough of them you can't hold all of them in the register file at once) so you need to do the arbitrary-state-permute possibly with some temps somewhere, or maybe in the worst case you set up the new frame before you pop your own then The Bugzilla bug for SpiderMonkey's Wasm tail call support here contains a link to 25-page Google Doc that outlines in some good detail new ABI that the SpiderMonkey folks have designed to support this... it's a lot of work, but understanding that (or at least the reasons for its decisions) would be a good starting point. Anyway, if you or anyone else has the resources to tackle this and want to dig in more, I'd be happy to help work out details! |
@cfallin This may be really naïve of me, but what happens if the caller is made responsible for the tail-call elimination. In the simplest case of a recursive tail-call this is just a reentrant function call with new parameters. How much of a pain and/or performance hit would this be. I am just interested in this issue in general, even if it isn't "the best option possible". |
@andrew-johnson-4 can you give a bit more detail what you mean by "caller is made responsible"? E.g., do you mean the callee should return some value that means "now call the next function"? (So The general problem to wrestle with is easiest to see (IMHO) in a minimized example: Then when It's also possible to make it work in a "caller-pop" scheme, but requires dynamic information to be passed around about the size of argument and return-value areas. In the general case, dynamic approaches are less efficient (at the very least they require an extra arg and retval). I suspect that what you're thinking of with "caller is made responsible" is either something like caller-pop, or else is a sort of "trampoline" scheme where the callee returns either "next func to call" or "final return" states and the callsite runs a worklist/driver loop. I'm curious to hear more what you have in mind though! |
Yes, the To synthesize tail-call optimization right now (I want to use cranelift for a pet project) I am looking at the option of writing a function with an I am currently transpiling my interpreted toy language into JIT segments. I have the benefit of whole program optimization, which makes it a bit easier. This I see this as primarily a space optimization problem, not so much a speed problem. For simple functional programs with loops this is a huge deal. "There's no way to satisfy all the constraints for a true tail-call in this case" yes, no unified calling convention. "there's no need to somehow know that the actual returnee had 32 bytes rather than 16 of stack args." that is certainly advantageous to have a unified interface. What would that entail development-wise to support? From a previously linked doc "Agreeing on common conventions allows SpiderMonkey to freely choose the compilation strategy for different WebAssembly functions, and to change that choice over time." This is a design constraint that I specifically don't care about right now, but probably wasmtime does. |
Indeed, that's the "trampoline" approach. It's definitely used in the wild; e.g. when targeting Wasm there's this comment about a Haskell compiler, and IIRC, Clojure provides this (via an explicit API exposed to the user?) as a replacement for tail calls on the JVM. It seems like by far the easiest option at the moment, and could be extended to the corecursive case (tail-call to another function) by returning a "function pointer" (index into function table) as part of the
Probably about a month of effort, depending on who's doing the work and their familiarity with the codebase :-) Adding a new ABI is not too technically challenging on its own, but the cross-cutting interactions with everything else -- backtraces, trampolines elsewhere in the system, unwind info, etc. -- make for a likely somewhat gnarly project. |
I'll try to familiarize myself more with cranelift before I volunteer. There is lots of low-hanging fruit on my end before tce is important. I would be afraid of completely beggaring your devs if I tried now :O |
It's actually somewhat likely that @fitzgen will work on this sometime early next year, or at least, that was my most recent understanding :-) Having others interested in the issue as well is never a bad thing, of course! |
By the way tail calls to |
@cfallin Is it safe to assume that jumping to the entry block is one special case of a true tail-call? Does this violate SSA or other invariants potentially? |
I don't know that we've documented this, but no, Cranelift doesn't allow branching back to the entry block. I know one reason is that there's no block to hoist loop-invariant code into if the loop's entry doesn't have a dominator. I feel like this has come up as a problem in other contexts too but I don't remember them. That said, you can always emit an entry block that immediately jumps to another block with the same signature, forwarding the arguments, and branch back to that second block instead of tail-calling. Either way this only works for a function that tail-calls itself. Cranelift has no way to express a jump to a block in a different function, since block IDs are function-scoped. But in that special case, yes, you could implement tail-calls today. |
FWIW, I am working on an RFC for this at the moment. |
The RFC in question is bytecodealliance/rfcs#29. |
The big blocker for tail calls was overhauling Wasmtime's trampolines, which I've just posted a PR for: #6262 Will start on Cranelift support for tail calls when that lands. |
So @jameysharp and I did a little profiling/investigation of switching the internal Wasm calling convention over to Supporting caller-save registers with the So here is our updated plan:
Additionally, we would not enable tail calls by default in Wasmtime until the performance issues are addressed. |
FYI, Wasmtime is gaining a As a reminder, tail calls won't be enabled by default until we
|
Tail calls are now enabled by default in Wasmtime except for s390x, so I think this is done. |
To write a functional language compiler using this IR, tail call eliminations would be desirable. Are there any plans to support this? I couldn't find any details in the docs.
The text was updated successfully, but these errors were encountered: