-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knowing the maximum depth of locals #23
Comments
One question is whether the dynamic sizing of the local array introduced by |
Also discussed here: #7, Though the conversation veered off-topic a bit. |
A related issue is that Something I've wondered about but haven't found in a prior discussion is whether one could address the initialization problem by having label types list the local variables that are uninitialized (or, technically, not necessarily initialized). That would separate Where |
Ah, I missed that. Maybe we should pick one of the two threads to continue this discussion in?
Not an unreasonable use case but it might be hard to get the hand shaking between producer and consumer correct. I also don't know how valuable that is in practice, however, since producers could have the exception handler call some other helper function to "grow" the stack size. If it's really that rare to have an exception, the call overhead shouldn't really matter. |
(I'm running with this thread just cuz the other's last comment was half a year ago, though it's definitely worth reading.)
Good point. But it does make me wonder why a streaming compiler can't simply adjust the stack pointer at the point it sees the |
Sorry for dropping the ball on this... I think there are a couple of avenues here to avoid reserving the slots. One is "adjusting the stack pointer" though this looks pretty hairy in practice. The baseline compiler's stack frame is fairly complex (see eg https://hg.mozilla.org/mozilla-central/file/tip/js/src/wasm/WasmBaselineCompile.cpp#l1078) and is not getting simpler over time. The dynamic area of the frame (between the SP and the fixed locals) has both eval stack data and outgoing argument areas in the process of being built. A strategically placed LET can require quite a bit of data to be moved around. That said, we currently do that for multiple stack-allocated return values, and clearly it can be done somehow. It would allow the distance between the SP and the lowest-numbered local to be known at compile time, which is desirable. Another is to backpatch the size of the local area at the end of the function, when we know its maximum size. This can work OK if the fixed parts of the frame can be accessed from the FP and the dynamic parts from the SP but becomes hairier if we can't split those parts of the frame "just so". We're in the process of implementing tail calls and some call optimizations, and some of the designs we're considering may make the FP unavailable for addressing locals because there may be a variable amount of space between the incoming args and the local part of the frame. (Trampolines may allocate space there.) There are a lot of moving parts and it would be good not to lock ourselves into a solution that requires very specific implementation techniques, be it a two-pass compiler or multiple pointers into the frame, especially if that lock-in is just a result of improving the life of the producer -- istr we have a design principle about shuffling complexity into the producer when possible (even though we violated that with the initial nullref type, rip). Anyway, being able to pre-allocate the locals area by being told the layout of even the non-default-initializable slots in the function prologue is fairly appealing IMO. |
A few thoughts:
|
What is the expected performance of a Because the missing |
This inlining of functions is one of the features I particularly like about Also, thanks @lars-t-hansen for the super cool link! Though it'll take me a while to digest all of that 😄 |
@Horcrux7, there shouldn't be a performance difference in accessing locals. In a baseline compiler, the let-instruction itself might have a cost, but in a regular jit this should disappear (for example, in V8 all locals and stack operands immediately become SSA variables anyway). @RossTate, compositionality is one reason for the block semantics, disguised as the fact that the initial Wasm design actually was an expression language, and was generalised to a stack machine only later. But also, blocks would require more type annotations otherwise (the full stack), so that branches could still be validated correctly. A frame-rule-like semantics avoids that. |
Okay, but can you confirm my observation regarding @kmiller68's concern that that aspect of wasm's design already prevents allocating the stack frame based solely on the initially declared local variables? That is, |
@RossTate, I suppose. Another way to put it is that this just gives you a way to name slots on the operand stack. Although what that means in practice very much depends on how a given implementation actually uses the stack, organises stack frames, computes their layout, handles variables and spill slots, and so on. See @lars-t-hansen's comment, for example. I can attest that this can get mind-boggingly complicated in real compilers. |
FWIW I'm implementing |
So if |
@RossTate, let is more expressive and has different trade-offs. I had a brief slide in my Feb slides:
|
|
in #35 I point out that let without a provision for specifying the maximum let size is basically a no-go for an interpreter. Also, de Bruijn indexes (i.e. inner-most let variable is index 0) would require a second stack pointer in an interpreter. A second stack pointer can be avoided with a maxium let size and continuing to number let-bound locals increasing from the current local count. |
Let was removed, so this is obsolete. |
Since the number of locals can now change at runtime, it might be valuable for low latency consumers (e.g. baseline compilers) to know the total number of stack slots they need to reserve for locals. Otherwise, for at least JSC, we will need to do a two phase generation because we don't know the final stack layout until we are done parsing. For what it's worth, I haven't thought about this too hard though, so it could be there's actually a simple solution on the consumer side, which removes the need to know everything up front.
It was also brought up that some consumers may want to know the types of each slot. It's not clear to me how that would work though, since the indices could change type per block. It would be interesting to know if some consumer cares about this, so I thought I would bring it up here too.
The text was updated successfully, but these errors were encountered: