Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowing the maximum depth of locals #23

Closed
kmiller68 opened this issue May 26, 2020 · 18 comments
Closed

Knowing the maximum depth of locals #23

kmiller68 opened this issue May 26, 2020 · 18 comments

Comments

@kmiller68
Copy link

Since the number of locals can now change at runtime, it might be valuable for low latency consumers (e.g. baseline compilers) to know the total number of stack slots they need to reserve for locals. Otherwise, for at least JSC, we will need to do a two phase generation because we don't know the final stack layout until we are done parsing. For what it's worth, I haven't thought about this too hard though, so it could be there's actually a simple solution on the consumer side, which removes the need to know everything up front.

It was also brought up that some consumers may want to know the types of each slot. It's not clear to me how that would work though, since the indices could change type per block. It would be interesting to know if some consumer cares about this, so I thought I would bring it up here too.

@lukewagner
Copy link
Member

One question is whether the dynamic sizing of the local array introduced by let is worse than that of the operand stack in a way that requires the compiler to do something different than what it already has to do for operands.

@binji
Copy link
Member

binji commented May 26, 2020

Also discussed here: #7, Though the conversation veered off-topic a bit.

@RossTate
Copy link

A related issue is that let does not serve the pattern where a local variable is initialized in the two branches of an if. That's not necessarily an argument against let (it has other merits); it just suggests that let alone is not a solution to local-variable initialization.

Something I've wondered about but haven't found in a prior discussion is whether one could address the initialization problem by having label types list the local variables that are uninitialized (or, technically, not necessarily initialized). That would separate let from initialization and allow code to generate WebAssembly in pretty much the same manner. (If this makes you worry about code size, this aspect of label types can be trivially inferred—even for merge points—except for loops.)

Where let seems particularly useful is when you really do want to increase the stack-frame size. For example, an exception handler might have some local variables (e.g. for storing the exception) that only make sense to allocate space on the stack frame for in the exceptional case.

@kmiller68
Copy link
Author

Also discussed here: #7, Though the conversation veered off-topic a bit.

Ah, I missed that. Maybe we should pick one of the two threads to continue this discussion in?

Where let seems particularly useful is when you really do want to increase the stack-frame size. For example, an exception handler might have some local variables (e.g. for storing the exception) that only make sense to allocate space on the stack frame for in the exceptional case.

Not an unreasonable use case but it might be hard to get the hand shaking between producer and consumer correct. I also don't know how valuable that is in practice, however, since producers could have the exception handler call some other helper function to "grow" the stack size. If it's really that rare to have an exception, the call overhead shouldn't really matter.

@RossTate
Copy link

RossTate commented May 27, 2020

(I'm running with this thread just cuz the other's last comment was half a year ago, though it's definitely worth reading.)

Not an unreasonable use case but it might be hard to get the hand shaking between producer and consumer correct. I also don't know how valuable that is in practice, however, since producers could have the exception handler call some other helper function to "grow" the stack size. If it's really that rare to have an exception, the call overhead shouldn't really matter.

Good point. But it does make me wonder why a streaming compiler can't simply adjust the stack pointer at the point it sees the let and correspondingly adjust all access to the stack frame within the body of the let. Similarly, unwinding code within that scope would use the adjusted stack-frame size. This is what I imagine I'd do in a research compiler, but a production compiler is completely different beast that I still have a ton to learn about, so any insights are appreciated. (Tagging @lars-t-hansen for his insights as well here since my question applies just as well to #7.)

@lars-t-hansen
Copy link
Contributor

Sorry for dropping the ball on this...

I think there are a couple of avenues here to avoid reserving the slots.

One is "adjusting the stack pointer" though this looks pretty hairy in practice. The baseline compiler's stack frame is fairly complex (see eg https://hg.mozilla.org/mozilla-central/file/tip/js/src/wasm/WasmBaselineCompile.cpp#l1078) and is not getting simpler over time. The dynamic area of the frame (between the SP and the fixed locals) has both eval stack data and outgoing argument areas in the process of being built. A strategically placed LET can require quite a bit of data to be moved around. That said, we currently do that for multiple stack-allocated return values, and clearly it can be done somehow. It would allow the distance between the SP and the lowest-numbered local to be known at compile time, which is desirable.

Another is to backpatch the size of the local area at the end of the function, when we know its maximum size. This can work OK if the fixed parts of the frame can be accessed from the FP and the dynamic parts from the SP but becomes hairier if we can't split those parts of the frame "just so". We're in the process of implementing tail calls and some call optimizations, and some of the designs we're considering may make the FP unavailable for addressing locals because there may be a variable amount of space between the incoming args and the local part of the frame. (Trampolines may allocate space there.)

There are a lot of moving parts and it would be good not to lock ourselves into a solution that requires very specific implementation techniques, be it a two-pass compiler or multiple pointers into the frame, especially if that lock-in is just a result of improving the life of the producer -- istr we have a design principle about shuffling complexity into the producer when possible (even though we violated that with the initial nullref type, rip). Anyway, being able to pre-allocate the locals area by being told the layout of even the non-default-initializable slots in the function prologue is fairly appealing IMO.

@rossberg
Copy link
Member

rossberg commented Jun 3, 2020

A few thoughts:

  • As @lars-t-hansen points out, there are many different implementation techniques, so I would be cautious to avoid halfway hacks that only serve particular ones. In particular, just knowing the maximum number of locals, without knowing their types upfront, may be equally insufficient in some contexts.

  • I would assume that the same technique that works for the operand stack (whose max-depth an engine does not know upfront) should work for the "locals stack".

  • One downside of predeclaring locals is that you lose one of the advantages of let, namely, that it composes nicely. E.g., you can create temporaries or inline a function with only a local transformation. That's also the main reason why we originally chose relative indexing for labels and let-bound locals. Predeclarations would be a step back in that sense. (I had complaints from multiple folks writing code-gens for Wasm that introducing a temp cannot be done locally in an emitter. Let would fix this, but with predeclaration we'd be back to square one.)

@Horcrux7
Copy link

Horcrux7 commented Jun 3, 2020

What is the expected performance of a let versus a temporary variable? Also if the let feature is very nice for the compiler, the most important question is which one is better performing.

Because the missing dup my compiler produce already a large count of temporary variables. The let instruction can replace this.

@RossTate
Copy link

RossTate commented Jun 4, 2020

One downside of predeclaring locals is that you lose one of the advantages of let, namely, that it composes nicely. E.g., you can create temporaries or inline a function with only a local transformation.

This inlining of functions is one of the features I particularly like about let. I figured this inlining was an eventual goal of wasm's design, because it otherwise seemed odd that control constructs like block could leave values on the stack that cannot be touched by the body, which I believe effectively prevents predetermining the stack-frame size (but maybe I misunderstood something?).

Also, thanks @lars-t-hansen for the super cool link! Though it'll take me a while to digest all of that 😄

@rossberg
Copy link
Member

rossberg commented Jun 4, 2020

@Horcrux7, there shouldn't be a performance difference in accessing locals. In a baseline compiler, the let-instruction itself might have a cost, but in a regular jit this should disappear (for example, in V8 all locals and stack operands immediately become SSA variables anyway).

@RossTate, compositionality is one reason for the block semantics, disguised as the fact that the initial Wasm design actually was an expression language, and was generalised to a stack machine only later. But also, blocks would require more type annotations otherwise (the full stack), so that branches could still be validated correctly. A frame-rule-like semantics avoids that.

@RossTate
Copy link

RossTate commented Jun 4, 2020

Okay, but can you confirm my observation regarding @kmiller68's concern that that aspect of wasm's design already prevents allocating the stack frame based solely on the initially declared local variables? That is, let isn't providing any new stack-frame functionality; it is just making the stack frame more conveniently accessible.

@rossberg
Copy link
Member

rossberg commented Jun 5, 2020

@RossTate, I suppose. Another way to put it is that this just gives you a way to name slots on the operand stack. Although what that means in practice very much depends on how a given implementation actually uses the stack, organises stack frames, computes their layout, handles variables and spill slots, and so on. See @lars-t-hansen's comment, for example. I can attest that this can get mind-boggingly complicated in real compilers.

@lars-t-hansen
Copy link
Contributor

FWIW I'm implementing let in our baseline compiler to learn more about the pain points. It doesn't look too awful so far. More next week.

@RossTate
Copy link

RossTate commented Jun 8, 2020

Another way to put it is that this just gives you a way to name slots on the operand stack.

So if let does not actually provide any new stack-frame-resizing functionality, as it sounds, then why not just give a way to get/set operands on the stack without let? (I've seen hints that this question has been discussed before, but I haven't been able to find where.)

@rossberg
Copy link
Member

rossberg commented Jun 9, 2020

@RossTate, let is more expressive and has different trade-offs. I had a brief slide in my Feb slides:

Let vs Pick

let is as expressive as pick+swap 
(allows reordering, removing non-top operands)

But mostly complementary

Pick: good for one-off use cases

Let: good for multiple uses, reordering, etc.

More structured and compact than stack twiddling

@RossTate
Copy link

RossTate commented Jun 9, 2020

pick sounds like the analog of local.get. swap does not sound like the analog of set. For fun, let's give the analog of local.set the name pack. pack n replaces the value at slot n in the stack with the top of the stack. It also updates the type of the slot accordingly, disregarding its old type.

pick and pack are actually strictly more expressive than let. In particular, they permit flow-sensitive reasoning, something that will become more and more important. For example, the first thing implementations of virtual methods will do will downcast this to its expected class type. The implementation would only want to bind this to a local variable after the downcast, and likely an immutable one at that, which means it doesn't need a type annotation for that either. In this proposal, you're adding non-null references, which means it will be common to want to update the type of a slot (or local) to the non-null type after the check is performed.

let, however, still has value because it is less expressive. In particular, label types within a let do not need to mention the types of the let-bound slots because those types are fixed, and similarly branches to those labels do not need to check compatibility of those slots. Since the primary utility of let is reduction of type annotation, it's all the more unusual to have let itself require unnecessary type annotations.

@titzer
Copy link
Contributor

titzer commented Aug 3, 2020

in #35 I point out that let without a provision for specifying the maximum let size is basically a no-go for an interpreter. Also, de Bruijn indexes (i.e. inner-most let variable is index 0) would require a second stack pointer in an interpreter. A second stack pointer can be avoided with a maxium let size and continuing to number let-bound locals increasing from the current local count.

@rossberg
Copy link
Member

Let was removed, so this is obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants