Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for new Binaryen IR type system #903

Closed
dschuff opened this issue Feb 11, 2017 · 19 comments
Closed

Proposal for new Binaryen IR type system #903

dschuff opened this issue Feb 11, 2017 · 19 comments

Comments

@dschuff
Copy link
Member

dschuff commented Feb 11, 2017

Background: previous general discussion about Binaryen and wasm stack machine is at #663. It turns out Since Binary 0xc, Binaryen's type system has been subtly wrong with respect to the spec, and both V8 and SpiderMonkey were not spec compliant in that way and so nobody noticed. More concretely, @kripken tried to fix that in #899 and that led to some more general discussion about what to do. Probably that discussion should be in an issue and not in a PR, so here's an issue.

@jgravelle-google and I have been discussing it, and we have an idea which essentially carries the discussion in #899 to an end which we think we like, but maybe you can poke holes in it.

Also to recap, the goals are:

  1. Valid wasm is accepted by and convertable to Binaryen IR (but not necessarily losslessly)
  2. Valid Binaryen is convertible to valid wasm
  3. Transforms on Binaryen IR should be context-free rather that context-sensitive (and as a special case, a valid node should be replaceable by another node of the same type and the IR should still be type-valid).

In short, we think we can allow blocks (and everything else in Binaryen IR) to have unreachable type, and basically freeze those types into block signatures (i.e. reify them) when converting to wasm proper. Here are the Binaryen IR properties:

  1. Validation/type check (of nodes other than br) is context-free; in other words a node is valid based on its type and it contents only (the consequence of this is that transforms are also context-free).
  2. Any IR node can have type unreachable.
  3. A node (any kind of node with any type) is valid if:
    a. all operand(s) (including those that follow unreachable nodes in blocks) are recursively valid, and
    b. all operands type-check according to opcode-specific rules (unreachable always type-checks)
    c. A node with unreachable type must have at least one unreachable operand
  4. unreachable nodes may be reified based on their context (see paragraph on conversion to wasm below), and the result remains valid.
  5. br nodes are valid if their operand type matches the type of the target (brs targeting unreachable-typed nodes are invalid)
  6. (special rules for other node kinds?)

On conversion to wasm, types of all block/loop/if must be inferred/reified to concrete types. Inference always happens from context (i.e. outside in) and not from e.g. inside-out. In other words (and in contrast to current Binaryen), block types are not inferred based on their last element, but based on their context; e.g. if they are the operand of a return or are implicitly fall-through-returned, then they are converted to the return type of the function. This happens for every block we emit; we check the context and if it requires a particular type, we emit that type; otherwise it can be void. When this happens, we do not need to reify anything inside (rule 3).

There are a few non-intuitive consequences of these rules.
For example
(i32.add (f32.const 0) (br 0))) is valid (and can have type unreachable) because of rule 3b.

(block f32
 (i32.add 
   (i32.const 1)
   (br 0 (f32.const 0))
 )
)

is valid and the i32.add must have type unreachable. It does not get reified.

(block ;; [unreachable]
 (i32.add 
   (i32.const 1)
   (br 1 (f32.const 0)) ;; targets outer block
 )
)

If the block is reified to i32 then the add can stay unreachable, or have type i32. If the block is reified to something else then it's still valid (it's the first example again). In this nesting, it never needs to take a type (and it's beneficial).

I think this also means that it's a valid transform to take a node with an unreachable child and make it unreachable.
I think this also has the optimization properties that we want, and that conversion from wasm is workable but we haven't thought about all the permutations yet, so... yeah, comments and we can see if it all holds up.

@jgravelle-google
Copy link
Contributor

Thinking more about taking a node with unreachable child and making it unreachable:

I'm going to use U as shorthand for unreachable for the rest of this comment.
Also ([letter]...) is an arbitrary sequence of AST nodes that'll get backreferenced later.

(block f32
  (A...)
  (block U
    (B...)
  )
  (C...)
)

Is it always valid to transform this to:

(block U
  (A...)
  (block U
    (B...)
  )
)

My gut says yes. This is a form of DCE, and the bubbling out of U opens up more opportunities later on.
Further, we got the outer block with type f32 because the context it's in requires it. Therefore we can re-write the type later when we serialize to wasm. So:
(func (result f32) (block f32 (A...) (block U (B...)) (C...))) ->
(func (result f32) (block U (A...) (block U (B...)))) ->
(func (result f32) (block f32 (A...) (block f32 (B...))))

Where can't we bubble out U? if is one such case. For example given

(if i32
  (condition)
  (block U ...)
  (block i32 ...)
)

, we can't rewrite the if with U because it is not always unreachable, so any bubbling out has to stop there.

How does this interact with br_if and br_table? br_if and br are modeled in Binaryen IR as Break nodes, which optionally take a condition and optionally a value. An unconditional break is always unreachable. A conditional break has the value that it passes. That value might be unreachable, and in that case the break should take unreachable as its value and continue propagating it.

Presumably we want a single pass to propogate unreachability as far as it can go that we run early, followed by unreachable-DCE?

@sunfishcode
Copy link
Member

Why is the first example, (i32.add (f32.const 0) (br 0))), valid, under rule 3b or otherwise? The f32.const is reachable (assuming the sequence is reachable), so it's not clear why it wouldn't be checked under the opcode-specific rules for i32.add.

(Historical note: around the 0xc timeframe, the spec actually explicitly permitted implementations to ignore type errors in unreachable code, so implementations weren't wrong or non-compliant.)

@jgravelle-google
Copy link
Contributor

I believe that's a misunderstanding we had while drafting the post that wasn't fixed after the fact - we noticed for the next two examples and you're right it's not valid.
Cutely, (i32.add (br 0) (f32.const 0)) I believe is valid wasm, because the linear form is f32.const 0; br 0; i32.add. But it's saner to not let that typecheck in binaryen IR?

@binji
Copy link
Member

binji commented Feb 11, 2017

@jgravelle-google the order doesn't flip, so

(module
  (func
    (drop (i32.add (f32.const 0) (br 0)))))

desugars to

(module
  (func
    f32.const 0
    br 0
    i32.add
    drop))

which validates because the type stack at i32.add has [any, any]. The other order desugars to

(module
  (func
    br 0
    f32.const 0
    i32.add
    drop))

which does not validate because the stack has [any, f32].

@kripken
Copy link
Member

kripken commented Feb 11, 2017

reify

Does this mean emiting a proper wasm type for a block/if/loop when converting to wasm? Or something more?

(brs targeting unreachable-typed nodes are invalid)

Does this mean that a block has type unreachable if and only if

  1. No brs target it.
  2. Its last element has type unreachable?

Presumably we want a single pass to propogate unreachability as far as it can go that we run early

I'm not sure I understand what this propagation mechanism is. Would we need to propagate unreachability after each transformation? (I hope not?)

On the general approach described in this issue: I think it sounds promising. I've also been moving in the direction of thinking we should keep the unreachable type. I had some different ideas about how to emit proper wasm from such an IR. The ideas are not really complete yet, but something like this: If an unreachable block/if/loop is one that cannot be flowed out of, then it is always valid to "fix" things with two unreachable nodes, that is, given this binaryen IR:

(block U
  ..code..
)

then it is always valid to emit this wasm:

block
  ..wasm code..
  unreachable
end
unreachable

The first extra unreachable makes the block valid, since (a) by assumption it cannot be exited, so it has no brs, and so no brs with a value, and (b) by the same assumption, we cannot actually reach that new unreachable node, so execution will not change; and so it is ok to emit the block in wasm as having no type. (This first unreachable may not be necessary, actually, depending on other type system details.)

The second unreachable, as the first, will not change execution, so it is ok to add in that sense. It fixes our main problem, which is that it makes it ok that the block before it has no type, and the unreachable makes the stack polymorphic so it will validate no matter what tries to pop it.

If that makes sense so far, then this can actually be taken a step further: again, if an unreachable block is one that cannot be exited, then we can just emit

..wasm code..
unreachable

i.e., don't emit a block at all! Just emit the block contents, then an unreachable. This is valid stacky wasm, I believe.

(Also in fact the unreachable is probably not needed, since the ..wasm code.. will end with something unreachable anyhow, but that might depend on other type system details.)

@kripken
Copy link
Member

kripken commented Feb 11, 2017

A shorter way to summarize the last part of my last comment is that, if we allow blocks with unreachable type, and if that means the block is not flowed out of, then we can either (1) emit a proper type based on the outside context, which is what I understood @dschuff's proposal to do - is that correct? - or, (2) emit something like

block
..
end
unreachable

which in effect puts a block on the stack, and makes it so it's ok to pop from the stack. So this can be seen as a wasm way to emit an unreachable block. Just instead of having the unreachable type as the block type, it's an extra byte afterwards. (And, while efficiency isn't the main issue here, we don't even need to waste a byte, as I think we can just not emit the block begin and end, it would still be valid stacky wasm, so we actually save a byte.)

@ghost
Copy link

ghost commented Feb 11, 2017

Let me add one other option to ponder. If there is an unreachable operator used deep in an expression then the expression can simply be split out into a series of live sub-expressions and the rest discarded. This can be done trivially when the IR is in a flat (no-expression) form in which all operators with a result set an SSA definition. I suspect this is why I did not find it necessary to propagate an unreachable type through expressions in general with one IR I am exploring, rather just had to deal with some block related issues.

Writing an explicit unreachable operator after a block looks frustrating to handle, and perhaps it would be easier to work with it this is a flag on the block operator.

Here's another way to look at this: let blocks have no fall-through, so if there is one then the block must end in a br operator, then the presence of that trailing br also flags that the block has a reachable exit. One of the IR's I am exploring ends blocks with a values operator if reachable, similar to this br. Just another possibility.

I still have not got up to speed on the proposed validation rules, but cases such as the following illustrate some of my concerns, and could binaryen really read this into an AST, are values effectively materialized. Is this (i32.add (unreachable)) or (i32.add (unreachable) (unreachable))? There might be solutions such as defining operators such as i32.add to default arguments not supplied, but then it would default to (i32.add (i32.const 0) (i32.const 0)). I see more challenges getting something consistent with multiple values going down this path.

block i32
  unreachable
  i32.add
end

@dschuff
Copy link
Member Author

dschuff commented Feb 14, 2017

Responding in order of original comments:

First, (and to @kripken's question about reifying and @jgravelle-google's suggested transform) I guess we should be a little more precise with what that means, because I think it's actually 2 things. One is an IR->IR transformation, changing the type of a node with unreachable type to another type (I'll call that a "reifying transform") and the other is what type you write when you convert to wasm (I'll just call that "lowering"). Reifying during lowering is obviously a requirement to generate valid wasm block signatures. But with respect to the IR itself, the original post left an open question: "must nodes which are in fact unreachable have unreachable type?" Which I think is equivalent to "are reifying transforms always/ever valid?", and "can/must unreachable operands 'bubble up' to to their parent?"

@sunfishcode (i32.add (f32.const 0) (br 0))) is valid in the spec interpreter (assuming the br targets a void block) even though the f32.const is reachable, because the stack is polymorphic after the br. However (i32.add (br 0)(f32.const 0) ) is invalid because the f32 goes on the stack after the branch. (edit: just as @binji and @jgravelle-google said in the followup comments). Rule 3b was originally written such that any unreachable operand made its parent valid but this is an example of why that shouldn't be. So we could define the validation rule for binary operators to reflect wasm's order dependency in case of unreachable code, or we could keep it symmetric (i.e. i32.add is invalid in binaryen if either of its operators is the wrong type).

re: @kripken's second question: yes, my proposal is your (1): when lowering, emit a proper wasm type based on the outside context. We still want to have validation rules which are defined entirely in terms of Binaryen AST; hence the 2 options I just mentioned for binary operators. Both possibilities allow straightforward lowering to wasm, I guess it's just a question of whether an asymmetric validation rule makes optimizations more difficult. I suppose you could just model it the way you model other side effects when deciding if a transformation like commuting operands is valid, so maybe it's not so bad.

Adding unreachable and dropping dead code sounds like a good optimization but I'm not sure we want to require that for emitting wasm, and I think it would be nice if we didn't have to add extra instructions.

I think if as in @jgravelle-google's example might be a special case, need to think about that.

@ghost
Copy link

ghost commented Feb 14, 2017

On the if operator, if either the then or else branch exit is reachable (or even if there is a branch to the exit from within one of these paths that is reachable and both fall-through paths are unreachable) then it still has a reachable type.

@kripken
Copy link
Member

kripken commented Feb 14, 2017

Seems like overall we are pretty much in agreement here.

So we could define the validation rule for binary operators to reflect wasm's order dependency in case of unreachable code, or we could keep it symmetric

I think we should keep it symmetric as it currently is, because wasm's asymmetry is only due to the stack machine model (I believe?) and in our AST an asymmetry would be surprising. (Of course DCE makes this kind of a moot point anyhow.)

Adding unreachable and dropping dead code sounds like a good optimization

Is that about my proposal to emit an unreachable block as block [..] end unreachable? If so, it's a side issue that it's an optimization. The main point is that that is the "natural" way to emit an unreachable block in wasm, in the sense that it's a simple pattern that always works regardless of context.

Now, maybe that doesn't matter if we can find the right i32/f32/etc. type for the block during lowering anyhow - if we can, then that's a nice clean solution. But I am not 100% sure we can.

Another issue to consider is reading the output back into binaryen. If we emitted block [..] end unreachable as I proposed, then we would recreate the block as none naively, and if we add a type during lowering, then we'd recreate the block as having that type. It would be nice to recreate it as unreachable somehow?

@dschuff
Copy link
Member Author

dschuff commented Feb 14, 2017

@Wllang my understanding is that your last example is valid because the polymorphic stack supplies (any number of) whatever types are needed. I'm not sure I'd want AST with default arguments though; I haven't really thought through the consequences of that but it seems like it would be surprising. WRT if, that sounds about right. It feeds into @jgravelle-google's earlier example: if you modeled the arms of the if as implicit blocks then an optimization which eliminated unreachable blocks with no side effects could also be applied to if (e.g. in that example you could invert the condition and remove the unreachable block).

@kripken WRT symmetry: obviously symmetry would make some optimizations simpler, but it could leave performance on the table by defeating other optimizations; e.g. if we have (i32.add (block i32 (A)) (block i32 (B))) and some optimization renders the tail of block B unreachable (perhaps by turning br_if into br). If A has no side effects we can change A to anything or remove the add entirely. However suppose A does have side effects; say it's ((drop (call $foo [returns f32])) (get_local $bar [returns i32])). If the add is valid no matter what, we can remove the get_local and we don't care that the type is wrong. But if we have symmetry then the type of the block has to stay i32 and we have to leave the get_local (or replace it with unreachable). Maybe that's a contrived example but it illustrates some of the tradeoffs.

WRT finding the right type during lowering for unreachable blocks: Every block is in some context, and has some parent. I think the type should be unambiguous in most cases (e.g. binops, conversions, etc). Possible exceptions might be cases like if or select where the types just have to match (if the type of the other operand is also unreachable, they could both be void), or if the parent is a block (then it would depend on the validation rules for blocks).
Speaking of validation for blocks, IIRC we currently require dropping results of all operands in the list except for the last, correct? In that case it would always be clear what the required type is. If we allowed blocks to be a little more stacky (e.g. any one operand could be left instead of just the last) then we would have to scan the block for non-unreachable operands before assigning a type for unreachable operands, but it seems like it would still be tractable.

Wrt round-tripping (or just plain creating binaryen IR from wasm): not sure, it kind of ties in to whether we require blocks whose end is unreachable to have unreachable type. If so, then during wasm parsing it seems like we'd need to propagate unreachables outward through the scopes when we find an unreachable node (and likewise during optimizations that create unreachable nodes). In that case I think we'd end up with the same IR whether the extra unreachable node was present or not?

@kripken
Copy link
Member

kripken commented Feb 14, 2017

@dschuff:

Wrt symmetry: yeah, I see what you mean. I believe that is currently handled by DCE, though - it should remove all unneeded code including the toplevel add, where possible. So this is a little complexity but it seems worth it.

Wrt finding the right type during lowering for unreachable blocks: Yeah, its those cases like if etc. that worry me. It's not enough to just look up the stack, in general, you may need to look up, reach an if, then go down into the ifs other arm, and so forth.

Speaking of validation for blocks, IIRC we currently require dropping results of all operands in the list except for the last, correct? In that case it would always be clear what the required type is.

All operands but the last must be dropped, yes. (But if the last is unreachable, then we need to look at the brs to it to find the block type.)

Wrt round-tripping: yeah, maybe that isn't much different between the two proposals. Workable in either one, I suspect.

@dschuff
Copy link
Member Author

dschuff commented Feb 14, 2017

Wrt finding the right type during lowering for unreachable blocks: Yeah, its those cases like if etc. that worry me. It's not enough to just look up the stack, in general, you may need to look up, reach an if, then go down into the ifs other arm, and so forth.

I think for if you don't. If both arms of an if or select are unreachable, then the if is void. If one is unreachable, then its type is the type of the reachable arm. But in that case you wouldn't be inferring the if's type, the type of the if would be i32 already. To find the type of an unreachable block which is the arm of an if you just look at the type of the if containing it, just like any other construct.

That leaves the more interesting case of parametric operators, select and drop. If you have (drop (block U (... (br 1) )) then there is no single requirement for the type of the block except it can't be void. I guess in that case for lowering to wasm you'd just pick i32 arbitrarily, or you could omit the drop and use void. You'd of course also want an optimization to remove the drop but that should be orthogonal.

select is also a little more interesting because it's parametric too and doesn't have a type signature in wasm, so you wouldn't need to infer it to lower it; but it would have a type in Binaryen. You could have (select (br) (br)(condition)) or (select (br) (block i32 (i32.const 0)) (condition)) (select (i32.const) (br)(condition)), which would all be valid wasm. If you had (select (br 1) (block U (br 2)) (condition)) or (select (i32.const 0) (block U (br 2)) (condition)) then how would you pick the type to lower the block? You'd want to pick the type based on the type of the select in the IR. I would think it would have type unreachable in the first case and i32 in the second case. In the first case you could go up another level and infer the select from its context.

Thinking more about blocks: could say a block has type unreachable if any of its elements is unreachable, and if no br targets it (regardless of its last element). An example would be (block (br 1) (i32.const 0))
In that case we might (or might not) need to add an unreachable element at the end. (I think this is independent of reifying though); i.e. for lowering you'd pick a type based on the context, and then if the type was not void and the last element was not void or unreachable, you'd add an unreachable or drop at the end to make it valid wasm. I don't think you'd need to add anything after the block. To do a reifying transformation you could infer a type from context and leave the contents of the block alone.

The alternative would be to say that a block is unreachable if no br targets it and its last element is unreachable. Then (block (br 1)(i32.const 0)) would have type i32. Not sure if there's a reason to prefer either of those options but the first seems more flexible?

It kind of ties into the question of whether blocks or other elements which could have type unreachable must have that type. I think it could work either way for emitting wasm; for creating binaryen IR from wasm and for transformations, it would be a bit of extra work to potentially fix up the type of a node that is created or made unreachable by a transformation, but it sounds like that might be a property we want for round-tripping anyway.

@kripken
Copy link
Member

kripken commented Feb 15, 2017

I think for if you don't. If both arms of an if or select are unreachable, then the if is void. If one is unreachable, then its type is the type of the reachable arm. But in that case you wouldn't be inferring the if's type, the type of the if would be i32 already. To find the type of an unreachable block which is the arm of an if you just look at the type of the if containing it, just like any other construct.

Yeah, thinking some more, that does sound right.

(Except for If both arms of an if or select are unreachable, then the if is void which I assume was a typo, the if might need a type if the parent requires it.)

Thinking more about blocks: could say a block has type unreachable if any of its elements is unreachable, and if no br targets it (regardless of its last element).

That does seem like a good way to define it.

if the type was not void and the last element was not void or unreachable, you'd add an unreachable or drop at the end to make it valid wasm.

I don't think that's necessary? If the block type is i32, and there is a br in the middle, then it shouldn't matter what the last element in the block is, the stack is polymorphic, anything goes.

@kripken
Copy link
Member

kripken commented Feb 15, 2017

Exploring this issue, I implemented @dschuff's rule for block types (unreachable iff an element is unreachable, and no breaks), and the simple "emit unreachables to make stuff work" approach. I realize we haven't agreed here yet, I just wanted to see how this can work. Here is the diff.

  • It appears to achieve the goal: I ran a bunch of tests, all pass in v8. (I am actually not able to test in sm right now, because it has the extra rule of disallowing dead code entirely, and the extra unreachables hit that.)
  • Implementing this was simple. The only semi-tricky part was the new type rule for blocks. That hit a few minor things like the relooper fix (was missing a finalize).
  • This makes us fail on the spec tests. I didn't look into it, but it could be the typing change for blocks, which I guess doesn't match wasm anymore. If this is tricky to figure out, and depending on how urgent this is to fix, I think we could consider disabling the spec tests - they were crucial for getting up and running, but now that we have large working test suites, and are diverging more from wasm anyhow, perhaps at some point we can just drop them.

@eholk
Copy link
Contributor

eholk commented Feb 15, 2017

In short, we think we can allow blocks (and everything else in Binaryen IR) to have unreachable type, and basically freeze those types into block signatures (i.e. reify them) when converting to wasm proper.

mir2wasm does something similar to this. Rust has an unreachable type, !, for functions that never return. In general we map Rust types to an appropriate binaryen type, so for example all pointers become i32s. For !, mapping this to void would be reasonable, and this works for functions returning !, but because we can't declare void locals, we are careful to avoid creating locals at all for ! "values". I don't think this is quite the same thing we're talking about here though.

@eholk
Copy link
Contributor

eholk commented Feb 16, 2017

Okay, I just finished reading this thread, so here are some thoughts of varying degrees of vagueness, relatedness, and self-consistency.


Regarding symmetry, I think there's a higher level question of whether the binaryen IR defines an evaluation order (although I realize everyone else in the thread probably already knows if binaryen defines an order). If it does, then obviously you have to respect that, but if not then you have a lot more freedom. For example, if you don't define an evaluation order, the following transformation is always valid. (I'm using ⊥ to represent unreachable)

(i32.add (A...) (B... : ⊥))   =>  (B... : ⊥)

That obviously gives you a ton of freedom in eliminating dead code, although I expect these semantics would be surprising to binaryen users.


I think the safest way to think of unreachable or ⊥ is that is a fresh type variable. This is how the Wasm type system works. In this way, unreachable isn't really more as a type, but more of an assertion that we don't know anything about that context.

A consequence of this is that you don't get to propagate unreachable. So in our i32.add example, we'd have something like:

(i32.add (A...) (B... : t1))

where t1 is a new type variable. Because it occurs in a context that needs an i32, then we'd set t1 to i32. This also means that something like

(i32.add (A... : f32) (B... : ⊥))

is always illegal, because there's an f32 value in an i32 context.


Another way to think of unreachable is as an effect. I think this way makes more sense if you want to reason about DCE or code reordering, or whether unreachable infects its context.

I think unreachable as an effect is pretty much orthogonal to unreachable as a fresh type variable, so it might actually make sense to do both of them. If that's the case, then it would probably reduce confusion to be explicit about when we are talking about each sense of unreachable.

With unreachable as an effect, this isn't so much a type, but another attribute on expressions. So then in the same way you can say "this expression has type i32 and writes memory as an effect" you could say "this expression has type f32 and has the effect of not returning." The type and various effects (writes memory, traps, doesn't return, calls functions) are each independent properties.


As far as when you go to reify an unreachable block (and here reify basically means "produce a concrete solution to all the constraints on the type variable"). Most of the time this will be easy, because most things in Wasm (and presumably binaryen) have concrete types. I'm skeptical that this is always going to be the case (or at least, care will be needed to ensure this property if it's wanted).

In the most general case, you'd have to do something like Hindley-Milner type inference. At the end you still might have type variables, but this means nothing observes that particular type and thus you can pick whatever you want (such as i32). Basically, in this situation, the type doesn't matter, but Wasm requires a type annotation, so you have to invent something.


I'm not sure how helpful this is, but one way to think of type systems is that they define a logic that you can use to prove properties about programs. One way you might use this in binaryen is as a tool to decide whether it a particular optimization is correct and applicable.


Going back to @dschuff's original requirements, here's the way I would think of the requirements.

  1. We want a language BinaryenIR, and a language Wasm, each with an associated type system, where WellTyped[L](M) means that module M is accepted by L's type system.
  2. We want a function WasmFromBinaryen, which is compiles BinaryenIR to Wasm.
  3. We also want a function BinaryenFromWasm, which converts a Wasm modules to binaryen's internal representation.
  4. WasmFromBinaryen(BinaryenFromWasm(M) is not necessarily equal to M.
  5. BinaryenFromWasm(WasmFromBinaryen(M) is also not necessarily equal to M.
  6. If WellTyped[BinaryenIR](M) then WellTyped[Wasm](WasmFromBinaryen(M))
  7. If WellTyped[Wasm](M) then WellTyped[BinaryenIR](BinaryenFromWasm(M))
  8. All of binaryen's optimizations and transformations preserve WellTyped[BinaryenIR].

If you were doing a paper on this, most of the paper would be focused on proving 6 and 7. It's good to be really precise and explicit about what is BinaryenIR and Wasm and think of them as different languages and type systems entirely, especially because they will in practice be very similar. For example, I've seen some papers which use different colors for code in the source language and the target language to keep from mixing them up. As a bonus, you might also want to define a relation between BinaryenIR types and Wasm types (such as i32[BinaryenIR] ~ i32[Wasm], etc.) and also show that the various translations between languages preserve type relations.

@kripken
Copy link
Member

kripken commented Feb 16, 2017

Good summary. Yeah, I think 1-8 are exactly what we want here. Earlier on we had hopes of having binaryen use pure wasm, so there wouldn't be two type systems, but we wasm evolved we've diverged. so formally, we do have two now.

About unreachable being a type vs an effect, I'm not totally sure I see the difference in our case, but the changes proposed in this issue do make unreachable properly describe an ast node that is not "reachable" in the naive sense - you can't get exactly there (you can get out of it, though, if you branch to some outer scope).

@kripken
Copy link
Member

kripken commented Sep 21, 2017

This was done in #911 (and followups mentioned above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants