Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix v8 -O0 take 2 #899

Closed
wants to merge 17 commits into from
Closed

Fix v8 -O0 take 2 #899

wants to merge 17 commits into from

Conversation

kripken
Copy link
Member

@kripken kripken commented Feb 6, 2017

This is an alternative to #897, which turned out to not be enough. This PR has a more general fix, that seems to get everything to run in v8 and sm.

Principles:

  • Eagerly drop things in asm2wasm. If we wait til later to see if we need drops, then we may have already created things like an if i32, which would mean we need to fix up the children of that if, etc. Instead, asm.js input doesn't have fallthrough values anyhow, so just use that information up front.
  • Update validation rules to check that the fallthrough nodes of a control flow structure - if, block, loop - must, if they are control flow structures as well, have the same type. In other words, it's fine for a block i32 to end in a return, which has unreachable type, but if the block ends in another block, the child block must be i32 as well. Control flow structures are special in wasm, it seems.
  • To facilitate that, this changes how finalize works. finalize without a parameter will not remove a type like i32. In other words, if you have a block i32, the finalizing it will not remove that type, even if it has no brs and ends in a return. This lets us preserve the proper types through optimizations. Once we set a concrete type, it stays unless we explicitly remove it.
  • On the other hand, finalize(type) (with a param) does allow changing the type to anything, it just sets the provided type directly, no matter what it is.
  • Both finalize with and without a param propagate the type into child control flow structure nodes. So whether a concrete type already existed or whether we just changed one, we make the children consistent with the parent.

This PR is not ready to land, though, as these changes broke us on the spec tests. It might be best if someone that understands the type checking rules of the spec better than me looks at that part.

…remove it later, as it may be necessary due to wasm rules from the parent (e.g. the function fallthrough may need a block i32 even if the block ends in a return). at the end of asm2wasm, make sure to create the proper types for the function fallthrough, which is special (as asm2wasm doesn't emit other fallthroughs itself, not before optimization). rework how autodrop operates in order to work in this new scheme, by dropping things directly, not looking at higher scopes to see if used
…ing us to propagate a type necessary from the bottom up
… used, and it can make the block have a value, which is wrong
…their children must have the same type as them if they are concretely typed
@kripken
Copy link
Member Author

kripken commented Feb 6, 2017

Ah, I spoke too soon, I see some errors, still not done on the correctness side.

…ns, the fallthrough must be typed that way (unreachable is not acceptable any more). this requires changing the binaryen C API to allow forcing the type of a block
@kripken
Copy link
Member Author

kripken commented Feb 6, 2017

This also required C API changes, to provide the type of a block.

So this was a lot more work than expected, but it does seem to be ok now, nightly fuzzing + test suite appear to pass. The spec tests are still broken though.

@kripken
Copy link
Member Author

kripken commented Feb 6, 2017

Added some more validation improvements. This now seems to pass the spec tests too.

Note on s2wasm: it has tests that contain typed empty blocks (like (block i32)). Those should not validate. I made it add an unreachable to keep the block valid, but if that is in fact something the wasm backend emits, it sounds like a bug? Or if not, perhaps the test needs to be updated.

…ve put an if at the end of a block, and if the block if is typed, then so now must be the if (it must have both arms of unreachable type, so this is ok to do)
@kripken
Copy link
Member Author

kripken commented Feb 7, 2017

In more "test suite is never enough" news, tests pass and fuzzing found nothing, but a failure occurred on Unreal. Fix pushed, details in there.

@kripken
Copy link
Member Author

kripken commented Feb 7, 2017

Fuzzing found another bug. I guess this makes sense - this is a large change to the type system, so fallout is to be expected. It'll take a while to stabilize again.

The last commit fixes it, but also worries me because it seems to indicate we must handle a lot more corner cases in each pass. We need to find a better approach, I think.

@kripken
Copy link
Member Author

kripken commented Feb 7, 2017

Or maybe I still don't understand the new wasm type system rules. Can someone confirm if this should or should not validate?

  (func $return-block-2 (type $0) (param $x i32) (result i32)
    (if i32
      (get_local $x)
      (block
        (set_local $x
          (get_local $x)
        )
        (return
          (get_local $x)
        )
      )
      (i32.const 2)
    )
  )

@kripken
Copy link
Member Author

kripken commented Feb 7, 2017

Also whether this should or shouldn't:

  (func $return-block-2 (param $x i32) (result i32)
    (if i32
      (get_local $x)
      (return
        (block i32
          (set_local $x (get_local $x))
          (get_local $x)
        )
      )
      (i32.const 2)
    )
  )

@kripken
Copy link
Member Author

kripken commented Feb 7, 2017

And the same two questions with the if replaced by a select (I assume the answers are the same, but just to be sure).

@sunfishcode
Copy link
Member

The first example does not validate, and the second does. The answers are the same for select.

When replacing a return, br, br_table, or unreachable with a block, it's necessary to infer the return type from the surrounding context in order to give the block the required signature.

@kripken
Copy link
Member Author

kripken commented Feb 8, 2017

Thanks. Ok, then this is a serious issue. That transform, of

(return
  (block i32
    (set_local $x (get_local $x))
    (get_local $x)
  )
)

into

(block ?
  (set_local $x (get_local $x))
  (return
    (get_local $x)
  )
)

was previously valid, because in the latter case our type system gives the block type unreachable (since the block ends in a return, and we set ? to none). But now wasm doesn't allow that. (Note that I put ? for the type, because i32 would happen to work, but the problem is the outside context might want f64.)

The result is that in our current type system we cannot replace a node in the ast with another node with the same type (unreachable). I think that's a bad property for an optimizing IR, so something needs to change.

One option is to change our type system to disallow unreachable as the type of block,if,loop. That seems to be more consistent with wasm, if I understand it correctly? It's a substantial type system change for binaryen though, so it might be a lot of work.

A downside to doing that change is that it means we can't do the above transformation - it would be disallowed by type system logic as the first has type unreachable and the latter type none - unless we infer the type from the surroundings. In other words, this change would mean that optimizations can't work locally any more, they need to see the surrounding context. That's also a bad property for an optimizing IR.

Instead, another alternative is to keep the type system as it is, i.e. allow unreachable on block, but fix it up when writing the binary. This is actually what we sort of do now, or try to - we write out unreachable as none. But of course that is far from enough. The question is whether a simple fixup can work here. I don't have a clear picture of that yet. Even if possible, it would mean another divergence of our IR from wasm, which we've tried to avoid (but couldn't always, like the stack machine change).

@ghost
Copy link

ghost commented Feb 8, 2017

Instead, another alternative is to keep the type system as it is, i.e. allow unreachable on block, but fix it up when writing the binary.

Fwiw I've been exploring similar solutions, so settling the types for the encoding as part of the encoding stage. I also throw away a lot more of the type information in some IRs, and derive it at a later stage, to make it easier to work with the IR. For unreachable code, in some depth first walks of the code it returns the reachability of the child, which can be used in the parent to drop the dead code as appropriate. Block ends have a redundant values operator to make the bundle of values returned clear and this is absent if unreachable (in the encoding this is implicit in the block signature), so this is a bit like a fall-through reachable flag. So a permissive IR might be easier to work with, resolving types later, and this seems workable.

@kripken
Copy link
Member Author

kripken commented Feb 9, 2017

An alternative I am considering, and would appreciate feedback on, is as follows:

  • Keep the current binaryen type system as is.
  • Document a new difference between it and wasm: we allow block/if/loops with type unreachable, while wasm doesn't.
  • Fix things up during read/write of binaries. For example, a block/if/loop that is never exited could be emitted with type none, then an unreachable instruction could be emitted. I believe that wasm type system rules would then see that the block/if/loop is not consumed, so it's fine, while the unreachable is always ok to emit. So we increase code size slightly as a workaround.
  • We could also improve that in some cases, e.g., a block that is never exited could simply not be emitted, only its contents would be. This should be ok as such contents would end in a something unreachable like a return anyhow.
  • Slightly complicating this approach is that in the current type system, a block that has a (br) (no value) and ends in a return will have type unreachable - by the property of having the type of the last element. This actually seems inconsistent, since if such a block had a br with a value, it would have that type, and not the type of the last element. Perhaps we should change this, both for consistency and for making the last bullet point simpler. However, I'm not sure how this would interact with other corners of the wasm type system.

@ghost
Copy link

ghost commented Feb 9, 2017

For example, a block/if/loop that is never exited could be emitted with type none, then an unreachable instruction could be emitted.

Or replace the unreachable type with the expected type.

Slightly complicating this approach is that in the current type system, a block that has a (br) (no value) and ends in a return will have type unreachable - by the property of having the type of the last element.

All branches to the block label should have the same type, so if the end of the block is unreachable then use one of the other paths, and if there is an exit path then it seems odd to label the block as unreachable and if there a no values then give it the same type as nop. This sill leaves the problem of a block with no exit paths, but in that case I presume binaryen flags the block as unreachable and it can patch the expected type in later.

@kripken
Copy link
Member Author

kripken commented Feb 16, 2017

Superceded by #911.

@kripken kripken closed this Feb 16, 2017
@kripken kripken deleted the fix-O0-v8-b branch February 17, 2017 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants