Skip to content
This repository has been archived by the owner on Nov 24, 2022. It is now read-only.

Investigate hidden binaryen workarounds #31

Closed
TerrorJack opened this issue Oct 18, 2018 · 12 comments
Closed

Investigate hidden binaryen workarounds #31

TerrorJack opened this issue Oct 18, 2018 · 12 comments

Comments

@TerrorJack
Copy link
Member

When comparing binary code emitted by binaryen/new backend, I discovered that binaryen doesn't simply do a post-order traversal on the expression IR as expected; there are quite some hidden transformations/workarounds here and there, which are mostly undocumented and there's no way to opt out even when optimize/shrink level is set to 0. A non-comprehensive list including workarounds I discovered so far:

  • When binaryen adds default passes, some are always added regardless of optimize/shrink level
  • Not reporting an error on receiving zero alignments, instead, falling back to "natural" alignment depending on byte length without a single warning.
  • Combining and remapping locals in the code section, so no more than 4 locals buckets appear in a function definition. Non-parameter local indices don't correspond to input expression IR (although there does exist a bijection)
  • Flattening anonymous blocks.
  • Emitting unreachable instructions not present in the original IR.

I've patched binaryen and shrank the list above to only contain the last entry. As long as the last piece is fit into the jigsaw puzzle, we'll base everything on the new backend right away.

We're not accusing binaryen's deficiencies here (though the caveats probably should be mentioned in docs); however, this debugging experience does reveal some additional mismatches between what binaryen offers and what we expect:

  • When we pass expression IR to the serializer, we don't expect ANY additional magic; any optimization is supposed to be done (and can be done) pre-serialization.
  • The linker/debugger performs aggressive IR rewriting and assumes IR isn't twisted in some way we aren't aware of. There is likely some dangerous chemistry between binaryen's rewriting and our own's.
  • When we serialize the expression IR, there may be validation errors hiding somewhere, in which case:
    • If validation is only done upon real execution: fine.
    • If a validation pass exists before serialization: find and report them. It's fine to have some false negatives.
    • Status quo: binaryen implements the workaround of inserting the stack-polymorphic unreachable instruction in quite some places. Hand it a mal-formed expression IR, it uses this trick to make the IR automagically pass validation...
    • What we expect: If such a trick is really needed, it indicates some fundamental flaw in how we engineer the expression IR. If we don't change the IR, at least the trick should be done pre-serialization.
    • Anyway, it's not okay if we thought we generated a well-typed expression but we really didn't, and when we step out the comfort zone binaryen once offered, we're thrown into a chaotic and cold universe.
@mboes
Copy link
Member

mboes commented Oct 18, 2018

Could you bring this to the attention of the Binaryen devs?

@TerrorJack
Copy link
Member Author

@mboes Other compilers based on binaryen may already be aware of some workarounds and even relying on them, so the switch shall still be inevitable. Still, yes, I'll factor out a short experience summary and "what to fix" list as soon as getting out of this minefield. Some should be doc enhancements, other in code.

@TerrorJack
Copy link
Member Author

Related discussion: WebAssembly/binaryen#903

The only remaining obstacle is sorting out the mess of binaryen vs wasm type system.

@TerrorJack
Copy link
Member Author

One interesting observation: when we re-enable binaryen's type inference for blocks, the unreachable trick is still mandatory to make V8 happy. So possibly we need to extend our type system to contain the unreachable type, even if it's not present in wasm spec. On it and let's see how it goes.

@TerrorJack
Copy link
Member Author

Findings after some more late night binaryen hacking:

The code implementing the unreachable trick is at https://github.com/WebAssembly/binaryen/blob/master/src/wasm/wasm.cpp#L178. Trying to port the trick without triggering a major refactor in current IR.

Still, the current type-related mess in migration to new backend reflects some flaws in expression IR:

  • No distinction between "pure" expressions and control-flow. In binaryen IR, everything is an expression, even a block or br can be one, and expressions can be arbitrarily nested in each other. So I basically ported the same design, only adding a few custom constructors (like Unresolved which allow you to use symbols in pre-link wasm). However, type-checking is tricky to get right when related to control flow.
  • No way to enforce correct-by-construction building of code. When I first started modeling binaryen IR several months ago, I attempted a GADT powered design, which upon completion would guarantee validation as long as it typechecks in Haskell. This effort proves to be more trouble than its worth, main problem: we lose Generic/Data related stuff, so things like serialization/traversal need tons of extra boilerplate. So the current expression IR is uni-typed and doesn't enforce wasm type-safety by construction.

So here is a mini roadmap for the way out:

  • Port the single binaryen hack, ensure the wasm binary code emitted by the new backend validates and runs on V8. This is for early dogfooding of the new backend, the longer we use it, the more confident we are about its quality. The following entries are done in-parallel with resumed todomvc effort.
  • After the binaryen backend is pruned, refactor the expression IR a little bit: separate pure/side-effecting code, much like CmmExpr and CmmNode. An "expression" cleanly maps to a wasm instruction sequence with type [] -> [t*] and can be easily typechecked; as for blocks/loops, we ensure a stronger constraint: all of them are type [] -> [], and so is each of its child instruction. This constraint is very strong, very unlikely to suffer from mistyping, and we don't lose power in terms of translating from Cmm.
  • Using GADTs for super typesafe IR is a waste of time; but we still want to make sure we aren't screwing things up before serialization, so a sensible choice would be a simple "WasmLint" pass which typechecks the expression IR with a single top-down traversal; it doesn't need to deal with wasm stack semantics or handle every wasm instruction; it is off by default and can be switched on by a flag when we suspect something might be wrong regarding the codegen, custom runtime or anything else.

@TerrorJack
Copy link
Member Author

Progress on this front:

A naive port of binaryen's unreachable hack would be:

isUnreachableType :: Expression -> Bool
isUnreachableType expr =
  case expr of
    Unreachable -> True
    Block {..} -> valueType == None && any isUnreachableType bodys
    _ -> False

And when marshalling a Block, we may emit an extra Unreachable instruction depending on isUnreachableType's result.

This makes V8 validates the module happily, but traps at runtime, so we're emitting more Unreachable than needed here, and isUnreachableType need further improvement.

Meanwhile, it seems after masking the function body of scheduleWaitThread with Unreachable, and disabling this hack, V8 also validates the module! So it's possible that a type error is hidden in scheduleWaitThread, and binaryen didn't report it, instead, it somehow deferred the type error to runtime and since it isn't triggered with 100% chance (in fact it almost always worked for small examples) we didn't notice it.

@TerrorJack
Copy link
Member Author

Type error is somewhere near https://github.com/tweag/asterius/blob/wip-wasm-3/asterius/src/Asterius/Builtins.hs#L976. We're pretty close.

@TerrorJack
Copy link
Member Author

After disabling scheduleHandleHeapOverflow and scheduleHandleYield, the new backend works out of the box without any binaryen hack.

It's still an amazement how binaryen managed to defer the type errors to runtime, and since the scheduler didn't need to handle HeapOverflow/Yield return codes in the examples, we never noticed those errors in the first place. Before pruning binaryen and migrate the new backend to master branch, I'd take a bit of time to look into this, and see what should be done to rule out similar errors in the long run.

@mboes
Copy link
Member

mboes commented Oct 22, 2018

Sounds reasonable.

@TerrorJack
Copy link
Member Author

@mchakravarty @mboes There are several possible options regarding how to handle the binaryen backend:

  • Prune it completely from codebase, the Haskell bindings can be found from earlier commits or an "archive" branch
  • Move the Haskell binding package to a separate repo for archive purpose. We may still update it when binaryen updates C API, but its development is detached with mainline asterius and becomes a "side project". It's still possible other Haskellers are happier with extra functionalities provided by binaryen.
  • Keep the binaryen package in-tree, and maintain it as a "v1-codegen" which can be switched on with a flag. Later when we experiment with post-MVP wasm features, we'll need some extra work to ensure the "v1-codegen" still works to some extent.

Any thoughts on this?

@mboes
Copy link
Member

mboes commented Oct 22, 2018

Let's scratch option (1): it will hardly be discoverable by others who want to revive the binaryen bindings and not start from zero. We can either move these to separate repo or keep it here. Either way, don't you think it will be useful to keep this code path for debugging purposes? If so that could even justify continuing to maintain it, at least while the alternative code path matures.

@TerrorJack
Copy link
Member Author

@mboes Makes sense. I've already enabled the new backend as default while providing a switch to fall back to the binaryen codegen. Will soon land in master.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants