Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return to more structured type rules for block and if #1148

Merged
merged 8 commits into from
Sep 6, 2017

Conversation

kripken
Copy link
Member

@kripken kripken commented Aug 27, 2017

This PR undos some damage from the wasm stack change. When we adapted to that change, we started to allow things like this:

(block
  (unreachable)
  (i32.const 10)
)

The block has a return value flowing out, but is not marked as having a value, and it is valid because that value is in unreachable code. The same thing can happen in an if-else with an unreachable condition and arm:

(if
  (unreachable)
  (i32.const 1)
  (unreachable)
)

This is bad for optimization as it means the optimizer needs to know if we are in unreachable code. For example, if the optimizer removes the unreachable in that block, suddenly it won't be valid any more. The fuzzer found a bunch of bugs on this, and it's better to fix it properly in the type system than to hack each pass so it is aware of this aspect of unreachable code.

Concretely, this PR takes us back to the more structured logic from before that wasm change, and in these two examples both the block and if would be marked (result i32). In other words,

  • If a block has a branch with a value, or a value on the final element, it has that type (and the others must match it or be unreachable). Otherwise, it has type none or unreachable (unreachable if it has an unreachable child).
  • If an if-else has a child with a value, it has that type (and the other child must match it or be unreachable). Otherwise, it has type none or unreachable (unreachable if it has an unreachable condition or all children are unreachable).

(As a followup, perhaps we should also get rid of the branch change that also happened at the same time, where unreachable branches are ignored for type checking. Not sure about that one.)

…en even if it has an unreachable child, keep it with that concrete type. this means we no longe allow the silly case of a block with an unreachable in the middle and a concrete as the final element while the block is unreachable - after this change, the block would have the type of the final element
… a result, even if the if condition is unreachable, to parallel block
@kripken kripken force-pushed the no-concrete-final-if-block-is-not-concrete branch from 100e4dd to b87c7b4 Compare August 27, 2017 16:00
@binji
Copy link
Member

binji commented Aug 27, 2017

It's not clear to me when you say "valid" whether you're referring to valid wasm or valid binaryen ir -- but neither of those examples are valid wasm. The stack is only valid at the end of a block if the contents of the stack match the block's signature. Since the signature in both cases is required to be empty, there can't be an i32 on the top of the stack at the block exit, even if the stack is polymorphic.

@kripken kripken force-pushed the no-concrete-final-if-block-is-not-concrete branch from 725ae8b to 2f33046 Compare August 27, 2017 17:55
@kripken
Copy link
Member Author

kripken commented Aug 27, 2017

Yeah, I should have been clearer. I wasn't sure if those are valid wasm or not - looks like not, based on what you say. But we started to accept them in binaryen as a result of the wasm stack changes. I don't remember the full reasoning back then - maybe we just relaxed validation rules because it seemed simplest - but I think we can do things better with the changes in this PR.

@kripken
Copy link
Member Author

kripken commented Aug 27, 2017

Looking at the spec test issues here more carefully, some just needed to be fixed. They were already fixed upstream, so what might have happened is when we switched to the stack stuff the tests were not yet fixed at that time.

@binji
Copy link
Member

binji commented Aug 27, 2017

Hmm, maybe those tests were from before block signatures were added. Btw, if i32 format isn't allowed anymore, it has to be if (result i32).

@kripken
Copy link
Member Author

kripken commented Aug 28, 2017

Yeah, binaryen still supports if i32 so I did that for consistency with the tests around it. At some point we should upgrade all the tests (or fork them) and remove the old notation support, probably.

@kripken kripken force-pushed the no-concrete-final-if-block-is-not-concrete branch 2 times, most recently from 52912c0 to d1874d5 Compare August 29, 2017 22:13
… taken or not. whether they are dead code should not affect how they influence other types in our IR

disable wasm2wasm tests FIXME
@kripken kripken force-pushed the no-concrete-final-if-block-is-not-concrete branch from d1874d5 to 2b73e69 Compare August 29, 2017 23:07
@kripken
Copy link
Member Author

kripken commented Aug 29, 2017

Added a commit to also clean up the "untaken br" issue mentioned above. That means that we don't ignore unreachable brs in type checking. Consider

(block $b
  (br_if $b (unreachable))
  (unreachable)
)

Then before this change the type of the block would be unreachable (since the br_if is not taken). After this change the type would be none, since has a br referring to it.

This gets rid of a bunch of unnecessary complexity (ignoring untaken branches, handling cases where a branch becomes taken or untaken), and complements the other commits in returning us to a more structured type system.

@kripken
Copy link
Member Author

kripken commented Aug 30, 2017

Thinking some more on this, I think the core issue is the type system changes from #903 etc., when we moved to make a block unreachable if any of its children is unreachable. When we made that change we still allowed the annoying case mentioned at the top of this issue,

(block
  (unreachable)
  (i32.const 10)
)

The block has type unreachable despite having a final element which has a concrete type, which is wacky. We probably should have made that not validate back then, and that's basically all this PR does - see the wasm-validator.cpp changes. All the rest of this PR falls out of that.

The one slightly odd result from this is that the block in the example should have type i32, while if the last element were (nop) then it would have type unreachable. In other words, there is an asymmetry here between all the concrete types and none, as it is only none that will be turned into an unreachable if there is an unreachable child.

I think this makes sense to do despite the asymmetry, because in wasm there is already an asymmetry there, (block (unreachable)) can be optionally given a concrete type, ever since wasm moved to block and if signatures. We have always had some unease around that, as the binaryen optimizer could strip the type of (block (result i32) (unreachable)) if it wanted to, while a none type can never be stripped.

And actually the change in this PR makes things better there too, while we could strip the type in that trivial case, we couldn't for

(block (result i32)
  (unreachable)
  (i32.const 10)
)

In other words, requiring the last element to be valid for the type, as this PR enforces, means we can't change types in silly ways like removing the i32 from that block. Furthermore, this is more efficient: when updating the type of a block, if the last element is concrete, we don't need to scan the other children to see if one is unreachable. Finally, this change makes the code simpler and it seems more resilient to fuzzing.

@dschuff, @jgravelle-google, you suggested the type system change in #903 etc. - thoughts on this change? If you don't have time to do a full review of the code, it would still be helpful to know if you think this direction makes sense or not for the type system.

@dschuff
Copy link
Member

dschuff commented Aug 30, 2017

I don't think I have strong opinions on principle. Making the type of a block match the last element (and especially having the asymmetry with none vs concrete types) does seem more 'wacky' to me on the face of it than making the children of a block symmetric. But I think this decision should be driven by its consequences, especially 1) ease and simplicity/minimality of the conversion to/from wasm, and 2) ease of implementing and reasoning about optimization. (and if there is weirdness, it's a good property that the weirdness directly corresponds to weirdness in wasm itself). IMO You've made a strong enough case here about # 2, what is the impact on conversion to and from wasm?

@kripken
Copy link
Member Author

kripken commented Aug 30, 2017

I think this makes conversion to wasm simpler and better in that we have fewer blocks with type unreachable, which is the tricky one for us since it doesn't exist in wasm. Namely, this PR rules out (block (unreachable) (i32.const 1)) which we need to fix up for wasm by adding another unreachable in and out of the block; after this PR that block would have type i32 anyhow, so it's emitted more efficiently and simply.

(However, we can't get rid of that unreachable-handling code because we do still have unreachable blocks without a concrete final element, like (block .. (br $somewhere)). So this PR lets us use that code less but it's still needed.)

I don't think conversion from wasm is affected, since there are no unreachable blocks like that coming from there. I.e., wasm wouldn't tell us to create stuff like (block (unreachable) (i32.const 1)) anyhow, as @binji pointed out.

…t to anyhow, and it's more complicated and error prone)
@kripken kripken force-pushed the no-concrete-final-if-block-is-not-concrete branch from f23a92d to 1ae7726 Compare August 30, 2017 21:45
Copy link
Member

@dschuff dschuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the "taken" terminology thing predates this change to no need to block it on that.

@@ -103,11 +103,11 @@ inline std::set<Name> getBranchTargets(Expression* ast) {
// Finds if there are branches targeting a name. Note that since names are
// unique in our IR, we just need to look for the name, and do not need
// to analyze scoping.
// By default we ignore untaken branches. You can set named to
// note those as well, then any named branch is noted, even if untaken
// By default we consider untaken branches (so any named use). You can unset named to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This terminology "untaken" branches is confusing to me, since whether a branch is taken is a dynamic property that changes from run to run. Do you mean unreachable branches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be "unreachable branches" I guess. I'll do that in a followup.

@kripken kripken merged commit c0f21e1 into master Sep 6, 2017
@kripken kripken deleted the no-concrete-final-if-block-is-not-concrete branch September 6, 2017 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants