Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem of rethrow instruction #30

Closed
aheejin opened this issue Oct 14, 2017 · 7 comments
Closed

Problem of rethrow instruction #30

aheejin opened this issue Oct 14, 2017 · 7 comments

Comments

@aheejin
Copy link
Member

aheejin commented Oct 14, 2017

Problem

The current rethrow instruction has some problems that make its use very
limited. For example, suppose there are two try/catch blocks and in both catch
they jump to some common code, which is followed by a rethrow instruction.

block $label0
  try
    ...
  catch i
    br $label$0
  end
  ...
  try
    ...
  catch i
    br $label$0
  end
end

some common code
rethrow

This cannot be supported in the current spec because rethrow can occur only in
the scope of a catch block. But this code pattern is very common especially
when there is some cleanup code (calling destructors) to run before rethrowing
an exception up to a caller. If you compile the code below,

MyClass obj;
try {
  foo(); // might throw
} catch (int n) {
  ...
}
...
try {
  foo(); // might throw
} catch (int n) {
  ...
}

The generated code will look like this:

block $label0
  try
    call $foo
  catch i
    br $label$0
  end
  ...
  try
    ...
  catch i
    br $label$0
  end
end

call MyClass's destructor to destruct obj
rethrow

In this case, cleanup action consists of only a single destructor call, but it
can take more code space if there are many objects to destroy, so I think
duplicating this cleanup code into possibly n catch blocks is not a viable
idea. This is a classic case in which we need a rethrow, but it cannot be
executed at the end of this code because it's not in the scope of a catch.

This is one example of code sharing that's very common, but code sharing between
catch blocks can occur in other cases as well. You can use goto within catch
clauses to jump to a common block. Or any middle-IR-level compiler optimization
pass can factor out some common code in catch blocks.

While this problem can be worked around using not a rethrow but just a normal
throw, throw is considered as throwing a completely new exception, and the
VM wouldn't be able to carry an attached backtrace with it, which can be useful
when we later support backtrace debugging in the future. And more importantly,
this problem effectively makes rethrow unusable at all, because the most
common usecase of it is, as illustrated above, when it occurs after some common
cleanup code which is shared between many catch blocks. It can also occur when
there is shared cleanup code between catch i and catch_all clauses, which
will be very common case as well, but I'm actually planning on proposing
something else for catch clauses... but anyway.

Idea?

This is a rough idea and not a complete spec yet. And it's not I'm proposing
this as a single concrete alternative and I appreciate comments and suggestions.

I think it is necessary to make it possible to access some kind of handle to an
exception object outside a catch block. (The reason it is not a i32 value but a
handle is it can be opaque if it is for a foreign exception) There can be
multiple ways to do it. We can make catch instruction to return a handle, save
it to a local or something, and then use it after we exit a catch block. In this
case, rethrow should take an handle as an argument now.

try
  ...
catch i
  set_local 0, handle
end

get_local 0
use handle

In this case, I think when the VM can destroy an exception object is unclear,
and it can be an issue, maybe? Maybe the VM should maintain a map of handle to
an exception object until the program ends.

Or, to make the VM can destroy exception object when they are not necessary
anymore, we can add some reference count to exception objects, and make a way to
capture an exception handle within a catch block. Suppose capture_exception
instruction captures the current exception handle.

try
  ...
catch_all
  set_local 0, capture_exception
end

get_local 0
use handle

The reference count for an exception starts as 1 when any catch block is
entered.

  • It will be decremented when
    • It hits an try_end instruction.
    • When the function execution ends
  • It will be incremented when
    • When capture_exception instruction is executed within a catch block
    • When rethrow instruction is executed on the handle.
      This approach is in a way similar to the newly added library functions to the
      C++11 spec:
      std::current_exception, and std::rethrow_exception.
@eholk
Copy link
Contributor

eholk commented Oct 26, 2017

I like this suggestion, and I think it fits nicely within the existing proposal. Most of the concepts needed to make this happen are already lurking under the covers. This proposal would just bring them to the surface and make them accessible to Wasm. Here's how I would frame it:

We keep all the terminology given here by @rossberg. Then we add a new base type in addition to i32, i64, f32, f64. Let's called it tagged_value.

Note that we do not add any new memory instructions, meaning there is no way to store a tagged_value to memory or load it. This is important because you can build a reinterpret cast by writing to memory and reading it. If we wanted to allow tagged_values in memory, we'd need a type system for linear memory, which we should not do.

Next, we change the typing rule for catch_all to start the body with a tagged_value on the stack. Basically, the VM would push a tagged value representing the exception onto the stack and run the catch_all block. Then, we change rethrow to have type tagged_value -> unreachable.

Here are a couple examples of how this works:

(try
  ...
  (catch_all
    rethrow))

In this case, rethrow picks up the tagged value directly from the stack and throws it again. (There's a question of whether tagged values carry a stack trace, but I'll address this later). I believe this gives us parity with what we have already. In the case of nested tries and catches, you can disambiguate which one is through by dropping values of the stack. For example, consider the following nested try block:

(try
  ....
  (catch_all
    (try
      ...
      (catch_all
        <code goes here>))))

If we filled in <code goes here> with rethrow, we would get the same behavior as (rethrow 0). On the other hand, drop rethrow would have the same behavior as (rethrow 1), because the nested catch_all would have two tagged values on the stack so it would drop one and throw the next one.

However, once we do this, it's pretty straightforward to allow tagged values in local variables:

(try
  ...
  (catch_all
    (set_local 0)
    ...
    (rethrow (get_local 0)))

However, once we have this much, I'd propose some more ambitious changes. Here are some of them, in order of increasing radicalness.

First, adding match. Match would have the following syntax:

(match <tag index>
  <match body>
  (else
    <else body>))

A match expression would have type tagged_value -> t if

  • The tag defined at <tag index> has type [t1 ... tn]
  • <match body> has type [t1 ... tn] -> t
  • <else body> has type tagged_value -> t

Including the tagged value on the stack for the else body allows for easy chaining of matches, such as:

(match 0
  ...
  (else (match 1
    ...
    (else unreachable)))

For code size concerns, we may even want to introduce a new opcode for else-match.

A match would be implemented by checking if the tagged value on top of the stack has the same tag as the tag defined at index <tag index>. If so, it would push the values associated with the tagged value onto the stack and execute <match body>. Otherwise, it would leave the tagged value on the stack and execute <else body>.

So now there's no reason to have a separation between catch and catch_all. Let's just get rid of catch and rename catch_all to catch.

Now the original code,

(try
  ...
  (catch 0
    <catch 0>)
  (catch 1
    <catch 1>)
  (catch_all
    <catch all>))

becomes

(try
  ...
  (catch
    (match 0
      <catch 0>
      (else-match 1
        <catch 1>
        (else
          <catch all>)))))

Next, I'd propose adding another instruction, tag. Tag takes values on the stack and builds a tagged value out of them. For example, suppose the tag defined at index 0 had type [i32 i32], then we could do:

(i32.const 1)
(i32.const 2)
(tag 0)

The typing rule would be that (tag <tag index>) has type [t1 ... tn] -> tagged_value, assuming the tag at <tag index> has type [t1 ... tn].

At this point, we have tagged values that can be created and used completely independently of exceptions, which could have other uses.

Now we can also get rid of the distinction between throw and rethrow. Instead, we will just have throw, with type tagged_value -> unreachable. Whereas before we would have written:

(i32.const 1)
(i32.const 2)
(throw 0)

Now we would write:

(i32.const 1)
(i32.const 2)
(tag 0)
throw

This means we only have one throw instruction, and it can be used both inside and outside of catch blocks.

There was a question in the original post about lifetimes. I think these are actually not a problem. Right now, tagged values can only exist on the stack, meaning we don't have to worry about indefinite extend. Once they go out of scope, they are safe to clean up. These will usually be backed by JS objects anyway, so we may represent tagged values as a handle to the JS object so that the garbage collector keeps it alive.

The more challenging issue is what to do about stack traces. In the existing proposal, I believe exceptions carry a stack trace, because JavaScript needs to be able to show this. We could say that all tagged values carries the stack of when they were created with them, but this seems less than ideal. For example, it may impose constraints on VMs to keep stacks around for no reason, especially if tagged values end up being used for things besides exceptions.

Perhaps a better option is to add another type, exception, which is basically a pair of a tagged value and a stack trace. Here, throw would attach a stack trace to a tagged value and transform it into an exception. We'd add another instruction to get a tagged value from an exception, make catch put the exception on top of the stack rather than the tagged value. We'd want to add rethrow back, which would have type exception -> unreachable. We could still remove the restriction that rethrow can only be in a catch block because the type system gives you no other way to get ahold an exception than by catching it or somehow plumbing around a caught exception.

@eholk
Copy link
Contributor

eholk commented Oct 26, 2017

One potential downside to my proposal is that it might limit our options for adding resumable exceptions in the future. If we want this, it'd probably be could to tweak my proposal to anticipate this. The currently existing exceptions proposal already seems to anticipate resumable exceptions.

@rossberg
Copy link
Member

@eholk, I remember there were a few earlier discussions of this sort of decomposition. It is nice design-wise, but has a number of potential issues:

  1. Decoupling catching from scrutinisation is likely to make handling slower, or at least harder to make fast, because it prevents various optimised strategies for jumping to handlers directly, at least in the general case (I want to avoid the misleading "zero-cost" slogan here).

  2. The ability to store away tagged values or even exception packages would make their lifetimes more dynamic and thus more difficult to manage. As you observe, they may have some interaction with host GC, but we want to avoid a dependency on managed types.

  3. As you also mention, it is not clear how this would scale to resumable exceptions, because then an exception package actually becomes a delimited continuation, whose lifetime and scope is even more tricky, especially if you want to implement it efficiently. I've come to believe that resumption will be a key feature, because it generalises exception handlers to what's commonly called effect handlers today, and which enable expressing all sorts of control abstractions, including lightweight threads, coroutines, generators, async/await, etc. In short they provide a structured way of expressing stack switching. (In this context I've also become less convinced that it is a good idea to regard exceptions merely as tagged values. Resumable exceptions naturally have a return type, which describes the value(s) you pass pack to the throw point.)

@eholk
Copy link
Contributor

eholk commented Oct 26, 2017

Yeah, as I was thinking about this some more, trying to shoehorn general tagged values into the exception system doesn't feel like a good fit. To do resumable exceptions, I think we'd end up with exception declarations again, and then the idea that exceptions can only take tagged values as arguments seems weird. Better to design exceptions in a way that makes sense and if tagged values are useful we could add them as a separate proposal.

I wonder if there's a hybrid that would make sense. For example, we could preserve catch as is today, but then make catch_all leave and exception value on the stack. Ideally this would let us do the optimized strategies for jumping directly to handlers while only paying the cost of reifying an exception in the case of catch_all. Once you have an exception on the wasm stack, we could add match instructions to allow inspecting it in a safe way if needed.

As I've described it, tagged values/exceptions could not be stored to the linear heap, meaning we can manage the lifetimes of exceptions purely using a stack discipline. This does not have to interact with the host GC, and does not depend on managed types. It may even be reasonable to say exceptions cannot be return values or parameters, which then means we only have to worry about them living on one stack frame.

One question is whether first class exceptions (or maybe 1.5 class, since I'm still proposing leaving them on the stack only) would be enough to solve all of @aheejin's problems. If so, I think that gives us more reason to consider this seriously. As it is now, I get the sense that @aheejin will have to end up emulating a more general mechanism, which will almost certainly be slower than anything we are able to build into Wasm directly.

@rossberg
Copy link
Member

Even if you can only assign exceptions to locals you can no longer statically tell their lifetime in general, because assignments may be conditional, and they may create aliases. So I think this would already require some form of reference counting to release associated host resources? Also, our local declarations currently rely on the ability to initialise them with a default value; for exceptions it wouldn't be clear what that means.

Without assignment to locals on the other hand there isn't even a way to reorder operands on the stack, so I think you would have a very hard time generating code for non-trivial cases.

@eholk
Copy link
Contributor

eholk commented Oct 27, 2017

If exceptions can't outlive a call frame (i.e. exceptions are not allowed as return values), then the VM can keep track of what exceptions are in the frame and destroy all of them when the function returns or unwinds. Of course, with loops we could get an unbounded number of exceptions, so we'll still need some mechanism to clean them up sooner. This seems like an issue.

Coming up with a meaningful default value is also indeed problematic. One possibility is to have a "empty exception" which has no data, stack trace, etc. Any attempts to match on it will fail, since it has no defined tag. We could then make throw of an empty exception initialize the exception with the current stack trace, etc. Throwing an initialized exception would be equivalent to rethrow. Overall, trying to use throw to both initialize and throw a new exception, as well as rethrow an existing one, seems weird, so I don't think I'd seriously argue for this option.

It's unfortunate that it's currently hard to use Wasm as a pure stack machine. Hopefully multi-valued blocks will help, but ti seems like we'll still need a way to reorder the stack.

@aheejin
Copy link
Member Author

aheejin commented Dec 2, 2018

With rethrow with a depth argument and the first-class except_ref type, I think we can close this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants