Skip to content

Commit

Permalink
vm: rework the exception handling (#1061)
Browse files Browse the repository at this point in the history
## Summary

Implement a different approach to exception handling in the VM, fixing
most of the issues around `raise`, `except`, and `finally`. This makes
the VM's exception handling implementation the most complete out of the
existing backends.

## Details

Exception and `finally` handling was very incomplete, with many non-
simple cases not working (e.g., `finally` not intercepting `break`s,
the current exception not being reset once handled, etc.).

**New exception handling implementation:**
* instead of the safepoint-based approach (`opcTry`), a separate
  instruction-to-exception-handler lookup table is used
* each call frame uses its own lookup table; setting the lookup table
  is done via the new `SetEh` instruction
* the table stores mappings from instruction position to the position
  of special-purpose exception handling (=EH) instructions
* when an instruction raises, it is looked up in the table. If no
  associated EH instruction is found, the call instruction in the above
  frame is looked up, then the call instruction in the frame, etc.
* if no EH instruction is found even after unwinding the call stack,
  the exception is treated as unhandled and reported to the supervisor
* if an EH instruction is found, an internal EH thread is spawned for
  the raised exception and the EH instruction(s) are executed
* the EH instructions describe which `except` and `finally` clauses are
  visited and in what order
* entering an `except` or `finally` clause from an EH thread sets the
  current exception (the one returned by `getCurrentException`) to the
  thread's associated exception

This affords for a large amount of flexibility with exception handling.

**New `finally` implementation:**
* the `Finally` and `FinallyEnd` opcodes stay, but they now work
  differently
* each finally section is associated with a *control register*
* the control register stores the information necessary for knowing
  what to do when the end of the section (`FinallyEnd`) is reached
* how execution continues at the end of a finally section depends on
  how the `Finally` instruction was reached:
  * if reached by normal control-flow, execution continues at the
    instruction designated by the `Finally` instruction
  * if reached via an `Enter` instruction, execution continues at the
    instruction following the `Enter` instruction
  * if reached from exception handling, the EH thread is resumed
* jumping from outside a finally section to within one is forbidden

The `Enter` and `Leave` are two new instructions:
* `Enter` is used for redirecting to finally sections and works as
  described above
* `Leave` is used for terminating open EH threads when exiting an
  `except` clause or when exiting a `finally` clause through
  unstructured control-flow (e.g., `break`)

When a `break` or `return` exits one or more `try` clauses with
attached `finally` clauses, `vmgen` emits an `Enter` instruction
targeting each, prior to the final jump. Similarly, `vmgen` emits an
`Leave` instruction for each `finally` and `except` clause exited
through unstructured control-flow.

### Considered alternatives

Much of the mentioned changes could have also been implemented on-top
of the safe-point mechanism. This, however, would not allow supporting
clean-up-only control-flow (that is, all exception handler being
skipped and only certain finally section being executed), which, while
not used at the moment, could in the future become useful for
implementing panics.

### Code generator details

* a `SetEh` instruction is always emitted, even if a code fragment /
  procedure has no instruction-to-EH mappings. This is a temporary
  limitation
* all `IndCall` and `IndCallAsgn` instructions (i.e., calls that
  weren't lowered into dedicated instruction) are treated as raising
* the EH instructions are emitted together with the normal bytecode;
  no separate code generation pass is used

### Future direction

The MIR (and subsequently the `CgNode` IR) is planned to use goto-
based control-flow primitives, similar to the ones that the VM uses.
This means that much of the semantics-related decision-making (e.g.,
where to insert `Leave` instructions) is going to move out of `vmgen`.

### Tests

Multiple tests for things that previously didn't work with the VM are
added. In addition, the `knownIssue` marker is removed from multiple
`exception` tests that now succeed.

---------

Co-authored-by: Saem Ghani <saemghani+github@gmail.com>
  • Loading branch information
zerbina and saem authored Feb 5, 2024
1 parent 093ffd7 commit 752029c
Show file tree
Hide file tree
Showing 19 changed files with 870 additions and 255 deletions.
9 changes: 9 additions & 0 deletions compiler/vm/packed_env.nim
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,8 @@ type

code*: seq[TInstr]
debug*: seq[uint32] # Packed version of `TCtx.debug`. Indices into `infos`
ehTable*: seq[HandlerTableEntry]
ehCode*: seq[EhInstr]

# rtti related data:
nimNodes: seq[PackedNodeLite]
Expand Down Expand Up @@ -928,6 +930,9 @@ func storeEnv*(enc: var PackedEncoder, dst: var PackedEnv, c: TCtx) =
mapList(dst.debug, c.debug, d):
dst.infos.getOrIncl(d).uint32

dst.ehTable = c.ehTable
dst.ehCode = c.ehCode

mapList(dst.files, c.config.m.fileInfos, fi):
fi.fullPath.string

Expand Down Expand Up @@ -960,6 +965,8 @@ proc writeToFile*(p: PackedEnv, file: AbsoluteFile): RodFileError =
f.storePrim p.entryPoint
f.storeSeq p.code
f.storeSeq p.debug
f.storeSeq p.ehTable
f.storeSeq p.ehCode

f.storeSection symsSection
f.store p.infos
Expand Down Expand Up @@ -1002,6 +1009,8 @@ proc readFromFile*(p: var PackedEnv, file: AbsoluteFile): RodFileError =
f.loadPrim p.entryPoint
f.loadSeq p.code
f.loadSeq p.debug
f.loadSeq p.ehTable
f.loadSeq p.ehCode

f.loadSection symsSection
f.load p.infos
Expand Down
Loading

0 comments on commit 752029c

Please sign in to comment.