Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: implement labeled switch/continue #21257

Merged
merged 14 commits into from
Sep 5, 2024

Conversation

mlugg
Copy link
Member

@mlugg mlugg commented Aug 30, 2024

Implements #8220.

This is a revival of #19812.

@Luukdegram, you might want to check the Wasm changes -- I rebased your work, but haven't tested it, so the rebase could have gone wrong somewhere, not sure. It's probably fine? (oops, the commit authorship info got lost, sorry!)

@jacobly0, I haven't implemented the new syntax in the x86_64 backend. If you'd like to do this imminently on this PR, feel free; otherwise I'll let you do it on master in your own time.

@Rexicon226, I've regressed some bits of the RISC-V backend, specifically loops and switches containing ranges. I can probably fix them up (loops certainly should be easy), but it might be easier if you can take a look?

@mlugg mlugg requested a review from Snektron as a code owner August 30, 2024 23:43
@Rexicon226
Copy link
Contributor

@Rexicon226, I've regressed some bits of the RISC-V backend, specifically loops and switches containing ranges. I can probably fix them up (loops certainly should be easy), but it might be easier if you can take a look?

You got it!

@Luukdegram
Copy link
Member

This is a revival of #19812.

@Luukdegram, you might want to check the Wasm changes -- I rebased your work, but haven't tested it, so the rebase could have gone wrong somewhere, not sure. It's probably fine? (oops, the commit authorship info got lost, sorry!)

Looks good! Thanks for doing the work.

@mlugg mlugg added the release notes This PR should be mentioned in the release notes. label Sep 1, 2024
jacobly0 and others added 8 commits September 1, 2024 18:30
This commit modifies the representation of the AIR `switch_br`
instruction to represent ranges in cases. Previously, Sema emitted
different AIR in the case of a range, where the `else` branch of the
`switch_br` contained a simple `cond_br` for each such case which did a
simple range check (`x > a and x < b`). Not only does this add
complexity to Sema, which we would like to minimize, but it also gets in
the way of the implementation of ziglang#8220. That proposal turns certain
`switch` statements into a looping construct, and for optimization
purposes, we want to lower this to AIR fairly directly (i.e. without
involving a `loop` instruction). That means we would ideally like a
single instruction to represent the entire `switch` statement, so that
we can dispatch back to it with a different operand as in ziglang#8220. This is
not really possible to do correctly under the status quo system.

This commit implements lowering of this new `switch_br` usage in the
LLVM and C backends. The C backend just turns any case containing ranges
entirely into conditionals, as before. The LLVM backend is a little
smarter, and puts scalar items into the `switch` instruction, only using
conditionals for the range cases (which direct to the same bb). All
remaining self-hosted backends are temporarily regressed in the presence
of switch range cases. This functionality will be restored for at least
the x86_64 backend before merge.
This commit introduces a new AIR instruction, `repeat`, which causes
control flow to move back to the start of a given AIR loop. `loop`
instructions will no longer automatically perform this operation after
control flow reaches the end of the body.

The motivation for making this change now was really just consistency
with the upcoming implementation of ziglang#8220: it wouldn't make sense to
have this feature work significantly differently. However, there were
already some TODOs kicking around which wanted this feature. It's useful
for two key reasons:

* It allows loops over AIR instruction bodies to loop precisely until
  they reach a `noreturn` instruction. This allows for tail calling a
  few things, and avoiding a range check on each iteration of a hot
  path, plus gives a nice assertion that validates AIR structure a
  little. This is a very minor benefit, which this commit does apply to
  the LLVM and C backends.

* It should allow for more compact ZIR and AIR to be emitted by having
  AstGen emit `repeat` instructions more often rather than having
  `continue` statements `break` to a `block` which is *followed* by a
  `repeat`. This is done in status quo because `repeat` instructions
  only ever cause the direct parent block to repeat. Now that AIR is
  more flexible, this flexibility can be pretty trivially extended to
  ZIR, and we can then emit better ZIR. This commit does not implement
  this.

Support for this feature is currently regressed on all self-hosted
native backends, including x86_64. This support will be added where
necessary before this branch is merged.
The parse of `fn foo(a: switch (...) { ... })` was previously handled
incorrectly; `a` was treated as both the parameter name and a label.

The same issue exists for `for` and `while` expressions -- they should
be fixed too, and the grammar amended appropriately. This commit does
not do this: it only aims to avoid introducing regressions from labeled
switch syntax.
`.loop` is also a block, so the block_depth must be stored *after* block
creation, ensuring a correct block_depth to jump back to when receiving
`.repeat`.

This also un-regresses `switch_br` which now correctly handles ranges
within cases. It supports it for both jump tables as well as regular
conditional branches.
This does *not* yet implement the new `loop_switch_br` instruction.
@andrewrk andrewrk merged commit 3929cac into ziglang:master Sep 5, 2024
10 checks passed
@andrewrk
Copy link
Member

andrewrk commented Sep 5, 2024

Brilliant work as usual, @mlugg

Don't forget the release notes writeup while it's fresh in your mind :)

@mlugg
Copy link
Member Author

mlugg commented Sep 8, 2024

Release notes are below.


Labeled switch

Zig 0.14.0 implements an accepted proposal which allows switch statements to be labeled, and to be targeted by continue statements. Such a continue statement takes a single operand (like break can to return a value from a block or loop); this value is treated as a replacement operand to the original switch expression. This construct is semantically equivalent to a switch statement inside of a loop, with a variable tracking the switch operand; for instance, the following tests are equivalent:

test "labeled switch" {
    foo: switch (@as(u8, 1)) {
        1 => continue :foo 2,
        2 => continue :foo 3,
        3 => return,
        4 => {},
    }
    return error.Unexpected;
}

test "emulate labeled switch" {
    var op: u8 = 1;
    while (true) {
        switch (op) {
            1 => { op = 2; continue; },
            2 => { op = 3; continue; },
            3 => return,
            4 => {},
        }
        break;
    }
    return error.Unexpected;
}

These constructs differ in two ways. The most obvious difference is in clarity: the new syntax form is clearer at times, for instance when implementing Finite State Automata where one can write continue :fsa new_state to represent a state transition. However, a key motivation for this language feature lies in its code generation. This is expanded on below.

It is also possible to break from a labeled switch. This simply terminates evaluation of the switch expression, causing it to result in the given value, as though the case body were a labeled block. As with blocks, an unlabeled break will never target a switch statement; only a while or for loop.

Unlike a typical switch statement, a labeled switch with one or more continues targeting it is not implicitly evaluated at compile-time (this is similar to how loops behave). However, as with loops, compile-time evaluation can be forced by evaluating such an expression in a comptime context.

Code Generation Properties

This language construct is designed to generate code which aids the CPU in predicting branches between cases of the switch, allowing for increased performance in incredibly hot loops, particularly those dispatching instructions, evaluating FSAs, or performing similar case-based evaluations. To achieve this, the generated code may be different to what one would intuitively expect.

If the operand to continue is comptime-known, then it can be translated to an unconditional branch to the relevant case. Such a branch is perfectly predicted, and hence typically very fast to execute.

If the operand is runtime-known, then each continue can become a separate conditional branch (ideally via a shared jump table) back to the same set of potential branch targets. The advantage of this pattern is that it aids the CPU's branch predictor by providing different branch instructions which can be associated with distinct prediction data. For instance, when evaluating an FSA, if case a is very likely to be followed by case b, while case c is very likely to be followed by case d, then the branch predictor can use the direct jumps between switch cases to predict the control flow more accurately, whereas a loop-based lowering causes the state dispatches to be "collapsed" into a single indirect branch or similar, hindering branch prediction.

This lowering can inflate code size compared to a simple "switch in a loop" lowering, and any Zig implementation is, of course, free to lower this syntax however it wishes provided the language semantics are obeyed. However, the official ZSF compiler implementation will attempt to match the lowering described above, particularly in the ReleaseFast build mode.

@andrewrk
Copy link
Member

Follow-up issue for docs:

Nice demo of this in action:

LiterallyVoid added a commit to LiterallyVoid/zig that referenced this pull request Sep 11, 2024
This feature was proposed in ziglang#8220, and implemented in ziglang#21257.
LiterallyVoid added a commit to LiterallyVoid/zig that referenced this pull request Sep 11, 2024
This feature was proposed in ziglang#8220, and implemented in ziglang#21257.
mlugg added a commit to mlugg/zig that referenced this pull request Sep 11, 2024
`break`ing from something which isn't a loop should always be opt-in.
This was a bug in ziglang#21257.
mlugg added a commit to mlugg/zig that referenced this pull request Sep 12, 2024
`break`ing from something which isn't a loop should always be opt-in.
This was a bug in ziglang#21257.
mlugg added a commit that referenced this pull request Sep 12, 2024
`break`ing from something which isn't a loop should always be opt-in.
This was a bug in #21257.
andrewrk added a commit that referenced this pull request Sep 13, 2024
Add langref docs for labeled switch

This feature was proposed in #8220, and implemented in #21257.

Co-authored-by: Andrew Kelley <andrew@ziglang.org>
@nm-remarkable
Copy link

The ziglings project now supports this feature https://codeberg.org/ziglings/exercises/pulls/161

DivergentClouds pushed a commit to DivergentClouds/zig that referenced this pull request Sep 24, 2024
`break`ing from something which isn't a loop should always be opt-in.
This was a bug in ziglang#21257.
DivergentClouds pushed a commit to DivergentClouds/zig that referenced this pull request Sep 24, 2024
Add langref docs for labeled switch

This feature was proposed in ziglang#8220, and implemented in ziglang#21257.

Co-authored-by: Andrew Kelley <andrew@ziglang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes This PR should be mentioned in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants