compiler: implement labeled switch/continue #21257

mlugg · 2024-08-30T23:43:42Z

Implements #8220.

This is a revival of #19812.

@Luukdegram, you might want to check the Wasm changes -- I rebased your work, but haven't tested it, so the rebase could have gone wrong somewhere, not sure. It's probably fine? (oops, the commit authorship info got lost, sorry!)

@jacobly0, I haven't implemented the new syntax in the x86_64 backend. If you'd like to do this imminently on this PR, feel free; otherwise I'll let you do it on master in your own time.

@Rexicon226, I've regressed some bits of the RISC-V backend, specifically loops and switches containing ranges. I can probably fix them up (loops certainly should be easy), but it might be easier if you can take a look?

Rexicon226 · 2024-08-30T23:49:40Z

@Rexicon226, I've regressed some bits of the RISC-V backend, specifically loops and switches containing ranges. I can probably fix them up (loops certainly should be easy), but it might be easier if you can take a look?

You got it!

Luukdegram · 2024-08-31T17:00:33Z

This is a revival of #19812.

@Luukdegram, you might want to check the Wasm changes -- I rebased your work, but haven't tested it, so the rebase could have gone wrong somewhere, not sure. It's probably fine? (oops, the commit authorship info got lost, sorry!)

Looks good! Thanks for doing the work.

This commit modifies the representation of the AIR `switch_br` instruction to represent ranges in cases. Previously, Sema emitted different AIR in the case of a range, where the `else` branch of the `switch_br` contained a simple `cond_br` for each such case which did a simple range check (`x > a and x < b`). Not only does this add complexity to Sema, which we would like to minimize, but it also gets in the way of the implementation of ziglang#8220. That proposal turns certain `switch` statements into a looping construct, and for optimization purposes, we want to lower this to AIR fairly directly (i.e. without involving a `loop` instruction). That means we would ideally like a single instruction to represent the entire `switch` statement, so that we can dispatch back to it with a different operand as in ziglang#8220. This is not really possible to do correctly under the status quo system. This commit implements lowering of this new `switch_br` usage in the LLVM and C backends. The C backend just turns any case containing ranges entirely into conditionals, as before. The LLVM backend is a little smarter, and puts scalar items into the `switch` instruction, only using conditionals for the range cases (which direct to the same bb). All remaining self-hosted backends are temporarily regressed in the presence of switch range cases. This functionality will be restored for at least the x86_64 backend before merge.

This commit introduces a new AIR instruction, `repeat`, which causes control flow to move back to the start of a given AIR loop. `loop` instructions will no longer automatically perform this operation after control flow reaches the end of the body. The motivation for making this change now was really just consistency with the upcoming implementation of ziglang#8220: it wouldn't make sense to have this feature work significantly differently. However, there were already some TODOs kicking around which wanted this feature. It's useful for two key reasons: * It allows loops over AIR instruction bodies to loop precisely until they reach a `noreturn` instruction. This allows for tail calling a few things, and avoiding a range check on each iteration of a hot path, plus gives a nice assertion that validates AIR structure a little. This is a very minor benefit, which this commit does apply to the LLVM and C backends. * It should allow for more compact ZIR and AIR to be emitted by having AstGen emit `repeat` instructions more often rather than having `continue` statements `break` to a `block` which is *followed* by a `repeat`. This is done in status quo because `repeat` instructions only ever cause the direct parent block to repeat. Now that AIR is more flexible, this flexibility can be pretty trivially extended to ZIR, and we can then emit better ZIR. This commit does not implement this. Support for this feature is currently regressed on all self-hosted native backends, including x86_64. This support will be added where necessary before this branch is merged.

The parse of `fn foo(a: switch (...) { ... })` was previously handled incorrectly; `a` was treated as both the parameter name and a label. The same issue exists for `for` and `while` expressions -- they should be fixed too, and the grammar amended appropriately. This commit does not do this: it only aims to avoid introducing regressions from labeled switch syntax.

`.loop` is also a block, so the block_depth must be stored *after* block creation, ensuring a correct block_depth to jump back to when receiving `.repeat`. This also un-regresses `switch_br` which now correctly handles ranges within cases. It supports it for both jump tables as well as regular conditional branches.

This does *not* yet implement the new `loop_switch_br` instruction.

Also, don't use the special switch lowering for errors if the switch is labeled; this isn't currently supported. Related: ziglang#20627.

andrewrk · 2024-09-05T01:31:38Z

Brilliant work as usual, @mlugg

Don't forget the release notes writeup while it's fresh in your mind :)

mlugg · 2024-09-08T23:38:15Z

Release notes are below.

Labeled `switch`

Zig 0.14.0 implements an accepted proposal which allows switch statements to be labeled, and to be targeted by continue statements. Such a continue statement takes a single operand (like break can to return a value from a block or loop); this value is treated as a replacement operand to the original switch expression. This construct is semantically equivalent to a switch statement inside of a loop, with a variable tracking the switch operand; for instance, the following tests are equivalent:

test "labeled switch" {
    foo: switch (@as(u8, 1)) {
        1 => continue :foo 2,
        2 => continue :foo 3,
        3 => return,
        4 => {},
    }
    return error.Unexpected;
}

test "emulate labeled switch" {
    var op: u8 = 1;
    while (true) {
        switch (op) {
            1 => { op = 2; continue; },
            2 => { op = 3; continue; },
            3 => return,
            4 => {},
        }
        break;
    }
    return error.Unexpected;
}

These constructs differ in two ways. The most obvious difference is in clarity: the new syntax form is clearer at times, for instance when implementing Finite State Automata where one can write continue :fsa new_state to represent a state transition. However, a key motivation for this language feature lies in its code generation. This is expanded on below.

It is also possible to break from a labeled switch. This simply terminates evaluation of the switch expression, causing it to result in the given value, as though the case body were a labeled block. As with blocks, an unlabeled break will never target a switch statement; only a while or for loop.

Unlike a typical switch statement, a labeled switch with one or more continues targeting it is not implicitly evaluated at compile-time (this is similar to how loops behave). However, as with loops, compile-time evaluation can be forced by evaluating such an expression in a comptime context.

Code Generation Properties

This language construct is designed to generate code which aids the CPU in predicting branches between cases of the switch, allowing for increased performance in incredibly hot loops, particularly those dispatching instructions, evaluating FSAs, or performing similar case-based evaluations. To achieve this, the generated code may be different to what one would intuitively expect.

If the operand to continue is comptime-known, then it can be translated to an unconditional branch to the relevant case. Such a branch is perfectly predicted, and hence typically very fast to execute.

If the operand is runtime-known, then each continue can become a separate conditional branch (ideally via a shared jump table) back to the same set of potential branch targets. The advantage of this pattern is that it aids the CPU's branch predictor by providing different branch instructions which can be associated with distinct prediction data. For instance, when evaluating an FSA, if case a is very likely to be followed by case b, while case c is very likely to be followed by case d, then the branch predictor can use the direct jumps between switch cases to predict the control flow more accurately, whereas a loop-based lowering causes the state dispatches to be "collapsed" into a single indirect branch or similar, hindering branch prediction.

This lowering can inflate code size compared to a simple "switch in a loop" lowering, and any Zig implementation is, of course, free to lower this syntax however it wishes provided the language semantics are obeyed. However, the official ZSF compiler implementation will attempt to match the lowering described above, particularly in the ReleaseFast build mode.

andrewrk · 2024-09-10T05:34:00Z

Follow-up issue for docs:

langref docs for labeled switch/continue #21375

Nice demo of this in action:

Faster tokenizer #21367

This feature was proposed in ziglang#8220, and implemented in ziglang#21257.

`break`ing from something which isn't a loop should always be opt-in. This was a bug in ziglang#21257.

`break`ing from something which isn't a loop should always be opt-in. This was a bug in #21257.

Add langref docs for labeled switch This feature was proposed in #8220, and implemented in #21257. Co-authored-by: Andrew Kelley <andrew@ziglang.org>

nm-remarkable · 2024-09-16T16:45:05Z

The ziglings project now supports this feature https://codeberg.org/ziglings/exercises/pulls/161

`break`ing from something which isn't a loop should always be opt-in. This was a bug in ziglang#21257.

Add langref docs for labeled switch This feature was proposed in ziglang#8220, and implemented in ziglang#21257. Co-authored-by: Andrew Kelley <andrew@ziglang.org>

`break`ing from something which isn't a loop should always be opt-in. This was a bug in ziglang#21257.

Add langref docs for labeled switch This feature was proposed in ziglang#8220, and implemented in ziglang#21257. Co-authored-by: Andrew Kelley <andrew@ziglang.org>

mlugg requested a review from Snektron as a code owner August 30, 2024 23:43

mlugg force-pushed the computed-goto-3 branch from 4e9102f to edbebbc Compare September 1, 2024 16:38

mlugg added the release notes This PR should be mentioned in the release notes. label Sep 1, 2024

jacobly0 and others added 8 commits September 1, 2024 18:30

Builder: add indirectbr llvm instruction

49ad51b

compiler: implement labeled switch/continue

5e12ca9

std.zig.render: fix switch rendering

3b52e5a

x86_64: un-regress loop and switch_br

fd70d9d

This does *not* yet implement the new `loop_switch_br` instruction.

mlugg force-pushed the computed-goto-3 branch from edbebbc to 0d295d7 Compare September 1, 2024 17:30

mlugg and others added 6 commits September 1, 2024 18:31

AstGen: allow breaking from labeled switch

b7a55cd

Also, don't use the special switch lowering for errors if the switch is labeled; this isn't currently supported. Related: ziglang#20627.

AstGen: error on unused switch label

2b9af9e

x86_64: implement loop_switch_br and switch_dispatch

d5b01df

riscv: implement repeat and the new switch_br

97ed239

riscv: implement switch_dispatch & loop_switch_br

0d295d7

cbe: don't emit 'x = x' in switch dispatch loop

289c704

andrewrk merged commit 3929cac into ziglang:master Sep 5, 2024
10 checks passed

xdBronch mentioned this pull request Sep 9, 2024

Computed goto codegen is inconsistent on aarch64. #14444

Closed

andrewrk mentioned this pull request Sep 10, 2024

langref docs for labeled switch/continue #21375

Closed

LiterallyVoid added a commit to LiterallyVoid/zig that referenced this pull request Sep 11, 2024

Add langref docs for labeled switch

dbc3a8f

This feature was proposed in ziglang#8220, and implemented in ziglang#21257.

LiterallyVoid added a commit to LiterallyVoid/zig that referenced this pull request Sep 11, 2024

Add langref docs for labeled switch

32c9b09

This feature was proposed in ziglang#8220, and implemented in ziglang#21257.

mlugg added a commit to mlugg/zig that referenced this pull request Sep 11, 2024

AstGen: do not allow unlabeled break to exit a labeled switch

2c1c21b

`break`ing from something which isn't a loop should always be opt-in. This was a bug in ziglang#21257.

mlugg mentioned this pull request Sep 11, 2024

AstGen: do not allow unlabeled break to exit a labeled switch #21385

Merged

mlugg added a commit to mlugg/zig that referenced this pull request Sep 12, 2024

AstGen: do not allow unlabeled break to exit a labeled switch

9ce93da

`break`ing from something which isn't a loop should always be opt-in. This was a bug in ziglang#21257.

mlugg added a commit that referenced this pull request Sep 12, 2024

AstGen: do not allow unlabeled break to exit a labeled switch

03c3633

`break`ing from something which isn't a loop should always be opt-in. This was a bug in #21257.

andrewrk added a commit that referenced this pull request Sep 13, 2024

Labeled switch documentation (#21383)

cf69154

Add langref docs for labeled switch This feature was proposed in #8220, and implemented in #21257. Co-authored-by: Andrew Kelley <andrew@ziglang.org>

chrboesch mentioned this pull request Sep 13, 2024

Labled switch pedropark99/zig-book#35

Closed

DivergentClouds pushed a commit to DivergentClouds/zig that referenced this pull request Sep 24, 2024

AstGen: do not allow unlabeled break to exit a labeled switch

f985b9c

`break`ing from something which isn't a loop should always be opt-in. This was a bug in ziglang#21257.

Ev1lT3rm1nal mentioned this pull request Oct 2, 2024

Support new Zig's labeled switch/continue skvadrik/re2c#492

Closed

modulovalue mentioned this pull request Oct 3, 2024

Should we deprecate and remove support for switch case labels? dart-lang/language#3441

Open

richerfu pushed a commit to richerfu/zig that referenced this pull request Oct 28, 2024

AstGen: do not allow unlabeled break to exit a labeled switch

02bd6c3

`break`ing from something which isn't a loop should always be opt-in. This was a bug in ziglang#21257.

RalfJung mentioned this pull request Oct 31, 2024

RFC: Improved State Machine Codegen rust-lang/rfcs#3720

Open

folkertdev mentioned this pull request Nov 4, 2024

Initiative for Rust codegen trifectatechfoundation/trifectatech-website#20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiler: implement labeled switch/continue #21257

compiler: implement labeled switch/continue #21257

mlugg commented Aug 30, 2024 •

edited by andrewrk

Loading

Rexicon226 commented Aug 30, 2024

Luukdegram commented Aug 31, 2024

andrewrk commented Sep 5, 2024 •

edited

Loading

mlugg commented Sep 8, 2024 •

edited

Loading

andrewrk commented Sep 10, 2024

nm-remarkable commented Sep 16, 2024

compiler: implement labeled switch/continue #21257

compiler: implement labeled switch/continue #21257

Conversation

mlugg commented Aug 30, 2024 • edited by andrewrk Loading

Rexicon226 commented Aug 30, 2024

Luukdegram commented Aug 31, 2024

andrewrk commented Sep 5, 2024 • edited Loading

mlugg commented Sep 8, 2024 • edited Loading

Labeled switch

Code Generation Properties

andrewrk commented Sep 10, 2024

nm-remarkable commented Sep 16, 2024

mlugg commented Aug 30, 2024 •

edited by andrewrk

Loading

andrewrk commented Sep 5, 2024 •

edited

Loading

mlugg commented Sep 8, 2024 •

edited

Loading

Labeled `switch`