stage2: Pop error trace frames for handled errors (#1923) #12837

topolarity · 2022-09-14T05:44:40Z

This implements the (unapproved) error handling behavior described in #1923 (comment).

The result is a basic ownership-style algorithm for errors. Barring any bugs (🤞), it should guarantee that error return frames follow a strict stack discipline, which we need to correctly accumulate them in a single stack-per-thread.

Error return traces "live" when:

stored in a const variable
handled in a catch { ... } or else |err| { ... } block
propagated to caller via return or try
passed to a function as an argument

Error return traces are killed when:

stored in a mutable var
leaving a catch or else |err| block with a non-error value
the const variable associated with an error trace falls out of scope
a function receives an error as an argument and returns a non-error

Here's an example that tries to show most of this together:

fn foo() !void {
   eat_dinner() catch {
       // Begin error handling here
       find_recipe() catch {};        // find_recipe() is removed from trace upon leaving this catch block
       if (cond)
           return error.HouseOnFire;  //      error result: Includes eat_dinner() in the error return trace, but not find_recipe()
       try cook_recipe();             //      error result: If this fails, the error return trace includes eat_dinner(), followed by cook_recipe()
   }
   // We made it out of another catch block, eat_dinner() is removed from the trace 
   return error.MyInlawsAreComing;    //      error result: Error trace starts fresh here
}

fn bar() !void {
   // In addition to returning from within the block, you can pass
   // the result to an external return, try, catch, or if:
   return try_me() catch |err| switch(err) {
       error.IgnorableError => null,   // non-error result: Pops try_me() from the error return trace
       error.ImportantError => b: {
           break :b error.FatalError;  //     error result: Includes try_me() in the error return trace
       },
       else => err,                    //     error result: Includes try_me() in the error return trace
   };
}
test {
   try expectError(error.FatalError, bar());         // If this fails, `bar()` appears in the error return trace
   try expectError(error.MyInlawsAreComing, foo());  // If this fails, `foo()` appears in the error return trace (`bar` was already popped)
}

The main caveat is that traces are lost when storing errors to a mutable var. For example:

fn foo() !void {
    var x = try_me();   // If this were `const` or `return try_me();`, the trace wouldn't be killed here
     // ... fancy logic here
    return x;           // Error trace starts fresh here - try_me() is not included.
}

Note: There is room to relax this last condition for var, but only a little bit. We could do the same thing we do for const and keep the error trace alive for the remainder of the block where the assignment happens. Any wider scope will violate the strict stack discipline for error return traces, so it won't work. In the end, I decided the most consistent behavior for the user is just to kill error return traces when assigning to var.

Supersedes #12825. Closes #1923.
(Many thanks @Vexu for that PR btw -- it was very helpful to tighten up some of my handling in AstGen)

lib/test_runner.zig

src/Sema.zig

andrewrk · 2022-09-18T19:53:20Z

@topolarity could you rebase this against master branch please? My auto-rebase script failed due to conflicts.

topolarity · 2022-09-23T23:20:18Z

With the latest push, examples in the style of #11593 work as expected:

const expectError = @import("std").testing.expectError;

fn alwaysErrors() !void { return error.BUG_ThisErrorShouldNotAppearInAnyTrace; }
fn foo() !void { return error.Foo; }

test "test expected error" {
    try expectError(error.BUG_ThisErrorShouldNotAppearInAnyTrace, alwaysErrors());
    try expectError(error.BUG_ThisErrorShouldNotAppearInAnyTrace, alwaysErrors());
    try expectError(error.Bar, foo());
}

Output with this PR is:

Test [1/1] test.test expected error... expected error.Bar, found error.Foo
Test [1/1] test.test expected error... FAIL (TestExpectedError)
./testme.zig:4:18: 0x211d08 in foo (test)
fn foo() !void { return error.Foo; }
                 ^
/home/topolarity/repos/zig/build/stage3/lib/zig/std/testing.zig:37:13: 0x211dcf in expectError__anon_1111 (test)
            return error.TestExpectedError;
            ^
./testme.zig:9:5: 0x212052 in test.test expected error (test)
    try expectError(error.Bar, foo());
    ^
0 passed; 0 skipped; 1 failed.

Before we had entries from each expectError call. Now, the entries are cleaned up if expectError passes.

topolarity · 2022-09-26T03:46:30Z

Was able to lift the const restriction. This new behavior no longer requires modifying the test runner 🚀

andrewrk · 2022-10-15T18:11:54Z

Looking forward to this change. After the latest rebase, all CI checks passed. However due to merging of other PRs there are now several conflicts. Would you mind rebasing? (If a rebase is too problematic, a merge of master branch is fine)

topolarity · 2022-10-16T17:07:04Z

Alright, looks like the rebase went well

The rebase did expose an existing bug #13175 which together with this change causes 1 behavior test regression. I've skipped that test for now

andrewrk

Thanks for doing this incredible work.

I'm inclined to merge this, however, the additional complications to the compiler internals, as well as the additional runtime performance overhead are giving me pause.

Before merging, can you share some details on these topics?

How might this integrate with async functions? In stage1 we combine error return traces when doing await.
Can you provide any benchmark data about how runtime performance is affected?
Can you share how this affects the compilation speed of the self-hosted compiler?

src/AstGen.zig

topolarity · 2022-10-20T18:42:11Z

Very good questions.

The same things gave me concern - Let's see if I can get a read on the situation for you.

How might this integrate with async functions? In stage1 we combine error return traces when doing await.

I think the strategy here can stay largely the same. Merging the traces means that it's as if the error were generated "at" the await. For the save/restore logic here, if we treat an errorable await exactly the way that we treat an errorable function call, then popping should continue to work correctly.

Can you share how this affects the compilation speed of the self-hosted compiler?

This is the most significant difference I'd expect. This does add a significant number of new ZIR instructions:

; src/codegen/c.zig (master)
# Source bytes:       158.0126953125KiB
# Tokens:             34052 (166.29296875KiB)
# AST Nodes:          17641 (224.0751953125KiB)
# Total ZIR bytes:    554.3564453125KiB
# Instructions:       34155 (300.1904296875KiB)
# String Table Bytes: 18.626953125KiB
# Extra Data Items:   60282 (235.4765625KiB)

; src/codegen/c.zig (this PR)       
# Source bytes:       158.0126953125KiB
# Tokens:             34052 (166.29296875KiB)
# AST Nodes:          17641 (224.0751953125KiB)
# Total ZIR bytes:    585.8505859375KiB
# Instructions:       36505 (320.8447265625KiB)
# String Table Bytes: 18.626953125KiB
# Extra Data Items:   63057 (246.31640625KiB)

That's a ~6.9% increase in instructions, and a 5.7% increase in ZIR bytes (sorry to eat up your .try savings 😬)

Here are the build timings:

# master (stage3) ReleaseSafe building Debug
54.64s user 2.51s system 101% cpu 56.450 total
54.95s user 2.28s system 101% cpu 56.521 total
55.01s user 2.26s system 101% cpu 56.559 total

# this PR (stage3) ReleaseSafe building Debug
55.86s user 2.44s system 101% cpu 57.544 total
56.30s user 2.31s system 101% cpu 57.894 total
55.33s user 2.36s system 101% cpu 56.963 total


# master (stage3) Debug building Debug
102.98s user 2.70s system 103% cpu 1:42.59 total
103.23s user 3.25s system 102% cpu 1:43.62 total
102.81s user 2.88s system 102% cpu 1:42.73 total

# this PR (stage3) Debug building Debug
103.77s user 2.92s system 102% cpu 1:43.73 total
105.92s user 2.95s system 102% cpu 1:45.86 total
104.37s user 2.74s system 102% cpu 1:43.99 total

Each run was done from a completely fresh cache, so that this includes AstGen and build.zig compilation time.

Can you provide any benchmark data about how runtime performance is affected?

The self-compilation timings above should include the runtime performance hit, in addition to the delta from code changes in this PR. I did try to make some more adversarial benchmarks that would do a lot of save/restore, but my attempts didn't find a delta that rose above the measurement noise yet.

In general there are two places with extra runtime code:

Error handling: Trace index save/restore at every block entry/exit for catch, else |err|, and any blocks containing const x = foo(); where foo() is errorable
Discarded error traces: Trace index save/restore wrapping any errorable function call whose trace is "killed" (generally, those that do not go to try/catch/return/if-else-err/const x = ...)

try foo() and return foo() require no extra interactions with the trace index (except when a return exits an error-handling block as in (1)). Overall, compared to the overhead of error return tracing in general I think this is probably not a large delta, but let me know if you'd like me to test some existing projects or continue hunting for adversarial cases.

andrewrk · 2022-10-21T03:00:04Z

Thanks for taking these measurements. I'm a little confused by "ReleaseSafe" here - did you hack up the build.zig script? Because it only exposes the -Drelease flag which will select ReleaseFast.

Also, which one of those timings is "wall clock"?

topolarity · 2022-10-21T03:18:39Z

Thanks for taking these measurements. I'm a little confused by "ReleaseSafe" here - did you hack up the build.zig script? Because it only exposes the -Drelease flag which will select ReleaseFast.

Yeah, that's exactly what I did here

I did a quick sanity check against the usual ReleaseFast too, which clocked in about 2-3 seconds faster iirc. Delta was similar between master and PR, about 1 second elapsed time.

Also, which one of those timings is "wall clock"?

"Total" should be the wall clock. These are reported from zsh's "time"

andrewrk

OK, this is good to be merged!

This implement trace "popping" for correctly handled errors within `catch { ... }` and `else { ... }` blocks. When breaking from these blocks with any non-error, we pop the error trace frames corresponding to the operand. When breaking with an error, we preserve the frames so that error traces "chain" together as usual. ```zig fn foo(cond1: bool, cond2: bool) !void { bar() catch { if (cond1) { // If baz() result is a non-error, pop the error trace frames from bar() // If baz() result is an error, leave the bar() frames on the error trace return baz(); } else if (cond2) { // If we break/return an error, then leave the error frames from bar() on the error trace return error.Foo; } }; // An error returned from here does not include bar()'s error frames in the trace return error.Bar; } ``` Notice that if foo() does not return an error it, it leaves no extra frames on the error trace. This is piece (1/3) of ziglang#1923 (comment)

This allows for errors to be "re-thrown" by yielding any error as the result of a catch block. For example: ```zig fn errorable() !void { return error.FallingOutOfPlane; } fn foo(have_parachute: bool) !void { return errorable() catch |err| b: { if (have_parachute) { // error trace will include the call to errorable() break :b error.NoParachute; } else { return; } }; } pub fn main() !void { // Anything that returns a non-error does not pollute the error trace. try foo(true); // This error trace will still include errorable(), whose error was "re-thrown" by foo() try foo(false); } ``` This is piece (2/3) of ziglang#1923 (comment)

In order to enforce a strict stack discipline for error return traces, we cannot track error return traces that are stored in variables: ```zig const x = errorable(); // errorable()'s error return trace is killed here // v-- error trace starts here instead return x catch error.UnknownError; ``` In order to propagate error return traces, function calls need to be passed directly to an error-handling expression (`if`, `catch`, `try` or `return`): ```zig // When passed directly to `catch`, the return trace is propagated return errorable() catch error.UnknownError; // Using a break also works return blk: { // code here break :blk errorable(); } catch error.UnknownError; ``` Why do we need this restriction? Without it, multiple errors can co-exist with their own error traces. Handling that situation correctly means either: a. Dynamically allocating trace memory and tracking lifetimes, OR b. Allowing the production of one error to interfere with the trace of another (which is the current status quo) This is piece (3/3) of ziglang#1923 (comment)

This is encoded as a primitive AIR instruction to resolve one corner case: A function may include a `catch { ... }` or `else |err| { ... }` block but not call any errorable fn. In that case, there is no error return trace to save the index of and codegen needs to avoid interacting with the non-existing error trace. By using a primitive AIR op, we can depend on Liveness to mark this unused in this corner case.

This re-factor is intended to make it easier to track what kind of operator/expression consumes a result location, without overloading the ResultLoc union for this purpose. This is used in the following commit to keep track of initializer expressions of `const` variables to avoid popping error traces pre-maturely. Hopefully this will also be useful for implementing RLS temporaries in the future.

This change extends the "lifetime" of the error return trace associated with an error to include the duration of a function call it is passed to. This means that if a function returns an error, its return trace will include the error return trace for any error inputs. This is needed to support `testing.expectError` and similar functions. If a function returns a non-error, we have to clean up any error return traces created by error-able call arguments.

Despite the old doc-comment, this function cannot be valid for all types since it operates with only a value and Error (Union) types have overlapping Value representations with other Types.

This change extends the "lifetime" of the error return trace associated with an error to continue throughout the block of a `const` variable that it is assigned to. This is necessary to support patterns like this one in test_runner.zig: ```zig const result = foo(); if (result) |_| { // ... success logic } else |err| { // `foo()` should be included in the error trace here return error.TestFailed; } ``` To make this happen, the majority of the error return trace popping logic needed to move into Sema, since `const x = foo();` cannot be examined syntactically to determine whether it modifies the error return trace. We also have to make sure not to delete pertinent block information before it makes it to Sema, so that Sema can pop/restore around blocks correctly. * Why do this only for `const` and not `var`? * There is room to relax things for `var`, but only a little bit. We could do the same thing we do for const and keep the error trace alive for the remainder of the block where the *assignment* happens. Any wider scope would violate the stack discipline for traces, so it's not viable. In the end, I decided the most consistent behavior for the user is just to kill all error return traces assigned to a mutable `var`.

Previously, we'd overwrite the errors in a circular buffer. Now that error return traces are intended to follow a stack discipline, we no longer have to support the index rolling over. By treating the trace like a saturating stack, any pop/restore code still behaves correctly past-the-end of the trace. As a bonus, this adds a small blurb to let the user know when the trace saturated and x number of frames were dropped.

This PR (ziglang#12873) in combination with this particular test exposed a pre-existing bug (ziglang#13175). This means that the test for ziglang#13038 has regressed

Instead of adding 3 fields to every `Block`, this adds just one. The function-level information is saved in the `Sema` struct instead, which is created/copied more rarely.

topolarity · 2022-10-21T19:57:22Z

Alright, final rebase (gods willing 🙏 )

I also added one small final commit that reduces the size impact to Sema.Block.

Let's let CI give us the green light and then bring this in!

andrewrk · 2022-10-22T03:27:15Z

Congrats on the big merge, and thanks for your patience with all the rebases!

Would you be willing to type me up some release notes for this change to demonstrate it?

squeek502 · 2022-10-22T07:23:08Z

Thank you for taking this on @topolarity (and @Vexu for starting it). I think this being addressed will hugely benefit Zig overall, as needing to know that (a) some frames of error return traces needed to be ignored and (b) how to determine which frames should be ignored was both confusing for new Zig programmers and frustrating for experienced Zig programmers.

Also, props on recognizing the need for and implementing a more complete solution than the original proposal in #1923.

topolarity · 2022-10-24T06:58:02Z

Would you be willing to type me up some release notes for this change to demonstrate it?

@andrewrk Sure thing! Can I get your take on #12990?

I'm OK whether it lands or not but I wanted to ask since it affects the example

Thank you for taking this on @topolarity (and @Vexu for starting it).

Thank you for the kind words @squeek502 - Always great to know when things are having an impact downstream 🙂

PR ziglang#12837 handled control flow for break and return, but I forgot about `continue`. This is effectively another break, so we just need another `.restore_err_ret_index` ZIR instruction. Resolves ziglang#13618.

PR #12837 handled control flow for break and return, but I forgot about `continue`. This is effectively another break, so we just need another `.restore_err_ret_index` ZIR instruction. Resolves #13618.

topolarity changed the title ~~stage2: Remove handled error trace frames (#1923)~~ stage2: Pop error trace frames for handled errors (#1923) Sep 14, 2022

topolarity force-pushed the err-ret-trace-improvements-1923 branch 2 times, most recently from 73e5235 to 09842f9 Compare September 14, 2022 08:15

Vexu reviewed Sep 14, 2022

View reviewed changes

lib/test_runner.zig Outdated Show resolved Hide resolved

src/Sema.zig Outdated Show resolved Hide resolved

src/Sema.zig Outdated Show resolved Hide resolved

topolarity force-pushed the err-ret-trace-improvements-1923 branch from 09842f9 to 8427412 Compare September 23, 2022 19:58

topolarity force-pushed the err-ret-trace-improvements-1923 branch 3 times, most recently from 675169d to 3fa635b Compare September 26, 2022 17:20

topolarity requested a review from kristoff-it as a code owner September 26, 2022 17:20

topolarity force-pushed the err-ret-trace-improvements-1923 branch 2 times, most recently from 1511488 to bb8465a Compare September 27, 2022 23:42

topolarity mentioned this pull request Sep 27, 2022

stage2: Include catch expressions in error return trace #12990

Closed

topolarity force-pushed the err-ret-trace-improvements-1923 branch 4 times, most recently from f39f0fc to bf9ecd0 Compare October 16, 2022 05:24

topolarity requested a review from andrewrk October 16, 2022 17:07

andrewrk requested changes Oct 19, 2022

View reviewed changes

src/AstGen.zig Outdated Show resolved Hide resolved

topolarity force-pushed the err-ret-trace-improvements-1923 branch from 60db2f9 to 1fe1f31 Compare October 20, 2022 18:56

topolarity requested a review from andrewrk October 20, 2022 18:57

andrewrk approved these changes Oct 21, 2022

View reviewed changes

stage2: properly reset error return trace index

5316a00

topolarity added 11 commits October 21, 2022 10:43

stage2: Fix usage of getError()

597ead5

Despite the old doc-comment, this function cannot be valid for all types since it operates with only a value and Error (Union) types have overlapping Value representations with other Types.

stage2: Skip test exposing ziglang#13175

74b9cbd

This PR (ziglang#12873) in combination with this particular test exposed a pre-existing bug (ziglang#13175). This means that the test for ziglang#13038 has regressed

Change how Block propagates (error return) trace index

c36a2c2

Instead of adding 3 fields to every `Block`, this adds just one. The function-level information is saved in the `Sema` struct instead, which is created/copied more rarely.

topolarity force-pushed the err-ret-trace-improvements-1923 branch from 1fe1f31 to c36a2c2 Compare October 21, 2022 19:47

andrewrk merged commit 09236d2 into ziglang:master Oct 22, 2022

topolarity deleted the err-ret-trace-improvements-1923 branch October 24, 2022 06:59

squeek502 mentioned this pull request Nov 21, 2022

catch continue doesn't handle error return trace frame popping correctly #13618

Closed

Vexu mentioned this pull request Jan 17, 2024

astgen: fix error return trace on error union switch #18599

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stage2: Pop error trace frames for handled errors (#1923) #12837

stage2: Pop error trace frames for handled errors (#1923) #12837

topolarity commented Sep 14, 2022 •

edited

Loading

andrewrk commented Sep 18, 2022

topolarity commented Sep 23, 2022

topolarity commented Sep 26, 2022

andrewrk commented Oct 15, 2022 •

edited

Loading

topolarity commented Oct 16, 2022

andrewrk left a comment

topolarity commented Oct 20, 2022

andrewrk commented Oct 21, 2022 •

edited

Loading

topolarity commented Oct 21, 2022

andrewrk left a comment

topolarity commented Oct 21, 2022

andrewrk commented Oct 22, 2022

squeek502 commented Oct 22, 2022 •

edited

Loading

topolarity commented Oct 24, 2022 •

edited

Loading

stage2: Pop error trace frames for handled errors (#1923) #12837

stage2: Pop error trace frames for handled errors (#1923) #12837

Conversation

topolarity commented Sep 14, 2022 • edited Loading

andrewrk commented Sep 18, 2022

topolarity commented Sep 23, 2022

topolarity commented Sep 26, 2022

andrewrk commented Oct 15, 2022 • edited Loading

topolarity commented Oct 16, 2022

andrewrk left a comment

Choose a reason for hiding this comment

topolarity commented Oct 20, 2022

andrewrk commented Oct 21, 2022 • edited Loading

topolarity commented Oct 21, 2022

andrewrk left a comment

Choose a reason for hiding this comment

topolarity commented Oct 21, 2022

andrewrk commented Oct 22, 2022

squeek502 commented Oct 22, 2022 • edited Loading

topolarity commented Oct 24, 2022 • edited Loading

topolarity commented Sep 14, 2022 •

edited

Loading

andrewrk commented Oct 15, 2022 •

edited

Loading

andrewrk commented Oct 21, 2022 •

edited

Loading

squeek502 commented Oct 22, 2022 •

edited

Loading

topolarity commented Oct 24, 2022 •

edited

Loading