Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stage2: Pop error trace frames for handled errors (#1923) #12837

Merged
merged 12 commits into from
Oct 22, 2022

Conversation

topolarity
Copy link
Contributor

@topolarity topolarity commented Sep 14, 2022

This implements the (unapproved) error handling behavior described in #1923 (comment).

The result is a basic ownership-style algorithm for errors. Barring any bugs (🤞), it should guarantee that error return frames follow a strict stack discipline, which we need to correctly accumulate them in a single stack-per-thread.

Error return traces "live" when:

  • stored in a const variable
  • handled in a catch { ... } or else |err| { ... } block
  • propagated to caller via return or try
  • passed to a function as an argument

Error return traces are killed when:

  • stored in a mutable var
  • leaving a catch or else |err| block with a non-error value
  • the const variable associated with an error trace falls out of scope
  • a function receives an error as an argument and returns a non-error

Here's an example that tries to show most of this together:

fn foo() !void {
   eat_dinner() catch {
       // Begin error handling here
       find_recipe() catch {};        // find_recipe() is removed from trace upon leaving this catch block
       if (cond)
           return error.HouseOnFire;  //      error result: Includes eat_dinner() in the error return trace, but not find_recipe()
       try cook_recipe();             //      error result: If this fails, the error return trace includes eat_dinner(), followed by cook_recipe()
   }
   // We made it out of another catch block, eat_dinner() is removed from the trace 
   return error.MyInlawsAreComing;    //      error result: Error trace starts fresh here
}

fn bar() !void {
   // In addition to returning from within the block, you can pass
   // the result to an external return, try, catch, or if:
   return try_me() catch |err| switch(err) {
       error.IgnorableError => null,   // non-error result: Pops try_me() from the error return trace
       error.ImportantError => b: {
           break :b error.FatalError;  //     error result: Includes try_me() in the error return trace
       },
       else => err,                    //     error result: Includes try_me() in the error return trace
   };
}
test {
   try expectError(error.FatalError, bar());         // If this fails, `bar()` appears in the error return trace
   try expectError(error.MyInlawsAreComing, foo());  // If this fails, `foo()` appears in the error return trace (`bar` was already popped)
}

The main caveat is that traces are lost when storing errors to a mutable var. For example:

fn foo() !void {
    var x = try_me();   // If this were `const` or `return try_me();`, the trace wouldn't be killed here
     // ... fancy logic here
    return x;           // Error trace starts fresh here - try_me() is not included.
}

Note: There is room to relax this last condition for var, but only a little bit. We could do the same thing we do for const and keep the error trace alive for the remainder of the block where the assignment happens. Any wider scope will violate the strict stack discipline for error return traces, so it won't work. In the end, I decided the most consistent behavior for the user is just to kill error return traces when assigning to var.

Supersedes #12825. Closes #1923.
(Many thanks @Vexu for that PR btw -- it was very helpful to tighten up some of my handling in AstGen)

@topolarity topolarity changed the title stage2: Remove handled error trace frames (#1923) stage2: Pop error trace frames for handled errors (#1923) Sep 14, 2022
@topolarity topolarity force-pushed the err-ret-trace-improvements-1923 branch 2 times, most recently from 73e5235 to 09842f9 Compare September 14, 2022 08:15
lib/test_runner.zig Outdated Show resolved Hide resolved
src/Sema.zig Outdated Show resolved Hide resolved
src/Sema.zig Outdated Show resolved Hide resolved
@andrewrk
Copy link
Member

@topolarity could you rebase this against master branch please? My auto-rebase script failed due to conflicts.

@topolarity topolarity force-pushed the err-ret-trace-improvements-1923 branch from 09842f9 to 8427412 Compare September 23, 2022 19:58
@topolarity
Copy link
Contributor Author

With the latest push, examples in the style of #11593 work as expected:

const expectError = @import("std").testing.expectError;

fn alwaysErrors() !void { return error.BUG_ThisErrorShouldNotAppearInAnyTrace; }
fn foo() !void { return error.Foo; }

test "test expected error" {
    try expectError(error.BUG_ThisErrorShouldNotAppearInAnyTrace, alwaysErrors());
    try expectError(error.BUG_ThisErrorShouldNotAppearInAnyTrace, alwaysErrors());
    try expectError(error.Bar, foo());
}

Output with this PR is:

Test [1/1] test.test expected error... expected error.Bar, found error.Foo
Test [1/1] test.test expected error... FAIL (TestExpectedError)
./testme.zig:4:18: 0x211d08 in foo (test)
fn foo() !void { return error.Foo; }
                 ^
/home/topolarity/repos/zig/build/stage3/lib/zig/std/testing.zig:37:13: 0x211dcf in expectError__anon_1111 (test)
            return error.TestExpectedError;
            ^
./testme.zig:9:5: 0x212052 in test.test expected error (test)
    try expectError(error.Bar, foo());
    ^
0 passed; 0 skipped; 1 failed.

Before we had entries from each expectError call. Now, the entries are cleaned up if expectError passes.

@topolarity
Copy link
Contributor Author

Was able to lift the const restriction. This new behavior no longer requires modifying the test runner 🚀

@topolarity topolarity force-pushed the err-ret-trace-improvements-1923 branch 3 times, most recently from 675169d to 3fa635b Compare September 26, 2022 17:20
@topolarity topolarity force-pushed the err-ret-trace-improvements-1923 branch 2 times, most recently from 1511488 to bb8465a Compare September 27, 2022 23:42
@andrewrk
Copy link
Member

andrewrk commented Oct 15, 2022

Looking forward to this change. After the latest rebase, all CI checks passed. However due to merging of other PRs there are now several conflicts. Would you mind rebasing? (If a rebase is too problematic, a merge of master branch is fine)

@topolarity topolarity force-pushed the err-ret-trace-improvements-1923 branch 4 times, most recently from f39f0fc to bf9ecd0 Compare October 16, 2022 05:24
@topolarity
Copy link
Contributor Author

Alright, looks like the rebase went well

The rebase did expose an existing bug #13175 which together with this change causes 1 behavior test regression. I've skipped that test for now

@topolarity topolarity requested a review from andrewrk October 16, 2022 17:07
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this incredible work.

I'm inclined to merge this, however, the additional complications to the compiler internals, as well as the additional runtime performance overhead are giving me pause.

Before merging, can you share some details on these topics?

  • How might this integrate with async functions? In stage1 we combine error return traces when doing await.
  • Can you provide any benchmark data about how runtime performance is affected?
  • Can you share how this affects the compilation speed of the self-hosted compiler?

src/AstGen.zig Outdated Show resolved Hide resolved
@topolarity
Copy link
Contributor Author

Very good questions.

The same things gave me concern - Let's see if I can get a read on the situation for you.

How might this integrate with async functions? In stage1 we combine error return traces when doing await.

I think the strategy here can stay largely the same. Merging the traces means that it's as if the error were generated "at" the await. For the save/restore logic here, if we treat an errorable await exactly the way that we treat an errorable function call, then popping should continue to work correctly.

Can you share how this affects the compilation speed of the self-hosted compiler?

This is the most significant difference I'd expect. This does add a significant number of new ZIR instructions:

; src/codegen/c.zig (master)
# Source bytes:       158.0126953125KiB
# Tokens:             34052 (166.29296875KiB)
# AST Nodes:          17641 (224.0751953125KiB)
# Total ZIR bytes:    554.3564453125KiB
# Instructions:       34155 (300.1904296875KiB)
# String Table Bytes: 18.626953125KiB
# Extra Data Items:   60282 (235.4765625KiB)

; src/codegen/c.zig (this PR)       
# Source bytes:       158.0126953125KiB
# Tokens:             34052 (166.29296875KiB)
# AST Nodes:          17641 (224.0751953125KiB)
# Total ZIR bytes:    585.8505859375KiB
# Instructions:       36505 (320.8447265625KiB)
# String Table Bytes: 18.626953125KiB
# Extra Data Items:   63057 (246.31640625KiB)

That's a ~6.9% increase in instructions, and a 5.7% increase in ZIR bytes (sorry to eat up your .try savings 😬)

Here are the build timings:

# master (stage3) ReleaseSafe building Debug
54.64s user 2.51s system 101% cpu 56.450 total
54.95s user 2.28s system 101% cpu 56.521 total
55.01s user 2.26s system 101% cpu 56.559 total

# this PR (stage3) ReleaseSafe building Debug
55.86s user 2.44s system 101% cpu 57.544 total
56.30s user 2.31s system 101% cpu 57.894 total
55.33s user 2.36s system 101% cpu 56.963 total


# master (stage3) Debug building Debug
102.98s user 2.70s system 103% cpu 1:42.59 total
103.23s user 3.25s system 102% cpu 1:43.62 total
102.81s user 2.88s system 102% cpu 1:42.73 total

# this PR (stage3) Debug building Debug
103.77s user 2.92s system 102% cpu 1:43.73 total
105.92s user 2.95s system 102% cpu 1:45.86 total
104.37s user 2.74s system 102% cpu 1:43.99 total

Each run was done from a completely fresh cache, so that this includes AstGen and build.zig compilation time.

Can you provide any benchmark data about how runtime performance is affected?

The self-compilation timings above should include the runtime performance hit, in addition to the delta from code changes in this PR. I did try to make some more adversarial benchmarks that would do a lot of save/restore, but my attempts didn't find a delta that rose above the measurement noise yet.

In general there are two places with extra runtime code:

  1. Error handling: Trace index save/restore at every block entry/exit for catch, else |err|, and any blocks containing const x = foo(); where foo() is errorable
  2. Discarded error traces: Trace index save/restore wrapping any errorable function call whose trace is "killed" (generally, those that do not go to try/catch/return/if-else-err/const x = ...)

try foo() and return foo() require no extra interactions with the trace index (except when a return exits an error-handling block as in (1)). Overall, compared to the overhead of error return tracing in general I think this is probably not a large delta, but let me know if you'd like me to test some existing projects or continue hunting for adversarial cases.

@topolarity topolarity force-pushed the err-ret-trace-improvements-1923 branch from 60db2f9 to 1fe1f31 Compare October 20, 2022 18:56
@topolarity topolarity requested a review from andrewrk October 20, 2022 18:57
@andrewrk
Copy link
Member

andrewrk commented Oct 21, 2022

Thanks for taking these measurements. I'm a little confused by "ReleaseSafe" here - did you hack up the build.zig script? Because it only exposes the -Drelease flag which will select ReleaseFast.

Also, which one of those timings is "wall clock"?

@topolarity
Copy link
Contributor Author

Thanks for taking these measurements. I'm a little confused by "ReleaseSafe" here - did you hack up the build.zig script? Because it only exposes the -Drelease flag which will select ReleaseFast.

Yeah, that's exactly what I did here

I did a quick sanity check against the usual ReleaseFast too, which clocked in about 2-3 seconds faster iirc. Delta was similar between master and PR, about 1 second elapsed time.

Also, which one of those timings is "wall clock"?

"Total" should be the wall clock. These are reported from zsh's "time"

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is good to be merged!

This implement trace "popping" for correctly handled errors within
`catch { ... }` and `else { ... }` blocks.

When breaking from these blocks with any non-error, we pop the error
trace frames corresponding to the operand. When breaking with an error,
we preserve the frames so that error traces "chain" together as usual.

```zig
fn foo(cond1: bool, cond2: bool) !void {
    bar() catch {
    	if (cond1) {
	    // If baz() result is a non-error, pop the error trace frames from bar()
	    // If baz() result is an error, leave the bar() frames on the error trace
            return baz();
	} else if (cond2) {
	    // If we break/return an error, then leave the error frames from bar() on the error trace
	    return error.Foo;
	}
    };

    // An error returned from here does not include bar()'s error frames in the trace
    return error.Bar;
}
```

Notice that if foo() does not return an error it, it leaves no extra
frames on the error trace.

This is piece (1/3) of ziglang#1923 (comment)
This allows for errors to be "re-thrown" by yielding any error as the
result of a catch block. For example:

```zig
fn errorable() !void {
    return error.FallingOutOfPlane;
}

fn foo(have_parachute: bool) !void {
    return errorable() catch |err| b: {
        if (have_parachute) {
            // error trace will include the call to errorable()
            break :b error.NoParachute;
        } else {
            return;
        }
    };
}

pub fn main() !void {
    // Anything that returns a non-error does not pollute the error trace.
    try foo(true);

    // This error trace will still include errorable(), whose error was "re-thrown" by foo()
    try foo(false);
}
```

This is piece (2/3) of ziglang#1923 (comment)
In order to enforce a strict stack discipline for error return traces,
we cannot track error return traces that are stored in variables:

  ```zig
  const x = errorable(); // errorable()'s error return trace is killed here

  // v-- error trace starts here instead
  return x catch error.UnknownError;
  ```

In order to propagate error return traces, function calls need to be passed
directly to an error-handling expression (`if`, `catch`, `try` or `return`):

  ```zig
  // When passed directly to `catch`, the return trace is propagated
  return errorable() catch error.UnknownError;

  // Using a break also works
  return blk: {
      // code here
      break :blk errorable();
  } catch error.UnknownError;
  ```

Why do we need this restriction? Without it, multiple errors can co-exist
with their own error traces. Handling that situation correctly means either:
  a. Dynamically allocating trace memory and tracking lifetimes, OR
  b. Allowing the production of one error to interfere with the trace of another
     (which is the current status quo)

This is piece (3/3) of ziglang#1923 (comment)
This is encoded as a primitive AIR instruction to resolve one corner
case: A function may include a `catch { ... }` or `else |err| { ... }`
block but not call any errorable fn. In that case, there is no error
return trace to save the index of and codegen needs to avoid
interacting with the non-existing error trace.

By using a primitive AIR op, we can depend on Liveness to mark this
unused in this corner case.
This re-factor is intended to make it easier to track what kind of
operator/expression consumes a result location, without overloading the
ResultLoc union for this purpose.

This is used in the following commit to keep track of initializer
expressions of `const` variables to avoid popping error traces
pre-maturely. Hopefully this will also be useful for implementing
RLS temporaries in the future.
This change extends the "lifetime" of the error return trace associated
with an error to include the duration of a function call it is passed
to.

This means that if a function returns an error, its return trace will
include the error return trace for any error inputs. This is needed to
support `testing.expectError` and similar functions.

If a function returns a non-error, we have to clean up any error return
traces created by error-able call arguments.
Despite the old doc-comment, this function cannot be valid for all types
since it operates with only a value and Error (Union) types have
overlapping Value representations with other Types.
This change extends the "lifetime" of the error return trace associated
with an error to continue throughout the block of a `const` variable
that it is assigned to.

This is necessary to support patterns like this one in test_runner.zig:
```zig
const result = foo();
if (result) |_| {
    // ... success logic
} else |err| {
    // `foo()` should be included in the error trace here
    return error.TestFailed;
}
```

To make this happen, the majority of the error return trace popping logic
needed to move into Sema, since `const x = foo();` cannot be examined
syntactically to determine whether it modifies the error return trace. We
also have to make sure not to delete pertinent block information before it
makes it to Sema, so that Sema can pop/restore around blocks correctly.

* Why do this only for `const` and not `var`? *

There is room to relax things for `var`, but only a little bit. We could
do the same thing we do for const and keep the error trace alive for the
remainder of the block where the *assignment* happens. Any wider scope
would violate the stack discipline for traces, so it's not viable.

In the end, I decided the most consistent behavior for the user is just
to kill all error return traces assigned to a mutable `var`.
Previously, we'd overwrite the errors in a circular buffer. Now that
error return traces are intended to follow a stack discipline, we no
longer have to support the index rolling over. By treating the trace
like a saturating stack, any pop/restore code still behaves correctly
past-the-end of the trace.

As a bonus, this adds a small blurb to let the user know when the trace
saturated and x number of frames were dropped.
This PR (ziglang#12873) in combination with this particular test exposed
a pre-existing bug (ziglang#13175).

This means that the test for ziglang#13038 has regressed
Instead of adding 3 fields to every `Block`, this adds just one. The
function-level information is saved in the `Sema` struct instead,
which is created/copied more rarely.
@topolarity topolarity force-pushed the err-ret-trace-improvements-1923 branch from 1fe1f31 to c36a2c2 Compare October 21, 2022 19:47
@topolarity
Copy link
Contributor Author

Alright, final rebase (gods willing 🙏 )

I also added one small final commit that reduces the size impact to Sema.Block.

Let's let CI give us the green light and then bring this in!

@andrewrk andrewrk merged commit 09236d2 into ziglang:master Oct 22, 2022
@andrewrk
Copy link
Member

Congrats on the big merge, and thanks for your patience with all the rebases!

Would you be willing to type me up some release notes for this change to demonstrate it?

@squeek502
Copy link
Collaborator

squeek502 commented Oct 22, 2022

Thank you for taking this on @topolarity (and @Vexu for starting it). I think this being addressed will hugely benefit Zig overall, as needing to know that (a) some frames of error return traces needed to be ignored and (b) how to determine which frames should be ignored was both confusing for new Zig programmers and frustrating for experienced Zig programmers.

Also, props on recognizing the need for and implementing a more complete solution than the original proposal in #1923.

@topolarity
Copy link
Contributor Author

topolarity commented Oct 24, 2022

Would you be willing to type me up some release notes for this change to demonstrate it?

@andrewrk Sure thing! Can I get your take on #12990?

I'm OK whether it lands or not but I wanted to ask since it affects the example

Thank you for taking this on @topolarity (and @Vexu for starting it).

Thank you for the kind words @squeek502 - Always great to know when things are having an impact downstream 🙂

@topolarity topolarity deleted the err-ret-trace-improvements-1923 branch October 24, 2022 06:59
topolarity added a commit to topolarity/zig that referenced this pull request Nov 21, 2022
PR ziglang#12837 handled control flow for break and return, but I forgot
about `continue`. This is effectively another break, so we just
need another `.restore_err_ret_index` ZIR instruction.

Resolves ziglang#13618.
Vexu pushed a commit that referenced this pull request Nov 22, 2022
PR #12837 handled control flow for break and return, but I forgot
about `continue`. This is effectively another break, so we just
need another `.restore_err_ret_index` ZIR instruction.

Resolves #13618.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants