Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special-case switching on error union capture #18173

Merged
merged 16 commits into from
Jan 9, 2024

Conversation

dweiller
Copy link
Contributor

@dweiller dweiller commented Dec 1, 2023

Closes #11957 and as well as the related issues #16770 and #11812.

There are a few things left to do before merging:

  • implement AstGen (and any sema changes needed) for if (x) { ... } else |err| switch (err) { ... } pattern
  • check compile error messages makes sense (e.g. source locations) and consider adding compile error test cases
  • cleanup commit history (drop commit adding -Donly-ast-check build flag?)
  • look for ways to share more logic between normal switch, condbr and error union switch (both in AstGen.zig and Sema.zig)
  • mention the special-cased if and catch patterns in langref (contrast with using a block around the switch)

@dweiller dweiller force-pushed the switch-err-union branch 3 times, most recently from 0f08626 to 0436b3b Compare December 1, 2023 06:07
@dweiller
Copy link
Contributor Author

dweiller commented Dec 2, 2023

Not sure what the errors in the CI are - I can reproduce the segfault in bootstrap locally, but don't know what would be causing it. Perhaps there is some incompatibility with the x86_64 backend.

Edit: The segfault referenced above has been fixed, but there is a new/different segfault happening in the CI that I cannot reproduce.

@dweiller dweiller force-pushed the switch-err-union branch 3 times, most recently from ad85f90 to 815b4d4 Compare December 8, 2023 11:45
@dweiller dweiller marked this pull request as ready for review December 14, 2023 07:35
@matu3ba
Copy link
Contributor

matu3ba commented Dec 15, 2023

Looks like stage2 is broken, probably due to the C backend emitting broken code in readFromPackedMemory and utf8ToUtf16Le. Since this is Linux Release, I'd guess that readFromPackedMemory is broken.
Did you try running the command inside valgrind? You can also play around with setting -Wstringop-overflow=X to see what overflows occur. -Wformat-overflow and -Wformat-truncation also enable those and theres also -Wformat=2 & -Wformat-truncation. See also https://interrupt.memfault.com/blog/best-and-worst-gcc-clang-compiler-flags.

I do see 3 ways forward, since I suspect you checked that the miscompilation is not detectable within standard debugging tools (valgrind, memsanitizer etc):

    1. try to add workarounds as to not make the C backends not breaking readFromPackedMemory with different usage (pattern)
    1. try to find a minimal reproducible to get the C backend to emit broken code for readFromPackedMemory and fix it
    1. add a workaround in the C backend after understanding the broken C code
cc -o zig2 zig2.c compiler_rt.c -std=c99 -O2 -fno-stack-protector -Istage1 -Wl,-z,stack-size=0x10000000 -pthread
zig2.c: In function ‘unicode_utf8ToUtf16Le__91520’:
zig2.c:3263751: warning: assignment to ‘const uint16_t (*)[16]’ {aka ‘const short unsigned int (*)[16]’} from incompatible pointer type ‘const uint16_t *’ {aka ‘const short unsigned int *’} [-Wincompatible-pointer-types]
3263751 |    t26 = t25.ptr;
        | 
In function ‘mem_readPackedIntLittle__anon_210076__210076’,
    inlined from ‘mem_readPackedInt__anon_187698__187698’ at zig2.c:2210564:0,
    inlined from ‘value_Value_readFromPackedMemory__22759’ at zig2.c:1798160:0:
zig2.c:2633608: warning: ‘memcpy’ reading 16 bytes from a region of size 10 [-Wstringop-overread]
2633608 |  memcpy(&t11, &t10, sizeof(zig_u128));
        | 
zig2.c: In function ‘value_Value_readFromPackedMemory__22759’:
zig2.c:2633585: note: source object ‘t10’ of size 10
2633585 |  uint8_t t10[10];
        | 
In function ‘mem_readPackedIntBig__anon_210077__210077’,
    inlined from ‘mem_readPackedInt__anon_187698__187698’ at zig2.c:2210568:0,
    inlined from ‘value_Value_readFromPackedMemory__22759’ at zig2.c:1798160:0:
zig2.c:2633682: warning: ‘memcpy’ reading 16 bytes from a region of size 10 [-Wstringop-overread]
2633682 |  memcpy(&t13, &t12, sizeof(zig_u128));
        | 
zig2.c: In function ‘value_Value_readFromPackedMemory__22759’:
zig2.c:2633644: note: source object ‘t12’ of size 10
2633644 |  uint8_t t12[10];
        | 
zig2.c: In function ‘value_Value_readFromMemory__22758’:
zig2.c:1395844: warning: ‘memcpy’ reading 16 bytes from a region of size 10 [-Wstringop-overread]
1395844 |      memcpy(&t47, &t46, sizeof(zig_u128));
        | 
zig2.c:1395703: note: source object ‘t46’ of size 10
1395703 |  uint8_t t46[10];
        | 
+ ./zig2 build -Dno-lib

Please let me know what you think.

@rootbeer
Copy link
Contributor

Aren't the quoted warnings from cc about reading-from-the-wrong-size all just warnings? I feel like I see those all the time, and have successfully ignored them (they do seem scary though). You can see the same warnings in the successful debug build too: https://github.com/ziglang/zig/actions/runs/7217377559/job/19665091928?pr=18173

The failure in this case seems to be after those:

+ ./zig2 build -Dno-lib
Segmentation fault
Error: Process completed with exit code 139.

@matu3ba
Copy link
Contributor

matu3ba commented Dec 15, 2023

all just warnings

Buffer overreads or overwrites are undefined behavior and, if conforming to the c standard, compilers are free to optimize based on the absence of such things and miscompile your programs.
However I can not tell you, if this definitely happened here, but this UB is a candidate to consider as cause.

@dweiller
Copy link
Contributor Author

dweiller commented Jan 9, 2024

Please let me know what you think.

I think you were probably right - I started investigating and it looked like a bug when readPackedIntLittle was called for a u80 (by readFromMemory for a f80) - for some reason the bitcast was emitting a memcpy for size of a zig_u128 (which was the dest type) rather than the 10 byte buffer for the u80 source. It seems the issue has been resolved by Andrew's build module rework in #18160, though I have no idea why as that PR did not touch src/codegen/c.zig and the logic for the bitcast air instruction in there looked like it would use the correct size previously...

Now that I've rebased to include those changes, hopefully the CI will pass on all the x86_64-linux targets.

Comment on lines +6806 to +6808
// The non-error and error cases are only peers if the error case is just a switch expression;
// the pattern `if (x) {...} else |err| blk: { switch (err) {...} }` does not consider the
// non-error and error case to be peers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like how this turns a limitation of the compiler into a feature of the language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just trying to be clear about what the behaviour is so as to not cause confusion, but I can see that this may the source of future problems if peer type resolution changes for these cases. Should I remove the sentence, or add a note saying something along the lines of 'this is the current behaviour and may change'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good enough for now. I need to go over the langref anyway; it's collected quite a few different contributors' conflicting ideas about what a langref should be and it needs to be reworked by one person with a vision.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dweiller - nice work. The amount of additions and copy-pasted code makes me a bit uneasy, but you made up for it with robust test coverage.

I have a couple review comments but they are insignificant and so I will proceed with the merge regardless. Feel free to ignore them, they can be cleaned up later along with a future enhancement or bug fix.

pub const Bits = packed struct(u32) {
/// If true, one or more prongs have multiple items.
has_multi_cases: bool,
/// If true, there is an else prong. This is mutually exclusive with `has_under`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment looks like it was copy pasted and no longer applies since there is no has_under field.

sema.inst_map.putAssumeCapacity(err_capture_inst, spa.operand);
}
defer if (extra.data.bits.any_uses_err_capture) assert(sema.inst_map.remove(err_capture_inst));
_ = try sema.analyzeSwitchRuntimeBlock(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this return value is never used by any caller.

Comment on lines +6806 to +6808
// The non-error and error cases are only peers if the error case is just a switch expression;
// the pattern `if (x) {...} else |err| blk: { switch (err) {...} }` does not consider the
// non-error and error case to be peers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good enough for now. I need to go over the langref anyway; it's collected quite a few different contributors' conflicting ideas about what a langref should be and it needs to be reworked by one person with a vision.

@andrewrk andrewrk merged commit acca16c into ziglang:master Jan 9, 2024
10 checks passed
@dweiller dweiller deleted the switch-err-union branch January 10, 2024 01:57
@dweiller
Copy link
Contributor Author

Thank you @dweiller - nice work. The amount of additions and copy-pasted code makes me a bit uneasy, but you made up for it with robust test coverage.

I wouldn't be surprise if there is a way to unify the new logic more with the regular switch logic to reduce the amount of duplication. However, there are a number of subtle differences that made it tricky to do without requiring interleaving a bunch of if (this_is_an_error_union_switch) which I think won't make it clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

special-case the lowering of ... catch(err) switch(err) and if (...) |p| ... else |err| switch(err)
5 participants