Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

char::encode_utf8 crash on nightly #82833

Closed
davidhewitt opened this issue Mar 6, 2021 · 13 comments · Fixed by #83030
Closed

char::encode_utf8 crash on nightly #82833

davidhewitt opened this issue Mar 6, 2021 · 13 comments · Fixed by #83030
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group P-critical Critical priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@davidhewitt
Copy link
Contributor

On nightly 2021-03-05, with RUSTFLAGS="-Ccodegen-units=1 -Cinline-threshold=0 -Clink-dead-code", I am experiencing undefined behaviour which seems to be based around char::encode_utf8.

I ran this code:

fn make_string(ch: char) -> String {
    let mut bytes = [0u8; 4];
    ch.encode_utf8(&mut bytes).into()
}

fn main() {
    let ch = '😃';
    dbg!(ch);
    let string = make_string(ch);
    dbg!(string);
}

I expected to see the following output:

[src/bin/string_crash.rs:8] ch = '😃'
[src/bin/string_crash.rs:10] string = "😃"

I get the above output on stable 1.50.0, or on the same nightly version if I remove at least one of the three RUSTFLAGS listed above.

With all three flags present, on nightly 2021-03-05 I see the following output:

[src/bin/string_crash.rs:8] ch = '😃'
memory allocation of 140730017967032 bytes failed
Aborted

The exact bytes count varies, so this looks like UB to me.

Meta

rustc +nightly --version --verbose:

rustc 1.52.0-nightly (caca2121f 2021-03-05)
binary: rustc
commit-hash: caca2121ffe4cb47d8ea2d9469c493995f57e0b5
commit-date: 2021-03-05
host: x86_64-unknown-linux-gnu
release: 1.52.0-nightly
LLVM version: 12.0.0

@davidhewitt davidhewitt added the C-bug Category: This is a bug. label Mar 6, 2021
@JohnTitor JohnTitor added I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 6, 2021
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 6, 2021
@JohnTitor
Copy link
Member

Regressed since rustc 1.52.0-nightly (45b3c28 2021-03-04). At a glance, #81451 might be related.

@jyn514 jyn514 added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness and removed I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. labels Mar 6, 2021
@LeSeulArtichaut LeSeulArtichaut added the E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc label Mar 6, 2021
@LeSeulArtichaut
Copy link
Contributor

Compiling with nightly-2021-03-05 on x86_64-apple-darwin:

rustc 1.52.0-nightly (caca2121f 2021-03-05)
binary: rustc
commit-hash: caca2121ffe4cb47d8ea2d9469c493995f57e0b5
commit-date: 2021-03-05
host: x86_64-apple-darwin
release: 1.52.0-nightly
LLVM version: 12.0.0

I am not able to reproduce the crash, though the result is still incorrect:

$ RUSTFLAGS="-Ccodegen-units=1 -Cinline-threshold=0 -Clink-dead-code" cargo +nightly run
   Compiling issue_82833 v0.1.0 (/Users/tous/Documents/Dev/rustc_issues/issue_82833)
    Finished dev [unoptimized + debuginfo] target(s) in 46.25s
     Running `target/debug/issue_82833`
[src/main.rs:8] ch = '😃'
[src/main.rs:10] string = "\u{0}\u{0}\u{0}\u{0}"

@LeSeulArtichaut
Copy link
Contributor

#81451 is indeed the culprit:

$ RUSTFLAGS="-Ccodegen-units=1 -Cinline-threshold=0 -Clink-dead-code" cargo +ec7f258d543e1ac7d0b94435972331e85da8c509 run
   Compiling issue_82833 v0.1.0 (/Users/tous/Documents/Dev/rustc_issues/issue_82833)
    Finished dev [unoptimized + debuginfo] target(s) in 5.15s
     Running `target/debug/issue_82833`
[src/main.rs:8] ch = '😃'
[src/main.rs:10] string = "😃"

$ RUSTFLAGS="-Ccodegen-units=1 -Cinline-threshold=0 -Clink-dead-code" cargo +409920873cf8a95739a55dc5fe5adb05e1b4758e run
   Compiling issue_82833 v0.1.0 (/Users/tous/Documents/Dev/rustc_issues/issue_82833)
    Finished dev [unoptimized + debuginfo] target(s) in 16.55s
     Running `target/debug/issue_82833`
[src/main.rs:8] ch = '😃'
[src/main.rs:10] string = "\u{0}\u{0}\u{0}\u{0}"

@LeSeulArtichaut LeSeulArtichaut added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. and removed E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc labels Mar 6, 2021
@LeSeulArtichaut
Copy link
Contributor

@rustbot ping llvm

@rustbot
Copy link
Collaborator

rustbot commented Mar 6, 2021

Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique

@rustbot rustbot added the ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group label Mar 6, 2021
@nagisa
Copy link
Member

nagisa commented Mar 6, 2021

Through an experiment wherein I link a cdylib produced from

// compile-flags: -Cinline-threshold=0 -Clink-dead-code -g --crate-type=cdylib --emit=link
#[no_mangle]
#[inline(never)]
pub extern "C" fn banana() -> String {
    let mut bytes = [0u8; 4];
    '😃'.encode_utf8(&mut bytes).into()
}

to a test program such as this:

// compile-flags: -lrepro -L.
extern "C" {
    fn banana() -> String;
}

fn main() {
    unsafe {
        assert_eq!(banana(), "😃");
    }
}

I verified that this issue is caused by something in the banana function itself, rather than the test glue in main. Sadly, this is pretty hard to actually root cause because any attempts to remove libcore and libstd out of equation end up fixing this :(

@tmiasko
Copy link
Contributor

tmiasko commented Mar 6, 2021

I had to additionally use -Copt-level=0 -Cdebuginfo=2 to reproduce the issue. The length of slice returned from encode_utf8 is passed incorrectly to into; it is read from a stack slot that is uninitialized.

@LeSeulArtichaut LeSeulArtichaut added P-critical Critical priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Mar 6, 2021
@LeSeulArtichaut
Copy link
Contributor

Assigning P-critical as discussed as part of the Prioritization WG procedure.

@nagisa
Copy link
Member

nagisa commented Mar 6, 2021

To illustrate what @tmiasko is saying:

   0x00007fe22d9b9a34 <+68>:	call   0x7fe22d9b6ee0 <_ZN4core4char7methods15encode_utf8_raw17hc73da9e6e7004300E>
   ; returned slice pointer is in %rax, returned slice length is in %rdx, both correct
   0x00007fe22d9b9a39 <+73>:	mov    %rdx,0x18(%rsp)
   0x00007fe22d9b9a3e <+78>:	mov    %rax,0x38(%rsp)
   0x00007fe22d9b9a43 <+83>:	mov    %rdx,0x40(%rsp)
   0x00007fe22d9b9a48 <+88>:	mov    0x18(%rsp),%rdx
   0x00007fe22d9b9a4d <+93>:	mov    (%rsp),%rsi
   0x00007fe22d9b9a51 <+97>:	mov    0x8(%rsp),%rdi
   ; input function pointer is in %rsi, length in %rdi
   0x00007fe22d9b9a56 <+102>:	call   0x7fe22d9b8380 <_ZN50_$LT$T$u20$as$u20$core..convert..Into$LT$U$GT$$GT$4into17he0dae454987f3de8E>

you can see that the rax and rdx are stashed in some place on the stack, and the arguments are filled/reloaded from another area.

This won't happen when we compile the LLVM IR with llc from LLVM head. I haven't tried with an LLVM build from our fork, but I'm not at all confident it'll help in reducing the problem.

@nikic
Copy link
Contributor

nikic commented Mar 7, 2021

Relevant IR: https://gist.github.com/nikic/9731da9ff13f58b0af9891f89326e8eb Under llc -O0 this gives the suspicious stack manipulations.

And if you drop the br label %bb1 bb1: there is an assertion failure:

llc: /home/nikic/rust/src/llvm-project/llvm/lib/CodeGen/RegAllocFast.cpp:923: void {anonymous}::RegAllocFast::useVirtReg(llvm::MachineInstr&, unsigned int, llvm::Register): Assertion `(!MO.isKill() || LRI->LastUse == &MI) && "Invalid kill flag"' failed.

@nikic
Copy link
Contributor

nikic commented Mar 7, 2021

Reducing the assertion failure gives:

declare { i8*, i64 } @get()

declare void @use(i8*, i64)
      
define void @test(i64* %p) {
  %struct = call { i8*, i64 } @get()
  %struct.1 = extractvalue { i8*, i64 } %struct, 1
  store i64 %struct.1, i64* %p, align 8
  %struct.2 = extractvalue { i8*, i64 } %struct, 1
  call void @use(i8* undef, i64 %struct.2)
  ret void
}   

Running with llc -O0 -verify-machineinstrs we get:

# After Instruction Selection
# Machine code for function test: IsSSA, TracksLiveness
Function Live Ins: $rdi in %0

bb.0 (%ir-block.0):
  liveins: $rdi
  %0:gr64 = COPY $rdi
  %1:gr64 = COPY killed %0:gr64
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  CALL64pcrel32 target-flags(x86-plt) @get, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit-def $rax, implicit-def $rdx
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  %7:gr64 = COPY $rax
  %8:gr64 = COPY $rdx
  MOV64mr %1:gr64, 1, $noreg, 0, $noreg, killed %8:gr64 :: (store 8 into %ir.p)
  %2:gr64 = IMPLICIT_DEF
  ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  $rdi = COPY %2:gr64
  $rsi = COPY %8:gr64
  CALL64pcrel32 target-flags(x86-plt) @use, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi
  ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
  RETQ

# End machine code for function test.

*** Bad machine code: Using a killed virtual register ***
- function:    test
- basic block: %bb.0  (0x557dd5832118)
- instruction: $rsi = COPY %8:gr64
- operand 1:   %8:gr64

The kill flag on the MOV64mr is not correct. Possibly FastISel related.

@nikic
Copy link
Contributor

nikic commented Mar 7, 2021

Upstream report: https://bugs.llvm.org/show_bug.cgi?id=49467 Fixing this does seem to fix the original case as well.

@pnkfelix
Copy link
Member

pnkfelix commented Mar 11, 2021

also assigning @nagisa just to doubly-ensure backport (from LLVM upstream to Rust's LLVM fork) for fixing this P-critical bug happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group P-critical Critical priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants