Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to LLVM 12 caused AArch64 bare metal program run incorrectly #83335

Closed
songzhi opened this issue Mar 21, 2021 · 14 comments · Fixed by #83592
Closed

Upgrade to LLVM 12 caused AArch64 bare metal program run incorrectly #83335

songzhi opened this issue Mar 21, 2021 · 14 comments · Fixed by #83592
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@songzhi
Copy link

songzhi commented Mar 21, 2021

Here's the project (as an example).
It run correctly when the following toolchain is used:

[toolchain]
channel = "nightly-2021-03-04"
targets = ["aarch64-unknown-none-softfloat"]

And incorrectly when the following toolchain is used:

[toolchain]
channel = "nightly-2021-03-05"
targets = ["aarch64-unknown-none-softfloat"]

So I think it related to #81451 the upgrade to LLVM 12.
I don't know the exact reason, but back to nightly-2021-03-04 works for me now.

@songzhi songzhi added the C-bug Category: This is a bug. label Mar 21, 2021
@camelid camelid added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 21, 2021
@jyn514 jyn514 added O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Mar 21, 2021
@jyn514
Copy link
Member

jyn514 commented Mar 21, 2021

Can you describe what you mean by "incorrectly "?

@songzhi
Copy link
Author

songzhi commented Mar 21, 2021

Can you describe what you mean by "incorrectly "?

I'm not saying the errors about feature gates that compiler throwing, I fixed that. And it complied successfully, but run incorrectly.
The correct behaviour of the leos-kernel in the project is like this:

000.00011 - Starting LeOS kernel
000.00021 * task 1: enable_preempt -> 0
000.00026 * core task preempt: 0
000.00029 - Starting task 2
000.00033 - Starting task 3
000.00034 - Starting task 4
000.00035 - Task: 1
001.00023 * task 1: disable_preempt -> 1
001.00028 * Switch context PID 1 -> 2
001.00030 * task 2: enable_preempt -> 0
001.00032 - Task: 2
002.00038 * task 2: disable_preempt -> 1
002.00041 * Switch context PID 2 -> 3
002.00043 * task 3: enable_preempt -> 0
002.00044 - Task: 3
...

But when using nightly-2021-03-05 version rustc, the kernel prints nothing. And I believe it's not related to those feature gates or the restriction of naked function, because my own os kernel project without these features has the same situation.

@apiraino
Copy link
Contributor

apiraino commented Mar 21, 2021

@songzhi for us to frame the issue, we need a way to reproduce it, ideally with a small piece of code (possibly without external libraries) that shows the incorrect behaviour you described.
Also, can you report if you can reproduce on other toolchains (example beta and stable)?

@songzhi
Copy link
Author

songzhi commented Mar 21, 2021

I just found why. Here's the minimal reproducible code:

// Linker script
SECTIONS
{
  . = 0xffff000040080000;

  .text : {
    *(.text.boot)
    *(.text .text.* .gnu.linkonce.t*)
    . = ALIGN(4K);
  }

  .rodata : {
    *(.rodata .rodata.* .gnu.linkonce.r*)
    . = ALIGN(4K);
  }

  .data : {
    *(.data .data.* .gnu.linkonce.d*)
    . = ALIGN(4K);
  }

  .stack : {
    . = ALIGN(4K);
    *(.bss.stack)
  }

  .bss : {
    . = ALIGN(32);
    *(.bss .bss.*)
    *(COMMON)
    . = ALIGN(4K);
  }

  /DISCARD/ : { *(.comment) *(.gnu*) *(.note*) *(.eh_frame*) }
}
#![no_std]
#![no_main]
#![feature(global_asm)]
#![feature(asm)]

use core::ptr;

global_asm!(r#"
.section .text.boot
.globl _start

_start:
    # read cpu affinity, start core 0, halt rest
    mrs     x19, mpidr_el1
    and     x19, x19, #3
    # compare and branch if non zero
    cbnz    x19, halt

    adrp    x0, _start
    sub     x0, x0, x19, lsl #16
    mov     sp, x0

    b       main_start

halt:
    # unreachable
    wfe
    b       halt
"#);

const UART_BASE: usize = 0x0900_0000;

#[inline]
fn halt() {
    unsafe { asm!("wfi", options(nomem, nostack)) }
}

#[inline]
fn wfe() {
    unsafe { asm!("wfe", options(nomem, nostack)) }
}

#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    loop {
        wfe();
    }
}

extern "C" {
    fn _start();
}

#[no_mangle]
pub unsafe extern "C" fn main_start() {
    let x = _start as usize;
    // just used to prevent `x` is optimized.
    ptr::write_volatile(UART_BASE as *mut u8, (x % 10 + '0' as usize) as u8);

    halt();
}

Here's the difference of generated assembly code (based on generated ELF file), comparing nightly-2021-03-04 version to nightly-2021-03-05 version.

22,23c22,23
< ffff000040080028:     90000008        adrp    x8, ffff000040080000 <_start>
< ffff00004008002c:     91000108        add     x8, x8, #0x0
---
> ffff000040080028:     b0000008        adrp    x8, ffff000040081000 <etext>
> ffff00004008002c:     f9400108        ldr     x8, [x8]
34a35,45
>       ...
> 
> Disassembly of section .got:
> 
> ffff000040081000 <.got>:
> ffff000040081000:     40080000        .inst   0x40080000 ; undefined
> ffff000040081004:     ffff0000        .inst   0xffff0000 ; undefined
> 
> Disassembly of section .rodata:
> 
> ffff000040081008 <srodata>:

Actually QEMU load the kernel binary at address 0x40080000, so the adrp x8, ffff000040080000 <_start> is actually adrp x8, 40080000, then the _start we got is 0x40080000. But in later version of rustc, the _start is 0xFFFF000040080000. In this specific case, the former value is considered correct.

It's a breaking change, but I don't know if it could be considered as a bug. Maybe this usage could be considered as a undefined behaviour? I don't know what do you think about it.

My own OS kernel still runs incorrectly, I'll keep digging to find out if it related to rustc, if not I'll close this issue.

@songzhi
Copy link
Author

songzhi commented Mar 21, 2021

Just found another, emmm, bug? Here's the code, using qemu-system-aarch64 -M virt -kernel $(kernel) to run it:

#![no_std]
#![no_main]
#![feature(global_asm)]

global_asm!(
    r#"
.section .text.boot
.globl _start

_start:
    adrp    x0, _start
    mov     sp, x0

    b       main_start

.section .data
.align 12
data_block:
    .space 0x1000 // 4K
data_block2:
    .space 0x1000 // 4K
"#
);

#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    loop {}
}

extern "C" {
    fn _start();
    fn data_block();
    fn data_block2();
}

#[no_mangle]
pub unsafe extern "C" fn main_start() {
    use core::ptr::{read_volatile, write_volatile};
    write_volatile(data_block2 as *mut u8, 42);
    loop {
        read_volatile(data_block as *mut u8);
        read_volatile(data_block2 as *mut u8);
    }
}

Generated assembly code difference, comparing nightly-2021-03-04 to nightly-2021-03-05:

7,19c7,27
< 0000000000210158 <_start>:
<   210158:     90000000        adrp    x0, 210000 <_start-0x158>
<   21015c:     9100001f        mov     sp, x0
<   210160:     14000001        b       210164 <main_start>
< 
< 0000000000210164 <main_start>:
<   210164:     d0000088        adrp    x8, 222000 <data_block2>
<   210168:     52800549        mov     w9, #0x2a                       // #42
<   21016c:     39000109        strb    w9, [x8]
<   210170:     b0000089        adrp    x9, 221000 <data_block>
<   210174:     3940013f        ldrb    wzr, [x9]
<   210178:     3940011f        ldrb    wzr, [x8]
<   21017c:     17fffffe        b       210174 <main_start+0x10>
---
> 00000000002101c8 <_start>:
>   2101c8:     90000000        adrp    x0, 210000 <_start-0x1c8>
>   2101cc:     9100001f        mov     sp, x0
>   2101d0:     14000001        b       2101d4 <main_start>
> 
> 00000000002101d4 <main_start>:
>   2101d4:     b0000088        adrp    x8, 221000 <main_start+0x10e2c>
>   2101d8:     f940fd08        ldr     x8, [x8, #504]
>   2101dc:     52800549        mov     w9, #0x2a                       // #42
>   2101e0:     39000109        strb    w9, [x8]
>   2101e4:     90000089        adrp    x9, 220000 <main_start+0xfe2c>
>   2101e8:     f940fd29        ldr     x9, [x9, #504]
>   2101ec:     3940013f        ldrb    wzr, [x9]
>   2101f0:     3940011f        ldrb    wzr, [x8]
>   2101f4:     17fffffe        b       2101ec <main_start+0x18>
> 
> Disassembly of section .got:
> 
> 00000000002201f8 <.got>:
>   2201f8:     00231000        .inst   0x00231000 ; NYI
>   2201fc:     00000000        udf     #0
23c31
< 0000000000221000 <data_block>:
---
> 0000000000231000 <data_block>:
26c34
< 0000000000222000 <data_block2>:
---
> 0000000000232000 <data_block2>:
40c48
<   20: 00210164        .inst   0x00210164 ; NYI
---
>   20: 002101d4        .inst   0x002101d4 ; NYI
42c50
<   28: 0000001c        udf     #28
---
>   28: 00000024        udf     #36
51c59
<    c: 302e3131        adr     x17, 5c631 <_start-0x1b3b27>
---
>    c: 302e3231        adr     x17, 5c651 <_start-0x1b3b77>

The problem is, in the nightly-2021-03-05 version code, the data_block and data_block2 pointer we got are both nullptr.
I don't know why such code is generated, but it seems like a real bug?
Here's another example.

@jyn514
Copy link
Member

jyn514 commented Mar 21, 2021

@rustbot ping llvm

I don't know assembly well enough to see if this is a real bug or not - it would be helpful to figure out what the expected behavior is.

@rustbot rustbot added the ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group label Mar 21, 2021
@rustbot
Copy link
Collaborator

rustbot commented Mar 21, 2021

Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
instructions for tackling these sorts of bugs. Maybe take a look?
Thanks! <3

cc @camelid @comex @cuviper @DutchGhost @hdhoang @henryboisdequin @heyrutvik @higuoxing @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique

@songzhi
Copy link
Author

songzhi commented Mar 21, 2021

@jyn514 The second case is pretty common, maybe we want to initialize the data_block2, since we get a null pointer, it's not possible. In my OS kernel, I allocate and operate some pagetables like this, it's inspired by rCore.

@nagisa
Copy link
Member

nagisa commented Mar 21, 2021

Looks like with the new version the binary generated uses pic relocation model, and attempts to specify different relocation models get ignored.

Here's a more minimal reproducer:

// RUN: --target=aarch64-unknown-none-softfloat -O -Cdebuginfo=0 -Crelocation-model=static
#![no_std]

extern "C" {
    fn _start();
}

#[no_mangle]
pub unsafe extern "C" fn main_start() {
    loop {
        core::ptr::read_volatile(_start as *mut u8);
    }
}

I'm not yet sure if its a genuine bug (probably is) that we ignore the relocation-model, but I think on the user's end this could also be fixed by adjusting their linker script to correctly handle the .got section.

@ojeda
Copy link
Contributor

ojeda commented Mar 23, 2021

We also run into this, in both arm64 and x86_64 (Rust-for-Linux/linux#135).

@nagisa
Copy link
Member

nagisa commented Mar 23, 2021

Upstream issue report: https://bugs.llvm.org/show_bug.cgi?id=49693

@MaskRay
Copy link
Contributor

MaskRay commented Mar 23, 2021

Upstream issue report: https://bugs.llvm.org/show_bug.cgi?id=49693

Not implying dso_local for a declaration in --relocation-model=static is intented. See also https://internals.rust-lang.org/t/is-dso-local-set-in-llvm-ir-output/13725

@nagisa
Copy link
Member

nagisa commented Mar 23, 2021

Cool. I'm not super familiar with the relocation stuff, and reading the referenced blog post didn't really make anything much clearer to me. Admittedly I didn't have much capacity to think things through while reading it, so maybe a weekend will make things somewhat clearer to me ^^.

All that said, it is not obvious to me if we want to slap dso_local on everything with -Crelocation-model=static or not. If there's anybody who has all the relocation particularities internalized (esp. in context of Rust), advice would be appreciated.

@apiraino apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 31, 2021
@apiraino
Copy link
Contributor

Removing I-prioritize, PR is on the way (Zulip thread)

@rustbot label -I-prioritize

@rustbot rustbot removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Mar 31, 2021
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 6, 2021
Set dso_local for hidden, private and local items

This should probably have no real effect in most cases, as e.g. `hidden`
visibility already implies `dso_local` (or at least LLVM IR does not
preserve the `dso_local` setting if the item is already `hidden`), but
it should fix `-Crelocation-model=static` and improve codegen in
executables.

Note that this PR does not exhaustively port the logic in [clang], only the
portion that is necessary to fix a regression from LLVM 12 that relates to
`-Crelocation_model=static`.

Fixes rust-lang#83335

[clang]: https://github.com/llvm/llvm-project/blob/3001d080c813da20b329303bf8f45451480e5905/clang/lib/CodeGen/CodeGenModule.cpp#L945-L1039
@bors bors closed this as completed in 2f000a7 Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. ICEBreaker-LLVM Bugs identified for the LLVM ICE-breaker group O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants