Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscompilation: Equal pointers comparing as unequal #107975

Open
JakobDegen opened this issue Feb 13, 2023 · 97 comments
Open

Miscompilation: Equal pointers comparing as unequal #107975

JakobDegen opened this issue Feb 13, 2023 · 97 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation

Comments

@JakobDegen
Copy link
Contributor

JakobDegen commented Feb 13, 2023

I tried this code:

pub fn foo() -> (usize, usize, bool) {
    let a: *const u8;
    let b: *const u8;
    {
        let v: [u8; 16] = [std::hint::black_box(4); 16];
        a = &(v[0]);
    }
    {
        let v: [u8; 16] = [std::hint::black_box(4); 16];
        b = &(v[0]);
    }
    (a as usize, b as usize, a == b)
}

fn main() {
    println!("{:?}", foo());
}

I expected to see this happen: Either the pointers (when cast to integers) are the same and the comparison is true, or they are not the same and the comparison is false.

Instead, this happened: It printed: (140728325198984, 140728325198984, false)

Upstream LLVM issue

Meta

Reproduced via rustc +nightly -Copt-level=3 test.rs && ./test.

rustc --version --verbose:

rustc 1.69.0-nightly (5b8f28453 2023-02-12)
binary: rustc
commit-hash: 5b8f284536d00ba649ca968584bedab4820d8527
commit-date: 2023-02-12
host: x86_64-unknown-linux-gnu
release: 1.69.0-nightly
LLVM version: 15.0.7

Also reproduces on master.

@rustbot label +I-unsound +T-compiler +A-llvm

@JakobDegen JakobDegen added the C-bug Category: This is a bug. label Feb 13, 2023
@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Feb 13, 2023
@JakobDegen
Copy link
Contributor Author

Godbolt indicates that this is EarlyCSE in LLVM optimizing the pointer comparison to false despite the two stack allocations have non-overlapping lifetime ranges.

@zirconium-n
Copy link
Contributor

This seems related to pointer provenance.

@JakobDegen
Copy link
Contributor Author

JakobDegen commented Feb 13, 2023

The bug itself or the cause of the bug? I don't know what the incorrect reasoning that LLVM is using here is, so I can't comment on the cause of the issue. But both Rust and LLVM clearly define pointer comparison to be address based, which means that I don't think we need to think about provenance in order to conclude that this is a miscompilation

@DoubleHyphen
Copy link

DoubleHyphen commented Feb 13, 2023

This seems related to pointer provenance.

tl;dr even if the docs were changed to take provenance into account when comparing for equality, there would still be a miscompilation, because evaluating the same expression twice leads to different results!

fn main() {
    let a: *const u8;
    let b: *const u8;
    {
        let v: [u8; 16] = [core::hint::black_box(0); 16];
        a = &(v[0]);
    }
    {
        let v: [u8; 16] = [core::hint::black_box(0); 16];
        b = &(v[0]);
    }
    println!("{a:?} == {b:?} evaluates to {}", a==b);
    println!("{a:?} == {b:?} evaluates to {}", a==b);
}
0x7ffd8828d818 == 0x7ffd8828d818 evaluates to false
0x7ffd8828d818 == 0x7ffd8828d818 evaluates to true

Playground link

This was exactly the train of thought that led me to experiment, thereby uncovering the bug: “Can two pointers compare as equal, even when they belong to different allocations?” I experimented, figured out that they cannot, and tried to refactor my println!s, whereupon the program started behaving differently. (In the second linked program, uncommenting line 18 changes the output of line 19.)

PS: In case it matters, my experiments were on the stable channel.

@RalfJung
Copy link
Member

tl;dr even if the docs were changed to take provenance into account when comparing for equality, there would still be a miscompilation, because evaluating the same expression twice leads to different results!

In theory they could define pointer comparison to be non-deterministic, and then this would be okay. For a comparison that involves provenance, that is probably the case.

But to my knowledge that's not how LLVM intends their ptr comparison to work.

That last example is really strange though, why would it optimize only one of them? Also I was unable to reproduce this without printing, which is equally strange...

@DoubleHyphen
Copy link

DoubleHyphen commented Feb 13, 2023

Also I was unable to reproduce this without printing, which is equally strange...

I thought that was queer, so I experimented further.

What I found is as follows:

  • As long as a pointer is never printed using println!, the behaviour remains stable. a==b always evaluates to false, and even if a and b are cast to integers and compared, the comparison still returns false. In other words, the code (continued from above)
    let (a_, b_) = (a as u128, b as u128);
    dbg!(a_ == b_, a_ ^ b_);

outputs

a_ == b_ = false
a_ ^ b_ = 0

to stderr, which –to the best of my knowledge– is not exactly how integers are supposed to work.

  • As soon as either pointer gets printed using println!, a==b starts evaluating to true. Casting them both to u128 before printing them does not cause that; only eg println!("{a:?}") does.

@Imberflur
Copy link

I tried some alternatives to println!, it looks like black_box(format_args!("{:?}", a)) is sufficient for it to start evaluating to true:

// Produces true on following println

black_box(format_args!("{:?}", a));
//black_box(format_args!("{:?}", b));
//format!("{:?}", a);

// Doesn't

//format_args!("{:?}", black_box(a));
//black_box(format_args!("{:?}", black_box(a)));
//format!("{:?}", black_box(a));

@RalfJung
Copy link
Member

The wild thing is that black_box(a) doesn't cause this... so format_args is an even blacker box than black_box, or so? black_box is supposed to be a wildcard, and for all LLVM knows it might be calling format_args...

In fact looks like adding black_box actually enables the optimization?!?

@Imberflur
Copy link

Imberflur commented Feb 13, 2023

Reduced to black_box(&a) (edit: here is a playground link)

@JakobDegen
Copy link
Contributor Author

The wild thing is that black_box(a) doesn't cause this... so format_args is an even blacker box than black_box, or so? black_box is supposed to be a wildcard, and for all LLVM knows it might be calling format_args...

Yeah, this isn't all too surprising, the difference between black_box(a) and black_box(&a) is quite significant

@apiraino
Copy link
Contributor

apiraino commented Feb 14, 2023

Result from my bisection points to a rollup. Out of that rollup likely #102232, I guess?

searched nightlies: from nightly-2022-01-01 to nightly-2023-02-14
regressed nightly: nightly-2022-09-29
searched commit range: 470e518...ce7f0f1
regressed commit: 837bf37

bisected with cargo-bisect-rustc v0.6.5

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --start=2022-01-01 --script script.sh 

Also - If I'm correct - this should tag more t-libs than t-compiler

@rustbot label -t-compiler +t-libs +t-libs-api

@rustbot rustbot added T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Feb 14, 2023
@apiraino apiraino removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 14, 2023
@DoubleHyphen
Copy link

Is it note-worthy that the issue can trigger without using black_box at all?

@RalfJung
Copy link
Member

Yeah, this isn't all too surprising, the difference between black_box(a) and black_box(&a) is quite significant

How's that not surprising?^^ In terms of tings like "exposing the pointer", both should be equivalent...

Is it note-worthy that the issue can trigger without using black_box at all?

In fact even the original examples work without black-box, e.g.

fn main() {
    let a: *const u8;
    let b: *const u8;
    {
        let v: [u8; 16] = [0; 16];
        a = &(v[0]);
    }
    {
        let v: [u8; 16] = [0; 16];
        b = &(v[0]);
    }
    println!("{a:?} == {b:?} evaluates to {}", a==b);
    println!("{a:?} == {b:?} evaluates to {}", a==b);
}

@RalfJung
Copy link
Member

RalfJung commented Feb 14, 2023

Result from my bisection points to a rollup. Out of that rollup likely #102232, I guess?

That just stabilizes black_box without changing its behavior. Presumably you need to add a feature flag to continue bisecting this code further into the past?

Definitely looks like a t-compiler issue to me.

@eggyal
Copy link
Contributor

eggyal commented Feb 14, 2023

Somewhat simpler, still demonstrating misbehaviour, again no black_box. Likewise with usize rather than *const types.

Edit: updated even simpler.

@DoubleHyphen
Copy link

I whipped up a quick example for the Compiler Explorer, and tried it with the 10 newest and the 10 oldest stable versions of the compiler. All of them appear to fold the comparison to xor eax eax, ie all of them arbitrarily decide it's false.

What's even more staggering is that this behaviour persists even if we use strict provenance and expose_addr. I'd been hoping that this would fix the issue, but no.

@pitaj
Copy link
Contributor

pitaj commented Feb 14, 2023

Here's an example without the prints in between, using a bunch of black boxes:

https://godbolt.org/z/cKMra38a8

It appears that copying the value from either integer causes llvm to "realize" that they're actually equal. Before that it assumes they're not equal.

Edit: actually only the one black box is needed (let c = black_box(a)) the others aren't necessary to reproduce

@apiraino
Copy link
Contributor

apiraino commented Feb 14, 2023

Hmm you are all right here, the black_box just confused me. And as @DoubleHyphen points out it's not even a regression, I can reproduce with any version of rustc.

@rustbot label T-compiler -T-libs -T-libs-api P-high

@rustbot rustbot added P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 14, 2023
@RalfJung
Copy link
Member

Yeah we know what it's triggered by: llvm/llvm-project#45725. It's a very hard to fix bug in LLVM boiling down to different parts of LLVM making contradicting assumptions about the semantics of lifetime intrinsics, fundamentally caused by never properly deciding and documenting the semantics of said intrinsics.

@DoubleHyphen
Copy link

DoubleHyphen commented Oct 19, 2023

/r/rust already managed to turn this into a segfault in safe code: https://www.reddit.com/r/rust/comments/112crfj/comment/j8ml9zh/?utm_source=reddit&utm_medium=web2x&context=3

Inlining the code to avoid indirection/loss, credits to u/duckerude:

use std::cell::RefCell;

fn main() {
    let a = {
        let v = 0u8;
        &v as *const _ as usize
    };
    let b = {
        let v = 0u8;
        &v as *const _ as usize
    };
    let i = a - b;
    let arr = [
        RefCell::new(Some(Box::new(1))),
        RefCell::new(None),
        RefCell::new(None),
    ];
    assert_ne!(i, 0);
    let r = arr[i].borrow();
    let r = r.as_ref().unwrap();
    *arr[0].borrow_mut() = None;
    println!("{}", *r);
}

Out of curiosity, I tried those two examples in the playground again, to see if the behaviour had changed from version to version.

EDIT: Both examples currently fail with Exited with signal 11 (SIGSEGV): segmentation violation, ie pretty much as previously.

Original post preserved for history:

Guess what… no segfault! But it was obviously miscompiling, since the final println! never appeared.

Just to make sure that segfaults were caught, I replaced main with

fn main() {
    unsafe { core::hint::unreachable_unchecked() }
}

and… guess what! No segfault!

…Do those functions actually not segfault, or has the playground just stopped reporting segfaults when they happen?

@wesleywiser wesleywiser added the WG-llvm Working group: LLVM backend code generation label Nov 17, 2023
GrigorenkoPV pushed a commit to GrigorenkoPV/rust that referenced this issue Jun 26, 2024
GrigorenkoPV added a commit to GrigorenkoPV/rust that referenced this issue Jun 26, 2024
GrigorenkoPV added a commit to GrigorenkoPV/rust that referenced this issue Jul 6, 2024
GrigorenkoPV added a commit to GrigorenkoPV/rust that referenced this issue Jul 6, 2024
GrigorenkoPV added a commit to GrigorenkoPV/rust that referenced this issue Jul 6, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 16, 2024
Add a test for rust-lang#107975

The int is zero. But also not zero. This is so much fun.

This is a part of rust-lang#105107.

Initially I was going to just rebase rust-lang#108445, but quite a few things changed since then:
* The [mcve](rust-lang#105787 (comment)) used for rust-lang#105787 got fixed.[^upd2]
* You can't just `a ?= b` for rust-lang#107975 anymore. Now you have to `a-b ?= 0`. This is what this PR does. As an additional flex, it show that three ways of converting a pointer to its address have this issue:
  1. `as usize`
  2. `.expose_provenance()`
  3. `.addr()`
* rust-lang#108425 simply got fixed. Yay.

As an aside, the naming for `addr_of!` is quite unfortunate in context of provenance APIs. Because `addr_of!` gives you a pointer, but what provenance APIs refer to as "address" is the `usize` value. Oh well.

UPD1: GitHub is incapable of parsing rust-lang#107975 in the PR name, so let's add it here.

[^upd2]: UPD2: [The other mcve](rust-lang#105787 (comment)) does not work anymore either, saying "this behavior recently changed as a result of a bug fix; see rust-lang#56105 for details."
GrigorenkoPV added a commit to GrigorenkoPV/rust that referenced this issue Jul 16, 2024
tgross35 added a commit to tgross35/rust that referenced this issue Jul 17, 2024
Add a test for rust-lang#107975

The int is zero. But also not zero. This is so much fun.

This is a part of rust-lang#105107.

Initially I was going to just rebase rust-lang#108445, but quite a few things changed since then:
* The [mcve](rust-lang#105787 (comment)) used for rust-lang#105787 got fixed.[^upd2]
* You can't just `a ?= b` for rust-lang#107975 anymore. Now you have to `a-b ?= 0`. This is what this PR does. As an additional flex, it show that three ways of converting a pointer to its address have this issue:
  1. `as usize`
  2. `.expose_provenance()`
  3. `.addr()`
* rust-lang#108425 simply got fixed. Yay.

As an aside, the naming for `addr_of!` is quite unfortunate in context of provenance APIs. Because `addr_of!` gives you a pointer, but what provenance APIs refer to as "address" is the `usize` value. Oh well.

UPD1: GitHub is incapable of parsing rust-lang#107975 in the PR name, so let's add it here.

[^upd2]: UPD2: [The other mcve](rust-lang#105787 (comment)) does not work anymore either, saying "this behavior recently changed as a result of a bug fix; see rust-lang#56105 for details."
tgross35 added a commit to tgross35/rust that referenced this issue Jul 17, 2024
Add a test for rust-lang#107975

The int is zero. But also not zero. This is so much fun.

This is a part of rust-lang#105107.

Initially I was going to just rebase rust-lang#108445, but quite a few things changed since then:
* The [mcve](rust-lang#105787 (comment)) used for rust-lang#105787 got fixed.[^upd2]
* You can't just `a ?= b` for rust-lang#107975 anymore. Now you have to `a-b ?= 0`. This is what this PR does. As an additional flex, it show that three ways of converting a pointer to its address have this issue:
  1. `as usize`
  2. `.expose_provenance()`
  3. `.addr()`
* rust-lang#108425 simply got fixed. Yay.

As an aside, the naming for `addr_of!` is quite unfortunate in context of provenance APIs. Because `addr_of!` gives you a pointer, but what provenance APIs refer to as "address" is the `usize` value. Oh well.

UPD1: GitHub is incapable of parsing rust-lang#107975 in the PR name, so let's add it here.

[^upd2]: UPD2: [The other mcve](rust-lang#105787 (comment)) does not work anymore either, saying "this behavior recently changed as a result of a bug fix; see rust-lang#56105 for details."
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 17, 2024
Rollup of 4 pull requests

Successful merges:

 - rust-lang#127003 (Add a test for rust-lang#107975)
 - rust-lang#127763 (Make more Windows functions `#![deny(unsafe_op_in_unsafe_fn)]`)
 - rust-lang#127813 (Prevent double reference in generic futex)
 - rust-lang#127847 (Reviewer on vacation)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 17, 2024
Add a test for rust-lang#107975

The int is zero. But also not zero. This is so much fun.

This is a part of rust-lang#105107.

Initially I was going to just rebase rust-lang#108445, but quite a few things changed since then:
* The [mcve](rust-lang#105787 (comment)) used for rust-lang#105787 got fixed.[^upd2]
* You can't just `a ?= b` for rust-lang#107975 anymore. Now you have to `a-b ?= 0`. This is what this PR does. As an additional flex, it show that three ways of converting a pointer to its address have this issue:
  1. `as usize`
  2. `.expose_provenance()`
  3. `.addr()`
* rust-lang#108425 simply got fixed. Yay.

As an aside, the naming for `addr_of!` is quite unfortunate in context of provenance APIs. Because `addr_of!` gives you a pointer, but what provenance APIs refer to as "address" is the `usize` value. Oh well.

UPD1: GitHub is incapable of parsing rust-lang#107975 in the PR name, so let's add it here.

[^upd2]: UPD2: [The other mcve](rust-lang#105787 (comment)) does not work anymore either, saying "this behavior recently changed as a result of a bug fix; see rust-lang#56105 for details."
GrigorenkoPV added a commit to GrigorenkoPV/rust that referenced this issue Jul 17, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 18, 2024
Add a test for rust-lang#107975

The int is zero. But also not zero. This is so much fun.

This is a part of rust-lang#105107.

Initially I was going to just rebase rust-lang#108445, but quite a few things changed since then:
* The [mcve](rust-lang#105787 (comment)) used for rust-lang#105787 got fixed.[^upd2]
* You can't just `a ?= b` for rust-lang#107975 anymore. Now you have to `a-b ?= 0`. This is what this PR does. As an additional flex, it show that three ways of converting a pointer to its address have this issue:
  1. `as usize`
  2. `.expose_provenance()`
  3. `.addr()`
* rust-lang#108425 simply got fixed. Yay.

As an aside, the naming for `addr_of!` is quite unfortunate in context of provenance APIs. Because `addr_of!` gives you a pointer, but what provenance APIs refer to as "address" is the `usize` value. Oh well.

UPD1: GitHub is incapable of parsing rust-lang#107975 in the PR name, so let's add it here.

[^upd2]: UPD2: [The other mcve](rust-lang#105787 (comment)) does not work anymore either, saying "this behavior recently changed as a result of a bug fix; see rust-lang#56105 for details."
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 18, 2024
Add a test for rust-lang#107975

The int is zero. But also not zero. This is so much fun.

This is a part of rust-lang#105107.

Initially I was going to just rebase rust-lang#108445, but quite a few things changed since then:
* The [mcve](rust-lang#105787 (comment)) used for rust-lang#105787 got fixed.[^upd2]
* You can't just `a ?= b` for rust-lang#107975 anymore. Now you have to `a-b ?= 0`. This is what this PR does. As an additional flex, it show that three ways of converting a pointer to its address have this issue:
  1. `as usize`
  2. `.expose_provenance()`
  3. `.addr()`
* rust-lang#108425 simply got fixed. Yay.

As an aside, the naming for `addr_of!` is quite unfortunate in context of provenance APIs. Because `addr_of!` gives you a pointer, but what provenance APIs refer to as "address" is the `usize` value. Oh well.

UPD1: GitHub is incapable of parsing rust-lang#107975 in the PR name, so let's add it here.

[^upd2]: UPD2: [The other mcve](rust-lang#105787 (comment)) does not work anymore either, saying "this behavior recently changed as a result of a bug fix; see rust-lang#56105 for details."
GrigorenkoPV added a commit to GrigorenkoPV/rust that referenced this issue Jul 18, 2024
GrigorenkoPV added a commit to GrigorenkoPV/rust that referenced this issue Jul 19, 2024
GrigorenkoPV added a commit to GrigorenkoPV/rust that referenced this issue Jul 19, 2024
bors added a commit to rust-lang-ci/rust that referenced this issue Jul 20, 2024
Add a test for rust-lang#107975

The int is zero. But also not zero. This is so much fun.

This is a part of rust-lang#105107.

Initially I was going to just rebase rust-lang#108445, but quite a few things changed since then:
* The [mcve](rust-lang#105787 (comment)) used for rust-lang#105787 got fixed.[^upd2]
* You can't just `a ?= b` for rust-lang#107975 anymore. Now you have to `a-b ?= 0`. This is what this PR does. As an additional flex, it show that three ways of converting a pointer to its address have this issue:
  1. `as usize`
  2. `.expose_provenance()`
  3. `.addr()`
* rust-lang#108425 simply got fixed. Yay.

As an aside, the naming for `addr_of!` is quite unfortunate in context of provenance APIs. Because `addr_of!` gives you a pointer, but what provenance APIs refer to as "address" is the `usize` value. Oh well.

UPD1: GitHub is incapable of parsing rust-lang#107975 in the PR name, so let's add it here.

[^upd2]: UPD2: [The other mcve](rust-lang#105787 (comment)) does not work anymore either, saying "this behavior recently changed as a result of a bug fix; see rust-lang#56105 for details."
@workingjubilee workingjubilee added the I-miscompile Issue: Correct Rust code lowers to incorrect machine code label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation
Projects
None yet
Development

No branches or pull requests