Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slice: avoid calling memcmp with size 0 #113435

Closed
wants to merge 1 commit into from

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Jul 7, 2023

This is more of an invitation for discussion that "I am sure we need to change this": according to C (C18 §7.1.4), it is UB to call memcmp with an "invalid" pointer. In Rust, we consider ptr::invalid(42) a perfectly valid address for an empty slice of u8. Whether C considers such a pointer (roughly equivalent to (int*)42) to be "invalid" is not entirely clear to me. The section on "Cast operators" in C is very short so I assume int-to-ptr casts are specified (or rather, mostly left unspecified) elsewhere. In the future things might get worse if t-opsem decides that more references are valid for size 0 (including OOB/UAF pointers) -- those are definitely "invalid" in C. But then, it is not clear which part of those rules apply when the function is called from another language.

The safe thing to do is to avoid calling memcmp when the size is 0. However I don't know the cost of this (in terms of performance, in particular).

Cc @rust-lang/libs

@rustbot
Copy link
Collaborator

rustbot commented Jul 7, 2023

r? @scottmcm

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 7, 2023
@Amanieu
Copy link
Member

Amanieu commented Jul 7, 2023

I would prefer if we did the same thing we already do for memcpy and just require the C library to provide a reasonable implementation that doesn't do anything when given a length of 0.

@RalfJung
Copy link
Member Author

RalfJung commented Jul 7, 2023

We don't do that for memcpy though, LLVM does. We just call LLVM's memcpy intrinsic (not the C function!) which is documented to be a NOP on size 0. It is up to LLVM to implement those semantics. Other codegen backends might implement our copy_nonoverlapping intrinsics differently.

We can of course start making such an assumption directly from the Rust side as well. Then that should be documented somewhere though. Also, does that mean all Rust code calling memcmp gets to make the same assumption? For Miri's sake I'd rather not have different rules for the standard library and other Rust code.

Cc @rust-lang/opsem

@bjorn3
Copy link
Member

bjorn3 commented Jul 13, 2023

What about adding a new intrinsic for this which a codegen backend may lower to memcmp if it knows it is fine to call it with size 0?

@RalfJung
Copy link
Member Author

Sure, then Miri could apply different rules. However in terms of documenting this, that would just move the responsibility about this assumption from t-libs to t-compiler.

@bjorn3
Copy link
Member

bjorn3 commented Jul 13, 2023

LLVM assigns defined behavior to memcmp with size 0 1. While I can't figure out how to make it insert a memcmp call with size 0, I don't think it is unrealistic for this to happen now or in the future. And when it happens t-compiler still has the burden of this assumption even if t-libs decided to avoid making this assumption in the standard library.

Footnotes

  1. https://github.com/llvm/llvm-project/blob/9418c40af7ec6913112b82d6f1d6e8a8c43af6c0/llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp#L625-L626

@RalfJung
Copy link
Member Author

LLVM assigns defined behavior to memcmp with size 0 1.

Yeah but that's their intrinsic, not ours. ;)

@scottmcm scottmcm added I-libs-nominated Nominated for discussion during a libs team meeting. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 29, 2023
@scottmcm
Copy link
Member

Yeah but that's their intrinsic, not ours. ;)

Should we add a rustc intrinsic for memcmp, then, like we have https://doc.rust-lang.org/nightly/std/intrinsics/fn.copy_nonoverlapping.html instead of calling memcpy via extern "C"?

That would let the length check be a codegen issue, rather than a library issue.


(I visited this from my review queue, but I tweaked a bunch of labels because it seemed like this wasn't just wanting on code review, but more a policy discussion. Feel free to set it back when it's in a "just confirm the code" state.)

@RalfJung
Copy link
Member Author

That would help in one case: if we don't want Rust code calling extern "C" fn memcmp to also get a guarantee that on size 0 this is never UB.

I don't know which team is the one to make that call... T-lang?

@Amanieu
Copy link
Member

Amanieu commented Aug 2, 2023

We discussed this in the libs meeting today. We don't want to add an unnecessary check for something that will never be a real issue in practice. All real memcmp implementations will not touch any memory or have any UB when passed a size of 0. The only real requirement is that the pointers are non-null, which we already guarantee.

We see 2 potential ways to resolve this:

  • Add an intrinsic for memcmp like we do for memcpy, which has additional requirements on top of C's.
  • Argue that every memory address is a valid zero-sized allocation (a concept which doesn't exist in C) and therefore the pointers passed to memcmp are always valid. No code changes are needed.

@Amanieu Amanieu removed I-libs-nominated Nominated for discussion during a libs team meeting. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Aug 2, 2023
@scottmcm
Copy link
Member

scottmcm commented Aug 2, 2023

I'll make a PR to make an intrinsic for it. That'll give a nice place to write our expectations for it, and have it up to the backend to guarantee those things as needed.

EDIT: PR #114382

@RalfJung
Copy link
Member Author

RalfJung commented Aug 2, 2023

I was pointed to https://doc.rust-lang.org/nightly/core/index.html#how-to-use-the-core-library as a good place to document this assumption. Do you want to do that in the same PR or should I make a separate PR?

@RalfJung
Copy link
Member Author

RalfJung commented Aug 3, 2023

I opened #114412 for the documentation part. So this PR can be closed now.

@RalfJung RalfJung closed this Aug 3, 2023
@RalfJung RalfJung deleted the memcmp branch August 3, 2023 12:29
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 7, 2023
…r=cjgillot

Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly

As discussed in rust-lang#113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target.  (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)

Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.

cc `@RalfJung` `@Amanieu`
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Aug 9, 2023
…r=cjgillot

Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly

As discussed in rust-lang#113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target.  (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)

Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.

cc `@RalfJung` `@Amanieu`
bors added a commit to rust-lang/miri that referenced this pull request Aug 15, 2023
C `mem` function shims: consistently treat "invalid" pointers as UB

Depends on rust-lang/rust#113435.
RalfJung pushed a commit to RalfJung/rust that referenced this pull request Aug 17, 2023
C `mem` function shims: consistently treat "invalid" pointers as UB

Depends on rust-lang#113435.
antoyo pushed a commit to antoyo/rust that referenced this pull request Oct 9, 2023
…r=cjgillot

Add a new `compare_bytes` intrinsic instead of calling `memcmp` directly

As discussed in rust-lang#113435, this lets the backends be the place that can have the "don't call the function if n == 0" logic, if it's needed for the target.  (I didn't actually *add* those checks, though, since as I understood it we didn't actually need them on known targets?)

Doing this also let me make it `const` (unstable), which I don't think `extern "C" fn memcmp` can be.

cc `@RalfJung` `@Amanieu`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants