Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

directly expose copy and copy_nonoverlapping intrinsics #81238

Merged
merged 2 commits into from
Feb 13, 2021

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Jan 21, 2021

This effectively un-does #57997. That should help with ptr::read codegen in debug builds (and any other of these low-level functions that bottoms out at copy/copy_nonoverlapping), where the wrapper function will not get inlined. See the discussion in #80290 and #81163.

Cc @bjorn3 @therealprof

@rust-highfive
Copy link
Collaborator

r? @cramertj

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 21, 2021
@rust-log-analyzer

This comment has been minimized.

@alecmocatta
Copy link
Contributor

This loses the (currently disabled until they can be constified) debug assertions. What're the upsides to this approach over marking the wrapper functions #[inline(always)], aside from the cg_clif issue mentioned here #81163 (comment)? Is there a known non-negligible build time downside to #[inline(always)] wrapper functions?

@RalfJung
Copy link
Member Author

This is not the first time those debug assertions came under criticism; e.g., that's why they were turned into an abort previously. They might even have contributed significantly to the slowdown that prompted debug-assertions-std to be split from debug-assertions (in config.toml), though I do not think this was ever benchmarked. At the same time, these debug assertions are not actually helping users since their libstd is a release build (even for cargo build --debug).

So together with that const problem and the discussion about function call overhead, I figured maybe those debug assertions are simply not worth the trouble. I don't know what one would have to measure to evaluate if inline(always) is a viable alternative; if someone wants to take over and do those experiments, I'd be happy to go with that approach instead.

@tmiasko
Copy link
Contributor

tmiasko commented Jan 21, 2021

@alecmocatta if you are comparing the removal of the wrappers with an addition of #[inline(always)] to them, there are a few extra costs associated with the latter. First, the wrappers have to be monomorphized and generated in each code generation unit. Then LLVM will have to inline calls to them, and at debuginfo > 0 produce additional debug information describing the inlining that took place. On the other hand, simply removing the wrappers has none of those costs.

If MIR inlining were to be enabled by default, the inlining could take place at MIR level when code is still generic, amortizing some of those costs, but we aren't at this point yet.

@alecmocatta
Copy link
Contributor

alecmocatta commented Jan 21, 2021

@tmiasko Thanks, that's helpful. My thinking is that #[inline(always)] on the wrapper functions might in practise have a statistically insignificant effect on build times vs this PR, ~100% of its codegen win, while also preserving the possibility of the debug assertions being re-enabled when they can be constified.

To be honest I thought MIR inlining was happening for #[inline(always)], but I think there's something I'm missing. What is it that's inlining foo here if you check the LLVM IR? Edit: Ahah! There's a special LLVM pass that's enabled even in debug builds specifically to handle #[inline(always)] functions.

@RalfJung
Copy link
Member Author

RalfJung commented Jan 22, 2021

Another advantage of this PR is that @tmiasko @wesleywiser's copy_nonoverlapping(_, _, 1)-lowering-pass will kick in -- this runs in rustc, pre-inlining.

@tmiasko
Copy link
Contributor

tmiasko commented Jan 24, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@bors
Copy link
Contributor

bors commented Jan 24, 2021

⌛ Trying commit 18d12ad with merge 375082d2e33538dc4a882ee01db94cd61563372c...

@bors
Copy link
Contributor

bors commented Jan 24, 2021

☀️ Try build successful - checks-actions
Build commit: 375082d2e33538dc4a882ee01db94cd61563372c (375082d2e33538dc4a882ee01db94cd61563372c)

@rust-timer
Copy link
Collaborator

Queued 375082d2e33538dc4a882ee01db94cd61563372c with parent 9a9477f, future comparison URL.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 24, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (375082d2e33538dc4a882ee01db94cd61563372c): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 25, 2021
@bjorn3
Copy link
Member

bjorn3 commented Jan 25, 2021

Huge improvements of up to 2.3% with a few regressions of up to 0.8%.

@camelid camelid added the A-intrinsics Area: Intrinsics label Feb 12, 2021
@RalfJung
Copy link
Member Author

@cramertj any suggestions for how to proceed on this?
Cc'ing some other possible reviewers -- @lcnr @m-ou-se

@m-ou-se m-ou-se assigned m-ou-se and unassigned cramertj Feb 13, 2021
@m-ou-se m-ou-se added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Feb 13, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Feb 13, 2021

Looks good to me.

The only downside is that

// FIXME: Perform these checks only at run time

now disappears without a place to fix it. But since debug assertions in std are pretty much useless, I don't think that matters much. I don't think this FIXME would've been resolved anyway, other than by removing it, which is what this PR does.

@bors r+

@bors
Copy link
Contributor

bors commented Feb 13, 2021

📌 Commit 1a80635 has been approved by m-ou-se

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 13, 2021
@bors
Copy link
Contributor

bors commented Feb 13, 2021

⌛ Testing commit 1a80635 with merge 8e54a21...

@bors
Copy link
Contributor

bors commented Feb 13, 2021

☀️ Test successful - checks-actions
Approved by: m-ou-se
Pushing 8e54a21 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 13, 2021
@bors bors merged commit 8e54a21 into rust-lang:master Feb 13, 2021
@rustbot rustbot added this to the 1.52.0 milestone Feb 13, 2021
@bors bors mentioned this pull request Feb 13, 2021
@RalfJung RalfJung deleted the copy-intrinsics branch February 14, 2021 10:32
flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 25, 2021
directly expose copy and copy_nonoverlapping intrinsics

This effectively un-does rust-lang#57997. That should help with `ptr::read` codegen in debug builds (and any other of these low-level functions that bottoms out at `copy`/`copy_nonoverlapping`), where the wrapper function will not get inlined. See the discussion in rust-lang#80290 and rust-lang#81163.

Cc `@bjorn3` `@therealprof`
bors added a commit to rust-lang-ci/rust that referenced this pull request May 3, 2021
…imulacrum

[stable] 1.52.0 release

This includes the release notes (rust-lang#84183) as well as cherry-picked commits from:

* [beta] revert PR rust-lang#77885 rust-lang#84710
* [beta] remove assert_matches rust-lang#84759
* Revert PR 81473 to resolve (on beta) issues 81626 and 81658. rust-lang#83171
* [beta] rustdoc revert deref recur rust-lang#84868
*  Fix ICE of for-loop mut borrowck where no suggestions are available rust-lang#83401

Additionally in "fresh work" we're also:

* reverting: directly expose copy and copy_nonoverlapping intrinsics rust-lang#81238 to avoid rust-lang#84297 on 1.52
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 9, 2021
…Mark-Simulacrum

Make copy/copy_nonoverlapping fn's again

Make copy/copy_nonoverlapping fn's again, rather than intrinsics.

This a short-term change to address issue rust-lang#84297.

It effectively reverts PRs rust-lang#81167 rust-lang#81238 (and part of rust-lang#82967), rust-lang#83091, and parts of rust-lang#79684.
@RalfJung RalfJung mentioned this pull request Jun 14, 2021
3 tasks
flip1995 pushed a commit to flip1995/rust that referenced this pull request Jun 17, 2021
…Mark-Simulacrum

Make copy/copy_nonoverlapping fn's again

Make copy/copy_nonoverlapping fn's again, rather than intrinsics.

This a short-term change to address issue rust-lang#84297.

It effectively reverts PRs rust-lang#81167 rust-lang#81238 (and part of rust-lang#82967), rust-lang#83091, and parts of rust-lang#79684.
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 8, 2021
Avoid using the `copy_nonoverlapping` wrapper through `mem::replace`.

This is a much simpler way to achieve the pre-rust-lang#86003 behavior of `mem::replace` not needing dynamically-sized `memcpy`s (at least before inlining), than re-doing rust-lang#81238 (which needs rust-lang#86699 or something similar).

I didn't notice it until recently, but `ptr::write` already explicitly avoided using the wrapper, while `ptr::read` just called the wrapper (and was the reason for us observing any behavior change from rust-lang#86003 in Rust-GPU).

<hr/>

The codegen test I've added fails without the change to `core::ptr::read` like this (ignore the `v0` mangling, I was using a worktree with it turned on by default, for this):
```llvm
       13: ; core::intrinsics::copy_nonoverlapping::<u8>
       14: ; Function Attrs: inlinehint nonlazybind uwtable
       15: define internal void `@_RINvNtCscK5tvALCJol_4core10intrinsics19copy_nonoverlappinghECsaS4X3EinRE8_25mem_replace_direct_memcpy(i8*` %src, i8* %dst, i64 %count) unnamed_addr #0 {
       16: start:
       17:  %0 = mul i64 %count, 1
       18:  call void `@llvm.memcpy.p0i8.p0i8.i64(i8*` align 1 %dst, i8* align 1 %src, i64 %0, i1 false)
not:17      !~~~~~~~~~~~~~~~~~~~~~                                                                     error: no match expected
       19:  ret void
       20: }
```
With the `core::ptr::read` change, `core::intrinsics::copy_nonoverlapping` doesn't get instantiated and the test passes.

<hr/>

r? `@m-ou-se` cc `@nagisa` (codegen test) `@oli-obk` / `@RalfJung` (miri diagnostic changes)
@RalfJung
Copy link
Member Author

RalfJung commented Nov 27, 2022

FYI, this got reverted shortly after landing due to being a breaking change, and the PR that would have made this possible got closed due to inactivity (#86699). However, #87827 should still help with code quality for read and functions using that (such as replace). I don't have plans myself to pursue this further. If someone thinks something should still be done here, please open an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-intrinsics Area: Intrinsics merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.