Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM codegen: Don't emit zero-sized padding for fields #87254

Conversation

hkratz
Copy link
Contributor

@hkratz hkratz commented Jul 18, 2021

Currently padding is emitted before fields of a struct and at the end of the struct regardless of the ABI. Even if no padding is required zero-sized padding fields are emitted. This is not useful and - more importantly - it make it impossible to generate the exact vector types that LLVM expects for certain ARM SIMD intrinsics. This change should unblock the implementation of many ARM intrinsics using the unadjusted ABI, see rust-lang/stdarch#1143 (comment).

This is a proof of concept only because the field lookup now takes O(number of fields) time compared to O(1) before since it recalculates the mapping at every lookup. I would like to find out how big the performance impact actually is before implementing caching or restricting this behavior to the unadjusted ABI.

cc @SparrowLii @bjorn3

(Discussion on internals)

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @estebank (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 18, 2021
@Aaron1011
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 18, 2021
@bors
Copy link
Contributor

bors commented Jul 18, 2021

⌛ Trying commit 535bce34741396022cb8e5e0740aaaacb0378f4e with merge 67ef29bb93ef18aa16af35c42098c1987b3b03d2...

@bors
Copy link
Contributor

bors commented Jul 18, 2021

☀️ Try build successful - checks-actions
Build commit: 67ef29bb93ef18aa16af35c42098c1987b3b03d2 (67ef29bb93ef18aa16af35c42098c1987b3b03d2)

@rust-timer
Copy link
Collaborator

Queued 67ef29bb93ef18aa16af35c42098c1987b3b03d2 with parent 1807305, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (67ef29bb93ef18aa16af35c42098c1987b3b03d2): comparison url.

Summary: This change led to significant mixed results 🤷 in compiler performance.

  • Moderate regression in instruction counts (up to 4.9% on full builds of deep-vector-opt)
  • Moderate improvement in instruction counts (up to -1.2% on incr-unchanged builds of deeply-nested-async-opt)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 18, 2021
@hkratz
Copy link
Contributor Author

hkratz commented Jul 18, 2021

I will add a testcase to make sure that this actually makes it possible to fix rust-lang/stdarch#1143 and implement caching of the projections. Once done I will request another perf run.

@bjorn3
Copy link
Member

bjorn3 commented Jul 19, 2021

@bors try @rust-timer queue

per the request on zulip

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 19, 2021
@bors
Copy link
Contributor

bors commented Jul 19, 2021

⌛ Trying commit 8afaf9e3f8f64e9fc9a6e713bca7032df45fb64f with merge 6a60d9e36a40b362f083b88fa742e5a1377ce10c...

@bors
Copy link
Contributor

bors commented Jul 19, 2021

☀️ Try build successful - checks-actions
Build commit: 6a60d9e36a40b362f083b88fa742e5a1377ce10c (6a60d9e36a40b362f083b88fa742e5a1377ce10c)

@rust-timer
Copy link
Collaborator

Queued 6a60d9e36a40b362f083b88fa742e5a1377ce10c with parent 0ecff8c, future comparison URL.

@SparrowLii
Copy link
Member

SparrowLii commented Jul 19, 2021

Thanks a lot for submitting this!

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (6a60d9e36a40b362f083b88fa742e5a1377ce10c): comparison url.

Summary: This change led to significant mixed results 🤷 in compiler performance.

  • Moderate regression in instruction counts (up to 4.7% on full builds of deep-vector-opt)
  • Moderate improvement in instruction counts (up to -1.0% on full builds of ctfe-stress-4-check)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 19, 2021
@kornelski
Copy link
Contributor

kornelski commented Jul 19, 2021

Number of fields is usually small, so try using a smallvec.

Also if a type has no padding at all, could you not put it in the field projection cache and use memory_index as a fallback? I assume many/most types wouldn't have padding, so it could shrink the cache hashmap considerably.

@hkratz
Copy link
Contributor Author

hkratz commented Jul 20, 2021

@kornelski:

  1. As far as I can see smallvec data structures are not used at all in rustc so far, maybe I am missing something. I am also not even sure a smallvec would make much of a difference here for overall perf.

    (Aside: What I would like to look at is a replacement datastructure for Vec<u/isize> optimized for small values, taking up only only 1, 2, 4 bytes per value as long as all values fit (initial value range hinted at creation) reallocating as necessary and that with smallvec optimization as well (e.g. 64 bytes which could be used for 64u8, 32 u16, 16 u32, 8 u64). From first glance quite a few data structures in the compiler could benefit from that.)

  2. Not storing the projection in case that there is no padding sounds like a good idea. Will look into that.

  3. Also I am not sure yet why there is such a relatively large slowdown for this particular testcase. Will do some profiling.

@klensy
Copy link
Contributor

klensy commented Jul 20, 2021

@hkratz You searched bad, even github will find usages of smallvec https://github.com/rust-lang/rust/search?q=smallvec .

@hkratz
Copy link
Contributor Author

hkratz commented Jul 20, 2021

@hkratz You searched bad, even github will find usages of smallvec https://github.com/rust-lang/rust/search?q=smallvec .

No idea how I missed that... Not sure if it will improve performance though either. Will dig some more.

@hkratz
Copy link
Contributor Author

hkratz commented Jul 21, 2021

The additional time for the deep-vector-opt is not directly related to the changed code. Instead the thin-local LTO apparently takes longer for the emitted code without zero-sized padding:

With thin-local LTO (default): 4% more time taken

$ hyperfine -w1 -r3 -s basic "~/rusttest/master/bin/rustc -C opt-level=3 src/main.rs" "~/rusttest/new/bin/rustc -C opt-level=3 src/main.rs"
Benchmark #1: ~/rusttest/master/bin/rustc -C opt-level=3 src/main.rs
  Time (mean ± σ):      5.123 s ±  0.009 s    [User: 5.015 s, System: 0.196 s]
  Range (min … max):    5.113 s …  5.131 s    3 runs

Benchmark #2: ~/rusttest/new/bin/rustc -C opt-level=3 src/main.rs
  Time (mean ± σ):      5.319 s ±  0.007 s    [User: 5.198 s, System: 0.212 s]
  Range (min … max):    5.313 s …  5.328 s    3 runs

Summary
  '~/rusttest/master/bin/rustc -C opt-level=3 src/main.rs' ran
    1.04 ± 0.00 times faster than '~/rusttest/new/bin/rustc -C opt-level=3 src/main.rs'

With thin-local LTO disabled:

$ hyperfine -w1 -r3 -s basic "~/rusttest/master/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs" "~/rusttest/new/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs"
Benchmark #1: ~/rusttest/master/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs
  Time (mean ± σ):      3.841 s ±  0.007 s    [User: 3.721 s, System: 0.128 s]
  Range (min … max):    3.834 s …  3.847 s    3 runs

Benchmark #2: ~/rusttest/new/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs
  Time (mean ± σ):      3.847 s ±  0.019 s    [User: 3.722 s, System: 0.133 s]
  Range (min … max):    3.829 s …  3.867 s    3 runs

Summary
  '~/rusttest/master/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs' ran
    1.00 ± 0.01 times faster than '~/rusttest/new/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs'

I have no idea why that is the case but I don't think that it should block this PR. The generated LL IR code looks as expected.

@hkratz hkratz marked this pull request as ready for review July 21, 2021 10:18
@Mark-Simulacrum
Copy link
Member

@bors retry spurious network

fatal: unable to access 'https://github.com/servo/servo/': GnuTLS recv error (-54): Error in the pull function.

@bors
Copy link
Contributor

bors commented Aug 10, 2021

⌛ Testing commit 02295f4 with merge 14795284b7e3779b7a45eedd6cfb4a7122cf1770...

@rust-log-analyzer
Copy link
Collaborator

The job x86_64-msvc-tools failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
 1 file changed, 1 insertion(+)
fatal: Cannot prompt because user interactivity has been disabled.
remote: fatal error in commit_refs        
To https://github.com/rust-lang-nursery/rust-toolstate
 ! [remote rejected] master -> master (failure)
error: failed to push some refs to 'https://github.com/rust-lang-nursery/rust-toolstate'
Sleeping for 3 seconds before retrying push
 * branch            master     -> FETCH_HEAD
HEAD is now at b95734b (linux CI update)
[master 14ae085] (windows CI update)
 1 file changed, 1 insertion(+)
 1 file changed, 1 insertion(+)
remote: fatal error in commit_refs        
To https://github.com/rust-lang-nursery/rust-toolstate
 ! [remote rejected] master -> master (failure)
error: failed to push some refs to 'https://github.com/rust-lang-nursery/rust-toolstate'
Sleeping for 3 seconds before retrying push
 * branch            master     -> FETCH_HEAD
HEAD is now at b95734b (linux CI update)
[master 80cf565] (windows CI update)
 1 file changed, 1 insertion(+)
 1 file changed, 1 insertion(+)
remote: fatal error in commit_refs        
To https://github.com/rust-lang-nursery/rust-toolstate
 ! [remote rejected] master -> master (failure)
error: failed to push some refs to 'https://github.com/rust-lang-nursery/rust-toolstate'
Sleeping for 3 seconds before retrying push
 * branch            master     -> FETCH_HEAD
HEAD is now at b95734b (linux CI update)
[master 246f0cf] (windows CI update)
 1 file changed, 1 insertion(+)
---
Everything up-to-date
Sleeping for 3 seconds before retrying push
From https://github.com/rust-lang-nursery/rust-toolstate
 * branch            master     -> FETCH_HEAD
HEAD is now at b95734b (linux CI update)
thread 'main' panicked at 'Failed to update toolstate repository with new data', src\bootstrap\toolstate.rs:445:9
Build completed unsuccessfully in 0:03:02

@bors
Copy link
Contributor

bors commented Aug 10, 2021

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 10, 2021
@ehuss
Copy link
Contributor

ehuss commented Aug 10, 2021

@bors retry

Failed while pushing toolstate.

https://www.githubstatus.com/incidents/rmfrw9dfbtbp

remote: fatal error in commit_refs        
To https://github.com/rust-lang-nursery/rust-toolstate
 ! [remote rejected] master -> master (failure)
error: failed to push some refs to 'https://github.com/rust-lang-nursery/rust-toolstate'

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 10, 2021
@bors
Copy link
Contributor

bors commented Aug 10, 2021

⌛ Testing commit 02295f4 with merge 97d8fddfa14b660a24ffeed2ab0784aaa379a9ff...

@bors
Copy link
Contributor

bors commented Aug 10, 2021

💥 Test timed out

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 10, 2021
@ehuss
Copy link
Contributor

ehuss commented Aug 10, 2021

@bors retry

Tests finished successfully hours ago, but bors didn't get triggered on the success. Presumably due to GitHub incident.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 10, 2021
@bors
Copy link
Contributor

bors commented Aug 11, 2021

⌛ Testing commit 02295f4 with merge 47b41b7...

@bors
Copy link
Contributor

bors commented Aug 11, 2021

☀️ Test successful - checks-actions
Approved by: eddyb
Pushing 47b41b7 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 11, 2021
@bors bors merged commit 47b41b7 into rust-lang:master Aug 11, 2021
@rustbot rustbot added this to the 1.56.0 milestone Aug 11, 2021
@hkratz hkratz deleted the rustc_codegen_llvm_dont_emit_zero_sized_padding branch August 11, 2021 04:46
@pnkfelix
Copy link
Member

Visiting for perf triage. Seems like the regression to deep-vector-opt is anticipated, and acceptable cost to get this in. Other than that, I don't see anything significant in the perf comparison for when this landed. Marking as perf-regression-triaged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.