LLVM codegen: Don't emit zero-sized padding for fields #87254

hkratz · 2021-07-18T13:18:30Z

Currently padding is emitted before fields of a struct and at the end of the struct regardless of the ABI. Even if no padding is required zero-sized padding fields are emitted. This is not useful and - more importantly - it make it impossible to generate the exact vector types that LLVM expects for certain ARM SIMD intrinsics. This change should unblock the implementation of many ARM intrinsics using the unadjusted ABI, see rust-lang/stdarch#1143 (comment).

This is a proof of concept only because the field lookup now takes O(number of fields) time compared to O(1) before since it recalculates the mapping at every lookup. I would like to find out how big the performance impact actually is before implementing caching or restricting this behavior to the unadjusted ABI.

cc @SparrowLii @bjorn3

(Discussion on internals)

rust-highfive · 2021-07-18T13:18:32Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @estebank (or someone else) soon.

Please see the contribution instructions for more information.

Aaron1011 · 2021-07-18T14:07:14Z

@bors try @rust-timer queue

rust-timer · 2021-07-18T14:07:15Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-07-18T14:07:23Z

⌛ Trying commit 535bce34741396022cb8e5e0740aaaacb0378f4e with merge 67ef29bb93ef18aa16af35c42098c1987b3b03d2...

bors · 2021-07-18T14:56:06Z

☀️ Try build successful - checks-actions
Build commit: 67ef29bb93ef18aa16af35c42098c1987b3b03d2 (67ef29bb93ef18aa16af35c42098c1987b3b03d2)

rust-timer · 2021-07-18T14:56:07Z

Queued 67ef29bb93ef18aa16af35c42098c1987b3b03d2 with parent 1807305, future comparison URL.

rust-timer · 2021-07-18T19:07:55Z

Finished benchmarking try commit (67ef29bb93ef18aa16af35c42098c1987b3b03d2): comparison url.

Summary: This change led to significant mixed results 🤷 in compiler performance.

Moderate regression in instruction counts (up to 4.9% on full builds of deep-vector-opt)
Moderate improvement in instruction counts (up to -1.2% on incr-unchanged builds of deeply-nested-async-opt)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

hkratz · 2021-07-18T19:23:15Z

I will add a testcase to make sure that this actually makes it possible to fix rust-lang/stdarch#1143 and implement caching of the projections. Once done I will request another perf run.

bjorn3 · 2021-07-19T09:22:40Z

@bors try @rust-timer queue

per the request on zulip

rust-timer · 2021-07-19T09:22:41Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-07-19T09:22:50Z

⌛ Trying commit 8afaf9e3f8f64e9fc9a6e713bca7032df45fb64f with merge 6a60d9e36a40b362f083b88fa742e5a1377ce10c...

bors · 2021-07-19T10:23:12Z

☀️ Try build successful - checks-actions
Build commit: 6a60d9e36a40b362f083b88fa742e5a1377ce10c (6a60d9e36a40b362f083b88fa742e5a1377ce10c)

rust-timer · 2021-07-19T10:23:14Z

Queued 6a60d9e36a40b362f083b88fa742e5a1377ce10c with parent 0ecff8c, future comparison URL.

SparrowLii · 2021-07-19T11:24:29Z

Thanks a lot for submitting this!

rust-timer · 2021-07-19T12:51:17Z

Finished benchmarking try commit (6a60d9e36a40b362f083b88fa742e5a1377ce10c): comparison url.

Summary: This change led to significant mixed results 🤷 in compiler performance.

Moderate regression in instruction counts (up to 4.7% on full builds of deep-vector-opt)
Moderate improvement in instruction counts (up to -1.0% on full builds of ctfe-stress-4-check)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

kornelski · 2021-07-19T14:30:31Z

Number of fields is usually small, so try using a smallvec.

Also if a type has no padding at all, could you not put it in the field projection cache and use memory_index as a fallback? I assume many/most types wouldn't have padding, so it could shrink the cache hashmap considerably.

hkratz · 2021-07-20T05:12:07Z

@kornelski:

As far as I can see smallvec data structures are not used at all in rustc so far, maybe I am missing something. I am also not even sure a smallvec would make much of a difference here for overall perf.

(Aside: What I would like to look at is a replacement datastructure for Vec<u/isize> optimized for small values, taking up only only 1, 2, 4 bytes per value as long as all values fit (initial value range hinted at creation) reallocating as necessary and that with smallvec optimization as well (e.g. 64 bytes which could be used for 64u8, 32 u16, 16 u32, 8 u64). From first glance quite a few data structures in the compiler could benefit from that.)
Not storing the projection in case that there is no padding sounds like a good idea. Will look into that.
Also I am not sure yet why there is such a relatively large slowdown for this particular testcase. Will do some profiling.

klensy · 2021-07-20T10:48:42Z

@hkratz You searched bad, even github will find usages of smallvec https://github.com/rust-lang/rust/search?q=smallvec .

hkratz · 2021-07-20T11:45:34Z

@hkratz You searched bad, even github will find usages of smallvec https://github.com/rust-lang/rust/search?q=smallvec .

No idea how I missed that... Not sure if it will improve performance though either. Will dig some more.

hkratz · 2021-07-21T10:17:42Z

The additional time for the deep-vector-opt is not directly related to the changed code. Instead the thin-local LTO apparently takes longer for the emitted code without zero-sized padding:

With thin-local LTO (default): 4% more time taken

$ hyperfine -w1 -r3 -s basic "~/rusttest/master/bin/rustc -C opt-level=3 src/main.rs" "~/rusttest/new/bin/rustc -C opt-level=3 src/main.rs"
Benchmark #1: ~/rusttest/master/bin/rustc -C opt-level=3 src/main.rs
  Time (mean ± σ):      5.123 s ±  0.009 s    [User: 5.015 s, System: 0.196 s]
  Range (min … max):    5.113 s …  5.131 s    3 runs

Benchmark #2: ~/rusttest/new/bin/rustc -C opt-level=3 src/main.rs
  Time (mean ± σ):      5.319 s ±  0.007 s    [User: 5.198 s, System: 0.212 s]
  Range (min … max):    5.313 s …  5.328 s    3 runs

Summary
  '~/rusttest/master/bin/rustc -C opt-level=3 src/main.rs' ran
    1.04 ± 0.00 times faster than '~/rusttest/new/bin/rustc -C opt-level=3 src/main.rs'

With thin-local LTO disabled:

$ hyperfine -w1 -r3 -s basic "~/rusttest/master/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs" "~/rusttest/new/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs"
Benchmark #1: ~/rusttest/master/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs
  Time (mean ± σ):      3.841 s ±  0.007 s    [User: 3.721 s, System: 0.128 s]
  Range (min … max):    3.834 s …  3.847 s    3 runs

Benchmark #2: ~/rusttest/new/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs
  Time (mean ± σ):      3.847 s ±  0.019 s    [User: 3.722 s, System: 0.133 s]
  Range (min … max):    3.829 s …  3.867 s    3 runs

Summary
  '~/rusttest/master/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs' ran
    1.00 ± 0.01 times faster than '~/rusttest/new/bin/rustc -C opt-level=3 -C codegen-units=1 src/main.rs'

I have no idea why that is the case but I don't think that it should block this PR. The generated LL IR code looks as expected.

Mark-Simulacrum · 2021-08-10T14:35:17Z

@bors retry spurious network

fatal: unable to access 'https://github.com/servo/servo/': GnuTLS recv error (-54): Error in the pull function.

bors · 2021-08-10T14:38:44Z

⌛ Testing commit 02295f4 with merge 14795284b7e3779b7a45eedd6cfb4a7122cf1770...

rust-log-analyzer · 2021-08-10T15:54:42Z

The job x86_64-msvc-tools failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

 1 file changed, 1 insertion(+)
fatal: Cannot prompt because user interactivity has been disabled.
remote: fatal error in commit_refs        
To https://github.com/rust-lang-nursery/rust-toolstate
 ! [remote rejected] master -> master (failure)
error: failed to push some refs to 'https://github.com/rust-lang-nursery/rust-toolstate'
Sleeping for 3 seconds before retrying push
 * branch            master     -> FETCH_HEAD
HEAD is now at b95734b (linux CI update)
[master 14ae085] (windows CI update)
 1 file changed, 1 insertion(+)
 1 file changed, 1 insertion(+)
remote: fatal error in commit_refs        
To https://github.com/rust-lang-nursery/rust-toolstate
 ! [remote rejected] master -> master (failure)
error: failed to push some refs to 'https://github.com/rust-lang-nursery/rust-toolstate'
Sleeping for 3 seconds before retrying push
 * branch            master     -> FETCH_HEAD
HEAD is now at b95734b (linux CI update)
[master 80cf565] (windows CI update)
 1 file changed, 1 insertion(+)
 1 file changed, 1 insertion(+)
remote: fatal error in commit_refs        
To https://github.com/rust-lang-nursery/rust-toolstate
 ! [remote rejected] master -> master (failure)
error: failed to push some refs to 'https://github.com/rust-lang-nursery/rust-toolstate'
Sleeping for 3 seconds before retrying push
 * branch            master     -> FETCH_HEAD
HEAD is now at b95734b (linux CI update)
[master 246f0cf] (windows CI update)
 1 file changed, 1 insertion(+)
---
Everything up-to-date
Sleeping for 3 seconds before retrying push
From https://github.com/rust-lang-nursery/rust-toolstate
 * branch            master     -> FETCH_HEAD
HEAD is now at b95734b (linux CI update)
thread 'main' panicked at 'Failed to update toolstate repository with new data', src\bootstrap\toolstate.rs:445:9
Build completed unsuccessfully in 0:03:02

bors · 2021-08-10T16:30:36Z

💔 Test failed - checks-actions

ehuss · 2021-08-10T16:46:59Z

@bors retry

Failed while pushing toolstate.

https://www.githubstatus.com/incidents/rmfrw9dfbtbp

remote: fatal error in commit_refs        
To https://github.com/rust-lang-nursery/rust-toolstate
 ! [remote rejected] master -> master (failure)
error: failed to push some refs to 'https://github.com/rust-lang-nursery/rust-toolstate'

bors · 2021-08-10T18:22:14Z

⌛ Testing commit 02295f4 with merge 97d8fddfa14b660a24ffeed2ab0784aaa379a9ff...

bors · 2021-08-10T22:37:14Z

💥 Test timed out

ehuss · 2021-08-10T22:38:26Z

@bors retry

Tests finished successfully hours ago, but bors didn't get triggered on the success. Presumably due to GitHub incident.

bors · 2021-08-11T01:36:30Z

⌛ Testing commit 02295f4 with merge 47b41b7...

bors · 2021-08-11T04:29:30Z

☀️ Test successful - checks-actions
Approved by: eddyb
Pushing 47b41b7 to master...

pnkfelix · 2021-08-11T15:49:14Z

Visiting for perf triage. Seems like the regression to deep-vector-opt is anticipated, and acceptable cost to get this in. Other than that, I don't see anything significant in the perf comparison for when this landed. Marking as perf-regression-triaged.

rust-highfive assigned estebank Jul 18, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 18, 2021

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 18, 2021

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jul 18, 2021

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 19, 2021

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 19, 2021

hkratz marked this pull request as ready for review July 21, 2021 10:18

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 10, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 10, 2021

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 10, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 10, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Aug 11, 2021

bors merged commit 47b41b7 into rust-lang:master Aug 11, 2021

rustbot added this to the 1.56.0 milestone Aug 11, 2021

hkratz deleted the rustc_codegen_llvm_dont_emit_zero_sized_padding branch August 11, 2021 04:46

pnkfelix added the perf-regression-triaged The performance regression has been triaged. label Aug 11, 2021

hkratz mentioned this pull request Aug 16, 2021

Can't get the correct return type of llvm link rust-lang/stdarch#1143

Closed

marmeladema mentioned this pull request Dec 12, 2021

AVX2 related regression introduced by rust 1.56.0 and onwards #91839

Closed

erikdesjardins mentioned this pull request Feb 25, 2024

Stop using LLVM struct types for alloca, byval, sret, and many GEPs #121577

Closed

erikdesjardins mentioned this pull request Mar 6, 2024

Stop using LLVM struct types for byval/sret #122050

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM codegen: Don't emit zero-sized padding for fields #87254

LLVM codegen: Don't emit zero-sized padding for fields #87254

hkratz commented Jul 18, 2021

rust-highfive commented Jul 18, 2021

Aaron1011 commented Jul 18, 2021

rust-timer commented Jul 18, 2021

bors commented Jul 18, 2021

bors commented Jul 18, 2021

rust-timer commented Jul 18, 2021

rust-timer commented Jul 18, 2021

hkratz commented Jul 18, 2021

bjorn3 commented Jul 19, 2021

rust-timer commented Jul 19, 2021

bors commented Jul 19, 2021

bors commented Jul 19, 2021

rust-timer commented Jul 19, 2021

SparrowLii commented Jul 19, 2021 •

edited

Loading

rust-timer commented Jul 19, 2021

kornelski commented Jul 19, 2021 •

edited

Loading

hkratz commented Jul 20, 2021 •

edited

Loading

klensy commented Jul 20, 2021

hkratz commented Jul 20, 2021

hkratz commented Jul 21, 2021 •

edited

Loading

Mark-Simulacrum commented Aug 10, 2021

bors commented Aug 10, 2021

rust-log-analyzer commented Aug 10, 2021

bors commented Aug 10, 2021

ehuss commented Aug 10, 2021 •

edited

Loading

bors commented Aug 10, 2021

bors commented Aug 10, 2021

ehuss commented Aug 10, 2021

bors commented Aug 11, 2021

bors commented Aug 11, 2021

pnkfelix commented Aug 11, 2021

LLVM codegen: Don't emit zero-sized padding for fields #87254

LLVM codegen: Don't emit zero-sized padding for fields #87254

Conversation

hkratz commented Jul 18, 2021

rust-highfive commented Jul 18, 2021

Aaron1011 commented Jul 18, 2021

rust-timer commented Jul 18, 2021

bors commented Jul 18, 2021

bors commented Jul 18, 2021

rust-timer commented Jul 18, 2021

rust-timer commented Jul 18, 2021

hkratz commented Jul 18, 2021

bjorn3 commented Jul 19, 2021

rust-timer commented Jul 19, 2021

bors commented Jul 19, 2021

bors commented Jul 19, 2021

rust-timer commented Jul 19, 2021

SparrowLii commented Jul 19, 2021 • edited Loading

rust-timer commented Jul 19, 2021

kornelski commented Jul 19, 2021 • edited Loading

hkratz commented Jul 20, 2021 • edited Loading

klensy commented Jul 20, 2021

hkratz commented Jul 20, 2021

hkratz commented Jul 21, 2021 • edited Loading

Mark-Simulacrum commented Aug 10, 2021

bors commented Aug 10, 2021

rust-log-analyzer commented Aug 10, 2021

bors commented Aug 10, 2021

ehuss commented Aug 10, 2021 • edited Loading

bors commented Aug 10, 2021

bors commented Aug 10, 2021

ehuss commented Aug 10, 2021

bors commented Aug 11, 2021

bors commented Aug 11, 2021

pnkfelix commented Aug 11, 2021

SparrowLii commented Jul 19, 2021 •

edited

Loading

kornelski commented Jul 19, 2021 •

edited

Loading

hkratz commented Jul 20, 2021 •

edited

Loading

hkratz commented Jul 21, 2021 •

edited

Loading

ehuss commented Aug 10, 2021 •

edited

Loading