Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce -Zvirtual-function-elimination codegen flag #96285

Merged
merged 8 commits into from
Jun 15, 2022

Conversation

flip1995
Copy link
Member

@flip1995 flip1995 commented Apr 21, 2022

Fixes #68262

This PR adds a codegen flag -Zvirtual-function-elimination to enable the VFE optimization in LLVM. To make this work, additonal information has to be added to vtables (!vcall_visibility metadata and a typeid of the trait). Furthermore, instead of just loading functions, the llvm.type.checked.load intrinsic has to be used to map functions to vtables.

For technical details of the changes, see the commit messages.

I also tested this flag on https://github.com/tock/tock on different boards to verify that this fixes the issue tock/tock#2594. This flag is able to improve the size of the resulting binary by about 8k-9k bytes by removing the unused debug print functions.

Rendered documentation update

@rust-highfive
Copy link
Collaborator

Some changes occured to rustc_codegen_gcc

cc @antoyo

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 21, 2022
@rust-highfive
Copy link
Collaborator

r? @nagisa

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 21, 2022
@tschuett
Copy link

Did you mean -Z virtual-function-elimination?

@flip1995
Copy link
Member Author

I think this flag fits better as a -C flag because it influences codegen and is not a debugging/nightly only flag. I can change it to a -Z flag, if it should start out as a nightly only flag though.

@Mark-Simulacrum
Copy link
Member

Is there a reason why we shouldn't (or LLVM shouldn't) enable this optimization by default with full LTO? Is there an expectation that it might miscompile something or otherwise cause us unexpected trouble?

let trait_ref_self = trait_ref.with_self_ty(cx.tcx, ty);
let trait_ref_self = cx.tcx.erase_regions(trait_ref_self);
let trait_def_id = trait_ref_self.def_id();
let trait_vis = cx.tcx.visibility(trait_def_id);
Copy link
Contributor

@tmiasko tmiasko Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the closest existing concept similar to vcall_visibility would be reachability, rather than visibility.

Example with an incorrectly optimized vtables showing why visibility is insufficient
trait Foo { fn foo(&self) { println!("foo") } }
trait Bar { fn bar(&self) { println!("bar") } }
trait Baz { fn baz(&self) { println!("baz") } }

impl Foo for usize {}
impl Bar for usize {}
impl Baz for usize {}

pub struct FooBox(Box<dyn Foo>);
pub struct BarBox(Box<dyn Bar>);
pub struct BazBox(Box<dyn Baz>);

pub fn make_foo() -> FooBox { FooBox(Box::new(0)) }
pub fn make_bar() -> BarBox { BarBox(Box::new(0)) }
pub fn make_baz() -> BazBox { BazBox(Box::new(0)) }

#[inline]
pub fn f(a: FooBox) { a.0.foo() }
pub fn g<T>(b: BarBox) { b.0.bar() }
#[inline]
fn h(c: BazBox) { c.0.baz() }
pub const H: fn(BazBox) = h;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I don't really have a solution for this right now. I tried looking at the reachability of the trait_ref and the members of the traits with the privacy_access_levels query. But for all of the traits and the functions in the traits (queried with tcx.vtable_entries) this just returns None, meaning it is not reachable IIUC.

I guess I would somehow have to check the reachability for the dyn Trait types. But I couldn't figure out how to do that.

Do you have ideas on how to best address this issue?

@nikic
Copy link
Contributor

nikic commented Apr 21, 2022

I think this flag fits better as a -C flag because it influences codegen and is not a debugging/nightly only flag. I can change it to a -Z flag, if it should start out as a nightly only flag though.

Yes, new flags are always added as unstable -Z flags first, and may later be stabilized as a -C flag. The "debugging options" terminology is an anachronism, the actual distinction between -C and -Z is stability.

module.module_llvm.llmod(),
llvm::LLVMModFlagBehavior::Error,
"LTOPostLink\0".as_ptr().cast(),
1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 11?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the HasModuleFlag should be checking for 1, but also adding this flag here seems super sketchy to me. This is a pretty internal thing to LTO AFAICT.

In that context, it would be great if this code had a comment with context as to why this is being done.

Copy link

@tschuett tschuett Apr 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here is based on the file tschuett linked above. In the LLVM repo that code is responsible to set up LTO. Adding this flag was missing from the Rust code that should emulate the code in LTOCodeGenerator.cpp.

The 11 argument in the HasModuleFlag is the size of the string passed to this function, not the value of the flag. The 1 argument in the AddModuleFlag function is the value of the module flag. LLVM doesn't need a string size here, because it expects a \0 terminated string in this function.

compiler/rustc_session/src/session.rs Outdated Show resolved Hide resolved
module.module_llvm.llmod(),
llvm::LLVMModFlagBehavior::Error,
"LTOPostLink\0".as_ptr().cast(),
1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the HasModuleFlag should be checking for 1, but also adding this flag here seems super sketchy to me. This is a pretty internal thing to LTO AFAICT.

In that context, it would be great if this code had a comment with context as to why this is being done.

@flip1995
Copy link
Member Author

Thanks for all the feedback. I think I addressed all the comments (I will clean up / squash the commits before this PR gets merged), except for the reachability issue, which needs a little bit more work/thinking on how to address this.

Is there a reason why we shouldn't (or LLVM shouldn't) enable this optimization by default with full LTO? Is there an expectation that it might miscompile something or otherwise cause us unexpected trouble?

Clang/LLVM doesn't set this flag by default, so I figured rustc shouldn't set this flag by default either. I don't know if there are any known issues with this optimization in LLVM

@nagisa
Copy link
Member

nagisa commented May 1, 2022

This seems reasonable to me for a first shot at the functionality. Since it is unstable, I'm not super concerned about this being partially broken in some ways as e.g. pointed out by @tmiasko above, although we should definitely make sure the information about this limitation/bug is preserved in e.g. the tracking issue.

Another thing I'm not seeing is documentation for this feature the unstable book. Please write something about this feature here, and especially please note the known limitations and/or issues as well.

@flip1995
Copy link
Member Author

flip1995 commented May 2, 2022

I added the documentation, cleaned up the commits and rebased on top of latest master.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@insilications
Copy link

Very nice work.

@nagisa
Copy link
Member

nagisa commented May 10, 2022

@bors r+ Thanks!

@bors
Copy link
Contributor

bors commented May 10, 2022

📌 Commit 798a2d927f3796c0b21fc6ee2707ce21833bfbd0 has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 10, 2022
@bors
Copy link
Contributor

bors commented May 11, 2022

⌛ Testing commit 798a2d927f3796c0b21fc6ee2707ce21833bfbd0 with merge 2c9812125037f96b4ba035183f237dfacc0f8d4a...

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 7, 2022
@rust-log-analyzer

This comment has been minimized.

Adds the virtual-function-elimination unstable compiler flag and a check
that this flag is only used in combination with -Clto. LLVM can only
apply this optimization with fat LTO.
To apply the optimization the `Virtual Function Elim` module flag has to
be set. To apply this optimization post-link the `LTOPostLink` module
flag has to be set.
This function computes a Itanium-like typeid for a trait_ref. This is
required for the VFE optimization in LLVM. It is used to map
`llvm.type.checked.load` invocations, that is loading the function from
a vtable, to the vtables this function could be from.

It is important to note that `typeid`s are not unique. So multiple
vtables of the same trait can share `typeid`s.
Add the intrinsic

declare {i8*, i1} @llvm.type.checked.load(i8* %ptr, i32 %offset, metadata %type)

This is used in the VFE optimization when lowering loading functions
from vtables to LLVM IR. The `metadata` is used to map the function to
all vtables this function could belong to. This ensures that functions
from vtables that might be used somewhere won't get removed.
This adds the typeid and `vcall_visibility` metadata to vtables when the
-Cvirtual-function-elimination flag is set.

The typeid is generated in the same way as for the
`llvm.type.checked.load` intrinsic from the trait_ref.

The offset that is added to the typeid is always 0. This is because LLVM
assumes that vtables are constructed according to the definition in the
Itanium ABI. This includes an "address point" of the vtable. In C++ this
is the offset in the vtable where information for RTTI is placed. Since
there is no RTTI information in Rust's vtables, this "address point" is
always 0. This "address point" in combination with the offset passed to
the `llvm.type.checked.load` intrinsic determines the final function
that should be loaded from the vtable in the
`WholeProgramDevirtualization` pass in LLVM. That's why the
`llvm.type.checked.load` intrinsics are generated with the typeid of the
trait, rather than with that of the function that is called. This
matches what `clang` does for C++.

The vcall_visibility metadata depends on three factors:

1. LTO level: Currently this is always fat LTO, because LLVM only
   supports this optimization with fat LTO.
2. Visibility of the trait: If the trait is publicly visible, VFE
   can only act on its vtables after linking.
3. Number of CGUs: if there is more than one CGU, also vtables with
   restricted visibility could be seen outside of the CGU, so VFE can
   only act on them after linking.

To reflect this, there are three visibility levels: Public, LinkageUnit,
and TranslationUnit.
The offset in the llvm.type.checked.load intrinsic differs on 32 bit platforms
@flip1995
Copy link
Member Author

I updated the 32-bit test with:

diff --git a/src/test/codegen/virtual-function-elimination-32bit.rs b/src/test/codegen/virtual-function-elimination-32bit.rs
index dd200278bb1..6f963363a99 100644
--- a/src/test/codegen/virtual-function-elimination-32bit.rs
+++ b/src/test/codegen/virtual-function-elimination-32bit.rs
@@ -31,5 +31,5 @@ pub fn main() {
     taking_t(&s);
 }

-// CHECK: ![[TYPE0]] = !{i64 0, !"[[MANGLED_TYPE0]]"}
+// CHECK: ![[TYPE0]] = !{i32 0, !"[[MANGLED_TYPE0]]"}
 // CHECK: ![[VCALL_VIS0]] = !{i64 2}

and rebased on latest master (to test locally again).

No other changes.

@bors r=nagisa

@bors
Copy link
Contributor

bors commented Jun 14, 2022

📌 Commit 195f208 has been approved by nagisa

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 14, 2022
@bors
Copy link
Contributor

bors commented Jun 14, 2022

⌛ Testing commit 195f208 with merge 2d1e075...

@bors
Copy link
Contributor

bors commented Jun 15, 2022

☀️ Test successful - checks-actions
Approved by: nagisa
Pushing 2d1e075 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 15, 2022
@bors bors merged commit 2d1e075 into rust-lang:master Jun 15, 2022
@rustbot rustbot added this to the 1.63.0 milestone Jun 15, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2d1e075): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
5.2% 5.2% 1
Regressions 😿
(secondary)
4.5% 4.5% 1
Improvements 🎉
(primary)
-2.0% -2.0% 1
Improvements 🎉
(secondary)
-2.4% -2.4% 1
All 😿🎉 (primary) 1.6% 5.2% 2

Cycles

This benchmark run did not return any relevant results for this metric.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change

  2. number of relevant changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support whole program devirtualization