Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive drop glue duplication in vtables can blow up compile times #88438

Open
jonas-schievink opened this issue Aug 28, 2021 · 3 comments
Open
Labels
A-codegen Area: Code generation A-destructors Area: Destructors (`Drop`, …) I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jonas-schievink
Copy link
Contributor

Currently, it seems like every CGU that contains a &ConcreteType -> &dyn Trait cast will instantiate the corresponding vtable and drop glue of ConcreteType. This can result in massive LLVM IR bloat that then slows down compile times.

We have observed this in rust-analyzer (rust-lang/rust-analyzer#10065, also Zulip discussion here), where the drop glue of a single type RootDatabase is responsible for over 40% of LLVM IR in some downstream crates.

@jonas-schievink jonas-schievink added A-codegen Area: Code generation A-destructors Area: Destructors (`Drop`, …) I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 28, 2021
matklad added a commit to matklad/repros that referenced this issue Aug 30, 2021
@matklad
Copy link
Member

matklad commented Aug 30, 2021

Further investigation shows that the trait objects are an accidental detail here.

I've put a repro for the problem here:

https://github.com/matklad/repros/tree/f6ce0cb8cfff123ca95d3caa3feb874328e39fe5/drop-monomorphisation

It contains three crates.

a defines a generic type:

#![no_std]

use core::fmt;

pub struct MyArc<T> {
    _value: T,
}

impl<T> Drop for MyArc<T> {
    fn drop(&mut self) {
        self.drop_slow()
    }
}

impl<T> MyArc<T> {
    #[inline(never)]
    fn drop_slow(&mut self) {
        unsafe {
            print(format_args!("dropping {:?}", core::any::type_name::<T>()));
        }
    }
}

#[allow(improper_ctypes)]
extern "C" {
    fn print(_: fmt::Arguments);
}

b defines a concrete type using some specific instantiation of a's type:

#![no_std]

pub struct S {
    _inner: a::MyArc<i32>,
}

c use the type from b:

#![no_std]

pub extern "Rust" fn f(_: b::S) {
}

If I compile the crates as

#!/bin/sh
rustc -C opt-level=3 --crate-type rlib a.rs
rustc -C opt-level=3 --crate-type rlib --extern a=liba.rlib b.rs
rustc -C opt-level=3 --crate-type rlib --extern b=libb.rlib -L dependency=. \
  -Zsymbol-mangling-version=v0 --emit=llvm-ir -Cno-prepopulate-passes -Cpasses=name-anon-globals \
  c.rs

I will see that llvm IR for c contains Arc::<i32>::drop_slow monomporhpisation. I would expect that to happen in b, because that's the place where we instantiate generic code.

I can get the desired behavior by using Jonas' Gambit in crate b:

use core::mem::ManuallyDrop;

pub struct S {
    inner: ManuallyDrop<a::MyArc<i32>>,
}

impl Drop for S {
    fn drop(&mut self) {
        unsafe {
            ManuallyDrop::drop(&mut self.inner)
        }
    }
}
$ exa -l *.ll
.rw-r--r-- 1.5k matklad 30 Aug 13:13 c.ll.after
.rw-r--r--  14k matklad 30 Aug 13:14 c.ll.before

An important detail here is that there maybe many downstream c crates, so the bloat is multiplicative rather than additive.

To come back to the original issue, this hits us hard in rust-analyzer in two ways.

First, we have this RootDatabase type defined in the mid layer of rust-analyzer. RootDatabase stores all of the state -- internally, it's a dozens of hashmap storing complex, heavily-Arced types. The drop for it was monomorphised into every downstream crate. In the case of fairly non-trivial ide_ssr crate, that glue accounted for 40% llvm IR generated. ManuallyDrop was a fairly small change to fix that.

Second, we use chalk, which contains a lot of very generic and very recursive data structures. We wrap all this into a non-generic Type, which suffers the same issue as RootDatabase. However, just adding manually drop here doesn't help. The culprit are methods like autoderef, which return an impl Iterator, whose internal state closes over chalk types. We can't add ManuallyDrop for this type in a nice way, and it causes a similar issue.

@csmoe
Copy link
Member

csmoe commented Oct 9, 2021

Seems duplicate of #84175

I forced the drop glue generated locally in #89660 as @Kobzol suggested, then "re-bench" on @matklad 's repros with -Zshare-generics:

#!/bin/sh
rustc +dev -C opt-level=3 --crate-type rlib a.rs -Zshare-generics
rustc +dev -C opt-level=3 --crate-type rlib --extern a=liba.rlib b.rs -Zshare-generics
rustc +dev -C opt-level=3 --crate-type rlib --extern b=libb.rlib -L dependency=. \
  -Zsymbol-mangling-version=v0 -Zshare-generics --emit=llvm-ir -Cno-prepopulate-passes -Cpasses=name-anon-globals \
  c.rs

(dev is my local build of #89660 rustc)

-rw-r--r--  1 mac  staff   1999 Oct  9 13:01 c.ll.drop_glue_locally
-rw-r--r--  1 mac  staff  16279 Oct  9 12:59 c.ll.drop_glue_no_locally

We got the nearly same result as ManuallyDrop hack, but don't know why the perf-run regressed a lot :(

@matklad
Copy link
Member

matklad commented Oct 19, 2021

#64140 is related (maybe exactly the same issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-destructors Area: Destructors (`Drop`, …) I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants