Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThinLTO exposes too many symbols #46374

Closed
alexcrichton opened this issue Nov 29, 2017 · 1 comment
Closed

ThinLTO exposes too many symbols #46374

alexcrichton opened this issue Nov 29, 2017 · 1 comment

Comments

@alexcrichton
Copy link
Member

For example given this code:

pub mod a {                 
    static mut A: i32 = 0;  
    pub fn foo() -> i32 {   
        unsafe { A }        
    }                       
                            
    pub fn set_foo(a: i32) {
        unsafe { A = a; }   
    }                       
}                           
                            
pub mod b {                 
    pub fn bar() {}         
}                           

Let's compile without ThinLTO and take a look at the symbols:

$ rustc +nightly -C codegen-units=8 foo.rs --crate-type lib
$ readelf -s libfoo.rlib
File: libfoo.rlib(foo.foo0.rcgu.o)

Symbol table '.symtab' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo0-8787f43e282added3762
     2: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    7 _ZN3foo1a1A17h6d4d0e10f3b
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    7 
     6: 0000000000000000     7 FUNC    GLOBAL DEFAULT    3 _ZN3foo1a3foo17hb1e13e720
     7: 0000000000000000     7 FUNC    GLOBAL DEFAULT    5 _ZN3foo1a7set_foo17h5e0b4

File: libfoo.rlib(foo.foo1.rcgu.o)

Symbol table '.symtab' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo1-8787f43e282added3762
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     3: 0000000000000000     1 FUNC    GLOBAL DEFAULT    3 _ZN3foo1b3bar17h8ae358bf9

File: libfoo.rlib(rust.metadata.bin)
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start

File: libfoo.rlib(foo.foo0.rcgu.bytecode.encoded)
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start

File: libfoo.rlib(foo.foo1.rcgu.bytecode.encoded)
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start

Here we're particularly interested in ZN3foo1a1A17h6d4d0e10f3b, the a::A constant from above. As expected this is LOCAL and DEFAULT, notably not exported from the object file as it's still internalized.

Let's instead compile with ThinLTO:

$ rustc +nightly -C codegen-units=8 foo.rs --crate-type lib -Z thinlto
$ readelf -s libfoo.rlib
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start
readelf: Error: Not an ELF file - it has the wrong magic bytes at the start

File: libfoo.rlib(foo.foo0-8787f43e282added376259c1adb08b80.rs.rcgu.o)

Symbol table '.symtab' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo0-8787f43e282added3762
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     4: 0000000000000000     4 OBJECT  GLOBAL HIDDEN     7 _ZN3foo1a1A17h6d4d0e10f3b
     5: 0000000000000000     7 FUNC    GLOBAL DEFAULT    3 _ZN3foo1a3foo17hb1e13e720
     6: 0000000000000000     7 FUNC    GLOBAL DEFAULT    5 _ZN3foo1a7set_foo17h5e0b4

File: libfoo.rlib(foo.foo1-8787f43e282added376259c1adb08b80.rs.rcgu.o)

Symbol table '.symtab' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS foo1-8787f43e282added3762
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     3: 0000000000000000     1 FUNC    GLOBAL DEFAULT    3 _ZN3foo1b3bar17h8ae358bf9

File: libfoo.rlib(rust.metadata.bin)

File: libfoo.rlib(foo.foo0-8787f43e282added376259c1adb08b80.rs.rcgu.bytecode.encoded)

File: libfoo.rlib(foo.foo1-8787f43e282added376259c1adb08b80.rs.rcgu.bytecode.encoded)

Oh dear! Our _ZN3foo1a1A17h6d4d0e10f3b symbol has been promoted to GLOBAL and HIDDEN. That's not good! This symbol, a::A, should stay internalized even when compiling with ThinLTO

@alexcrichton
Copy link
Member Author

I've been trying to figure out what's going on here but I'm sort of quickly coming to the end of my rope. A few points of note though...

Right now ThinLTO works by needing complete and total knowledge (AFAIK) of exported and internal symbols. Thankfully we as the compiler actually have this information so we're just handing it down to ThinLTO. Notably, we're creating our ThinLTO data with a list of symbols to preserve.

Currently, however, we're recursively walking all symbols and creating a very large GUIDPreservedSymbols set. This set is the transitive addition of each symbol followed by each symbol it then in turn references. This is not done in LLVM's source but it something I originally implemented to get the link errors to go away.

The GUIDPreservedSymbols variable is used for two purposes. First it's used to create a DeadSymbols set but it turns out that LLVM's implementation of computeDeadSymbols already has this recursive nature. Notably the DeadSymbols set is the same regardless of whether GUIDPreservedSymbols has only the symbols we passed in or all the transitive symbols they need as well. So things are looking good! This custom logic for "spider all the transitive deps" is not necessary for the first use case of GUIDPreservedSymbols.

The next use of GUIDPreservedSymbols comes later when we call the thinLTOInternalizeAndPromoteInIndex function. This function will decide, for the entire set of ThinLTO units, what functions are being exported linkage-wise and which are not. Namely the isExported function decides whether a symbol is exported or not (clever name eh?)

So it turns out that one clause of this isExported closure is to just test for membership in GUIDPreservedSymbols. Because our behavior today is to put everything into this set then this is almost always returning true! As a result we get the behavior in the OP of the issue, where internal statics are being promoted to being exported unnecessarily.


So alright this GUIDPreservedSymbols is tricky business. I've discovered that for the usage in computeDeadSymbols it's not necessary to include all transitive dependencies. For isExported, however, if we don't include all transitive dependencies then we get tons of link errors unfortunately. I have yet to determine if these link errors are a bug in LLVM or if they're a bug in how we're using ThinLTO. Unfortunately there is not an easy fix for this issue right now as far as I can tell.

I'm going to try to work with @dcci to see if he can help me drill into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant