Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLSLD reloc refers external symbol guard variable #145

Closed
erijo opened this issue Dec 16, 2021 · 21 comments
Closed

TLSLD reloc refers external symbol guard variable #145

erijo opened this issue Dec 16, 2021 · 21 comments
Assignees

Comments

@erijo
Copy link

erijo commented Dec 16, 2021

I have a static library that contains an object built from code that looks like this:

std::string Utilities::GetStackTrace()
{
  return to_string(boost::stacktrace::stacktrace());
}

When I try to link with this library using mold I get the following error:

(.text._ZN5boost10stacktrace6detail9to_stringB5cxx11EPKNS0_5frameEm): TLSLD reloc refers external symbol guard variable for boost::stacktrace::detail::construct_state(boost::stacktrace::detail::program_location const&)::state
@rui314 rui314 self-assigned this Dec 16, 2021
@rui314
Copy link
Owner

rui314 commented Dec 16, 2021

Which are you linking, a share object file or an executable?

@erijo
Copy link
Author

erijo commented Dec 16, 2021

I'm linking a shared library with the static one containing the to_string() call.

@erijo
Copy link
Author

erijo commented Dec 16, 2021

Here's a simple test to reproduce the error:
mold-test.tar.gz

@rui314
Copy link
Owner

rui314 commented Dec 16, 2021

The direct cause of the error is that there's a relocation of type TLSLD which refers an exported symbol (a symbol defined in a shared object file). What is odd is that when you pass -fPIC to a compiler, the compiler isn't expected to create a relocation of TLSLD type but instead create a more generic TLSGD type relocation. I don't know if it's a g++ 7's bug, but it could be.

Can you share object files of your test case?

Does this link fine with other linkers?

@erijo
Copy link
Author

erijo commented Dec 16, 2021

Yes, it seems to only happen with gcc 7.

Here is the result from building both with gcc 7 (CXX=g++-7 BOOST_DIR=../boost_1_69_0/ ~/source/mold/mold --run make libfoobar.so) and with gcc 9 (CXX=g++-9 ...):

If I remove mold --run then it works with both gcc 7 and 9.

@rui314
Copy link
Owner

rui314 commented Dec 16, 2021

It looks like the object file compiled with gcc 7 contains a symbol with the GNU_UNIQUE binding. I believe it's a deprecated feature, but I probably should support it for backward compatibility.

Do you mind if I ask you to add -fno-gnu-unique to CXXFLAGS to see if it links fine?

@erijo
Copy link
Author

erijo commented Dec 16, 2021

It fails also with that flag: mold-test-no-unique.tar.gz

@Alcaro
Copy link

Alcaro commented Dec 16, 2021

https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html

-fno-gnu-unique
On systems with recent GNU assembler and C library, the C++ compiler uses the STB_GNU_UNIQUE binding to make sure that definitions of template static data members and static local variables in inline functions are unique even in the presence of RTLD_LOCAL; this is necessary to avoid problems with a library used by two different RTLD_LOCAL plugins depending on a definition in one of them and therefore disagreeing with the other one about the binding of the symbol. But this causes dlclose to be ignored for affected DSOs; if your program relies on reinitialization of a DSO via dlclose and dlopen, you can use -fno-gnu-unique.

Doesn't look deprecated to me.

dlclose is indeed difficult.

@rui314
Copy link
Owner

rui314 commented Dec 17, 2021

I believe this is a gcc 7's bug. Here is what I found:

  • stacktrace.o contains a global STB_GNU_UNIQUE symbol and a TLSLD relocation referring that symbol as you can see below.
$ readelf --symbols --relocs stacktrace.o|grep _ZGVZN5boost10stacktrace6detail15construct_stateERKNS1_16program_locationEE5state
00000000000000ad  0000004900000014 R_X86_64_TLSLD         0000000000000000 _ZGVZN5boost10stacktrace6detail15construct_stateERKNS1_16program_locationEE5state - 4
00000000000000b8  0000004900000015 R_X86_64_DTPOFF32      0000000000000000 _ZGVZN5boost10stacktrace6detail15construct_stateERKNS1_16program_locationEE5state + 0
0000000000000a82  0000004900000015 R_X86_64_DTPOFF32      0000000000000000 _ZGVZN5boost10stacktrace6detail15construct_stateERKNS1_16program_locationEE5state + 0
    73: 0000000000000000     8 TLS     UNIQUE DEFAULT   46 _ZGVZN5boost10stacktrace6detail15construct_stateERKNS1_16program_locationEE5state
  • A symbol with STB_GNU_UNIQUE binding is a global symbol. It is actually more global than normal global symbols, as such symbols are uniquified by name even if they are in dlopen'ed DSOs loaded with the RTLD_LOCAL attribute.

  • TLSLD relocation is a special kind of relocation for a thread-local variable. A compiler is allowed to generate such relocation if and only if it knows the resulting symbol is private to an ELF module (i.e. not imported from other shared object at runtime).

  • However, a symbol with STB_GNU_UNIQUE is not guaranteed to be resolved to the symbol within the same ELF module, because they are uniquified at runtime by name.

Therefore, it's likely a bug in gcc 7 that it generates TLSLD relocations against STB_GNU_UNIQUE symbols. I think it should have generated TLSGD relocations instead. Indeed, the most recent version of gcc does that.

I believe mold is doing the right thing -- finding an error in an input file and report it to the user.

I think we have two choices:

  1. Do nothing and let users fix their compiler, or
  2. instead of reporting an error, create a wrong output file.

I personally prefer 1, but it looks like what other linkers are doing is 2. So, from a practical point of view, I'll make mold to ignore this error.

rui314 added a commit that referenced this issue Dec 17, 2021
This is a workaround for #145
@rui314
Copy link
Owner

rui314 commented Dec 17, 2021

@erijo I submitted 1baac27 as a workaround. Can you build mold and try to link your program again?

@erijo
Copy link
Author

erijo commented Dec 17, 2021

Now I get another error:

mold: libstacktrace.a(stacktrace.o):(.text._ZN5boost10stacktrace6detail9to_stringB5cxx11EPKNS0_5frameEm): TLSLD reloc refers external symbol: guard variable for boost::stacktrace::detail::construct_state(boost::stacktrace::detail::program_location const&)::state
mold: libstacktrace.a(stacktrace.o):(.text._ZN5boost10stacktrace6detail9to_stringB5cxx11EPKNS0_5frameEm): DTPOFF reloc refers external symbol: guard variable for boost::stacktrace::detail::construct_state(boost::stacktrace::detail::program_location const&)::state
collect2: error: ld returned 1 exit status

@rui314
Copy link
Owner

rui314 commented Dec 17, 2021

How about applying this patch?

diff --git a/elf/arch-x86-64.cc b/elf/arch-x86-64.cc
index f4808c5c..437be5c3 100644
--- a/elf/arch-x86-64.cc
+++ b/elf/arch-x86-64.cc
@@ -645,11 +645,11 @@ void InputSection<X86_64>::scan_relocations(Context<X86_64> &ctx) {
         sym.flags |= NEEDS_TLSLD;
       break;
     case R_X86_64_DTPOFF32:
     case R_X86_64_DTPOFF64:
       if (sym.is_imported)
-        Fatal(ctx) << *this << ": DTPOFF reloc refers external symbol: " << sym;
+        Warn(ctx) << *this << ": DTPOFF reloc refers external symbol: " << sym;
       break;
     case R_X86_64_GOTTPOFF: {
       ctx.has_gottp_rel = true;

       bool do_relax = ctx.arg.relax && !ctx.arg.shared &&

@erijo
Copy link
Author

erijo commented Dec 17, 2021

With that patch I get this:

mold: libstacktrace.a(stacktrace.o):(.text._ZN5boost10stacktrace6detail9to_stringB5cxx11EPKNS0_5frameEm): TLSLD reloc refers external symbol: guard variable for boost::stacktrace::detail::construct_state(boost::stacktrace::detail::program_location const&)::state
mold: libstacktrace.a(stacktrace.o):(.text._ZN5boost10stacktrace6detail9to_stringB5cxx11EPKNS0_5frameEm): DTPOFF reloc refers external symbol: guard variable for boost::stacktrace::detail::construct_state(boost::stacktrace::detail::program_location const&)::state
mold: libstacktrace.a(stacktrace.o):(.text._ZN5boost10stacktrace6detail9to_stringB5cxx11EPKNS0_5frameEm): DTPOFF reloc refers external symbol: boost::stacktrace::detail::construct_state(boost::stacktrace::detail::program_location const&)::state
mold: libstacktrace.a(stacktrace.o):(.text._ZN5boost10stacktrace6detail9to_stringB5cxx11EPKNS0_5frameEm): DTPOFF reloc refers external symbol: guard variable for boost::stacktrace::detail::construct_state(boost::stacktrace::detail::program_location const&)::state
mold: libstacktrace.a(stacktrace.o):(.text._ZN5boost10stacktrace6detail9to_stringB5cxx11EPKNS0_5frameEm): DTPOFF reloc refers external symbol: boost::stacktrace::detail::construct_state(boost::stacktrace::detail::program_location const&)::state

Now I get the shared library. But is it possible to hide the warnings? Even if they are correct, there's nothing I can do about them (except switching compiler, right?) so from my POV they are noise that I would like to avoid introducing in my build output.

@rui314
Copy link
Owner

rui314 commented Dec 17, 2021

Yeah, maybe we can hide the warnings, but first of all, did the produced binary work?

@erijo
Copy link
Author

erijo commented Dec 17, 2021

Yes. I've tried building my real project using mold and it works.

@rui314
Copy link
Owner

rui314 commented Dec 17, 2021

It's probably fine to remove these error checks, but let me think a bit more about it to figure out if there's a better way than simply removing a sanity check. Even if a generated binary works fine in most cases, a binary that would trigger this error message is corrupted and possibly crash at runtime. That's probably already the case with other linkers, because other linkers don't check this error conditions, though...

I'll update this bug again.

rui314 added a commit that referenced this issue Dec 20, 2021
This reverts commit 1baac27.

In order to link gcc 7's output, relaxing this error check wasn't
enough as reported on #145.
We need to remove more error checks or keep them as-is.
For now, I'll restore all the error checks, but before making another
release, I may remove all error checks to accept gcc 7's buggy
output files.
@rui314
Copy link
Owner

rui314 commented Dec 23, 2021

@MaskRay may chime in. So, it looks like there are object files in the wild that contain TLSLD relocations against non-local symbols. Because existing linkers don't check the condition, it doesn't cause any error but instead results in silently creating a corrupted binaries. Is this also the case for lld? IIRC, lld doesn't also have a check if TLSLD relocation refers a local symbol.

And if that's the case, what is your opinion on how to handle the situation? I think we can't add a strict error check as it breaks (bug-)compatibility with gcc 7.

@MaskRay
Copy link

MaskRay commented Dec 23, 2021

Therefore, it's likely a bug in gcc 7 that it generates TLSLD relocations against STB_GNU_UNIQUE symbols.

This is suspicious, but I do not call it a bug.
It is likely out of the scope of the orignal design of STB_GNU_UNIQUE in glibc in 2009.
If newer GCC don't emit the fishy assembly, then the GCC 7 behavior may indeed be a bug.

For a STT_TLS symbol, the glibc ld.so behavior may be similar to regular STB_GLOBAL, or it may have the STT_OBJECT STB_GNU_UNIQUE.
Someone interested can test it.

I think GNU ld, gold, and ld.lld just handle STB_GNU_UNIQUE like the normal STB_GLOBAL, and change the output binding.

% cat a.s
.type foo, @gnu_unique_object
foo:

.section .tbss,"awT",@nobits
.type bar, @gnu_unique_object
bar:
% as a.s -o a.o
% readelf -Ws a.o

Symbol table '.symtab' contains 3 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 OBJECT  UNIQUE DEFAULT    2 foo
     2: 0000000000000000     0 TLS     UNIQUE DEFAULT    3 bar

STB_GNU_UNIQUE is questionable and gets many complaints (https://news.ycombinator.com/item?id=21555752 https://www.openwall.com/lists/musl/2015/08/31/3 MaskRay/ccls#363 (comment) pytorch/pytorch#54245).
Most GCC packages use the default --enable-gnu-unique-object. Interested folks can ask distributions and GCC upstream to default to --disable-gnu-unique-object.
For Clang, I have stated elsewhere that I will object if anyone wants to make it output STB_GNU_UNIQUE assembly.

@rui314
Copy link
Owner

rui314 commented Dec 24, 2021

Recent versions of GCC always emit TLSGD relocations against such symbols.

So, here is my understanding:

  1. TLSLD relocs should not refer a symbol that could be resolved to a symbol in a different ELF module
  2. Multiple STB_GNU_UNIQUE symbols can be merged by the dynamic loader at runtime, so a dynamic symbol with STB_GNU_UNIQUE is not guaranteed to be resolved within the same ELF module

1 and 2 contradict, no?

@MaskRay
Copy link

MaskRay commented Dec 24, 2021

If the suspicious STT_TLS/STB_GNU_UNIQUE symbol is referenced by a TLSLD relocation, I agree this is a compiler bug.

TLSLD relocs should not refer a symbol that could be resolved to a symbol in a different ELF module

Right.

@rui314
Copy link
Owner

rui314 commented Dec 24, 2021

It looks like no one except mold has code to verify that TLSLD is actually referring a non-imported symbol. Ideally, we should add this check to lld and other linkers because it can catch a compiler bug that is otherwise hard to find. However, it's probably too late.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants