Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mesa linked with mold creates symbols with wrong visibility #197

Closed
icecream95 opened this issue Dec 26, 2021 · 9 comments
Closed

Mesa linked with mold creates symbols with wrong visibility #197

icecream95 opened this issue Dec 26, 2021 · 9 comments

Comments

@icecream95
Copy link

When Mesa is compiled with glvnd enabled using mold, programs crash on startup with errors like this:

error: symbol lookup error: undefined symbol: eglGetProcAddress (fatal)

When linked with ld.bfd, readelf shows an entry like this for the function in libEGL_mesa.so:

  2433: 0000000000014278   264 FUNC    LOCAL  DEFAULT   11 eglGetProcAddress

ld.lld gives something similar:

   628: 00000000000348b8   264 FUNC    LOCAL  HIDDEN    12 eglGetProcAddress

But mold creates a symbol with GLOBAL visibility, which I think causes the bug because it overrides the eglGetProcAddress function in glvnd.

  2356: 0000000000026e38   264 FUNC    GLOBAL HIDDEN    16 eglGetProcAddress

The original object file lists the function as GLOBAL and HIDDEN:

   831: 0000000000000000   264 FUNC    GLOBAL HIDDEN   268 eglGetProcAddress

elflink.c in binutils mentions:

/* XXX: The ABI draft says the linker must turn hidden and
   internal symbols into STB_LOCAL symbols when producing the
   DSO. However, if ld.so honors st_other in the dynamic table,
   this would not be necessary.  */

It seems that the BFD linker does this whenever it is passed the -shared option.

@rui314
Copy link
Owner

rui314 commented Dec 26, 2021

I knew this difference but didn't recognize as an issue because I thought that the dynamic linker would ignore HIDDEN symbols.

Thank you for your excerpt from binutils. I looks like my recognition was wrong. I'll investigate it a bit more and make a fix.

@rui314
Copy link
Owner

rui314 commented Dec 26, 2021

Excerpt from the spec (link):

STV_HIDDEN

A symbol defined in the current component is hidden if its name is not visible to other components. Such a symbol is necessarily protected. This attribute is used to control the external interface of a component. An object named by such a symbol may still be referenced from another component if its address is passed outside.

A hidden symbol contained in a relocatable object is either removed or converted to STB_LOCAL binding by the link-editor when the relocatable object is included in an executable file or shared object.

@rui314
Copy link
Owner

rui314 commented Dec 26, 2021

@icecream95 Can you share your libEGL_mesa.so built by mold and ld.lld?

I don't think mold puts a hidden symbol to .dynsym. It may put a hidden symbol as GLOBAL to .symtab, but the dynamic table doesn't use that table for dynamic linking, it shouldn't because a problem. So I wonder what is wrong with mold's output.

@icecream95
Copy link
Author

I'm doing some more debugging, and it appears that the actual issue is related to TLS, and that the symbol tables weren't causing the issues.

@icecream95
Copy link
Author

Mesa uses this variable to store the current EGL context.:

static __thread __attribute__((tls_model("initial-exec"))) const _EGLThreadInfo *_egl_TLS;

Using watchpoints, I found that when libEGL is linked with mold it has the same address as this variable declared in the ApiTrace eglretrace binary:

__thread Context *currentContextPtr;

@icecream95
Copy link
Author

icecream95 commented Dec 26, 2021

Here is a small example that reproduces the issue:

bin.c:

#include <stdio.h>

void lib_set(int);

__thread int bin;

int main(void)
{
    bin = 1;
    lib_set(2);
    printf("%i\n", bin);
}

lib.c:

static __thread __attribute__((tls_model("initial-exec"))) int lib;

void lib_set(int x)
{
    lib = x;
}
$ gcc -shared lib.c -o liblib.so; gcc bin.c -o bin -L. -llib; LD_LIBRARY_PATH=. ./bin
1
$ mold --run gcc -shared lib.c -o liblib.so; gcc bin.c -o bin -L. -llib; LD_LIBRARY_PATH=. ./bin
2

When linked with ld.bfd, the shared library has a R_AARCH64_TLS_TPREL64 relocation, but this is missing for mold.

EDIT: Added initial-exec back, otherwise it is broken even without mold

@rui314
Copy link
Owner

rui314 commented Dec 26, 2021

@icecream95 Thank you for sharing a small test case! I can reproduce it on my machine. Looking...

@rui314 rui314 closed this as completed in d116113 Dec 26, 2021
@rui314
Copy link
Owner

rui314 commented Dec 26, 2021

Fixed the issue in the above patch.

@icecream95
Copy link
Author

Thank you, everything works now!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants