Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading libepoxy from 1.5.5 to 1.5.7 results in Xorg crashing on boot #252

Closed
Ernest1338 opened this issue May 13, 2021 · 29 comments
Closed

Comments

@Ernest1338
Copy link

Ernest1338 commented May 13, 2021

Hello,

Basic info:

  • OS: Manjaro Linux KDE
  • Kernel: 5.11.18-1-MANJARO
  • GPU: Radeon R9 380 with AMDGPU drivers

I encountered this issue after an recent Manjaro update which amongst other things updated the libepoxy version from 1.5.5 to 1.5.7 which (as I later found) was the package causing my Xorg to crash on boot.
Using the git bisect I managed to find the problematic commit, which is: dbfa4b2

This is the Xorg log after it crashed:

[  5840.928] (WW) Failed to open protocol names file lib/xorg/protocol.txt
[  5840.929] 
X.Org X Server 1.20.11
... a lot more stuff (I can post if needed)
[  5841.290] (II) Initializing extension XFree86-DRI
[  5841.290] (II) Initializing extension DRI2
[  5841.291] (II) AMDGPU(0): Setting screen physical size to 1377 x 285
[  5841.311] (EE) 
[  5841.311] (EE) Backtrace:
[  5841.311] (EE) 0: /usr/lib/Xorg (xorg_backtrace+0x53) [0x55f225222fd3]
[  5841.311] (EE) 1: /usr/lib/Xorg (0x55f2250dc000+0x151df5) [0x55f22522ddf5]
[  5841.311] (EE) 2: /usr/lib/libc.so.6 (0x7f2164329000+0x3cf80) [0x7f2164365f80]
[  5841.311] (EE) 3: /usr/lib/libc.so.6 (gsignal+0x145) [0x7f2164365ef5]
[  5841.311] (EE) 4: /usr/lib/libc.so.6 (abort+0x116) [0x7f216434f862]
[  5841.311] (EE) 5: /usr/lib/libc.so.6 (0x7f2164329000+0x26747) [0x7f216434f747]
[  5841.311] (EE) 6: /usr/lib/libc.so.6 (0x7f2164329000+0x35646) [0x7f216435e646]
[  5841.311] (EE) 7: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (0x7f216485f000+0x8fd3) [0x7f2164867fd3]
[  5841.311] (EE) 8: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (0x7f216485f000+0x933a) [0x7f216486833a]
[  5841.311] (EE) 9: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (0x7f216485f000+0x1534d) [0x7f216487434d]
[  5841.311] (EE) 10: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (0x7f216485f000+0x1746a) [0x7f216487646a]
[  5841.311] (EE) 11: /usr/lib/xorg/modules/drivers/amdgpu_drv.so (0x7f216485f000+0x19127) [0x7f2164878127]
[  5841.311] (EE) 12: /usr/lib/Xorg (MapWindow+0x251) [0x55f22517f871]
[  5841.311] (EE) 13: /usr/lib/Xorg (0x55f2250dc000+0x39619) [0x55f225115619]
[  5841.311] (EE) 14: /usr/lib/libc.so.6 (__libc_start_main+0xd5) [0x7f2164350b25]
[  5841.311] (EE) 15: /usr/lib/Xorg (_start+0x2e) [0x55f2251165de]
[  5841.311] (EE) 
[  5841.311] (EE) 
Fatal server error:
[  5841.311] (EE) Caught signal 6 (Aborted). Server aborting
[  5841.311] (EE) 
[  5841.311] (EE) 
Please consult the The X.Org Foundation support 
         at http://wiki.x.org
 for help. 
[  5841.311] (EE) Please also check the log file at "/home/my_username/.local/share/xorg/Xorg.1.log" for additional information.
[  5841.311] (EE) 
[  5841.327] (EE) Server terminated with error (1). Closing log file.

Link to the Manjaro forum discussing the issue: https://forum.manjaro.org/t/upgrading-libepoxy-from-1-5-5-to-1-5-7-results-in-xorg-crashing-on-boot/66195

@ebassi
Copy link
Collaborator

ebassi commented May 13, 2021

I guess we're going to need the opinion of @nwnk and @ya-isakov on this.

I can do a revert and publish 1.5.8, but first I'd like some confirmation from other distributions using epoxy 1.5.7, to understand if this is a specific regression or a build issue somewhere else.

@ya-isakov
Copy link
Contributor

Hmm, I can see in log, posted in Manjaro forum, this strange line
[ 5840.986] (EE) AMDGPU(0): glGetString() returned NULL, your GL is broken

@ya-isakov
Copy link
Contributor

glGetString is handled by epoxy_get_bootstrap_proc_address, which is returning epoxy_gl_dlsym in most cases, which, in turn, loads GL library (libOpenGL.so, if available, or libGL.so), and will try do dlsym of geGetString on it. In case of error, there should be a message "glGetString not found", which I can't see in the logs... so, honestly, I do not understand, what could go wrong here.

@Ernest1338, could you, please, check that you have libOpenGL.so.0 in your system, and maybe rename it to something else, temporary, and try to run Xorg with libepoxy-1.5.7?

@ya-isakov
Copy link
Contributor

ya-isakov commented May 13, 2021

@Ernest1338 Also, it seems that if Xorg cannot load Glamor, it's falling back to ShadowFB, so this could prevent us from seeing a proper trace. Could you, please, also try to run Xorg with libepoxy-1.5.7, libOpenGL.so.0 present (if it is in the system), and ShadowFB disabled in config of xorg? I think, the proper way is to put

Option "ShadowFB" "off"

somewhere in xorg config. Please, post a full log with this option disabled, but first, make sure that there is no

amdgpu_glamor_pre_init returned FALSE, using ShadowFB

in the log anymore.

@ya-isakov
Copy link
Contributor

And yes, we need more testing on other distros, as Manjaro is based on Arch linux, but I cannot find any bug report on Arch bugtracker

@Ernest1338
Copy link
Author

So I checked and libOpenGL.so.0 is present both in /usr/lib32/ and /usr/lib/.
I'm gonna rename it and try running Xorg in a minute.

@ya-isakov
Copy link
Contributor

@Ernest1338 Actually, I might have a wild idea... Who is a vendor of your libGL.so? libglvnd or AMD? I'm just thinking that maybe libOpenGL.so, which is basically a glvnd wrapper, could not find a proper vendor implementation for AMD, but libGL.so is AMD-provider library, that's why it works for older versions of libepoxy, as older versions were forcefully using libGL.so for anything.

@Ernest1338
Copy link
Author

I renamed the libOpenGL.so.0, updated the libepoxy and rebooted. Xorg still crashed.
Then I checked and the libOpenGL.so.0 was back, so it got recreated somewhere during the startup?

I attached the log file but I don't think it changed.
Xorg.0.log

@ya-isakov
Copy link
Contributor

Yeah, it's still the same. I think, that my idea about glvnd which cannot find any vendor, could be the only cause. Could you, please, check, from which package are your libOpenGL.so.0 and libGL.so are from?

@Ernest1338
Copy link
Author

Sorry, but I never done this before, which command should I use to check that?

@ya-isakov
Copy link
Contributor

I have no idea, sorry, how to do it in Manjaro

@Ernest1338
Copy link
Author

Ok, so I used the pkgfile tool and I think that worked.
This is the output:

  • LibOpenGL.so.0:
extra/libglvnd
multilib/lib32-libglvnd
  • libGL.so:
extra/libglvnd
community/cuda-tools
community/teamspeak3
multilib/lib32-libglvnd

@ya-isakov
Copy link
Contributor

So, as conclusion - from the logs, it seems that Xorg cannot load glamoregl, because GL_VENDOR is empty. I thought, that something wrong happened with loading glGetString symbol, but no, NULL is returned by this function. It seems that both libGL.so and libOpenGL.so are provided by libglvnd, and if libGL.so is working (and it is working, as libepoxy 1.5.5 is using it in place of libOpenGL.so), then I run out of ideas, why GL_VENDOR is empty.

@ya-isakov
Copy link
Contributor

It could be libglvnd bug, as libepoxy's task is to load libraries, and dispatch GL calls to them. As there are no error messages in Xorg.log related to any errors in lib loading, then it could be some incompatibility between libOpenGL.so and Xorg. Could you, please, report this bug to libglvnd repo, and see if they can find a cause?

@Ernest1338
Copy link
Author

Sure, I'll do that then. I'm not sure what title should I use in the bug report, maybe something like: "Possible incompatibility between libOpenGL.so and Xorg causing Xorg to crash on boot" or maybe you can suggest something something better?

@ya-isakov
Copy link
Contributor

ya-isakov commented May 13, 2021

Yeah, it's fine, but, please, post this log line from your log

[ 5840.986] (EE) AMDGPU(0): glGetString() returned NULL, your GL is broken

This is my main suspect, why it could be in libglvnd, as libepoxy is loading glGetString from libOpenGL.so (in 1.5.7), and from libGL.so (in 1.5.5) If this function returns NULL, it means that the cause could be in libOpenGL.so. And as you checked, both libraries are provided by libglvnd.

P.S. Oh, and BTW, there is no need to reboot, when you have no Xorg server running, and just renamed libOpenGL.so.0. You can run Xorg immediately after renaming this library to something else, just to check if the problem is in libOpenGL.so.0

@Ernest1338
Copy link
Author

As you can see I made an issue as you said.
Tell me if I should edit something.
Also, can you help answering the question posted by aaronp24?

@ya-isakov
Copy link
Contributor

Yeah, adding info there, thank you :)

@ya-isakov
Copy link
Contributor

ya-isakov commented May 13, 2021

@nwnk
If libglvnd is returning NULL, because there is no current context, could it be caused by this:

  1. Xorg is loading libGL.so, for GLX extension
  2. Xorg is using libepoxy for EGL init, and making context, which is set in loaded libGL.so
  3. Xorg is using libepoxy for glGetString, which is loading libOpenGL.so, essentially losing context, which is set in libGL.so?

P.S. I do not really understand how libEGL loads libGL/libEGL for creating context in them...

@philmmanjaro
Copy link

@ya-isakov
Copy link
Contributor

@philmmanjaro if it's a context problem, then libglvnd is probably not a cause, the context could be lost in libepoxy

@Ernest1338
Copy link
Author

I don't know if that helps, but upgrading lib32-libepoxy to 1.5.7 works just fine.

@philmmanjaro
Copy link

Maybe there was a mismatch at some point. However with today's stable snap we have now the following packages:

libepoxy repository : extra
Stable : 1.5.7-1
Testing : 1.5.7-1
Unstable : 1.5.7-1

lib32-libepoxy repository : multilib
Stable : 1.5.7-1
Testing : 1.5.7-1
Unstable : 1.5.7-1

@Ernest1338
Copy link
Author

I installed today's update, but the issue still stands. (lib32-libepoxy is at 1.5.7)

@ebassi
Copy link
Collaborator

ebassi commented May 21, 2021

the context could be lost in libepoxy

Epoxy doesn't have a context: it's entirely up to the caller to have one and make it current.

@ebassi
Copy link
Collaborator

ebassi commented May 21, 2021

Considering that #253 is happening in Fedora, I'm tempted to just revert the commit, at this point.

ebassi added a commit that referenced this issue May 21, 2021
Commit dbfa4b2 has introduced a string of regressions in the X server
and KWin.

This reverts commit dbfa4b2.

See: #252
@evelikov
Copy link

Note the offending commit also breaks apitrace/gtk4 - see #240

@ya-isakov
Copy link
Contributor

I think that yes, it should be reverted, I just do not understand, how usage of different lib (libOpenGL.so vs libGL.so) broke OpenGL context, if libepoxy is not doing anything with it.

@ebassi
Copy link
Collaborator

ebassi commented May 21, 2021

Okay, let's close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants