Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hack to fix singleplayer crashing when toggling fullscreen. #1037

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

namtsui
Copy link

@namtsui namtsui commented Apr 23, 2020

glConfig is a global variable visible to tr_{model,init}.cpp and not
visible to cl_main.cpp, which has its own global cls.glconfig. When
RE_BeginRegistration is being called while the video subsystem is
restarting, explicitly memset glConfig to 0. InitOpenGL depends on
glConfig.vidWidth == 0.

In the codemp/ version, Sys_UnloadDll (dlclose) is sufficient to
remove glConfig, which gets reset to 0 later on. In the code/ version,
even after Sys_UnloadDll (dlclose), the old glConfig and vidWidth
remain, so InitOpenGL fails.

@namtsui
Copy link
Author

namtsui commented Apr 23, 2020

Fixes #1036

Copy link
Member

@xycaleth xycaleth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d rather we don’t introduce more hacks. Why do we need to memset on restart? What’s the actual cause of the crash?

@namtsui
Copy link
Author

namtsui commented Apr 23, 2020

if ( glConfig.vidWidth == 0 )

This line falls through to GL_SetDefaultState() because glConfig.vidWidth is never reset to 0.

This is the message printed upon crashing:
IN_Init called before SDL_Init( SDL_INIT_VIDEO )

It seems like the codebase relies on dlclose(rdsp-vanilla.so) and dlopen to reset glConfig.vidWidth to 0.

Sys_UnloadDll (rendererLib);

Setting a breakpoint here on the equivalent multiplayer file shows that glConfig, correctly, goes away after dlclose().

multiplayer:

(gdb) run
Starting program: /usr/local/share/JediAcademy/openjk 
[New thread 332429]

Thread 1 hit Breakpoint 2, CL_ShutdownRef (restarting=qtrue) at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/client/cl_main.cpp:2276
2276			Sys_UnloadDll (rendererLib);
(gdb) print glConfig
$1 = {renderer_string = 0xc144e084c7c '\337' <repeats 199 times>, <incomplete sequence \337>..., vendor_string = 0xc14d2a48a7d "X.Org", version_string = 0xc14072b9e80 "3.1 Mesa 19.2.8", extensions_string = 0xc14427ea3f0 "GL_ARB_multisample GL_EXT_abgr GL_EXT_bgra GL_EXT_blend_color GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_copy_texture GL_EXT_subtexture GL_EXT_texture_object GL_EXT_vertex_array GL_EXT_compiled_"..., maxTextureSize = 16384, maxActiveTextures = 8, maxTextureFilterAnisotropy = 16, colorBits = 24, depthBits = 24, stencilBits = 8, deviceSupportsGamma = qtrue, textureCompression = TC_S3TC_DXT, textureEnvAddAvailable = qtrue, clampToEdgeAvailable = qtrue, vidWidth = 1024, vidHeight = 768, displayFrequency = 0, isFullscreen = qtrue, stereoEnabled = qfalse}

next in emacs gdb

(gdb) print glConfig
No symbol "glConfig" in current context.

single player shows glConfig is still around after dlclose():

(gdb) run
Starting program: /usr/local/share/JediAcademy/openjk_sp 
[New thread 305138]

Thread 1 hit Breakpoint 1, CL_ShutdownRef (restarting=qtrue) at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/code/client/cl_main.cpp:904
904			Sys_UnloadDll (rendererLib);
(gdb) print glConfig
$1 = {renderer_string = 0x46e18a8147c '\337' <repeats 199 times>, <incomplete sequence \337>..., vendor_string = 0x46de9b38a7d "X.Org", version_string = 0x46d827ae580 "3.1 Mesa 19.2.8", extensions_string = 0x46d98f45000 "GL_ARB_multisample GL_EXT_abgr GL_EXT_bgra GL_EXT_blend_color GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_copy_texture GL_EXT_subtexture GL_EXT_texture_object GL_EXT_vertex_array GL_EXT_compiled_"..., maxTextureSize = 16384, maxActiveTextures = 8, maxTextureFilterAnisotropy = 16, colorBits = 24, depthBits = 24, stencilBits = 8, deviceSupportsGamma = qtrue, textureCompression = TC_S3TC_DXT, textureEnvAddAvailable = qtrue, textureFilterAnisotropicAvailable = qfalse, clampToEdgeAvailable = qtrue, vidWidth = 1024, vidHeight = 768, displayFrequency = 0, doStencilShadowsInOneDrawcall = qtrue, isFullscreen = qtrue, stereoEnabled = qfalse, isRestarting = qfalse}

next in emacs gdb

(gdb) print glConfig
$2 = {renderer_string = 0x46e18a8147c '\337' <repeats 199 times>, <incomplete sequence \337>..., vendor_string = 0x46de9b38a7d "X.Org", version_string = 0x46d827ae580 "3.1 Mesa 19.2.8", extensions_string = 0x46d98f45000 "GL_ARB_multisample GL_EXT_abgr GL_EXT_bgra GL_EXT_blend_color GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_copy_texture GL_EXT_subtexture GL_EXT_texture_object GL_EXT_vertex_array GL_EXT_compiled_"..., maxTextureSize = 16384, maxActiveTextures = 8, maxTextureFilterAnisotropy = 16, colorBits = 24, depthBits = 24, stencilBits = 8, deviceSupportsGamma = qtrue, textureCompression = TC_S3TC_DXT, textureEnvAddAvailable = qtrue, textureFilterAnisotropicAvailable = qfalse, clampToEdgeAvailable = qtrue, vidWidth = 1024, vidHeight = 768, displayFrequency = 0, doStencilShadowsInOneDrawcall = qtrue, isFullscreen = qtrue, stereoEnabled = qfalse, isRestarting = qfalse}

@namtsui
Copy link
Author

namtsui commented Apr 23, 2020

My guess as to a real fix might be to make the singleplayer codebase more similar to multiplayer codebase. One of the biggest differences I see is singleplayer's cl_main:

singleplayer:

refexport_t re;

multiplayer:

refexport_t *re = NULL;

I tried to refactor to make singleplayer define re as a pointer, but this might be a red herring. It did not fix the problem.

from dlclose(3):

If multiple calls to dlopen() have been done on this object or the object is a dependency of another object then the object is removed when its reference count drops to zero.

I only see a single call to dlopen so maybe some dependency is keeping the reference count > 0? Is is true that the codebase relies on dlclose and dlopen to reset glConfig to 0?

Somewhat related is this suggestion:

If your library is intended to reset its state on close/reopen, aiui you should do so explicitly.

Source: android/ndk#887 (comment)

It is strange that multiplayer fullscreening works but singleplayer does not, since multiplayer does not have my hack.

@namtsui
Copy link
Author

namtsui commented Apr 23, 2020

Useful breakpoint set here for multiplayer after toggling fullscreen. It shows how glConfig is a global that should be initialized to 0 at some point. I can't really pinpoint the exact point, but maybe after the dlclose -> dlopen sequence?

windowDesc_t windowDesc = { GRAPHICS_API_OPENGL };

--- Common Initialization Complete ---
SDL Version Compiled: 2.0.12
SDL Version Linked: 2.0.12
Opening IP socket: localhost:29070
tty]warning: Temporarily disabling breakpoints for unloaded shared library "/usr/local/share/JediAcademy/rd-vanilla.so"
----- Initializing Renderer ----
Trying to load "rd-vanilla.so" from "/usr/local/share/JediAcademy"...
tty][New thread 590553]

Thread 1 hit Breakpoint 1, InitOpenGL ()
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/rd-vanilla/tr_init.cpp:796
796                     windowDesc_t windowDesc = { GRAPHICS_API_OPENGL };
(gdb) print glConfig
$1 = {renderer_string = 0x0, vendor_string = 0x0, version_string = 0x0, 
  extensions_string = 0x0, maxTextureSize = 0, maxActiveTextures = 0, 
  maxTextureFilterAnisotropy = 0, colorBits = 0, depthBits = 0, 
  stencilBits = 0, deviceSupportsGamma = qfalse, textureCompression = TC_NONE, 
  textureEnvAddAvailable = qfalse, clampToEdgeAvailable = qfalse, 
  vidWidth = 0, vidHeight = 0, displayFrequency = 0, isFullscreen = qfalse, 
  stereoEnabled = qfalse}
(gdb) bt
#0  InitOpenGL ()
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/rd-vanilla/tr_init.cpp:796
#1  0x00000fd80c89464f in R_Init ()
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/rd-vanilla/tr_init.cpp:1782
#2  0x00000fd80c8a162b in RE_BeginRegistration (
    glconfigOut=0xfd51b2a0c08 <cls+405592>)
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/rd-vanilla/tr_model.cpp:1630
#3  0x00000fd51a55d2e9 in CL_InitRenderer ()
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/client/cl_main.cpp:2288
#4  0x00000fd51a55817d in CL_StartHunkUsers ()
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/client/cl_main.cpp:2318
#5  0x00000fd51a55953e in CL_Vid_Restart_f ()
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/client/cl_main.cpp:1227
#6  0x00000fd51a473e01 in Cmd_ExecuteString (text=0x7f7fffff6e80 "vid_restart")
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/qcommon/cmd.cpp:846
#7  0x00000fd51a47420c in Cbuf_Execute ()
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/qcommon/cmd.cpp:253
#8  0x00000fd51a47cb0d in Com_Frame ()
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/codemp/qcommon/common.cpp:1533
#9  0x00000fd51a5d32b7 in main (argc=1, argv=0x7f7fffff7848)
    at /usr/ports/pobj/openjk-0.0.0.20191129/OpenJK-eed60925ad1b0d513d3747264f3bf98615fa4b2a/shared/sys/sys_main.cpp:812

@xycaleth
Copy link
Member

I did some digging around and found that dlclose doesn't necessarily unload the shared library immediately. In some cases, the library is intentionally kept around. As a result, when the library is loaded up again none of the global variables get initialised.

There is apparently a way of avoiding this by compiling with -fno-gnu-unique - I think this works in gcc as well as clang. Can you give that a try and see if it fixes the issue? I don't have access to a Linux machine to test the change out unfortunately.

@namtsui
Copy link
Author

namtsui commented Apr 27, 2020

OpenBSD port uses clang, but clang didn't compile complaining about not recognizing -fno-gnu-unique. I switched the port to use gcc and got it to compile, but the crash remains with IN_Init called before SDL_Init( SDL_INIT_VIDEO ).

Full build log where cc -> egcc (gcc 8.3.0) https://gist.github.com/namtsui/f60b0d6fe9d334c3b6b30d1541645058

@namtsui
Copy link
Author

namtsui commented Apr 28, 2020

I force pushed a more concise one-liner hack for consideration. It uses the restarting argument. The logic is that in CL_Vid_Restart_f:

  1. CL_ShutdownRef() calls re.Shutdown, which zeroes out glConfig.
  2. CL_ShutdownRef() then calls dlclose(), which doesn't work in this case.
  3. CL_InitRef() calls dlopen()
  4. CL_StartHunkUsers() eventually calls InitOpenGL with a properly zeroed out glConfig regardless of dlclose()'s behavior.

@namtsui
Copy link
Author

namtsui commented Apr 28, 2020

As an aside, I had to comment out printing out the renderer_string because I was getting segfaults where renderer_string contains garbage or memory I could not access. This happens when toggling fullscreen. I'm not sure why nor if this is related to the original issue.

$OpenBSD$

Hack to fix singleplayer crashing when toggling fullscreen.

dlclose doesn't necessarily unload the shared library immediately. In some
cases, the library is intentionally kept around.  As a result, when the library
is loaded up again none of the global variables get initialised.

Index: code/rd-vanilla/tr_init.cpp
--- code/rd-vanilla/tr_init.cpp.orig
+++ code/rd-vanilla/tr_init.cpp
@@ -1253,7 +1253,6 @@ void GfxInfo_f( void )
 	int noborder = ri.Cvar_VariableIntegerValue("r_noborder");
 
 	ri.Printf( PRINT_ALL, "\nGL_VENDOR: %s\n", glConfig.vendor_string );
-	ri.Printf( PRINT_ALL, "GL_RENDERER: %s\n", glConfig.renderer_string );
 	ri.Printf( PRINT_ALL, "GL_VERSION: %s\n", glConfig.version_string );
 	R_PrintLongString(glConfig.extensions_string);
 	Com_Printf ("\n");
@@ -1834,6 +1833,7 @@ void RE_Shutdown( qboolean destroyWindow, qboolean res
 			if ( restarting )
 			{
 				SaveGhoul2InfoArray();
+				memset(&glConfig, 0, sizeof(glConfig));
 			}
 		}
 	}

@namtsui
Copy link
Author

namtsui commented Apr 29, 2020

Here is an update to #1037 (comment) which was slightly inaccurate. glConfig.renderer_string only needed to be commented out when I was initially investigating fullscreen segfaulting. With the one-liner hack, renderer_string properly prints out as GL_RENDERER: AMD BARTS (DRM 2.50.0 / 6.7, LLVM 8.0.1). It is no longer necessary to comment that line out.

@ensiform
Copy link
Member

ensiform commented Apr 29, 2020

I think in both SP and MP we should have these conditions

if destroyWindow is true: and after the window shutdown func pointer

memset glconfig & glconfigext (might be MP exclusive)
memset glstate

Is there ever a time where restarting happens without destroy window? Changing from full to not full and vice versa will require a destroy window afaik.

For reference see: https://github.com/etfdevs/ETe/blob/master/src/renderer/tr_init.c#L2003-L2005

cc @xycaleth

possibly even clear the window global variable that was created by xyc too.

@namtsui
Copy link
Author

namtsui commented Apr 29, 2020

SP and MP fullscreen toggling both work in my testing with the changes that you proposed.

I did get an unrelated segfault that I can't seem to reproduce and was not running gdb for. In multiplayer I created a game on Vjun Sentinel, got the Merr-Sonn PLX-2M behind the lifts near the starting spawn at lifts, and immediately segfaulted after firing it. However, now I can fire without segfaulting. I'll open an issue if I see it again.

Reported here: #1038

Is there ever a time where restarting happens without destroy window? Changing from full to not full and vice versa will require a destroy window afaik.

I observed the following scenarios for destroyWindow and restarting:

  • true and true: vid_restart in console, changing resolutions, toggling fullscreen on/off
  • false and false: starting a new game during the loading screen. loading a savefile during the loading screen.
  • true and false: exiting game by clicking exit

@ensiform
Copy link
Member

So basically it cannot be false, true. Which is ideal and why the new fix should probably be accepted I guess l.

Fixes JACoders#1036

Unbreaks crash when toggling fullscreen in the singleplayer binary on
OpenBSD. openjk (multiplayer binary) uses -lGL and -lGLU but openjk_sp
did not.
@namtsui
Copy link
Author

namtsui commented May 6, 2021

@xycaleth @ensiform This is ready for review with a proper fix.

I have output from both LD_DEBUG=1 openjk_sp and openjk on OpenBSD at this link.
https://gist.github.com/namtsui/3f227e9f9e6e16c804e76efecf0f2a46

I noticed that openjk (multiplayer binary) has a direct dependency on libGL.so and libGLU. This was not the case for openjk_sp, where libGL.so and libGLU.so first gets loaded by rdsp-vanilla.so. This explains (in a handwavy way) the strange behavior for dlopen and dlclose and its interaction with ld.so(1) on OpenBSD.

linking dep /usr/X11R6/lib/libGL.so.17.1 as child of ./openjk
linking dep /usr/X11R6/lib/libGLU.so.9.0 as child of ./openjk
...
linking dep /usr/X11R6/lib/libGL.so.17.1 as child of ./rd-vanilla.so
linking dep /usr/X11R6/lib/libGLU.so.9.0 as child of ./rd-vanilla.so
linking dep /usr/X11R6/lib/libGL.so.17.1 as child of ./rdsp-vanilla.so
linking dep /usr/X11R6/lib/libGLU.so.9.0 as child of ./rdsp-vanilla.so

Adding -lGL and -lGLU to cmake, as done with the multiplayer binary, resolves the issue. Toggling singleplayer fullscreen no longer crashes on OpenBSD.

bob-beck pushed a commit to openbsd/ports that referenced this pull request May 7, 2021
Unbreaks crash when toggling fullscreen in openjk_sp (singleplayer
binary). openjk (multiplayer binary) is linked with -lGL -lGLU, but
openjk_sp lacks these flags.

see: JACoders/OpenJK#1037

ok bcallah@ and tobhe@
epsilon-0 pushed a commit to epsilon-0/openbsd-ports that referenced this pull request May 10, 2021
Unbreaks crash when toggling fullscreen in openjk_sp (singleplayer
binary). openjk (multiplayer binary) is linked with -lGL -lGLU, but
openjk_sp lacks these flags.

see: JACoders/OpenJK#1037

ok bcallah@ and tobhe@
epsilon-0 pushed a commit to epsilon-0/openbsd-ports that referenced this pull request May 10, 2021
Unbreaks crash when toggling fullscreen in openjk_sp (singleplayer
binary). openjk (multiplayer binary) is linked with -lGL -lGLU, but
openjk_sp lacks these flags.

see: JACoders/OpenJK#1037

ok bcallah@ and tobhe@
@Razish Razish requested a review from a team October 11, 2023 01:32
@namtsui
Copy link
Author

namtsui commented Oct 26, 2023

thanks for reviewing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants