Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash during loading with multiple WorkerThreads enabled on Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz #5387

Closed
pirogronian opened this issue Jun 25, 2022 · 19 comments · Fixed by #5390

Comments

@pirogronian
Copy link

pirogronian commented Jun 25, 2022

Observed behaviour

Crash during loading, always in the same place and manner.

Console log with gdb session (last lines):

Loading [92%]: PostLoad took 0.03ms
Pioneer loading took 6150.41ms
Creating new galaxy generator 'legacy' version 1
Clearing and re-using previous Galaxy object
StarSystemCache: misses: 0, slave hits: 0, master hits: 0
SectorCache: misses: 101, slave hits: 0, master hits: 1
Created shader starfield (address=0x555556ca8e30)
Stars picked from galaxy: 103983
Generating 21017 random stars
/usr/include/c++/12.1.0/bits/stl_vector.h:1123: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = vector3<float>; _Alloc = std::allocator<vector3<float> >; reference = vector3<float>&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

Thread 1 "pioneer" received signal SIGABRT, Aborted.
0x00007ffff6e8e36c in ?? () from /usr/lib/libc.so.6

Gdb backtrace:

(gdb) bt
#0  0x00007ffff6e8e36c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff6e3e838 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff6e28535 in abort () from /usr/lib/libc.so.6
#3  0x00007ffff72d2002 in std::__glibcxx_assert_fail (file=<optimized out>, line=<optimized out>, function=<optimized out>, condition=<optimized out>)
    at /usr/src/debug/gcc/libstdc++-v3/src/c++11/debug.cc:60
#4  0x00005555555f467e in ?? ()
#5  0x00005555556f6b21 in ?? ()
#6  0x00005555556f6f35 in ?? ()
#7  0x00005555556f7fba in ?? ()
#8  0x000055555572b086 in ?? ()
#9  0x000055555572b259 in ?? ()
#10 0x0000555555600ab9 in ?? ()
#11 0x00005555555bdecc in ?? ()
#12 0x00005555555be069 in ?? ()
#13 0x00005555555a8cb6 in ?? ()
#14 0x00007ffff6e29290 in ?? () from /usr/lib/libc.so.6
#15 0x00007ffff6e2934a in __libc_start_main () from /usr/lib/libc.so.6
#16 0x00005555555a8065 in ?? ()

I know, it's not very helpful, depsite I added "-D CMAKE_BUILD_TYPE=Debug" and "options=(debug !strip)" to the PKBUILD file. Maybe I should recompile everything again, but I'm afraid I would take long time again. Well, I'll try it, but meanwhile can anybody already look at it?

Expected behaviour

Normal loading and work, as before about two or three weeks ago.

Steps to reproduce

My pioneer version (and OS):
pioneer-git 20220203.r74.gfeb4169a0-1 (from there: https://aur.archlinux.org/packages/pioneer-git)
OS: Arch Linux, up to date.

EDIT: Impaktor added code formatiting to not reference issues 1-16

@impaktor
Copy link
Member

If you're going to recompile, I'd recommend using our latest master, since it contains bug fixes, and new features (and more relevant bugs to discover :) )

Hmm, looking at the PKBUILD file, I don't see how it's getting the 2022-02 version, to me it looks like it's getting the latest master? (I very likely could be wrong). If you still have the git repo that pacman downloaded, you can always do a "git log" in it and see what the latest commit is.

Have pioneer worked on your machine before?

@pirogronian
Copy link
Author

pirogronian commented Jun 25, 2022

I'd recommend using our latest master
Hmm, looking at the PKBUILD file, I don't see how it's getting the 2022-02 version, to me it looks like it's getting the latest master?

Yes, despite first version string, it updates to the latest master version at every package rebuild. I dont know how either :-)

Have pioneer worked on your machine before?

Yes, I believe it was version from Jun 8, 2022 or slightly earlier.

@impaktor
Copy link
Member

it updates to the latest master version at every package rebuild.

Indeed, that makes sense. Would be interesting if you could find out which commit your release is on.

I dont know how either :-)

I believe L19 in the PKBUILD file, then updates L4 with the value of the pkgver variable the output from the function with the same name.

@pirogronian
Copy link
Author

Indeed, that makes sense. Would be interesting if you could find out which commit your release is on.

As I stated, it's probably the latest commit, as I updated my package just two days ago and latest commit from master has 12 days. And here is a proof, I just extracted git log from package source directory:

commit feb4169 (HEAD -> master, origin/master, origin/HEAD)
Author: Pioneer Transifex pioneer-transifex@pioneerspacesim.net
Date: Tue Jun 14 03:01:44 2022 +0200

auto-commit: translation updates

@sturnclaw
Copy link
Member

I will need a backtrace with symbols present to debug this particular error - I'd recommend uploading your output.txt as well, as some log messages are intentionally not logged to the console due to being too verbose (though I don't think it'll make a significant difference).

@pirogronian
Copy link
Author

pirogronian commented Jun 26, 2022

I will need a backtrace with symbols present to debug this particular error

Ok, finally I got this:

#0  0x00007ffff6e8e36c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff6e3e838 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff6e28535 in abort () from /usr/lib/libc.so.6
#3  0x00007ffff72d2002 in std::__glibcxx_assert_fail (file=file@entry=0x555555920340 "/usr/include/c++/12.1.0/bits/stl_vector.h", line=line@entry=1123, 
    function=function@entry=0x555555929ad0 "std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = vector3<float>; _Alloc = std::allocator<vector3<float> >; reference = vector3<float>&; size_type = long "..., condition=condition@entry=0x55555592021c "__n < this->size()")
    at /usr/src/debug/gcc/libstdc++-v3/src/c++11/debug.cc:60
#4  0x00005555555f467e in std::vector<vector3<float>, std::allocator<vector3<float> > >::operator[] (this=this@entry=0x7fffffffdfe0, __n=__n@entry=103983)
    at /usr/include/c++/12.1.0/bits/stl_vector.h:1123
#5  0x00005555556f6b21 in Background::Starfield::Fill (this=this@entry=0x555557c831e0, rand=..., systemPath=systemPath@entry=0x7fffffffe190, galaxy=...)
    at /usr/src/debug/pioneer-git/src/Background.cpp:510
#6  0x00005555556f6f35 in Background::Starfield::Starfield (this=0x555557c831e0, renderer=<optimized out>, rand=..., systemPath=0x7fffffffe190, galaxy=...)
    at /usr/src/debug/pioneer-git/src/Background.cpp:204
#7  0x00005555556f7fba in Background::Container::Container (this=this@entry=0x555557c831a0, renderer=renderer@entry=0x555555ca79e0, rand=..., 
    space=space@entry=0x0, galaxy=..., systemPath=systemPath@entry=0x7fffffffe190) at /usr/src/debug/pioneer-git/src/Background.cpp:659
#8  0x000055555572b086 in Intro::RefreshBackground (this=this@entry=0x5555572b25c0, r=r@entry=0x555555ca79e0) at /usr/src/debug/pioneer-git/src/Intro.cpp:99
#9  0x000055555572b259 in Intro::Intro (this=this@entry=0x5555572b25c0, r=0x555555ca79e0, width=<optimized out>, height=height@entry=720)
    at /usr/src/debug/pioneer-git/src/Intro.cpp:38
#10 0x0000555555600ab9 in MainMenu::Start (this=0x555555b5b8e0) at /usr/src/debug/pioneer-git/src/Pi.cpp:653
#11 0x00005555555bdecc in Application::StartLifecycle (this=this@entry=0x555555b5b4a0) at /usr/src/debug/pioneer-git/src/core/Application.cpp:114
#12 0x00005555555be069 in Application::Run (this=0x555555b5b4a0) at /usr/src/debug/pioneer-git/src/core/Application.cpp:178
#13 0x00005555555a8cb6 in main (argc=1, argv=0x7fffffffe6f8) at /usr/src/debug/pioneer-git/src/Pi.h:146

I'd recommend uploading your output.txt as well, as some log messages are intentionally not logged to the console due to being too verbose

Also:

output.txt

During compilation I got a strange "compiler internal error", and it was similar to those that encountered few weeks ago with similar conditions, so I was afraid that it's my RAM or other hardware that fails. But after final and successful recompilation main issue persisted the same. So, seems to be "stable".

@impaktor
Copy link
Member

@pirogronian thanks for bt & bug report! I've taken the liberty to edit it as I did with the first, by adding code block, such that the numbers don't reference issue 1-13. Click "edit" on your post and you can see what needed to be added to code-format it. (Adding: ```)

@pirogronian
Copy link
Author

pirogronian commented Jun 26, 2022

I run memtest and found several RAM errors at addressess above 6113M. So I disabled it by mem=6113M and run compilation again. Let's check it again. I hope my PC wil not overheat... 😄
Edit: Done. The same. However, I'm not sure, if bad address range was ruled out properly.

Click "edit" on your post and you can see what needed to be added to code-format it. (Adding: ```)

Thanks a lot! I cannot figure out, how to do this... Code button inserts only single `

@sturnclaw
Copy link
Member

Hmm... as far as I can tell, I want to consider this as an issue with memory corruption - the program fails immediately after filling the array with "real stars" when it attempts to write to the remaining space pre-reserved in the array. I've confirmed on my end that the same configuration with respect to exact number of stars being generated and number of worker threads has no issues at all, so I think this may be a problem with your memory setup.

The code writes to the exact same array across multiple worker threads, so it makes very little sense why it would work properly during the parallel phase, but fail immediately upon returning to the single-threaded phase and indexing through the stack-allocated array handle.

@pirogronian
Copy link
Author

pirogronian commented Jun 28, 2022

Hmm... as far as I can tell, I want to consider this as an issue with memory corruption - the program fails immediately after filling the array with "real stars" when it attempts to write to the remaining space pre-reserved in the array. I've confirmed on my end that the same configuration with respect to exact number of stars being generated and number of worker threads has no issues at all, so I think this may be a problem with your memory setup.

The code writes to the exact same array across multiple worker threads, so it makes very little sense why it would work properly during the parallel phase, but fail immediately upon returning to the single-threaded phase and indexing through the stack-allocated array handle.

I'm also worried that this is about my RAM, but I cant figure aout why the issue is so predictable and persistent between runs and even builds. RAM corruption should lead to more random issues, as memory management is dynamic. Now I have newer, corrupted chip replaced with previous one, whitch seems to be ok, and I'm recompiling pioneer again (along with new kernel with memtest support). For now, the issue persisted, so it's nearly for sure not a dynamic runtime problem.

Edit: I just imagined another option: remembering not very old problems with gcc's "internal compiler error" messages, I checked that gcc package was updated more than month ago. Mayby something is broken with its intallation or in other devel packages. I'm going to reinstall it now and recompile pioneer once again...

Update: Reinstalled group base-devel, gcc-libs and glibc. System rebooted. Issue persisted. Recompiling pioneer once again...
Update2: Issue persisted.

I decided to use another PC with Archlinux for reference. But it shall takes some time...
Update: done. System is not fully up to date and I have to use software rendering (too old hardware). However, game runs without the issue. Probably I have something broken in my system. Now I have to guess, what exactly... 😢

Update: I reinstalled forcefully all my system. Issue persisted. Then recompiled pioneer from fresh sources. Issue persisted.
Update: I've used package built on this older PC to install pioneer on my PC, the issue persisted. So, after all, I confirmed this is not a compilation-related problem.

Update: Finally... Yet yesterday I started to suspect it could be something with CPU, precisely with multithreading. And edited config.ini, changing WorkerThreads=0 to WorkerThreads=1. And then... game started without the issue. So, now it's question: is it my CPU problem, or just hard to find bug my system only revealed?
Anyway, I attatch content of my /proc/cpuinfo:
cpuinfo.txt

Just for sure, I tested also WorkerThreads=2. The issue emerged again, of course...
Anyway, thanks to developers for this config option.

@pirogronian pirogronian changed the title Crash during loading, Arch Linux, pioneer-git 20220203.r74.gfeb4169a0-1 Crash during loading with multiple WorkerThreads enabled, Arch Linux, pioneer-git 20220203.r74.gfeb4169a0-1 Jun 30, 2022
@pirogronian pirogronian changed the title Crash during loading with multiple WorkerThreads enabled, Arch Linux, pioneer-git 20220203.r74.gfeb4169a0-1 Crash during loading with multiple WorkerThreads enabled on Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz Jun 30, 2022
@manolollr
Copy link

Hi! I'm the maintainer of the PKGBUILD file. I'm glad to see it is used by someone, I'm doing it as a hobby and to learn Arch Linux packaging system, but trying it have the best quality I can.

I'm having the same issue. I have an AMD Ryzen 5 2600 (12) @ 3.400GHz CPU and 8 GiB RAM. My memory is not damaged.

Looking at the error i can see the line: /usr/include/c++/12.1.0/bits/stl_vector.h ......
Using "pacman -Qo /usr/include/c++/12.1.0/bits/stl_vector.h" I can see this file belongs to package gcc.

The version of GCC being used is 12. I've tried to compile using version 11, but I still have the same issue.

I'm haven't experienced this issue before. It's the first time. As @pirogronian indicates, using WorkerThreads=1 it works. It seems an issue related to multithreading. Perhaps it is needed a compile option to indicate CPU type?

@manolollr
Copy link

I've compiled commit 291a495 just before commit "Web-eWorks/multithread-improvements" and I can load the game without problem, but I can't save the game. When I save the gamme appears this error:

/usr/include/c++/12.1.0/bits/stl_vector.h:1142: std::vector<_Tp, _Alloc>::const_reference std::vector<_Tp, _Alloc>::operator const [with _Tp = unsigned int; _Alloc = std::allocator; const_reference = const unsigned int&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

And when I'm compiling appears this warnings:
[ 55%] Building CXX object CMakeFiles/pioneer-lib.dir/src/pigui/PiGuiView.cpp.o
In file included from /home/archilolo/builds/pioneer-git/src/pioneer-git/src/lua/LuaMetaType.h:10,
from /home/archilolo/builds/pioneer-git/src/pioneer-git/src/pigui/LuaRadar.cpp:7:
/home/archilolo/builds/pioneer-git/src/pioneer-git/src/lua/LuaTable.h:189:37: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is deprecated [-Wdeprecated-declarations]
189 | class VecIter : public std::iterator<std::input_iterator_tag, Value> {
| ^~~~~~~~
In file included from /usr/include/c++/12.1.0/bits/stl_algobase.h:65,
from /usr/include/c++/12.1.0/list:60,
from /usr/include/sigc++-2.0/sigc++/signal.h:7,
from /usr/include/sigc++-2.0/sigc++/sigc++.h:123,
from /home/archilolo/builds/pioneer-git/src/pioneer-git/src/DeleteEmitter.h:12,
from /home/archilolo/builds/pioneer-git/src/pioneer-git/src/lua/LuaObject.h:7,
from /home/archilolo/builds/pioneer-git/src/pioneer-git/src/pigui/LuaPiGui.h:7,
from /home/archilolo/builds/pioneer-git/src/pioneer-git/src/pigui/LuaRadar.cpp:4:
/usr/include/c++/12.1.0/bits/stl_iterator_base_types.h:127:34: note: declared here
127 | struct _GLIBCXX17_DEPRECATED iterator

It seems ArchLinux uses too many new versions of programs and libraries and perhaps pioneer is using an iterator considered deprecated.

@sturnclaw
Copy link
Member

I'm having the same issue. I have an AMD Ryzen 5 2600 (12) @ 3.400GHz CPU and 8 GiB RAM. My memory is not damaged.

Good to know! I wasn't expecting it to be actually related to corrupted memory, but unfortunately system errors of that sort mean there's very little actionable information that we can use from the bug report as nothing about the system can be considered in a consistent, stable state.

I'm haven't experienced this issue before. It's the first time. As @pirogronian indicates, using WorkerThreads=1 it works. It seems an issue related to multithreading.

Interestingly, this doesn't actually disable multithreading at all, it just reduces the number of threads involved to a 'core' thread and a 'worker' thread which both still work to fill the vector in question. It's not related to 'cpu type', as a Core-i3 2300 is a very different CPU from a Ryzen 5 2600.

...and looking at the code I just realized what the actual problem is. Try adding:

stars.pos.resize(NUM_BG_STARS);
stars.color.resize(NUM_BG_STARS);
stars.brightness.resize(NUM_BG_STARS);

to src/Background.cpp on line 489 (right after Output("Stars picked from galaxy [...]) and recompiling. If my guess is correct, this is just a subtle refactoring error that wasn't triggering asserts in my build configuration for some reason.

/home/archilolo/builds/pioneer-git/src/pioneer-git/src/lua/LuaTable.h:189:37: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is deprecated [-Wdeprecated-declarations]

This is a completely unrelated warning (the code it's warning about is not even in scope when the error is triggered) and is not relevant to the problem at hand. Inheriting from std::iterator will remain deprecated-but-allowed for a very long time.

@manolollr
Copy link

I've compiled using Debian 11 (Bullseye) and it works without problem, in the same computer. This version of Debian uses GCC 10. Perhaps the issue is with GCC 11 and GCC 12

@sturnclaw
Copy link
Member

The code in question (before or after the modification above) compiles and runs fine on GCC 12 for me (Solus Linux). It's much more likely to be a build-flags issue than a compiler-specific issue.

@manolollr
Copy link

manolollr commented Jul 2, 2022

I've compiled manually in Arch Linux, with GCC 12 and it works without problem.

So the game only fails when it is packaged, perhaps there is a build flag causing the error. These are the build flags I have configured for packaging:

#########################################################################
# ARCHITECTURE, COMPILE FLAGS
#########################################################################
#
CARCH="x86_64"
CHOST="x86_64-pc-linux-gnu"

#-- Compiler and Linker Flags
#CPPFLAGS=""
CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions \
        -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security \
        -fstack-clash-protection -fcf-protection"
CXXFLAGS="$CFLAGS -Wp,-D_GLIBCXX_ASSERTIONS"
LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now"
LTOFLAGS="-flto=auto"
#RUSTFLAGS="-C opt-level=2"
#-- Make Flags: change this for DistCC/SMP systems
MAKEFLAGS="-j12"
#-- Debugging flags
DEBUG_CFLAGS="-g"
DEBUG_CXXFLAGS="$DEBUG_CFLAGS"
#DEBUG_RUSTFLAGS="-C debuginfo=2"

@Web-eWorks I haven't tested your patch yet, I will try tomorrow. Perhaps it will fix the problem. Thank you!

@manolollr
Copy link

Finally, I have tested your fix (don't let for tomorrow things you can do today)

It works perfect!!! Now I can have the game packaged again.

Thank you

@manolollr
Copy link

Sorry, I've celebrated too soon. The game loads, this issue is solved but there are still some randomly crashes related to the same file: /usr/include/c++/12.1.0/bits/stl_vector.h What should I do? Open a new issue or you reopen this?

Manually compiled game works without problem, but packaged game with Arch Linux packager crashes in this situations:

  • Can't start a new game, at Mars, New Hope or Barnard.
  • I can load a previously saved game with manually compiled version, but game crashes when the ship takeoff, regardless it is at Mars or another planet or moon.

It seems there is an strict memory protection build flag when you package the game in Arch Linux. The flags used are mentioned in this comment Which flag can be causing the crash? -fstack-clash-protection? -D_GLIBCXX_ASSERTIONS? These are default flags, it means they are build flags that Arch Linux developers consider adequate.

Well, for the moment we can compile the game manually following COMPILING.txt guide for Linux.

@sturnclaw
Copy link
Member

-D_GLIBCXX_ASSERTIONS is the flag in question, which enables somewhat heavy-handed assertions and validations above and beyond what normal debug builds enable. This has a non-trivial performance impact especially for realtime applications like games, though it's good that you've brought this to our attention because that's not been part of our debug testing suite.

If I have some time, I'll look into those other crashes you've mentioned; for now I'd recommend manually compiling Pioneer (without said flag) or temporarily/permanently removing the flag given that it does have a runtime performance cost.

yump added a commit to yump/pioneer that referenced this issue Mar 21, 2023
Flatpak and Fedora packages are built with this flag, and it was
causing a crash on save. Also encountered by manolollr in pioneerspacesim#5387.
yump added a commit to yump/pioneer that referenced this issue Mar 21, 2023
Flatpak and Fedora packages are built with this flag, and it was causing
a crash on save. Fixes pioneerspacesim#5570. Also encountered by manolollr in pioneerspacesim#5387.
sturnclaw pushed a commit that referenced this issue Mar 23, 2023
Flatpak and Fedora packages are built with this flag, and it was causing
a crash on save. Fixes #5570. Also encountered by manolollr in #5387.
impaktor pushed a commit to impaktor/pioneer that referenced this issue Mar 24, 2023
Flatpak and Fedora packages are built with this flag, and it was causing
a crash on save. Fixes pioneerspacesim#5570. Also encountered by manolollr in pioneerspacesim#5387.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants