Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to analyze a Linux "core" file (i.e. crash dump) #199

Closed
Fish-Git opened this issue Apr 14, 2019 · 5 comments
Closed

How to analyze a Linux "core" file (i.e. crash dump) #199

Fish-Git opened this issue Apr 14, 2019 · 5 comments
Labels
Discussion Developers are invited to discuss a design change or solution to a coding problem. L Linux only issue, such as with tuntap networking that doesn't occur on Windows.

Comments

@Fish-Git
Copy link
Member

Fish-Git commented Apr 14, 2019

How to get a core dump

    ulimit -a             (lists all limits)
    ulimit -c             (core dump size)
    ulimit -c unlimited   (enable core dumps)

ulimit is a shell builtin, and thus only affects the current shell and processes started by that shell. To set limits permanently or for all processes, edit the file /etc/security/limits.conf and reboot. The examples in the limits.conf manpage are fairly good. You just need to add something like:

    # gedit /etc/security/limits.conf

        . . .

        myusrid - core unlimited

 

Where are "core" files written?

If you run Hercules as root, they appear to be placed in the current directory as a hidden file (because the owner is root):

  $ ls -al ~/hercules/hercules-0/core*

  -rw-------. 1 root root 104226816 Apr  6 01:24 /home/fish/hercules/hercules-0/core.56278

If you run Hercules as a regular user I'm guessing they'll be placed in the current directory as a regular file.

 

Obtaining a backtrace from a "core" file

You can open a core file with gdb like this:

    $ gdb  /path/to/executable  /path/to/core/file

which should then display exactly where the crash occurred and which thread it was.

For the Hercules case, since it uses libtool, the path to the executable is typically something like .libs/lt-hercules.

To see what each thread was doing when the crash occurred (including the one that crashed), use the command:

    (gdb) thread apply all bt

which should then display a full backtrace for each thread, identifying the source file and line number of each of the thread's function calls.

If you want to save the output of your gdb session to a log file, issue the following command as your very first gdb command (e.g. before your backtrace command):

    (gdb) set logging on

For more information regarding gdb's logging capabilities, please see:

 

Obtaining a backtrace by running Hercules directly under gdb itself

If the crash is reproducible, you can start Hercules directly from the gdb debugger as follows:

  $ gdb .libs/lt-hercules                        (program to be debugged)
  (gdb) run -f myhercconfig -o myherclogfile     (command line arguments)

  <segfault happens here>

  (gdb) backtrace

  <offending code is shown here>

To avoid signal noise (e.g. if gdb breaks on a SIGUSR2 event):

    Thread 10 "LCS_PortThread" received signal SIGUSR2, User defined signal 2.
    [Switching to Thread 0x7fffd77ca700 (LWP 28243)]
    0x00007ffff5be3f2c in close () from /lib64/libpthread.so.0

It appears you can do either:

  1. Press c to continue whenever the SIGUSR2 break occurs.
  2. Enter the gdb command handle SIGUSR2 noprint nostop when gdb is first started.
  3. Both.

Ref: "Avoiding gdb signal noise."

@Fish-Git Fish-Git added HELP! Help is needed from someone more experienced or I'm simply overloaded with too much work right now! Discussion Developers are invited to discuss a design change or solution to a coding problem. L Linux only issue, such as with tuntap networking that doesn't occur on Windows. labels Apr 14, 2019
@mhoes
Copy link

mhoes commented Apr 17, 2019

[fish@centos-64 ~]$ gdb -c core.56320

I may be mistaken, but it appears from the above snippet that you are only telling gdb where the core file is, but not where the executable is that produced the coredump.

What you need to do is this:

gdb .libs/lt-hercules ./core.dump

Which in my example '$test crash' produces this: (which seems to be what you are looking for)

$ gdb .libs/lt-hercules ./core.dump
GNU gdb (GDB) Fedora 8.2-6.fc29
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from .libs/lt-hercules...done.

warning: core file may not match specified executable file.
[New LWP 62520]
[New LWP 62539]
[New LWP 62541]
[New LWP 62540]
[New LWP 62538]

warning: Loadable section ".note.gnu.property" outside of ELF segments
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments

warning: Loadable section ".note.gnu.property" outside of ELF segments
Core was generated by `/home/maarten/src/sdl-hercules-390-topdir/in-tree-build/hyperion/.libs/lt-hercu'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  test_cmd (argc=<optimized out>, argv=<optimized out>, cmdline=<optimized out>) at hsccmd.c:141
141             if      (CMD( argv[1], CRASH,   5 )) CRASH();
[Current thread is 1 (Thread 0x7f193626c740 (LWP 62520))]
Missing separate debuginfos, use: dnf debuginfo-install zlib-1.2.11-14.fc29.x86_64
(gdb) thread apply all bt

Thread 5 (Thread 0x7f193616a700 (LWP 62538)):
#0  __libc_read (nbytes=1042417, buf=0x7f193616c81f, fd=3) at ../sysdeps/unix/sysv/linux/read.c:26
#1  __libc_read (fd=3, buf=0x7f193616c81f, nbytes=1042417) at ../sysdeps/unix/sysv/linux/read.c:24
#2  0x00007f193668cc66 in logger_thread (arg=<optimized out>) at logger.c:365
#3  0x00007f193668a8a5 in hthread_func (arg2=0x20653d0) at hthreads.c:798
#4  0x00007f193643f58e in start_thread (arg=<optimized out>) at pthread_create.c:486
#5  0x00007f193636c683 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f1935d21700 (LWP 62540)):
#0  0x00007f19363387f8 in __GI___nanosleep (requested_time=requested_time@entry=0x7f1935d20e10, remaining=remaining@entry=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00007f1936364568 in usleep (useconds=<optimized out>) at ../sysdeps/posix/usleep.c:32
#2  0x00007f1936ad2e74 in timer_thread (argp=<optimized out>) at timer.c:277
#3  0x00007f193668a8a5 in hthread_func (arg2=0x207a870) at hthreads.c:798
#4  0x00007f193643f58e in start_thread (arg=<optimized out>) at pthread_create.c:486
#5  0x00007f193636c683 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f1935ffd700 (LWP 62541)):
#0  0x00007f1936363df1 in __pselect (nfds=nfds@entry=11, readfds=readfds@entry=0x7f1935ffcdf0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=<optimized out>, timeout@entry=0x7f193600d1b0 <tv_100ms>, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/pselect.c:69
#1  0x00007f193600494f in console_connection_handler (arg=<optimized out>) at console.c:3428
#2  0x00007f193668a8a5 in hthread_func (arg2=0x2094c90) at hthreads.c:798
#3  0x00007f193643f58e in start_thread (arg=<optimized out>) at pthread_create.c:486
#4  0x00007f193636c683 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f1935e22700 (LWP 62539)):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x7f19280029c0) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x2060268, cond=0x7f1928002998) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=cond@entry=0x7f1928002998, mutex=mutex@entry=0x2060268) at pthread_cond_wait.c:655
#3  0x00007f193668bc76 in hthread_wait_condition (plc=plc@entry=0x7f1928002998, plk=plk@entry=0x7f193666aef0 <sysblk+3440>, location=location@entry=0x7f1936b257f3 "cpu.c:1513") at hthreads.c:684
#4  0x00007f19368a489e in CPU_Wait (regs=regs@entry=0x7f1928002000) at cpu.c:1513
#5  0x00007f19368ad198 in z900_process_interrupt (regs=regs@entry=0x7f1928002000) at cpu.c:1639
#6  0x00007f19368af260 in z900_run_cpu (cpu=<optimized out>, oldregs=<optimized out>) at cpu.c:1833
#7  0x00007f19368a4232 in cpu_thread (ptr=<optimized out>) at cpu.c:1307
#8  0x00007f193668a8a5 in hthread_func (arg2=0x207a870) at hthreads.c:798
#9  0x00007f193643f58e in start_thread (arg=<optimized out>) at pthread_create.c:486
#10 0x00007f193636c683 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f193626c740 (LWP 62520)):
#0  test_cmd (argc=<optimized out>, argv=<optimized out>, cmdline=<optimized out>) at hsccmd.c:141
#1  0x00007f193685298e in CallHercCmd (argc=2, argv=argv@entry=0x7ffc89d87740, cmdline=cmdline@entry=0x20853b0 "$test crash") at cmdtab.c:361
#2  0x00007f1936852a1f in DoCallHercCmdLine (pszCmdLine=pszCmdLine@entry=0x7ffc89d89780 "$test", internal=internal@entry=0 '\000') at cmdtab.c:416
#3  0x00007f1936852ba7 in HercCmdLine (pszCmdLine=pszCmdLine@entry=0x7ffc89d89780 "$test") at cmdtab.c:451
#4  0x00007f1936852c7f in the_real_panel_command (cmdline=<optimized out>) at cmdtab.c:821
#5  0x00007f1936a90976 in do_panel_command (cmd=cmd@entry=0x7f1936e82980 <cmdline>) at panel.c:367
#6  0x00007f1936a995ae in the_real_panel_display () at panel.c:2614
#7  0x00007f1936a762f8 in impl (argc=<optimized out>, argc@entry=1, argv=argv@entry=0x7ffc89d92bd8) at impl.c:1131
#8  0x0000000000401187 in main (ac=1, av=0x7ffc89d92bd8) at bootstrap.c:133
(gdb)

@Fish-Git
Copy link
Member Author

I may be mistaken, but it appears from the above snippet that you are only telling gdb where the core file is, but not where the executable is that produced the coredump.

What you need to do is this:

gdb .libs/lt-hercules ./core.dump

Which in my example '$test crash' produces this: (which seems to be what you are looking for)

$ gdb .libs/lt-hercules ./core.dump
GNU gdb (GDB) Fedora 8.2-6.fc29
Copyright (C) 2018 Free Software Foundation, Inc.

[...]

Core was generated by `/home/maarten/src/sdl-hercules-390-topdir/in-tree-build/hyperion/.libs/lt-hercu'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  test_cmd (argc=<optimized out>, argv=<optimized out>, cmdline=<optimized out>) at hsccmd.c:141
141             if      (CMD( argv[1], CRASH,   5 )) CRASH();
[Current thread is 1 (Thread 0x7f193626c740 (LWP 62520))]
Missing separate debuginfos, use: dnf debuginfo-install zlib-1.2.11-14.fc29.x86_64
(gdb) thread apply all bt

[...]

Thread 1 (Thread 0x7f193626c740 (LWP 62520)):
#0  test_cmd (argc=<optimized out>, argv=<optimized out>, cmdline=<optimized out>) at hsccmd.c:141
#1  0x00007f193685298e in CallHercCmd (argc=2, argv=argv@entry=0x7ffc89d87740, cmdline=cmdline@entry=0x20853b0 "$test crash") at cmdtab.c:361
#2  0x00007f1936852a1f in DoCallHercCmdLine (pszCmdLine=pszCmdLine@entry=0x7ffc89d89780 "$test", internal=internal@entry=0 '\000') at cmdtab.c:416
#3  0x00007f1936852ba7 in HercCmdLine (pszCmdLine=pszCmdLine@entry=0x7ffc89d89780 "$test") at cmdtab.c:451
#4  0x00007f1936852c7f in the_real_panel_command (cmdline=<optimized out>) at cmdtab.c:821
#5  0x00007f1936a90976 in do_panel_command (cmd=cmd@entry=0x7f1936e82980 <cmdline>) at panel.c:367
#6  0x00007f1936a995ae in the_real_panel_display () at panel.c:2614
#7  0x00007f1936a762f8 in impl (argc=<optimized out>, argc@entry=1, argv=argv@entry=0x7ffc89d92bd8) at impl.c:1131
#8  0x0000000000401187 in main (ac=1, av=0x7ffc89d92bd8) at bootstrap.c:133
(gdb)

 
THANK YOU, mhoes! :)))

Yes, that was the missing piece of the puzzle. It is working fine now.

I've updated my opening comments with the missing information.

Thanks again!   :)

@Fish-Git Fish-Git removed the HELP! Help is needed from someone more experienced or I'm simply overloaded with too much work right now! label Apr 17, 2019
@Fish-Git Fish-Git pinned this issue Apr 17, 2019
@Fish-Git Fish-Git changed the title HELP! How to analyze a Linux "core" file? (crash dump) How to analyze a Linux "core" file (i.e. crash dump) Apr 17, 2019
@Fish-Git Fish-Git changed the title How to analyze a Linux "core" file (i.e. crash dump) How to analyze a Linux "core" file &nbsp; (i.e. crash dump) Apr 17, 2019
@Fish-Git Fish-Git changed the title How to analyze a Linux "core" file &nbsp; (i.e. crash dump) How to analyze a Linux "core" file (i.e. crash dump) Apr 17, 2019
@wably wably unpinned this issue Apr 18, 2019
@Fish-Git Fish-Git pinned this issue Apr 20, 2019
@mhoes
Copy link

mhoes commented Apr 29, 2019

Minor nitpick: you stated

    ulimit -c unlimited   (enable core dumps)

which strictly speaking isn't completely correct: although this does enable core dumps, more specifically it (ulimit -c size) specifies the maximum allowed filesize of a coredump. Any size specification is allowed (apparently in blocks of 1024kb), although effectively it is also limited by the maximum file (any) size allowed (ulimit -f size). Setting it to 'unlimited' allows a coredump of any filesize, and setting it to '0' would have the effect of setting the maximum size to zero bytes, and thereby effectively disabling coredump creation completely.

@mhoes
Copy link

mhoes commented Apr 29, 2019

Second nitpick(s) :

Where are "core" files written?

Traditionally, (and regardless of the user producing the coredump) a core dump is written to the current working directory of the process (which does not necessarily need to be the working directory the process was started in).

On more modern systemd enabled Linux, it depends on what is specified in systemd's coredump.conf (or the systemd default of /var/lib/systemd/coredump/). I have no idea what non-systemd Linux does these days.

If you run Hercules as root, they appear to be placed in the current directory as a hidden file (because the owner is root):

Traditionally, the words "hidden file" mean something specific in Unix-speak, namely any file that starts with a . (dot). This is because 'ls' will not list such files by default (you need to do 'ls -a' for that), making them somewhat "hidden" from view. It has nothing to do with the owner or permissions of the file however.

@wably wably unpinned this issue Aug 18, 2019
@Fish-Git Fish-Git pinned this issue Aug 19, 2019
@wably wably unpinned this issue Sep 6, 2019
@Fish-Git Fish-Git pinned this issue Aug 19, 2020
@ivan-w ivan-w unpinned this issue Nov 12, 2020
@Fish-Git Fish-Git pinned this issue Nov 14, 2020
@Fish-Git Fish-Git reopened this Mar 9, 2021
@wably wably unpinned this issue Mar 25, 2021
@Fish-Git Fish-Git pinned this issue Mar 29, 2021
@wably wably unpinned this issue Mar 31, 2021
@Fish-Git Fish-Git pinned this issue Jun 29, 2021
@Fish-Git
Copy link
Member Author

I see no reason to keep this issue open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Developers are invited to discuss a design change or solution to a coding problem. L Linux only issue, such as with tuntap networking that doesn't occur on Windows.
Projects
None yet
Development

No branches or pull requests

2 participants