Motivation
==========
People may forget to recheck to ensure kdump works, which as a result, a
possibility of no vmcores generated after a real system crash. It is
unexpected for kdump.
It is highly recommended people to recheck kdump after any system
modification, such as:
a. after kernel patching or whole yum update, as it might break something
on which kdump is dependent, maybe due to introduction of any new bug etc.
b. after any change at hardware level, maybe storage, networking,
firmware upgrading etc.
c. after implementing any new application, like which involves 3rd party modules
etc.
Though these exceed the range of kdump, however a simple vmcore creation
status notification is good to have for now.
Design
======
Kdump currently will check any relating files/fs/drivers modified before
determine if initrd should rebuild when (re)start. A rebuild is an
indicator of such modification, and kdump need to be rechecked. This will
clear the vmcore creation status specified in $VMCORE_CREATION_STATUS.
Vmcore creation check will happen at "kdumpctl (re)start/status", and will
report the creation success/fail status to users. A "success" status indicates
previously there has been a vmcore successfully generated based on the current
env, so it is more likely a vmcore will be generated later when real crash
happens; A "fail" status indicates previously there was no vmcore
generated, or has been a vmcore creation failed based on current env. User
should check the 2nd kernel log or the kexec-dmesg.log for the failing reason.
$VMCORE_CREATION_STATUS is used for recording the vmcore creation status of
the current env. The format will be like:
success 1718682002
Which means, there has been a vmcore generated successfully at this
timestamp for the current env.
Usage
=====
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: No vmcore creation test performed!
[root@localhost ~]# kdumpctl test
[root@localhost ~]# kdumpctl status
kdump: Kdump is operational
kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Notice: Last successful vmcore creation on Tue Jun 18 16:39:10 CST 2024
The notification for kdumpctl (re)start/status can be disabled by
setting VMCORE_CREATION_NOTIFICATION in /etc/sysconfig/kdump
===
v3 -> v2:
Always mount
$VMCORE_CREATION_STATUS(/var/crash/vmcore-creation.status)'s device for
2nd kernel, in case /var is a seperate device than rootfs's device.
v4 -> v3:
Add "kdumpctl test" as the entrance for performing the kdump test.
v5 -> v4:
Fix the mounting failure issue in fadump.
v6 -> v5:
Add new argument as customized mount point for add_mount/to_mount.
v7 -> v6:
a. Code refactoring based on Philipp's suggestion.
b. Only mount $VMCORE_CREATION_STATUS(/var/crash/vmcore-creation.status)'s
device when needed.
c. Add "--force" option for "kdumpctl test", to support the automation test
script QE may perform.
d. Add check in "kdumpctl test" that $VMCORE_CREATION_STATUS can only be on
local drive.
v8 -> v7:
a. Rebased the patch on top of upstream commit e2b8463.
b. Code refactoring based on Philipp's suggestion.
c. Updated the "test" entry of kdumpctl.8.
Signed-off-by: Tao Liu <ltao@redhat.com>