[kdump] Fix kdump error message when a reboot is issued #7985

rajendra-dendukuri · 2021-06-25T19:23:52Z

[ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found

Why I did it

The below error message is seen when a reboot is issued.

[ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found

How I did it

dash doesn't support += operation to append to a variable's value.

How to verify it

Use KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} " instead

Which release branch to backport (provide reason below if selected)

201811
201911
202006
202012

Description for the changelog

Fix kdump error message when a reboot is issued

A picture of a cute animal (not mandatory but encouraged)

yozhao101 · 2021-06-27T03:44:39Z

files/image_config/kdump/kdump-tools

@@ -10,7 +10,7 @@ KDUMP_CMDLINE_APPEND="irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service a
 # Disable advanced pcie features
 # Disable high precision event timer as on some platforms it is interfering with the kdump operation
 # Pass platform identifier string as part of crash kernel command line to be used by the reboot script during kdump
-KDUMP_CMDLINE_APPEND+=" panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=__PLATFORM__"
+KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=__PLATFORM__"


Several questions: panic=10 is used to Reboot crash kernel on panic, right? Does it mean the device will be rebooted if crash kernel was panicked, right? If crash kernel was panicked, whether the core dump file will be generated?
If device was rebooted, production kernel will be loaded or crash kernel will be loaded?

I will try to answer all the questions.

panic=10 is used to Reboot crash kernel on panic, right?

Yes. panic=10 mentioned here is crash kernel's command line argument.

Does it mean the device will be rebooted if crash kernel was panicked, right?

Yes. If crash kernel crashes during boot up or during vmcore collection or during its reboot.

If crash kernel was panicked, whether the core dump file will be generated?

Depends on at what point the crash kernel panicked.

If device was rebooted, production kernel will be loaded or crash kernel will be loaded?

If crash kernel reboots/crashes, production kernel will be loaded.

I checked the change to append extra arguments to KDUMP_CMDLINE_APPEND did work.

Thanks so much for your answers! @rajendra-dendukuri.

Can you also share me the link or docs to introduce the meaning panic=x please?

@yozhao101 panic argument is described below.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/admin-guide/kernel-parameters.txt?h=v4.19.195

kdump is described here

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/kdump/kdump.txt?h=v4.19.195

Yes it is set on /etc/sysctl.conf on the filesystem. But since it is critical that crash kernel should always reboot on panic, we set it explicitly in kdump-tools.

For the question:

If crash kernel was panicked, whether the core dump file will be generated?

I think the question should be reworded as following:

If crash kernel was panicked, whether the core dump file of crash kernel will be generated?

I think the only purpose of crash kernel/capture kernel is to save kernel core dump file and kernel log file on local disk or remote server from /proc/vmcore.

From the kdump script kdump-tools, we can see that if kernel core file /proc/vmcore was generated, then crash kernel will try to dump kernel core file and kernel log file by invoking function in another kdump script kdump-config. No matter the dump commands were done successfully or not, device will be rebooted into production kernel by calling the command reboot -f.

But if crash kernel crashed during the dump operation, what I am thinking is kernel core file /proc/vmcore can still be generated, the device can be rebooted into production kernel if and only if the crash kernel was loaded again and have a chance to finish dumping the core file.

We may end up on a continuous loop trying to recover from a failed state. It is safe to reboot into production kernel rather than try the crash kernel which has failed. For example if there is an issue with hard disk access, crash kernel may not be able to write to the device unless a reboot has happened. crash kernel is kexec'ed so there is a chance that it may not be able to bring the system to a reliable state. Kdump is a best effort.

For the question:
If crash kernel was panicked, whether the core dump file will be generated?
I think the question should be reworded as following:
If crash kernel was panicked, whether the core dump file of crash kernel will be generated?
I think the only purpose of crash kernel/capture kernel is to save kernel core dump file and kernel log file on local disk or remote server from /proc/vmcore.
From the kdump script kdump-tools, we can see that if kernel core file /proc/vmcore was generated, then crash kernel will try to dump kernel core file and kernel log file by invoking function in another kdump script kdump-config. No matter the dump commands were done successfully or not, device will be rebooted into production kernel by calling the command reboot -f.
But if crash kernel crashed during the dump operation, what I am thinking is kernel core file /proc/vmcore can still be generated, the device can be rebooted into production kernel if and only if the crash kernel was loaded again and have a chance to finish dumping the core file.

We may end up on a continuous loop trying to recover from a failed state. It is safe to reboot into production kernel rather than try the crash kernel which has failed. For example if there is an issue with hard disk access, crash kernel may not be able to write to the device unless a reboot has happened. crash kernel is kexec'ed so there is a chance that it may not be able to bring the system to a reliable state. Kdump is a best effort.

Agreed.

Yes it is set on /etc/sysctl.conf on the filesystem. But since it is critical that crash kernel should always reboot on panic, we set it explicitly in kdump-tools.

Currently we reused the production kernel as crash kernel, right?

Yes. The production kernel is used as the crash kernel.

dash doesn't support += operation to append to a variable's value. Use KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} " instead The below error message is seen when a reboot is issued. [ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found

[kdump] Fix kdump error message when a reboot is issued

2472da6

rajendra-dendukuri requested a review from lguohan as a code owner June 25, 2021 19:23

yozhao101 reviewed Jun 27, 2021

View reviewed changes

yozhao101 approved these changes Jun 29, 2021

View reviewed changes

lguohan merged commit f4b0c8f into sonic-net:master Jul 1, 2021

lguohan added Bug 🐛 Request for 202012 Branch labels Jul 1, 2021

qiluo-msft added the Included in 202012 Branch label Jul 7, 2021

praveen-li pushed a commit to praveen-li/sonic-buildimage that referenced this pull request Feb 15, 2022

Fix kdump error message when a reboot is issued (sonic-net#7985)

12caf8a

rajendra-dendukuri mentioned this pull request Aug 12, 2022

[kdump] Keep kdump in disabled mode and let SONiC configuration enabl… #11724

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kdump] Fix kdump error message when a reboot is issued #7985

[kdump] Fix kdump error message when a reboot is issued #7985

rajendra-dendukuri commented Jun 25, 2021

yozhao101 Jun 27, 2021

rajendra-dendukuri Jun 28, 2021

yozhao101 Jun 28, 2021

yozhao101 Jun 28, 2021

rajendra-dendukuri Jun 28, 2021

rajendra-dendukuri Jun 28, 2021

rajendra-dendukuri Jun 28, 2021

yozhao101 Jun 29, 2021

yozhao101 Jun 29, 2021

rajendra-dendukuri Jun 29, 2021

[kdump] Fix kdump error message when a reboot is issued #7985

[kdump] Fix kdump error message when a reboot is issued #7985

Conversation

rajendra-dendukuri commented Jun 25, 2021

Why I did it

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

Description for the changelog

A picture of a cute animal (not mandatory but encouraged)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment