-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kdump] Fix kdump error message when a reboot is issued #7985
Conversation
@@ -10,7 +10,7 @@ KDUMP_CMDLINE_APPEND="irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service a | |||
# Disable advanced pcie features | |||
# Disable high precision event timer as on some platforms it is interfering with the kdump operation | |||
# Pass platform identifier string as part of crash kernel command line to be used by the reboot script during kdump | |||
KDUMP_CMDLINE_APPEND+=" panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=__PLATFORM__" | |||
KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=__PLATFORM__" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several questions: panic=10
is used to Reboot crash kernel on panic
, right? Does it mean the device will be rebooted if crash kernel was panicked, right? If crash kernel was panicked, whether the core dump file will be generated?
If device was rebooted, production kernel will be loaded or crash kernel will be loaded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to answer all the questions.
panic=10 is used to Reboot crash kernel on panic, right?
Yes. panic=10 mentioned here is crash kernel's command line argument.
Does it mean the device will be rebooted if crash kernel was panicked, right?
Yes. If crash kernel crashes during boot up or during vmcore collection or during its reboot.
If crash kernel was panicked, whether the core dump file will be generated?
Depends on at what point the crash kernel panicked.
If device was rebooted, production kernel will be loaded or crash kernel will be loaded?
If crash kernel reboots/crashes, production kernel will be loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the change to append extra arguments to KDUMP_CMDLINE_APPEND
did work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for your answers! @rajendra-dendukuri.
Can you also share me the link or docs to introduce the meaning panic=x
please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yozhao101 panic argument is described below.
kdump is described here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is set on /etc/sysctl.conf on the filesystem. But since it is critical that crash kernel should always reboot on panic, we set it explicitly in kdump-tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the question:
If crash kernel was panicked, whether the core dump file will be generated?
I think the question should be reworded as following:
If crash kernel was panicked, whether the core dump file of crash kernel will be generated?
I think the only purpose of
crash kernel/capture kernel
is to save kernel core dump file and kernel log file on local disk or remote server from/proc/vmcore
.From the kdump script
kdump-tools
, we can see that if kernel core file/proc/vmcore
was generated, then crash kernel will try to dump kernel core file and kernel log file by invoking function in another kdump scriptkdump-config
. No matter the dump commands were done successfully or not, device will be rebooted into production kernel by calling the commandreboot -f
.But if crash kernel crashed during the dump operation, what I am thinking is kernel core file
/proc/vmcore
can still be generated, the device can be rebooted into production kernel if and only if the crash kernel was loaded again and have a chance to finish dumping the core file.
We may end up on a continuous loop trying to recover from a failed state. It is safe to reboot into production kernel rather than try the crash kernel which has failed. For example if there is an issue with hard disk access, crash kernel may not be able to write to the device unless a reboot has happened. crash kernel is kexec'ed so there is a chance that it may not be able to bring the system to a reliable state. Kdump is a best effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the question:
If crash kernel was panicked, whether the core dump file will be generated?
I think the question should be reworded as following:
If crash kernel was panicked, whether the core dump file of crash kernel will be generated?
I think the only purpose ofcrash kernel/capture kernel
is to save kernel core dump file and kernel log file on local disk or remote server from/proc/vmcore
.
From the kdump scriptkdump-tools
, we can see that if kernel core file/proc/vmcore
was generated, then crash kernel will try to dump kernel core file and kernel log file by invoking function in another kdump scriptkdump-config
. No matter the dump commands were done successfully or not, device will be rebooted into production kernel by calling the commandreboot -f
.
But if crash kernel crashed during the dump operation, what I am thinking is kernel core file/proc/vmcore
can still be generated, the device can be rebooted into production kernel if and only if the crash kernel was loaded again and have a chance to finish dumping the core file.We may end up on a continuous loop trying to recover from a failed state. It is safe to reboot into production kernel rather than try the crash kernel which has failed. For example if there is an issue with hard disk access, crash kernel may not be able to write to the device unless a reboot has happened. crash kernel is kexec'ed so there is a chance that it may not be able to bring the system to a reliable state. Kdump is a best effort.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is set on /etc/sysctl.conf on the filesystem. But since it is critical that crash kernel should always reboot on panic, we set it explicitly in kdump-tools.
Currently we reused the production kernel as crash kernel, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The production kernel is used as the crash kernel.
dash doesn't support += operation to append to a variable's value. Use KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} " instead The below error message is seen when a reboot is issued. [ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found
dash doesn't support += operation to append to a variable's value. Use KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} " instead The below error message is seen when a reboot is issued. [ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found
[ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found
Why I did it
The below error message is seen when a reboot is issued.
[ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found
How I did it
dash doesn't support += operation to append to a variable's value.
How to verify it
Use KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} " instead
Which release branch to backport (provide reason below if selected)
Description for the changelog
Fix kdump error message when a reboot is issued
A picture of a cute animal (not mandatory but encouraged)