-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kdump support #729
kdump support #729
Conversation
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As comments.
Also, I'm wondering whether we should move the kdump group under a "debug" group, where we can place all debug-related functionality that most network engineers would not use. (e.g., config debug kdump ...
, and show debug kdump ...
config/main.py
Outdated
run_command("sonic-kdump-config --disable") | ||
|
||
@kdump.command() | ||
@click.option('-y', '--yes', is_flag=True, help="Force rebooting after enabling kdump") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comments need to be modified.
scripts/sonic-kdump-config
Outdated
# @param verbose If True, the function will display a few additinal information | ||
# @param no If True, will not display reboot needed message | ||
# @return True is the grub configuration has changed, and False if it has not | ||
def cmd_kdump_enable(verbose, no): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the parsing of grub.cfg and the logic be separated into different functions?
The idea would be facilitating the integration of other bootloaders than grub by adding an abstraction layer.
I would propose something like the following in the case of cmd_kdump_enable. Some other places in this file would need some changes too. If a cmd_function refers to grub rather than a function of bootloader then it's lacking genericity.
bootloader = get_bootloader() # this would return Grub() or Aboot() based on autodetection
if bootloader is None:
print("Feature not supported on this platform")
return False
image = get_current_image()
old_crashkernel = bootloader.get_crash_kernel( image )
new_crashkernel = "crashkernel: ..."
if old_crashkernel != new_crashkernel:
bootloader.set_crash_kernel( image, new_crashkernel )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olivier-singla , since the sonic support multiple boot loader, not just grub. We have aboot, now we are working on u-boot. The concern here is that current code does not provide good abstraction for the community to add support for other types of boot load. Can you simplify the code logic and define the api. So that others can provide their own implementation for their own bootloader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need better abstraction so that the community can add support for different types of boot loader.
if os.path.exists(grub_cfg): | ||
return kdump_disable_grub(verbose, kdump_enabled, memory, num_dumps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better than the previous version but I still feel some abstraction is lacking.
Basically with this design for every operation each bootloader has to be added inside a if/elif/elif. This is redundant and unnecessary.
Both the kdump_disable_grub
and kdump_config_next_grub
functions have a mix of common logic tied with grub specific code. Adding a new bootloader to this file would likely end up with a lot of copy pasting.
Could we instead just rely on a get_bootloader()
function? This function would hold the if/elif/elif that returns the Bootloader instance or raise UnknownBootloaderError.
Each bootloader would then override the method of the base class Bootloader to add support.
Adding a new Bootloader would now just be about adding a new class definition and changing 2 lines in
get_bootloader` which is a much better design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olivier-singla , can you address them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure more abstraction will make things easier or simpler when adding support for another boot loader. There is little code duplicated between kdump_enable_grub
, kdump_config_next_grub
and cmd_kdump_disable_grub
. grub and u-boot (for instance) use a different mechanism: grub use a flat text file, where u-boot use environment variables.
as possible to understand and identify the root cause of the crash. Currently, the kernel does not provide much information, which make kernel crash investigation difficult and time consuming. Fortunately, there is a way in the kernel to provide more information in the case of a kernel crash. kdump is a feature of the Linux kernel that creates crash dumps in the event of a kernel crash. This PR will add kernel kdump support. An extension to the CLI utilities config and show is provided to configure and manage kdump: - enable / disable kdump functionality - configure kdump - view kernel crash logs
- remove some duplicated code - when running kdump configuration command on Arista switch, display a message "Feature not supported on this platform"
Removed --no and --yes option on kdump enable and disable operations.
Restructured the code so it will be easier to support other bootloader than grub (such as u-boot on arm64 platform).
retest please |
retest this please |
VMCORE_FILE=/proc/vmcore | ||
if [ -e $VMCORE_FILE -a -s $VMCORE_FILE ]; then | ||
debug "We have a /proc/vmcore, then we just kdump'ed" | ||
/sbin/reboot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that to call /sbin/reboot
ahead of echoing user request reboot
to /host/reboot-cause/reboot-cause.txt fails the reboot cause?
@jleveque Could you please take a look?
e98a7af95a9767093904d9e8fd320067163d5f87 (HEAD -> 201911, origin/201911) [syncd] Translate removed RIDs in fdb notification (sonic-net#729) 3ceeae5371eee5b69064fa1af88f51e27caa2d36 [syncd] Process all cases fdb flush notification (sonic-net#726) 115ba0783edf85658fd0329eb23796d758c309f5 fix compile error when compiling with g++-4.8.4 (sonic-net#718) a67f94d3d91325516069ef8c0d99bdec30bafbce Fix typo at SAI_ATTR_VALUE_TYPE_ACL_FIELD_DATA_UINT32 (sonic-net#662) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
In the event of a kernel crash, we need to gather as much information as possible to understand and identify the root cause of the crash. Currently, the kernel does not provide much information, which make kernel crash investigation difficult and time consuming.
Fortunately, there is a way in the kernel to provide more information in the case of a kernel crash. kdump is a feature of the Linux kernel that creates crash dumps in the event of a kernel crash. This PR will add kernel kdump support. Please note that there is another PR in sonic-utilities which is also needed:
sonic-net/sonic-buildimage#3722
An extension to the CLI utilities config and show is provided to configure and manage kdump:
allocated for capture kernel)
There is a design document which describes this kdump implementation:
sonic-net/SONiC#510
- What I did
Added kdump support
- How I did it
Please read HLD:
sonic-net/SONiC#510
- How to verify it
config kdump enable
config save
reboot
echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
show kdump
show kdump log 1 20