-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kdump support #1596
Kdump support #1596
Conversation
7c27b68
to
84d15d2
Compare
|
84d15d2
to
1c2a6ab
Compare
|
sources/prairie-dog/src/main.rs
Outdated
// Load the panic kernel from `BOOT_MOUNT_PATH`, letting kexec decide which syscall | ||
// it should it use (KEXEC_LOAD and KEXEC_FILE_LOAD). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we force it to always use KEXEC_FILE_LOAD
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can, but for some reason the 5.4 kernel fails to use the kexec_file_load
syscall, and it returns an ENOSUP
error. The problem is not in the kexec
library but rather in the kernel, so I need to debug the syscall to check what's returning the error. I'm already working with the kernel folks on this.
In the 5.10 kernel, kexec always uses kexec_file_load
, so I think it is OK to use -a
for now and let kexec
decide what syscall to use, since we are only enabling kdump in variants that use the 5.10 kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we expect to enable this in other variants relatively soon? I imagine people building their own images/variants might want to enable it as well. Wouldn't want a nasty surprise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we enable this for other variants, and people build their own images/variants, they will still be locked with the kexec_load
syscall for the aarch64 5.4 kernel variants since the syscall for this aarch64 was implemented after the 5.4 version.
1c2a6ab
to
cded56a
Compare
Changes in force push:
|
cded56a
to
dd02418
Compare
Force push updates:
|
packages/makedumpfile/0000-fix-makefile-to-allow-cross-compilation.patch
Outdated
Show resolved
Hide resolved
sources/prairiedog/src/main.rs
Outdated
// Load the panic kernel from `BOOT_MOUNT_PATH`, letting kexec decide which syscall | ||
// it should it use (KEXEC_LOAD and KEXEC_FILE_LOAD). We will use this setting until | ||
// we figure out why kexec doesn't recognize the 5.4 kernel image as a valid file | ||
// to be used with KEXEC_FILE_LOAD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're planning to merge this prior to a fix, let's open an issue to track it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got to port the kernel patch to make the kexec_file_load
syscall work in the x86_64 5.4 kernel. However, I'm still waiting for the kernel folks to help me out with porting the syscall for aarch64, since the syscall was introduced for aarch64 on a higher kernel version.
I talked to @tjkirch since he expressed his concerns of using different syscalls depending on the kernel version/architecture. We agreed that we can proceed with my PR provided that we document in an GH issue that we will force the kexec_file_load
syscall once the kernel folks add support for it for the 5.4 kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should document the restrictions in the same place we document the feature, not (only) in a GH issue.
dd02418
to
3883f7f
Compare
Forced push includes:
|
This commit adds kexec-tools to all variants
packages/makedumpfile/0000-fix-strip-invocation-for-TARGET-env-variable.patch
Outdated
Show resolved
Hide resolved
This commit adds libelf to all variants
3883f7f
to
b78d624
Compare
Forced push includes:
|
This commit adds makedumpfile to all variants
This commit moves some of the systemd mount units to be part of `preconfigured` instead of the `local-fs`. This is to reduce the overhead of mounting unnecesary mount endpoints during the execution of the crash kernel.
b78d624
to
6642d58
Compare
Forced push includes:
|
6642d58
to
441d77c
Compare
Forced push include fixes in README |
README.md
Outdated
There area few important caveats about the provided kdump support: | ||
|
||
* Currently, only vmware variants have kdump support enabled | ||
* The system kernel will reserve 256M for the crash kernel, only when the host has at least 2GB of memory; the reserved space won't be available for processes running in the host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. I would change 256M to 256MB to be consistent with "at least 2GB of memory".
441d77c
to
5721535
Compare
Forced push includes:
|
README.md
Outdated
Bottlerocket provides support to collect kernel crash dumps whenever the system kernel panics. | ||
Once this happens, both the dmesg log and vmcore dump are stored at `/var/log/kdump`, and the system reboots. | ||
|
||
There area few important caveats about the provided kdump support: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There area few important caveats about the provided kdump support: | |
There are a few important caveats about the provided kdump support: |
Kdump is a Linux feature that allows to boot to a kernel whenever the system panics. The crash kernel is loaded into a reserved space in memory determined by the `crashkernel` kernel parameter. In Bottlerocket, this parameter is set such that no memory will be reserved if the host has less than 2GB of memory. For Bottlerocket, the crash kernel is loaded from the current active boot partition. The `configure-boot-mount.service` systemd unit determines which is the current active boot partition and mounts it at `/boot`. This mount is set as read-only and with private propagations, so new mount namespaces won't have access to it. As part of this change, SELinux labels are added to the boot partition when it is created by the `rpm2img` tool. The `load-crash-kernel.service` systemd unit loads the crash kernel, only if memory was reserved for it, and the `kexec.kexec_load_disable` setting is `0`. The unit will exit gracefully if no memory was reserved for the crash kernel. For the moment only the aws-dev and vmware variants use that kernel parameter. The `kexec.kexec_load_disable` setting used to be set in the `sysctl.conf` configuration file. With this change, the setting is set using the `disable-kexec-load.service` systemd unit. This unit runs after `load-crash-kernel.service`, even if the latter wasn't executed or excited with a non-zero code. The `capture-kernel-dump.service` systemd unit is set as the target when the crash kernel is executed. It captures both the dmesg logs and the kdump-compressed dump excluding: * Pages filled with zero * Non-private cache pages * All cache pages * User process data pages * Free pages All the files generated by the `capture-kernel-dump.service` unit are stored at `/var/log/kdump`, therefore the unit has a strong dependency on the following services to set up the persistent partition: * local-fs.target * systemd-sysusers.service * systemd-udevd.service * systemd-udev-trigger.service * systemd-tmpfiles-setup.service * systemd-tmpfiles-setup-dev.service Since `local-fs.target` is a dependency of `capture-kernel-dump.service`, systemd will attempt to load all the mount units. To prevent this, the mount units will only be loaded during the execution of the `preconfigured` target. No API is provided to enable/disable the dump collection, since the memory space is reserved and it will be a waste if nothing uses that space. Dynamically changing the `crashkernel` cmd line parameter isn't an option since we will provide support for secure boot in the future.
5721535
to
9e4c2cc
Compare
Forced pushed includes nit fix |
Issue number:
#1413
Description of changes:
Kdump is a Linux feature that allows to boot to a kernel whenever the system panics. The crash kernel is loaded into a reserved space in memory determined by the
crashkernel
kernel parameter. In Bottlerocket, this parameter is set such that no memory will be reserved if the host has less than 2GB of memory.For Bottlerocket, the crash kernel is loaded from the current active boot partition. The
configure-boot-mount.service
systemd unit determines which is the current active boot partition and mounts it at/boot
. This mount is set as read-only and with private propagations, so new mount namespaces won't have access to it. As part of this change, SELinux labels are added to the boot partition when it is created by therpm2img
tool.The
load-crash-kernel.service
systemd unit loads the crash kernel, only if memory was reserved for it, and thekexec.kexec_load_disable
setting is0
. The unit will exit gracefully if no memory was reserved for the crash kernel. For the moment only the aws-dev and vmware variants use that kernel parameter.The
kexec.kexec_load_disable
setting used to be set in thesysctl.conf
configuration file. With this change, the setting is set using thedisable-kexec-load.service
systemd unit. This unit runs afterload-crash-kernel.service
, even if the latter wasn't executed or excited with a non-zero code.The
capture-kernel-dump.service
systemd unit is set as the target when the crash kernel is executed. It captures both the dmesg logs and the kdump-compressed dump excluding:* Pages filled with zero
* Non-private cache pages
* All cache pages
* User process data pages
* Free pages
All the files generated by the
capture-kernel-dump.service
unit are stored at/var/log/kdump
, therefore the unit has a strong dependency on the following services to setup the persistent partition:Since
local-fs.target
is a dependency ofcapture-kernel-dump.service
, systemd will attempt to load all the mount units. To prevent this, the mount units will only be loaded during the execution of thepreconfigured
target.No API is provided to enable/disable the dump collection, since the memory space is reserved and it will be a waste if nothing uses that space. Dynamically changing the
crashkernel
cmd line parameter isn't an option since we will provide support for secure boot in the future.Testing done:
aws-dev x86_64/aarch64, vmware-dev/vmware-k8s-1.20 x86_64:
systemctl status
didn't show failed unitsecho c > /proc/sysrq-trigger
, and verified that the dumps/logs were generatedk8s variant x86_64:
256MB
of memory are enough to collect the kernel dumps, regardless of the size of the host and the operation workload it has. I used a custom build for the aws-k8s-1.20 variant for this test and am5.8xlarge
EC2 instance with 110 pods running that only load random data and keep it there forever.aws-ecs-1, aws-k8s-1.19 x86_64:
systemctl status
didn't show failed unitskexec.service
wasn't executed:Custom aws-k8s-1.19 x86_64 build with
crashkernel
, to validate the 5.4 kernel behavior:systemctl status
didn't show failed unitsecho c > /proc/sysrq-trigger
, and verified that the dumps/logs were generatedTerms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.