Kdump support #1596

arnaldo2792 · 2021-05-22T03:01:56Z

Issue number:
#1413

Description of changes:

3883f7f7 os: Add kdump support
842604d7 kernel-5.4: add patches required for kdump support
c78ec8a2 systemd: move systemd mounts to preconfigured.target
1e0f6859 packages: add makedumpfile
1466a49f packages: add libelf
bcc5c903 packages: add kexec-tools

Kdump is a Linux feature that allows to boot to a kernel whenever the system panics. The crash kernel is loaded into a reserved space in memory determined by the crashkernel kernel parameter. In Bottlerocket, this parameter is set such that no memory will be reserved if the host has less than 2GB of memory.

For Bottlerocket, the crash kernel is loaded from the current active boot partition. The configure-boot-mount.service systemd unit determines which is the current active boot partition and mounts it at /boot. This mount is set as read-only and with private propagations, so new mount namespaces won't have access to it. As part of this change, SELinux labels are added to the boot partition when it is created by the rpm2img tool.

The load-crash-kernel.service systemd unit loads the crash kernel, only if memory was reserved for it, and the kexec.kexec_load_disable setting is 0. The unit will exit gracefully if no memory was reserved for the crash kernel. For the moment only the aws-dev and vmware variants use that kernel parameter.

The kexec.kexec_load_disable setting used to be set in the sysctl.conf configuration file. With this change, the setting is set using the disable-kexec-load.service systemd unit. This unit runs after load-crash-kernel.service, even if the latter wasn't executed or excited with a non-zero code.

The capture-kernel-dump.service systemd unit is set as the target when the crash kernel is executed. It captures both the dmesg logs and the kdump-compressed dump excluding:
* Pages filled with zero
* Non-private cache pages
* All cache pages
* User process data pages
* Free pages

All the files generated by the capture-kernel-dump.service unit are stored at /var/log/kdump, therefore the unit has a strong dependency on the following services to setup the persistent partition:

local-fs.target
systemd-sysusers.service
systemd-udevd.service
systemd-udev-trigger.service
systemd-tmpfiles-setup.service
systemd-tmpfiles-setup-dev.service

Since local-fs.target is a dependency of capture-kernel-dump.service, systemd will attempt to load all the mount units. To prevent this, the mount units will only be loaded during the execution of the preconfigured target.

No API is provided to enable/disable the dump collection, since the memory space is reserved and it will be a waste if nothing uses that space. Dynamically changing the crashkernel cmd line parameter isn't an option since we will provide support for secure boot in the future.

Testing done:
aws-dev x86_64/aarch64, vmware-dev/vmware-k8s-1.20 x86_64:

systemctl status didn't show failed units
Crashed the kernel with echo c > /proc/sysrq-trigger, and verified that the dumps/logs were generated
Verified that existing dump files were deleted
Verified that the correct active boot partition was mounted

k8s variant x86_64:

I did a stress test to verify that 256MB of memory are enough to collect the kernel dumps, regardless of the size of the host and the operation workload it has. I used a custom build for the aws-k8s-1.20 variant for this test and a m5.8xlarge EC2 instance with 110 pods running that only load random data and keep it there forever.
During the same test, max out the number of volumes that can be attached to an instance, since udev was causing OOM errors before I limited the number of child processes that it can have. I verified that the dumps were generated properly during these tests:

[ec2-user@ip-192-168-0-174 log]$ free -h
              total        used        free      shared  buff/cache   available
Mem:           122G        112G        7.2G         18M        2.8G        8.9G
Swap:            0B          0B          0B

-rw-------. 1 root root  59K May 21 00:14 dmesg.dump
-rw-------. 1 root root 790M May 21 00:14 kdump.dump
-rw-r--r--. 1 root root   77 May 21 00:14 prairie-dog.log

aws-ecs-1, aws-k8s-1.19 x86_64:

systemctl status didn't show failed units
Run nginx task/pod
Validated that kexec.service wasn't executed:

● kexec.service - Load crash kernel
     Loaded: loaded (/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/systemd/system/kexec.service; enabled; vendor preset: enabled)
     Active: inactive (dead)
  Condition: start condition failed at Tue 2021-06-01 21:38:07 UTC; 17min ago
             └─ ConditionKernelCommandLine=crashkernel was not met

Jun 01 21:38:04 localhost systemd[1]: Condition check resulted in Load crash kernel being skipped.
Jun 01 21:38:07 ip-192-168-72-37.us-west-2.compute.internal systemd[1]: Condition check resulted in Load crash kernel being skipped.
Jun 01 21:38:07 ip-192-168-72-37.us-west-2.compute.internal systemd[1]: Condition check resulted in Load crash kernel being skipped.

Custom aws-k8s-1.19 x86_64 build with crashkernel, to validate the 5.4 kernel behavior:

systemctl status didn't show failed units
Run nginx pod
Crashed the kernel with echo c > /proc/sysrq-trigger, and verified that the dumps/logs were generated

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

arnaldo2792 · 2021-05-24T19:09:54Z

Update the URL for libelf's spec file

arnaldo2792 · 2021-05-25T16:39:24Z

Remove leftover systemd service

packages/kexec-tools/Cargo.lock

packages/kexec-tools/Cargo.toml

packages/kexec-tools/kexec-tools.spec

sources/prairie-dog/src/main.rs

packages/kexec-tools/kexec-tools.spec

bcressey · 2021-05-26T17:48:53Z

sources/prairie-dog/src/main.rs

+    // Load the panic kernel from `BOOT_MOUNT_PATH`, letting kexec decide which syscall
+    // it should it use (KEXEC_LOAD and KEXEC_FILE_LOAD).


Can we force it to always use KEXEC_FILE_LOAD?

Yes we can, but for some reason the 5.4 kernel fails to use the kexec_file_load syscall, and it returns an ENOSUP error. The problem is not in the kexec library but rather in the kernel, so I need to debug the syscall to check what's returning the error. I'm already working with the kernel folks on this.

In the 5.10 kernel, kexec always uses kexec_file_load, so I think it is OK to use -a for now and let kexec decide what syscall to use, since we are only enabling kdump in variants that use the 5.10 kernel.

Wouldn't we expect to enable this in other variants relatively soon? I imagine people building their own images/variants might want to enable it as well. Wouldn't want a nasty surprise.

Even if we enable this for other variants, and people build their own images/variants, they will still be locked with the kexec_load syscall for the aarch64 5.4 kernel variants since the syscall for this aarch64 was implemented after the 5.4 version.

sources/prairie-dog/src/main.rs

arnaldo2792 · 2021-06-01T23:32:24Z

Changes in force push:

Update kexec-tools, makedumpfile, and libelf to latest release
Rename prairie-dog to prairiedog
Remove conditionals in os.spec, and build all the added packages regardless of the variant
Add the disable-kexec-load.service systemd unit to always disable kexec* syscalls regardless of the state of kexec.service
Only execute kexec.service if there is memory reserved for the crash kernel
Update cmdline parameters for crash kernel, and use the same parameters as Amazon Linux 2
Add irqpoll to the cmdline for the crash kernel, depending on the target architecture
Move unnecessary mounts from local-fs.target to preconfigured.target

arnaldo2792 · 2021-06-02T21:35:34Z

Force push updates:

Gracefully exit while loading the crash kernel when no memory was reserved for the crash kernel

packages/libelf/libelf.spec

packages/makedumpfile/0000-fix-makefile-to-allow-cross-compilation.patch

sources/prairiedog/src/main.rs

packages/os/kdump-tmpfiles.conf

packages/os/configure-boot-mount.service

sources/prairiedog/src/main.rs

bcressey · 2021-06-09T16:54:00Z

sources/prairiedog/src/main.rs

+    // Load the panic kernel from `BOOT_MOUNT_PATH`, letting kexec decide which syscall
+    // it should it use (KEXEC_LOAD and KEXEC_FILE_LOAD). We will use this setting until
+    // we figure out why kexec doesn't recognize the 5.4 kernel image as a valid file
+    // to be used with KEXEC_FILE_LOAD.


If we're planning to merge this prior to a fix, let's open an issue to track it.

I got to port the kernel patch to make the kexec_file_load syscall work in the x86_64 5.4 kernel. However, I'm still waiting for the kernel folks to help me out with porting the syscall for aarch64, since the syscall was introduced for aarch64 on a higher kernel version.

I talked to @tjkirch since he expressed his concerns of using different syscalls depending on the kernel version/architecture. We agreed that we can proceed with my PR provided that we document in an GH issue that we will force the kexec_file_load syscall once the kernel folks add support for it for the 5.4 kernel.

I think we should document the restrictions in the same place we document the feature, not (only) in a GH issue.

sources/prairiedog/src/main.rs

arnaldo2792 · 2021-06-09T23:21:50Z

Forced push includes:

5.4 kernel patch to use kexec_file_load with the 5.4 kernel
makedumpfile patch renamed
Renamed kexec.service, kdump.service and configure-boot-mount.service to load-crash-kernel.service, capture-kernel-dump.service, and prepare-boot.service
Renamed prairiedog command to prepare-boot.service
Moved all systemd services created in this PR to the release package
Label boot partition in rpm2img
Narrow down the libraries include in the libelf-develpackage

This commit adds kexec-tools to all variants

packages/makedumpfile/0000-fix-strip-invocation-for-TARGET-env-variable.patch

packages/os/kdump-tmpfiles.conf

packages/release/release.spec

This commit adds libelf to all variants

arnaldo2792 · 2021-06-10T00:48:19Z

Forced push includes:

Remove unnecessary patch in the 5.4 kernel that caused failures in the aarch64 variants' build
Update commit messages

This commit adds makedumpfile to all variants

This commit moves some of the systemd mount units to be part of `preconfigured` instead of the `local-fs`. This is to reduce the overhead of mounting unnecesary mount endpoints during the execution of the crash kernel.

arnaldo2792 · 2021-06-10T02:31:01Z

Forced push includes:

Added documentation to README
Fixed makedumpfile patch
Addressed suggested nit changes

README.md

arnaldo2792 · 2021-06-10T23:15:50Z

Forced push include fixes in README

jpculp · 2021-06-11T22:15:01Z

README.md

+There area few important caveats about the provided kdump support:
+
+* Currently, only vmware variants have kdump support enabled
+* The system kernel will reserve 256M for the crash kernel, only when the host has at least 2GB of memory; the reserved space won't be available for processes running in the host


Nit. I would change 256M to 256MB to be consistent with "at least 2GB of memory".

arnaldo2792 · 2021-06-14T21:05:50Z

Forced push includes:

Patches to make the kexec_file_load syscall work in arm64
Fixed nit in README
Force the kexec_file_load syscall in prairedog

bcressey · 2021-06-14T21:29:36Z

README.md

+Bottlerocket provides support to collect kernel crash dumps whenever the system kernel panics.
+Once this happens, both the dmesg log and vmcore dump are stored at `/var/log/kdump`, and the system reboots.
+
+There area few important caveats about the provided kdump support:


Suggested change

There area few important caveats about the provided kdump support:

There are a few important caveats about the provided kdump support:

Kdump is a Linux feature that allows to boot to a kernel whenever the system panics. The crash kernel is loaded into a reserved space in memory determined by the `crashkernel` kernel parameter. In Bottlerocket, this parameter is set such that no memory will be reserved if the host has less than 2GB of memory. For Bottlerocket, the crash kernel is loaded from the current active boot partition. The `configure-boot-mount.service` systemd unit determines which is the current active boot partition and mounts it at `/boot`. This mount is set as read-only and with private propagations, so new mount namespaces won't have access to it. As part of this change, SELinux labels are added to the boot partition when it is created by the `rpm2img` tool. The `load-crash-kernel.service` systemd unit loads the crash kernel, only if memory was reserved for it, and the `kexec.kexec_load_disable` setting is `0`. The unit will exit gracefully if no memory was reserved for the crash kernel. For the moment only the aws-dev and vmware variants use that kernel parameter. The `kexec.kexec_load_disable` setting used to be set in the `sysctl.conf` configuration file. With this change, the setting is set using the `disable-kexec-load.service` systemd unit. This unit runs after `load-crash-kernel.service`, even if the latter wasn't executed or excited with a non-zero code. The `capture-kernel-dump.service` systemd unit is set as the target when the crash kernel is executed. It captures both the dmesg logs and the kdump-compressed dump excluding: * Pages filled with zero * Non-private cache pages * All cache pages * User process data pages * Free pages All the files generated by the `capture-kernel-dump.service` unit are stored at `/var/log/kdump`, therefore the unit has a strong dependency on the following services to set up the persistent partition: * local-fs.target * systemd-sysusers.service * systemd-udevd.service * systemd-udev-trigger.service * systemd-tmpfiles-setup.service * systemd-tmpfiles-setup-dev.service Since `local-fs.target` is a dependency of `capture-kernel-dump.service`, systemd will attempt to load all the mount units. To prevent this, the mount units will only be loaded during the execution of the `preconfigured` target. No API is provided to enable/disable the dump collection, since the memory space is reserved and it will be a waste if nothing uses that space. Dynamically changing the `crashkernel` cmd line parameter isn't an option since we will provide support for secure boot in the future.

arnaldo2792 · 2021-06-14T21:51:48Z

Forced pushed includes nit fix

arnaldo2792 requested review from bcressey and tjkirch May 22, 2021 03:02

arnaldo2792 force-pushed the kdump-support branch from 7c27b68 to 84d15d2 Compare May 24, 2021 19:09

arnaldo2792 force-pushed the kdump-support branch from 84d15d2 to 1c2a6ab Compare May 25, 2021 16:38

bcressey reviewed May 26, 2021

View reviewed changes

arnaldo2792 force-pushed the kdump-support branch from 1c2a6ab to cded56a Compare June 1, 2021 23:26

arnaldo2792 force-pushed the kdump-support branch from cded56a to dd02418 Compare June 2, 2021 21:33

arnaldo2792 marked this pull request as ready for review June 2, 2021 23:12

arnaldo2792 requested a review from bcressey June 3, 2021 00:24

arnaldo2792 linked an issue Jun 4, 2021 that may be closed by this pull request

Add kdump support #1413

Closed

bcressey reviewed Jun 9, 2021

View reviewed changes

arnaldo2792 force-pushed the kdump-support branch from dd02418 to 3883f7f Compare June 9, 2021 23:16

arnaldo2792 requested a review from bcressey June 9, 2021 23:38

packages: add kexec-tools

e0662ae

This commit adds kexec-tools to all variants

bcressey approved these changes Jun 10, 2021

View reviewed changes

packages/makedumpfile/0000-fix-strip-invocation-for-TARGET-env-variable.patch Outdated Show resolved Hide resolved

packages/os/kdump-tmpfiles.conf Outdated Show resolved Hide resolved

packages/release/release.spec Show resolved Hide resolved

packages: add libelf

2681050

This commit adds libelf to all variants

bcressey self-requested a review June 10, 2021 00:21

arnaldo2792 force-pushed the kdump-support branch from 3883f7f to b78d624 Compare June 10, 2021 00:42

arnaldo2792 added 2 commits June 10, 2021 01:37

packages: add makedumpfile

072a9d1

This commit adds makedumpfile to all variants

systemd: move systemd mounts to preconfigured.target

322d22e

This commit moves some of the systemd mount units to be part of `preconfigured` instead of the `local-fs`. This is to reduce the overhead of mounting unnecesary mount endpoints during the execution of the crash kernel.

arnaldo2792 force-pushed the kdump-support branch from b78d624 to 6642d58 Compare June 10, 2021 02:29

bcressey reviewed Jun 10, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

arnaldo2792 force-pushed the kdump-support branch from 6642d58 to 441d77c Compare June 10, 2021 23:15

bcressey approved these changes Jun 11, 2021

View reviewed changes

arnaldo2792 requested review from webern, jahkeup and jpculp June 11, 2021 20:53

jpculp reviewed Jun 11, 2021

View reviewed changes

jpculp approved these changes Jun 14, 2021

View reviewed changes

kernel-5.4: add patch required for kdump support

4202f11

arnaldo2792 force-pushed the kdump-support branch from 441d77c to 5721535 Compare June 14, 2021 21:03

arnaldo2792 requested review from bcressey and jpculp June 14, 2021 21:18

bcressey approved these changes Jun 14, 2021

View reviewed changes

arnaldo2792 force-pushed the kdump-support branch from 5721535 to 9e4c2cc Compare June 14, 2021 21:50

jpculp approved these changes Jun 14, 2021

View reviewed changes

arnaldo2792 merged commit 36f3720 into bottlerocket-os:develop Jun 14, 2021

arnaldo2792 deleted the kdump-support branch June 16, 2021 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kdump support #1596

Kdump support #1596

arnaldo2792 commented May 22, 2021 •

edited

Loading

arnaldo2792 commented May 24, 2021

arnaldo2792 commented May 25, 2021

bcressey May 26, 2021

arnaldo2792 Jun 1, 2021

tjkirch Jun 2, 2021

arnaldo2792 Jun 9, 2021

arnaldo2792 commented Jun 1, 2021

arnaldo2792 commented Jun 2, 2021

bcressey Jun 9, 2021

arnaldo2792 Jun 9, 2021

tjkirch Jun 9, 2021

arnaldo2792 commented Jun 9, 2021

arnaldo2792 commented Jun 10, 2021

arnaldo2792 commented Jun 10, 2021

arnaldo2792 commented Jun 10, 2021

jpculp Jun 11, 2021

arnaldo2792 commented Jun 14, 2021

bcressey Jun 14, 2021

arnaldo2792 commented Jun 14, 2021

		// Load the panic kernel from `BOOT_MOUNT_PATH`, letting kexec decide which syscall
		// it should it use (KEXEC_LOAD and KEXEC_FILE_LOAD).

	There area few important caveats about the provided kdump support:
	There are a few important caveats about the provided kdump support:

Kdump support #1596

Kdump support #1596

Conversation

arnaldo2792 commented May 22, 2021 • edited Loading

arnaldo2792 commented May 24, 2021

arnaldo2792 commented May 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnaldo2792 commented Jun 1, 2021

arnaldo2792 commented Jun 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnaldo2792 commented Jun 9, 2021

arnaldo2792 commented Jun 10, 2021

arnaldo2792 commented Jun 10, 2021

arnaldo2792 commented Jun 10, 2021

Choose a reason for hiding this comment

arnaldo2792 commented Jun 14, 2021

Choose a reason for hiding this comment

arnaldo2792 commented Jun 14, 2021

arnaldo2792 commented May 22, 2021 •

edited

Loading