Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kdump.crash test failing on aarch64 with kernel 6.2+ #1430

Closed
dustymabe opened this issue Feb 23, 2023 · 12 comments · Fixed by coreos/fedora-coreos-config#2732
Closed

kdump.crash test failing on aarch64 with kernel 6.2+ #1430

dustymabe opened this issue Feb 23, 2023 · 12 comments · Fixed by coreos/fedora-coreos-config#2732

Comments

@dustymabe
Copy link
Member

dustymabe commented Feb 23, 2023

The kdump.service is failing in the ext.config.kdump.crash test with:

=== RUN   ext.config.kdump.crash
2023-02-23T14:22:45Z platform: some systemd units failed: [kdump.service]
systemctl status kola-runext.service:
�� kola-runext.service
     Loaded: loaded (/etc/systemd/system/kola-runext.service; static)
     Active: failed (Result: exit-code) since Thu 2023-02-23 14:22:48 UTC; 1s ago
   Duration: 39ms
    Process: 5655 ExecStart=/usr/local/bin/kola-runext-test.sh (code=exited, status=1/FAILURE)
   Main PID: 5655 (code=exited, status=1/FAILURE)
        CPU: 17ms

Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + . /var/opt/kola/extdata/commonlib.sh
Feb 23 14:22:48 qemu0 systemd[1]: Started kola-runext.service.
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: ++ cmdline=($(< /proc/cmdline))
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + case "${AUTOPKGTEST_REBOOT_MARK:-}" in
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + is_service_active kdump.service
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + local service=kdump.service
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + for x in {0..20}
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5656]: ++ systemctl is-active kdump.service
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + '[' failed '!=' activating ']'
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + break
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + systemctl is-active kdump.service
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5657]: failed
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + fatal 'kdump.service failed to start'
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + echo 'kdump.service failed to start'
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: kdump.service failed to start
Feb 23 14:22:48 qemu0 kola-runext-test.sh[5655]: + exit 1
Feb 23 14:22:48 qemu0 systemd[1]: kola-runext.service: Main process exited, code=exited, status=1/FAILURE
Feb 23 14:22:48 qemu0 systemd[1]: kola-runext.service: Failed with result 'exit-code'.
--- FAIL: ext.config.kdump.crash (85.49s)

This test sets the crashkernel=512M kernel parameter.

The failure in the logs looks like:

Feb 23 19:45:50 cosa-devsh dracut[1267]: *** Hardlinking files done ***
Feb 23 19:45:50 cosa-devsh dracut[1267]: *** Store current command line parameters ***
Feb 23 19:45:50 cosa-devsh dracut[1267]: Stored kernel commandline:
Feb 23 19:45:50 cosa-devsh dracut[1267]: No dracut internal kernel commandline stored in the initramfs
Feb 23 19:45:50 cosa-devsh dracut[1267]: *** Install squash loader ***
Feb 23 19:45:51 cosa-devsh dracut[1267]: *** Squashing the files inside the initramfs ***
Feb 23 19:45:52 cosa-devsh dracut[1267]: *** Squashing the files inside the initramfs done ***
Feb 23 19:45:52 cosa-devsh dracut[1267]: *** Creating image file '/var/lib/kdump/initramfs-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39.aarch64kdump.img' ***
Feb 23 19:45:52 cosa-devsh dracut[1267]: *** Creating initramfs image file '/var/lib/kdump/initramfs-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39.aarch64kdump.img' done ***
Feb 23 19:45:52 cosa-devsh kdumpctl[871]: kdump: kexec: failed to load kdump kernel
Feb 23 19:45:52 cosa-devsh kdumpctl[871]: kdump: Starting kdump: [FAILED]
Feb 23 19:45:52 cosa-devsh systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
Feb 23 19:45:52 cosa-devsh systemd[1]: kdump.service: Failed with result 'exit-code'.
Feb 23 19:45:52 cosa-devsh systemd[1]: Failed to start kdump.service - Crash recovery kernel arming.
Feb 23 19:45:52 cosa-devsh systemd[1]: kdump.service: Consumed 24.003s CPU time.

Of the most recent kernels this appears to be the pass/fail summary:

@dustymabe dustymabe changed the title kdump.crash test failing on aarch64 with kernel 6.2 kernel-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39 and later kdump.crash test failing on aarch64 with kernel 6.2+ Feb 23, 2023
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Feb 23, 2023
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Feb 23, 2023
@dustymabe
Copy link
Member Author

Some contents of the /var/log/kdump.log on one of these machines:

+ 2023-02-23 19:54:59 /usr/bin/kdumpctl@665: /sbin/kexec -s -d -p '--command-line=BOOT_IMAGE=(hd0,gpt3)/ostree/fedora-coreos-d1d363af210b0f07906533c7ff8998c775c8408d1df6898b7d5c5e4477823301/vmlinuz-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39.aarch64 mitigations=auto,nosmt ignition.platform.id=qemu rw rootflags=prjquota boot=UUID=8166125f-f289-4275-b441-c2886518f04c ostree=/ostree/boot.1/fedora-coreos/d1d363af210b0f07906533c7ff8998c775c8408d1df6898b7d5c5e4477823301/0 nr_cpus=1 reset_devices cgroup_disable=memory udev.children-max=2 panic=10 swiotlb=noforce novmcoredd cma=0 hugetlb_cma=0' --initrd=/var/lib/kdump/initramfs-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39.aarch64kdump.img /boot/ostree/fedora-coreos-d1d363af210b0f07906533c7ff8998c775c8408d1df6898b7d5c5e4477823301/vmlinuz-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39.aarch64
kernel symbol _text vaddr = ffffd8935f9a0000
kernel symbol _stext vaddr = ffffd8935f9b0000
kernel symbol __init_begin vaddr = ffffd893619b0000
arch_process_options:178: command_line: BOOT_IMAGE=(hd0,gpt3)/ostree/fedora-coreos-d1d363af210b0f07906533c7ff8998c775c8408d1df6898b7d5c5e4477823301/vmlinuz-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39.aarch64 mitigations=auto,nosmt ignition.platform.id=qemu rw rootflags=prjquota boot=UUID=8166125f-f289-4275-b441-c2886518f04c ostree=/ostree/boot.1/fedora-coreos/d1d363af210b0f07906533c7ff8998c775c8408d1df6898b7d5c5e4477823301/0 nr_cpus=1 reset_devices cgroup_disable=memory udev.children-max=2 panic=10 swiotlb=noforce novmcoredd cma=0 hugetlb_cma=0
arch_process_options:180: initrd: /var/lib/kdump/initramfs-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39.aarch64kdump.img
arch_process_options:182: dtb: (null)
arch_process_options:185: console: (null)
Try gzip decompression.
Try LZMA decompression.
elf_arm64_probe: Not an ELF executable.
image_arm64_probe: Bad arm64 image header.
Cannot open `MZ': (null)
zImage_arm64_probe: Not an zImage file (Image.gz).
Cannot determine the file type of /boot/ostree/fedora-coreos-d1d363af210b0f07906533c7ff8998c775c8408d1df6898b7d5c5e4477823301/vmlinuz-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39.aarch64
+ 2023-02-23 19:54:59 /usr/bin/kdumpctl@669: ret=255

I asked around internally at RH and was pointed an existing issue to follow: https://gitlab.com/redhat/centos-stream/tests/kernel/kernel-tests/-/issues/1580

@dustymabe
Copy link
Member Author

This is still a problem. I asked again internally to see if there is any progress on resolution.

For now I'll bump the snooze.

@dustymabe
Copy link
Member Author

Note the 6.2 kernel will probably ship to our production streams before this issue is fixed. I don't think it is a blocker but will discuss at the community meeting today to see if we have any other opinions.

@dustymabe dustymabe added the meeting topics for meetings label Mar 22, 2023
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Mar 22, 2023
@dustymabe
Copy link
Member Author

We discussed this in the community meeting today.

13:07:17* dustymabe | #agreed The 6.2 kernel will introduce a regression for kdump on
                    | aarch64. While bad, we don't think this should blocking shipping 
                    | kernel 6.2.

@dustymabe dustymabe removed the meeting topics for meetings label Mar 22, 2023
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Mar 23, 2023
The 6.2 kernel is now heading towards our production streams so
the failure will propagate there.

See coreos/fedora-coreos-tracker#1430
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Mar 23, 2023
The 6.2 kernel is now heading towards our production streams so
the failure will propagate there.

See coreos/fedora-coreos-tracker#1430
marmijo added a commit to marmijo/fedora-coreos-config that referenced this issue Apr 12, 2023
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this issue Apr 12, 2023
coreosbot-releng pushed a commit to coreosbot-releng/os that referenced this issue Apr 13, 2023
Dusty Mabe (2):
      tests/kola: upgrade.extended: get info about booted deployment
      manifests/fedora-coreos-base: surgically remove qcom dtb files on aarch64

Jonathan Lebon (9):
      40ignition-ostree: run ignition-ostree-growfs before sysroot mount
      40ignition-ostree: factor out zram-related functions
      40ignition-ostree: skip udev hack if Ignition did not reprovision rootfs
      40ignition-ostree: add autosave-xfs transposefs unit
      tests/kola: add non-exclusive check for growfs
      tests/kola: move LUKS checks to shared file
      tests/kola: add autosave-xfs tests
      tests/kola: bump `minDisk` in autosave-xfs tests
      40ignition-ostree: give filesystem type when mounting zram-based XFS

Michael Armijo (2):
      denylist: snooze autosave-xfs and luks.autosave-xfs for aarch64 and ppc64le These tests are failing and blocking fcos pipeline multi-arch builds. See: coreos/fedora-coreos-tracker#1458
      denylist: bump snooze for ext.config.kdump.crash on aarch64 This is still causing issues. See: coreos/fedora-coreos-tracker#1430

Renata Ravanelli (4):
      overlay.d: create new 30gcp-udev-rules overlay
      overlay.d: Add/Update udev rules for GCP
      Add 30gcp-udev-rules overlay to the manifest
      overlay.d: Add 30gcp-udev-rules dracut module
coreosbot-releng pushed a commit to coreosbot-releng/os that referenced this issue Apr 14, 2023
Dusty Mabe (2):
      tests/kola: upgrade.extended: get info about booted deployment
      manifests/fedora-coreos-base: surgically remove qcom dtb files on aarch64

Jonathan Lebon (9):
      40ignition-ostree: run ignition-ostree-growfs before sysroot mount
      40ignition-ostree: factor out zram-related functions
      40ignition-ostree: skip udev hack if Ignition did not reprovision rootfs
      40ignition-ostree: add autosave-xfs transposefs unit
      tests/kola: add non-exclusive check for growfs
      tests/kola: move LUKS checks to shared file
      tests/kola: add autosave-xfs tests
      tests/kola: bump `minDisk` in autosave-xfs tests
      40ignition-ostree: give filesystem type when mounting zram-based XFS

Michael Armijo (2):
      denylist: snooze autosave-xfs and luks.autosave-xfs for aarch64 and ppc64le These tests are failing and blocking fcos pipeline multi-arch builds. See: coreos/fedora-coreos-tracker#1458
      denylist: bump snooze for ext.config.kdump.crash on aarch64 This is still causing issues. See: coreos/fedora-coreos-tracker#1430

Renata Ravanelli (4):
      overlay.d: create new 30gcp-udev-rules overlay
      overlay.d: Add/Update udev rules for GCP
      Add 30gcp-udev-rules overlay to the manifest
      overlay.d: Add 30gcp-udev-rules dracut module
c4rt0 pushed a commit to c4rt0/fedora-coreos-config that referenced this issue May 17, 2023
@dustymabe
Copy link
Member Author

This is still an issue. I've asked the kdump developers to see if there is an existing specific upstream tracker for this.

@dustymabe
Copy link
Member Author

Upstream patch set to try to address this problem: https://www.spinics.net/lists/kexec/msg31553.html

Hopefully landing upstream (and then in Fedora) soon.

marmijo added a commit to coreos/fedora-coreos-config that referenced this issue Aug 2, 2023
@dustymabe
Copy link
Member Author

AFAICT this hasn't landed yet. Watching https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/log/?h=main

@dustymabe
Copy link
Member Author

This landed in kexec-tools upstream with the 5 commits ending with https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/commit/?h=main&id=f67c4146d7b52bfc95f0d21353f2112c6ab3570d

@dustymabe dustymabe added the status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. label Aug 11, 2023
gursewak1997 added a commit to gursewak1997/fedora-coreos-config that referenced this issue Aug 16, 2023
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this issue Aug 16, 2023
@dustymabe
Copy link
Member Author

I believe this will be in the kexec-tools 2.0.27 release so we should be able to test this when that lands in a RPM in Fedora.

@dustymabe
Copy link
Member Author

The kexec-tools 2.0.27 landed in rawhide but not anywhere else yet. However the test still isn't passing because of an SELinux issue: #1560

mogeko added a commit to mogeko/nas-config that referenced this issue Sep 26, 2023
kexec doesn't work yet on 6.2+ kernels
See: coreos/fedora-coreos-tracker#1430
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
The 6.2 kernel is now heading towards our production streams so
the failure will propagate there.

See coreos/fedora-coreos-tracker#1430
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
The 6.2 kernel is now heading towards our production streams so
the failure will propagate there.

See coreos/fedora-coreos-tracker#1430
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
@dustymabe
Copy link
Member Author

The kexec-tools 2.0.27 landed in rawhide but not anywhere else yet. However the test still isn't passing because of an SELinux issue: #1560

kexec-tools 2.0.27 is now in F39 (next/next-devel) and the SELinux has been fixed. The test is passing on aarch64 in f39 and rawhide.

We can close this out once the other production streams have moved to F39 with kexec-tools 2.0.27.

@dustymabe
Copy link
Member Author

testing-devel has been moved over to F39 and all will follow from that.

PR to drop the denylist entry in coreos/fedora-coreos-config#2732

@dustymabe dustymabe removed the status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. label Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant