Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rawhide]: In ppc64le, kdump fails to generate crash dump file after kernel crash #1523

Closed
gursewak1997 opened this issue Jul 5, 2023 · 6 comments · Fixed by coreos/fedora-coreos-config#2531
Labels

Comments

@gursewak1997
Copy link
Member

gursewak1997 commented Jul 5, 2023

Describe the bug

kdump.crash test fails to generate a vmcore file after the crash is triggered in ppc64le in latest rawhide builds.

Jul 04 14:28:32 qemu0 kola-runext-test.sh[5215]: + echo 'Triggering sysrq'
Jul 04 14:28:32 qemu0 kola-runext-test.sh[5215]: Triggering sysrq
Jul 04 14:28:32 qemu0 kola-runext-test.sh[5215]: + sync
-- Boot 2f52d8f955e94b93ae18537b4a89f764 --
Jul 04 14:29:13 qemu0 kola-runext-test.sh[4612]: + . /var/opt/kola/extdata/commonlib.sh
Jul 04 14:29:13 qemu0 kola-runext-test.sh[4612]: ++ cmdline=($(< /proc/cmdline))
Jul 04 14:29:13 qemu0 kola-runext-test.sh[4612]: + case "${AUTOPKGTEST_REBOOT_MARK:-}" in
Jul 04 14:29:13 qemu0 kola-runext-test.sh[4631]: ++ find /var/crash -type f -name vmcore
Jul 04 14:29:13 qemu0 kola-runext-test.sh[4612]: + kcore=
Jul 04 14:29:13 qemu0 kola-runext-test.sh[4612]: + test -z ''
Jul 04 14:29:13 qemu0 kola-runext-test.sh[4612]: + fatal 'No kcore found in /var/crash'
Jul 04 14:29:13 qemu0 kola-runext-test.sh[4612]: + echo 'No kcore found in /var/crash'
Jul 04 14:29:13 qemu0 kola-runext-test.sh[4612]: No kcore found in /var/crash

Rawhide failed build
console.txt
Possible package transition that caused it:
kernel 6.5.0-0.rc0.20230630gite55e5df193d2.5.fc39 -> 6.5.0-0.rc0.20230703gita901a3568fd2.8.fc39

Denylisting the test on ppc64le for now because we still need to keep running tests using the latest kernel.

Expected behavior

There should be vmcore file under /var/crash/

Actual behavior

No kcore found in /var/crash
From console.log

[    6.385256] systemd[1]: Finished dracut-pre-pivot.service - dracut pre-pivot and cleanup hook.
[    6.407941] systemd[1]: Starting kdump-capture.service - Kdump Vmcore Save Service...
[    6.477560] kdump.sh[422]: kdump: saving to /sysroot/ostree/deploy/fedora-coreos/var/crash/127.0.0.1-2023-07-04-14:50:03/
[    6.521397] kdump.sh[422]: kdump: saving vmcore-dmesg.txt to /sysroot/ostree/deploy/fedora-coreos/var/crash/127.0.0.1-2023-07-04-14:50:03/
[    6.523297] kdump.sh[467]: Cannot open /proc/vmcore: No such file or directory
[    6.525103] kdump.sh[422]: kdump: saving vmcore-dmesg.txt failed
[    6.525527] kdump.sh[422]: kdump: saving vmcore
[    6.542845] kdump.sh[469]: open_dump_memory: Can't open the dump memory(/proc/vmcore). No such file or directory
[    6.545123] kdump.sh[469]: makedumpfile Failed.
[    6.545670] kdump.sh[422]: kdump: saving vmcore failed, exitcode:1
[    6.546082] kdump.sh[422]: kdump: saving vmcore failed
[    6.560905] kdump.sh[422]: kdump: saving the /run/initramfs/kexec-dmesg.log to /sysroot/ostree/deploy/fedora-coreos/var/crash/127.0.0.1-2023-07-04-14:50:03///
[    6.566762] systemd[1]: kdump-capture.service: Main process exited, code=exited, status=1/FAILURE
[    6.567049] systemd[1]: kdump-capture.service: Failed with result 'exit-code'.
[    6.567293] systemd[1]: Failed to start kdump-capture.service - Kdump Vmcore Save Service.

System details

[rawhide][ppc64le] ⚡ 39.20230704.91.0

@gursewak1997
Copy link
Member Author

marmijo added a commit to marmijo/fedora-coreos-config that referenced this issue Jul 24, 2023
aaradhak pushed a commit to coreos/fedora-coreos-config that referenced this issue Jul 24, 2023
@dustymabe
Copy link
Member

The problematic commit was reverted in 106ea7ffd which is in v6.5-rc3. kernel-6.5.0-0.rc3.20230727git0a8db05b571a.26.fc39 is in rawhide now so we should be able to unpin this.

gursewak1997 added a commit to gursewak1997/fedora-coreos-config that referenced this issue Jul 29, 2023
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this issue Jul 30, 2023
coreosbot-releng pushed a commit to coreosbot-releng/os that referenced this issue Jul 31, 2023
Dusty Mabe (1):
      tests/kola: ignore newlines in default network behavior change test

Michael Armijo (2):
      overrides: pin kernel-6.3.12-200.fc38
      Revert "denylist: snooze root-reprovision on testing-devel"

gursewak1997 (1):
      denylist: drop kdump.crash for ppc64le rawhide Closes: coreos/fedora-coreos-tracker#1523
coreosbot-releng pushed a commit to coreosbot-releng/os that referenced this issue Aug 1, 2023
Dusty Mabe (1):
      tests/kola: ignore newlines in default network behavior change test

Luke Yang (1):
      tests/kola: add space after ! in YAML architecture field

Michael Armijo (2):
      overrides: pin kernel-6.3.12-200.fc38
      Revert "denylist: snooze root-reprovision on testing-devel"

gursewak1997 (1):
      denylist: drop kdump.crash for ppc64le rawhide Closes: coreos/fedora-coreos-tracker#1523
coreosbot-releng pushed a commit to coreosbot-releng/os that referenced this issue Aug 2, 2023
Dusty Mabe (1):
      tests/kola: ignore newlines in default network behavior change test

Joseph Marrero (1):
      ignition-ostree: make sure we don't mount /sysroot before transposefs

Luke Yang (1):
      tests/kola: add space after ! in YAML architecture field

Michael Armijo (2):
      overrides: pin kernel-6.3.12-200.fc38
      Revert "denylist: snooze root-reprovision on testing-devel"

gursewak1997 (1):
      denylist: drop kdump.crash for ppc64le rawhide Closes: coreos/fedora-coreos-tracker#1523
coreosbot-releng pushed a commit to coreosbot-releng/os that referenced this issue Aug 2, 2023
Dusty Mabe (1):
      tests/kola: ignore newlines in default network behavior change test

Joseph Marrero (2):
      ignition-ostree: make sure we don't mount /sysroot before transposefs
      ignition-ostree: remove not needed Before= checks

Luke Yang (1):
      tests/kola: add space after ! in YAML architecture field

Michael Armijo (3):
      overrides: pin kernel-6.3.12-200.fc38
      Revert "denylist: snooze root-reprovision on testing-devel"
      denylist: bump snooze for ext.config.kdump.crash on aarch64

gursewak1997 (1):
      denylist: drop kdump.crash for ppc64le rawhide Closes: coreos/fedora-coreos-tracker#1523
coreosbot-releng pushed a commit to coreosbot-releng/os that referenced this issue Aug 3, 2023
Dusty Mabe (1):
      tests/kola: ignore newlines in default network behavior change test

Joseph Marrero (2):
      ignition-ostree: make sure we don't mount /sysroot before transposefs
      ignition-ostree: remove not needed Before= checks

Luke Yang (1):
      tests/kola: add space after ! in YAML architecture field

Michael Armijo (3):
      overrides: pin kernel-6.3.12-200.fc38
      Revert "denylist: snooze root-reprovision on testing-devel"
      denylist: bump snooze for ext.config.kdump.crash on aarch64

gursewak1997 (1):
      denylist: drop kdump.crash for ppc64le rawhide Closes: coreos/fedora-coreos-tracker#1523
@gursewak1997
Copy link
Member Author

We are seeing this issue again. We weren't testing the kdump.crash in FCOS-Rawhide due to other Selinux-policy related issue so didn't catch this early. Apparently, the transition of kernel version that seems to have caused this is kernel-6.6.0-0.rc0.20230829git1c59d383390f.59.fc40 -> kernel-doc-6.6.0-0.rc0.20230830git6c1b980a7e79.1.fc40.

In short, the kdump.crash test in Rawhide:
Passes with kernel-6.6.0-0.rc0.20230829git1c59d383390f.59.fc40
Fails with kernel-doc-6.6.0-0.rc0.20230830git6c1b980a7e79.1.fc40

Updated the BZ as well: https://bugzilla.redhat.com/show_bug.cgi?id=2222526#c10

@dustymabe
Copy link
Member

@gursewak1997 I think this is probably a new regression. Can you open a new bug with the new information?

@sharkcz
Copy link

sharkcz commented Oct 3, 2023

rc0 snapshots mean a high rate of changes during the merge window, so breakages are possible. I would say in rc2 the outcomes of the merge window should be mostly resolved. How does a more recent kernel build behaves (rc4 has been released)?

@dustymabe
Copy link
Member

@gursewak1997 I think this is probably a new regression. Can you open a new bug with the new information?

He opened #1588

HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants