Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop ppc64le PXE GRUB workaround #3370

Closed
aaradhak opened this issue Feb 22, 2023 · 17 comments
Closed

Drop ppc64le PXE GRUB workaround #3370

aaradhak opened this issue Feb 22, 2023 · 17 comments

Comments

@aaradhak
Copy link
Member

aaradhak commented Feb 22, 2023

Bug Report

The pipeline failures of ppc64le in build-684 and build-1183 show that pxe-install and pxe-install-4k tests have failed.

The pxe-install & pxe-install 4k kola tests failed due to the GRUB2 regression entering cosa.
This issue is found to occur after the GRUB2 version change from grub2-2.06-75.fc37 to grub2-2.06-84.fc37.

pxe-install failure:
[2023-02-22T19:30:50.920Z] FAIL: pxe-install ( + metal) (10m0.005s)
[2023-02-22T19:30:50.920Z]     timed out after 10m0s
[2023-02-22T19:30:50.920Z] Error: timed out after 10m0s
[2023-02-22T19:30:50.920Z] 2023-02-22T19:31:01Z cli: timed out after 10m0s
[2023-02-22T19:30:50.920Z] error: failed to execute cmd-kola: exit status 1
script returned exit code 1


pxe-install 4k failure:
[2023-02-22T19:30:50.920Z] FAIL: pxe-install ( + metal4k) (10m0.002s)
[2023-02-22T19:30:50.920Z]     timed out after 10m0s
[2023-02-22T19:30:50.920Z] Error: timed out after 10m0s
[2023-02-22T19:30:50.920Z] 2023-02-22T19:31:01Z cli: timed out after 10m0s
[2023-02-22T19:30:50.920Z] error: failed to execute cmd-kola: exit status 1
script returned exit code 1

Log:
console.txt

The relevant error message from the console looks like:

  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /pci@800000020000000/scsi@3 ... 
E3404: Not a bootable device!
Trying to load:  from: /pci@800000020000000/ethernet@2 ... 
 Initializing NIC
  Reading MAC address from device: 52:54:00:12:34:56
  Requesting information via DHCP: done
  Using IPv4 address: 192.168.76.9
  Requesting file "/boot/grub2/powerpc-ieee1275/core.elf" via TFTP from 192.168.76.2
  Receiving data:  318 KBytes
  TFTP: Received /boot/grub2/powerpc-ieee1275/core.elf (318 KBytes)
  Successfully loaded
Welcome to GRUB!

error: ../../grub-core/script/function.c:119:can't find command `source'.

@dustymabe
Copy link
Member

Let's freeze grub in COSA for now: #3371

@dustymabe
Copy link
Member

@jlebon
Copy link
Member

jlebon commented Nov 14, 2023

Note we're planning to drop the GRUB pin in #3653. We should verify if this is still an issue in f39 (likely, since the RHBZ is still open), and if so, then just denylist the PXE tests on ppc64le for now. (Denylist is a bit funny, since it's about the binaries in cosa, not in the host, but it's much better at keeping track of outstanding issues than having it commented out in testiso.go.)

@dustymabe
Copy link
Member

We can just do a cosa build from that branch and then use it as the COREOS_ASSEMBLER_IMAGE for a bump-lockfile job to see if there would be any fallout from it.

@jlebon
Copy link
Member

jlebon commented Nov 17, 2023

This is indeed still an issue. Denylisted in coreos/fedora-coreos-config#2733 and openshift/os#1397.

@jlebon
Copy link
Member

jlebon commented Nov 23, 2023

We added a workaround in #3661 which now allows us to re-enable the tests in openshift/os#1399 and coreos/fedora-coreos-config#2742.

Let's keep this bug open to drop the workaround once GRUB is properly fixed.

@jlebon jlebon changed the title pxe install & pxe install 4k tests are failing due to the new GRUB version Drop ppc64le PXE GRUB workaround Nov 23, 2023
@jlebon
Copy link
Member

jlebon commented Jan 19, 2024

https://bodhi.fedoraproject.org/updates/FEDORA-2024-53d986312e is marked as fixing https://bugzilla.redhat.com/show_bug.cgi?id=2173015. Once that hits stable and enters cosa, we should be able to drop the workaround and verify it still works.

@aaradhak
Copy link
Member Author

@dustymabe I have been getting the below error on trying to run pxe-install test. Created a [4.16-9.4][ppc64le] debug pod. I removed the workaround code in cosa and created a custom cosa in ppc64le, pushed it to quay and then started another debug pod with the custom cosa. Removed the pxe-install from kola denylist but still seeing the below error.

[coreos-assembler]$ kola testiso -S pxe-install --output-dir tmp/kola-metal
Error: harness: no tests to run
2024-02-23T03:19:48Z cli: harness: no tests to run

@dustymabe
Copy link
Member

try just running all the tests and the pxe tests should get run too:

kola testiso -S

@aaradhak
Copy link
Member Author

I had tried that too. There was an error message that the "build rhcos is missing live artifacts"

[coreos-assembler]$ kola testiso -S 
__  Snoozing kola test pattern "iso-live-login.uefi-secure" until Feb 26 2024
  __ https://github.com/openshift/os/issues/1237
__  Snoozing kola test pattern "iso-as-disk.uefi-secure" until Feb 26 2024
  __ https://github.com/openshift/os/issues/1237
__  Snoozing kola test pattern "basic" until Feb 26 2024
  __ https://github.com/openshift/os/issues/1237
__  Snoozing kola test pattern "ext.config.rpm-ostree.replace-rt-kernel" until Feb 26 2024
  __ https://github.com/openshift/os/issues/1383
__  Snoozing kola test pattern "ext.config.shared.content-origins" until Feb 26 2024
  __ https://github.com/openshift/os/issues/1387#issuecomment-1769313807
__  Snoozing kola test pattern "ext.config.version.rhel-matches-rhcos-build" until Feb 26 2024
  __ https://github.com/openshift/os/issues/1387#issuecomment-1769313807
__ Snooze for kola test pattern "coreos.ignition.failure" expired on Feb 05 2024
__   Will warn on failure for kola test pattern "coreos.ignition.failure":
  __ https://github.com/coreos/coreos-assembler/issues/3670
__   Skipping kola test pattern "iso-offline-install-iscsi.bios":
  __ https://github.com/coreos/fedora-coreos-tracker/issues/1638
Ignoring verification of signature on metal image
Error: build rhcos is missing live artifacts
2024-02-23T04:16:44Z cli: build rhcos is missing live artifacts

@dustymabe
Copy link
Member

yes. You have to have something to test :)

cosa fetch && cosa build && cosa buildextend-metal && cosa buildextend-metal4k && cosa buildextend-live && cosa kola testiso -S

@aaradhak
Copy link
Member Author

The cosa build & cosa buildextend commands were executed. The cosa kola testiso throws the error "build rhcos is missing live artifacts" after that.

@jlebon
Copy link
Member

jlebon commented Feb 23, 2024

What does cosa list say?

@aaradhak
Copy link
Member Author

aaradhak commented Feb 27, 2024

Now I got it why, I think its because of the naming convention change. Earlier when the issue was reported, the test was known as pxe-install and now it is pxe-online-install test. I was trying to run pxe-install test.

@aaradhak
Copy link
Member Author

Tested the pxe-* tests by removing the workaround, the tests PASS:

[coreos-assembler]$ kola testiso -S pxe-*
__  Snoozing kola test pattern "iso-live-login.uefi-secure" until Mar 18 2024
  __ https://github.com/openshift/os/issues/1237
__  Snoozing kola test pattern "iso-as-disk.uefi-secure" until Mar 18 2024
  __ https://github.com/openshift/os/issues/1237
__  Snoozing kola test pattern "ext.config.rpm-ostree.replace-rt-kernel" until Mar 18 2024
  __ https://github.com/openshift/os/issues/1383
__  Snoozing kola test pattern "ext.config.shared.content-origins" until Mar 18 2024
  __ https://github.com/openshift/os/issues/1387#issuecomment-1769313807
__  Snoozing kola test pattern "ext.config.version.rhel-matches-rhcos-build" until Mar 18 2024
  __ https://github.com/openshift/os/issues/1387#issuecomment-1769313807
__ Snooze for kola test pattern "coreos.ignition.failure" expired on Feb 05 2024
__   Will warn on failure for kola test pattern "coreos.ignition.failure":
  __ https://github.com/coreos/coreos-assembler/issues/3670
__   Skipping kola test pattern "iso-offline-install-iscsi.bios":
  __ https://github.com/coreos/fedora-coreos-tracker/issues/1638
Ignoring verification of signature on metal image
Running test: pxe-online-install.ppcfw
PASS: pxe-online-install.ppcfw (1m55.557s)
Running test: pxe-offline-install.4k.ppcfw
PASS: pxe-offline-install.4k.ppcfw (2m43.648s)
[coreos-assembler]$ 
[coreos-assembler]$ 
[coreos-assembler]$ 
[coreos-assembler]$ cosa list
416.94.202402271611-0
   Timestamp: 2024-02-27T16:20:27Z (1:30:29 ago)
   Artifacts: ostree oci-manifest qemu metal metal4k live-iso live-kernel live-initramfs live-rootfs
      Config: release-4.16 (1bba52e7ebe3) (dirty)

416.94.202402262104-0
   Timestamp: 2024-02-26T21:11:08Z (20:39:48 ago)
   Artifacts: ostree oci-manifest qemu metal metal4k live-iso live-kernel live-initramfs live-rootfs
      Config: release-4.16 (1bba52e7ebe3)

@jlebon
Copy link
Member

jlebon commented Feb 27, 2024

@aaradhak Great! Want to open a PR to cosa that reverts #3661?

aaradhak added a commit to aaradhak/coreos-assembler that referenced this issue Feb 27, 2024
grub2-2.06-116.fc39 update is now available and it fixes the issue.
The pxe-online-install.ppcfw & pxe-offline-install.4k.ppcfw tests
pass without the workaround.

Ref: coreos#3370
aaradhak added a commit that referenced this issue Feb 29, 2024
grub2-2.06-116.fc39 update is now available and it fixes the issue.
The pxe-online-install.ppcfw & pxe-offline-install.4k.ppcfw tests
pass without the workaround.

Ref: #3370
@jlebon
Copy link
Member

jlebon commented Feb 29, 2024

Done in #3745.

@jlebon jlebon closed this as completed Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants