Skip to content
This repository has been archived by the owner on Jun 11, 2024. It is now read-only.

Anaconda installation failed since centos-bootc-dev stream9.20240216.0 image #30

Closed
henrywang opened this issue Feb 20, 2024 · 16 comments
Closed

Comments

@henrywang
Copy link

Since centos-bootc-dev image version stream9.20240216.0, all anaconda installation tests failed. The system will enter into emergency mode after reboot from anaconda installation.
The issue might be related with PR #27 .
Screenshot from 2024-02-20 09-43-32
Screenshot from 2024-02-20 10-58-19

@cgwalters
Copy link
Member

Hmm, I'm not reproducing this in a quick test here. What's the Anaconda version in use?

The real failure BTW is above those lines.

@henrywang
Copy link
Author

The anaconda version is anaconda-34.25.4.6-1.el9.x86_64.rpm.

I had a brief research:

  1. From daily CI result, same test passed on Feb 16 but failed on Feb 17 and later. centos-bootc-dev image on those two days built from same CS9 repo.
  2. There are two changes on Feb 17, one is README.md: Add links to exact images #28 merge, the other is Enable composefs + transient root #27 merge.
  3. centos-bootc image does not have this issue.

So I suspect the composefs + transient change.

@cgwalters
Copy link
Member

If you can get the full journal logs, that'd be really helpful.

For reference, I tested with this kickstart:

text
network --bootproto=dhcp --device=link --activate
# Basic partitioning
clearpart --all --initlabel --disklabel=gpt
reqpart --add-boot
part / --grow --fstype xfs

ostreecontainer --url quay.io/centos-bootc/centos-bootc-dev:stream9 --no-signature-verification

firewall --disabled
services --enabled=sshd

# Only inject a SSH key for root
rootpw --iscrypted locked
# Add your example SSH key here!
sshkey --username root "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOQkQHeKan3X+g1jILw4a3KtcfEIED0kByKGWookU7ev walters@verbum.org"
reboot

Using RHEL-9.4.0-20240226.21-x86_64-boot.iso and things worked. (I ran into rhinstaller/anaconda#5399 (comment) when trying out the c9s ISO...are our other tests not hitting that?)

You linked to an Anaconda RPM, but presumably you're really testing with an Anaconda PXE/ISO setup from the corresponding compose, right?

@cgwalters
Copy link
Member

So I suspect the composefs + transient change.

For reference based on the evidence I'd agree, and it seems quite possible that there's a bug here. I consider resolving this issue a blocker for merging CentOS/centos-bootc#356 so let's get to the bottom of it.

Also, was this an automated test failure? Can you link the job if so? Also if so, let's ensure that we're gathering full information from Anaconda.

@henrywang
Copy link
Author

This is kickstart file the test is using. https://github.com/virt-s1/bootc-workflow-test/blob/main/anaconda.sh#L119

Using RHEL-9.4.0-20240226.21-x86_64-boot.iso and things worked. (I ran into rhinstaller/anaconda#5399 (comment) when trying out the c9s ISO...are our other tests not hitting that?)

RHEL 9.4 has same issue before until the anaconda-34.25.4.5-1.el9 included.

You linked to an Anaconda RPM, but presumably you're really testing with an Anaconda PXE/ISO setup from the corresponding compose, right?

Yes. test fetches the compose ID from image and use that compose in test. https://github.com/virt-s1/bootc-workflow-test/blob/0be2d9517772485ddccf3a4a90af20d19b0c5b87/anaconda.sh#L58

@henrywang
Copy link
Author

Also, was this an automated test failure? Can you link the job if so? Also if so, let's ensure that we're gathering full information from Anaconda.

Yes. it's automation test failure, log is here. Installation does not have issue, but failed at first boot. The vm console log can be found from here.

I'll add kernel arguments to access emergency shell from console and get full journal log.

@cgwalters
Copy link
Member

OK yeah, we need the journal from the target system indeed. Hmm, it'd also help a bit to stash the generated qcow2 as an artifact or so too.

I should be able to reproduce this using https://github.com/virt-s1/bootc-workflow-test/tree/main?tab=readme-ov-file#run-centos-stream-test right?

@henrywang
Copy link
Author

henrywang commented Feb 27, 2024

This command line to run anaconda test.
QUAY_USERNAME=<username> QUAY_PASSWORD=<password> QUAY_SECRET=<secret in auth.json> IMAGE_NAME=centos-bootc-dev TEST_OS=centos-stream-9 CERT_URL=<cert url host name> FIRMWARE=bios PARTITION=standard ./anaconda.sh

I'm re-producing this issue now. I'll post you updated.

@henrywang
Copy link
Author

henrywang commented Feb 27, 2024

@cgwalters I think I found the root cause. All the failed tests are all have ostree-2024.2-2 included. That version doesn't have ostreedev/ostree#3173 included.
The latest CS9 compose CentOS-Stream-9-20240226.d.0 has ostree-2024.4-2 included. And our CI is running against this new ostree. Let's see what happens.
So not just container image needs ostree-2024.4-2, but the anaconda needs ostree-2024.4-2 as well. That explained why the bootc install test does not have this issue after upgrade to composefs + transient root.

@henrywang
Copy link
Author

henrywang commented Feb 27, 2024

More results:

  1. The new CS9 compose CentOS-Stream-9-20240226.d.0 with ostree-2024.4-2 included does not have this issue.
  2. I checked the RHEL-9.4.0-20240226.21-x86_64-boot.iso, it has ostree-2024.4-2 included.

But the upgrade failure appears again error: Bootloader write config: grub2-mkconfig: Child process exited with code 1. This error should be fixed by ostreedev/ostree#3150 already.

@cgwalters
Copy link
Member

So not just container image needs ostree-2024.4-2, but the anaconda needs ostree-2024.4-2 as well.

Right, yes.

But the upgrade failure appears again error: Bootloader write config: grub2-mkconfig: Child process exited with code 1. This error should be fixed by ostreedev/ostree#3150 already.

What are we upgrading from and to in this scenario?

@henrywang
Copy link
Author

What are we upgrading from and to in this scenario?

The upgrade image has one more wget added. https://github.com/virt-s1/bootc-workflow-test/blob/5f539c3bfee1b330e7ff80d552368959a5a93a8e/anaconda.sh#L336

@cgwalters
Copy link
Member

I've reproduced the failure...should have been testing upgrades via anaconda. A core problem here is we aren't shipping ostree-grub2 so ostreedev/ostree@c281da8 isn't relevant.

@cgwalters
Copy link
Member

ostreedev/ostree#3205 should fix this

@cgwalters
Copy link
Member

The latest quay.io/centos-bootc/centos-bootc-dev:stream9 (sha256:cef521bb29d819325d39491eb91752fc35aa3d196437f06f108409ba5d28b83f) has the changes from that PR, let's see if it fixes things in CI?

@henrywang
Copy link
Author

henrywang commented Feb 28, 2024

Verified on ostree-2024.4.13.gf1e663bd and bootc-202402272250.g34dd356387 with centos-bootc-dev sha256:6cfe0b00d70e9adf595ca3d10b6c959a55dd0995948fdea8e594a915381ae154

All anaconda (x86_64) tests passed on CI: virt-s1/bootc-workflow-report#95

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants