Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure the console by default only on particular platforms #2400

Merged
merged 4 commits into from
Jun 3, 2022
Merged

Configure the console by default only on particular platforms #2400

merged 4 commits into from
Jun 3, 2022

Conversation

bgilbert
Copy link
Contributor

@bgilbert bgilbert commented Sep 1, 2021

Stop specifying console= kernel arguments by default, except on specific arch/platform pairs where we know we need them. Do the same with console configuration in grub.cfg. More info in coreos/fedora-coreos-tracker#567.

Read platforms.yaml from the config repo (coreos/fedora-coreos-config#1181) to determine what console settings should be applied for each arch/platform. Convert the table for the current arch to JSON and copy it to /boot/coreos/platforms.json in the image, since coreos-installer needs the same data when overriding the platform ID at install time (coreos/coreos-installer#605). If platforms.yaml is missing, continue applying the previous defaults.

We do this in two places: create_disk.sh configures console settings for metal and qemu images from platforms.yaml. For other platforms, gf-set-platform (previously gf-platformid) reads platforms.json from the image, undoes any qemu-specific settings, and applies settings for the target platform.

cc @coreos/multi-arch

@openshift-ci
Copy link

openshift-ci bot commented Sep 1, 2021

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@bgilbert
Copy link
Contributor Author

bgilbert commented Sep 1, 2021

/test all

1 similar comment
@bgilbert
Copy link
Contributor Author

bgilbert commented Sep 1, 2021

/test all

@cgwalters
Copy link
Member

cgwalters commented Sep 1, 2021

As a result of this change, we need a way to continue getting a serial console in kola run and kola qemuexec

We already have kola qemuexec --kargs which works by using libguestfs to edit a copy of the disk image before booting.

(EDIT: nevermind, I see you covered this)

@cgwalters
Copy link
Member

/test all

That only triggers Prow, not Jenkins. I think CCI was restarted, and lost state. Going to try the close/reopen.

@cgwalters cgwalters closed this Sep 1, 2021
@cgwalters cgwalters reopened this Sep 1, 2021
@dustymabe dustymabe closed this Sep 1, 2021
@dustymabe dustymabe reopened this Sep 1, 2021
@dustymabe
Copy link
Member

webhooks are failing to be delivered.. investigating

cgwalters
cgwalters previously approved these changes Sep 1, 2021
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

src/gf-platformid Outdated Show resolved Hide resolved
@dustymabe dustymabe closed this Sep 1, 2021
@dustymabe dustymabe reopened this Sep 1, 2021
@dustymabe dustymabe closed this Sep 1, 2021
@dustymabe dustymabe reopened this Sep 1, 2021
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - some questions/comments.

src/gf-platformid Outdated Show resolved Hide resolved
src/grub.cfg Outdated Show resolved Hide resolved
@bgilbert
Copy link
Contributor Author

bgilbert commented Sep 2, 2021

The legacy path wasn't properly setting kargs or GRUB commands in the metal or qemu images, since those don't go through gf-platformid. Refactored to fix that, and added a per-arch disable knob to aid migration per coreos/fedora-coreos-config#1181 (comment).

Tested the contents of grub.cfg and kargs, and the existence of platforms.json, in these contexts:

  • No platforms.yaml (qemu, metal, aws)
  • Arch with legacy: true in platforms.yaml (qemu, metal, aws)
  • Arch missing from platforms.yaml (qemu, metal, aws)
  • Platform missing from platforms.yaml (aws)
  • Platform with kargs + GRUB commands in platforms.yaml (aws)
  • Platform with kargs only in platforms.yaml (aws)
  • Platform with GRUB commands only in platforms.yaml (aws)

Ready for review.

@bgilbert bgilbert marked this pull request as ready for review September 2, 2021 03:48
@dustymabe
Copy link
Member

mostly LGTM - will try to test later today

dustymabe
dustymabe previously approved these changes Sep 2, 2021
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - will do more testing when this hits the pipeline

src/create_disk.sh Outdated Show resolved Hide resolved
@cgwalters
Copy link
Member

This is awesome work by the way!

As a result of this change, we need a way to continue getting a serial console in kola run and kola qemuexec

You touched on this but the way I think of this, the problem really parallels the bare metal case of initial provisioning and wanting debug output, particularly for the Ignition run. As well as for the initial ISO boot. Or I guess, any time Ignition runs.

We could embed the location of the BLS config near the partition table and use qemu-img map --output=json plus that custom header to inject kargs directly into the qcow2 file, similar to how we inject kargs into the ISO.

Yeah, that's quite doable.

But a downside of this is that if we e.g. land that code in coreos-assembler, it won't be useful for everyone using e.g. virt-install type flows. Or at least, it'd need to be replicated there.

It's easy to think of qemu platform as not a production thing, but it's really immensely useful for both us and our users to e.g. validate Ignition configs before deploying them on a real platform. Just a few days ago I played "FCOS user" and validated in qemu with cosa run an Ignition config I was creating for a little home server device.

This relates to coreos/fedora-coreos-tracker#235 - basically us extracting the cosa run and all the sugar around that out into a separate project.

@cgwalters
Copy link
Member

One alternative path we could investigate for bare metal is in some cases, making it easy to "console everywhere we can" but then turn it off once the server is out of provisioning phase. I commented on this elsewhere but in OCP we recently had a large customer escalation that was greatly exacerbated by printk performance issues on serial console.

So after this lands it'd be good to document that flow of adding the console kargs, but then removing them once Ignition is done.

@bgilbert bgilbert marked this pull request as draft September 3, 2021 17:36
dustymabe added a commit to dustymabe/fedora-coreos-pipeline that referenced this pull request Sep 5, 2021
This is a workaround to get console=ttyS0,115200n8 into the
aarch64 AWS image. It does so by applying the following patch
to gf-platformid:

```diff
diff --git a/usr/lib/coreos-assembler/gf-platformid b/usr/lib/coreos-assembler/gf-platformid
index 2912b322c..36d089651 100755
--- a/usr/lib/coreos-assembler/gf-platformid
+++ b/usr/lib/coreos-assembler/gf-platformid
@@ -46,7 +46,11 @@ blscfg_path=$(coreos_gf glob-expand /boot/loader/entries/ostree-*.conf)
 coreos_gf download "${blscfg_path}" "${tmpd}"/bls.conf
 # Remove any platformid currently there
 sed -i -e 's, ignition.platform.id=[a-zA-Z0-9]*,,g' "${tmpd}"/bls.conf
-sed -i -e 's,^\(options .*\),\1 ignition.platform.id='"${platformid}"',' "${tmpd}"/bls.conf
+if [ "${platformid}" == 'aws' ]; then
+    sed -i -e 's|^\(options .*\)|\1 ignition.platform.id='"${platformid}"' console=ttyS0,115200n8|' "${tmpd}"/bls.conf
+else
+    sed -i -e 's,^\(options .*\),\1 ignition.platform.id='"${platformid}"',' "${tmpd}"/bls.conf
+fi
 coreos_gf upload "${tmpd}"/bls.conf "${blscfg_path}"

 if [ "$basearch" = "s390x" ] ; then
```

Once coreos/fedora-coreos-config#1181 and
coreos/coreos-assembler#2400 land then we
won't need this any longer.

This implements a fix for coreos/fedora-coreos-tracker#920
Fixes: c9036fa ("Serialize `grub-script` literally into image.json")
Soon it will both set the platform ID and configure the console.
dustymabe
dustymabe previously approved these changes May 28, 2022
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dustymabe added a commit to dustymabe/fedora-coreos-pipeline that referenced this pull request May 31, 2022
This is an update to the serial console hack added in xyz.
This needed updating because of [1] that will update and
rename gf-platformid to gf-set-platform. Eventually once
platforms.yaml exists everywhere we will finally be able
to drop this patch.

[1] coreos/coreos-assembler#2400

The decoded version of this patch looks like:

```
diff --git a/src/gf-set-platform b/src/gf-set-platform
index 3b1c5ae31..df5e0f9d7 100755
--- a/src/gf-set-platform
+++ b/src/gf-set-platform
@@ -59,7 +59,13 @@ blscfg_path=$(coreos_gf glob-expand /boot/loader/entries/ostree-*.conf)
 coreos_gf download "${blscfg_path}" "${tmpd}"/bls.conf
 # Remove any platformid currently there
 sed -i -e 's, ignition.platform.id=[a-zA-Z0-9]*,,g' "${tmpd}"/bls.conf
-sed -i -e '/^options / s,$, ignition.platform.id='"${platformid}"',' "${tmpd}"/bls.conf
+if [ "$(coreos_gf exists /boot/coreos/platforms.json)" != "true" -a "${platformid}" == 'aws' ]; then
+    # Our platform is AWS and we still need the console=ttyS0 hack for the legacy
+    # (no platforms.yaml) path.
+    sed -i -e 's|^\(options .*\)|\1 ignition.platform.id='"${platformid}"' console=ttyS0,115200n8|' "${tmpd}"/bls.conf
+else
+    sed -i -e '/^options / s,$, ignition.platform.id='"${platformid}"',' "${tmpd}"/bls.conf
+fi
 if [ -n "$remove_kargs" ]; then
     # Remove existing qemu-specific kargs
     sed -i -e '/^options / s@ '"${remove_kargs}"'@@' "${tmpd}"/bls.conf

```
dustymabe added a commit to dustymabe/fedora-coreos-pipeline that referenced this pull request May 31, 2022
This is an update to the serial console hack added in ddd9da9.
This needed updating because of [1] that will update and
rename gf-platformid to gf-set-platform. Eventually once
platforms.yaml exists everywhere we will finally be able
to drop this patch.

[1] coreos/coreos-assembler#2400

The decoded version of this patch looks like:

```diff
diff --git a/src/gf-set-platform b/src/gf-set-platform
index 3b1c5ae31..df5e0f9d7 100755
--- a/src/gf-set-platform
+++ b/src/gf-set-platform
@@ -59,7 +59,13 @@ blscfg_path=$(coreos_gf glob-expand /boot/loader/entries/ostree-*.conf)
 coreos_gf download "${blscfg_path}" "${tmpd}"/bls.conf
 # Remove any platformid currently there
 sed -i -e 's, ignition.platform.id=[a-zA-Z0-9]*,,g' "${tmpd}"/bls.conf
-sed -i -e '/^options / s,$, ignition.platform.id='"${platformid}"',' "${tmpd}"/bls.conf
+if [ "$(coreos_gf exists /boot/coreos/platforms.json)" != "true" -a "${platformid}" == 'aws' ]; then
+    # Our platform is AWS and we still need the console=ttyS0 hack for the legacy
+    # (no platforms.yaml) path.
+    sed -i -e 's|^\(options .*\)|\1 ignition.platform.id='"${platformid}"' console=ttyS0,115200n8|' "${tmpd}"/bls.conf
+else
+    sed -i -e '/^options / s,$, ignition.platform.id='"${platformid}"',' "${tmpd}"/bls.conf
+fi
 if [ -n "$remove_kargs" ]; then
     # Remove existing qemu-specific kargs
     sed -i -e '/^options / s@ '"${remove_kargs}"'@@' "${tmpd}"/bls.conf

```
src/create_disk.sh Outdated Show resolved Hide resolved
src/create_disk.sh Show resolved Hide resolved
src/cmd-buildextend-metal Show resolved Hide resolved
src/create_disk.sh Outdated Show resolved Hide resolved
src/create_disk.sh Show resolved Hide resolved
@cgwalters
Copy link
Member

A lot going on here, and RHCOS stuff is in bugfix mode for OCP 4.11 freeze...I think we should cut a rhcos-4.11 branch here now (potentially before #2811 too) and start using it now to de-risk landing this a bit more?

@cgwalters
Copy link
Member

OK actually, no reason to block on 4.11 branching, I just pushed https://github.com/coreos/coreos-assembler/tree/rhcos-4.11

@cgwalters
Copy link
Member

openshift/release#29057

mantle/platforms.md also has a bunch, but that's more than I want to
unwind right now.
Stop specifying console= kernel arguments by default, except on specific
arch/platform pairs where we know we need them.  Do the same with console
configuration in grub.cfg.  More info is in
coreos/fedora-coreos-tracker#567.

Read platforms.yaml from the config repo to determine what console
settings should be applied for each arch/platform.  Convert the table for
the current arch to JSON and copy it to /boot/coreos/platforms.json in
the image, since coreos-installer needs the same data when overriding the
platform ID at install time.  If platforms.yaml is missing, continue to
apply the previous defaults.

We do this in two places: create_disk.sh configures console settings for
metal and qemu images from platforms.yaml.  For other platforms,
gf-set-platform reads platforms.json from the image, undoes any
qemu-specific settings, and applies settings for the target platform.
@bgilbert
Copy link
Contributor Author

bgilbert commented Jun 1, 2022

Updated!

src/create_disk.sh Show resolved Hide resolved
src/create_disk.sh Show resolved Hide resolved
@dustymabe
Copy link
Member

I'm good with this. If we're close to being final I can update my patch if needed and test coreos/fedora-coreos-pipeline#533

@bgilbert
Copy link
Contributor Author

bgilbert commented Jun 2, 2022

@dustymabe, yup, I think we're good to go!

@dustymabe
Copy link
Member

Local testing testing of coreos/fedora-coreos-pipeline#533 went well. I think this is now unblocked.

@newkit
Copy link

newkit commented Jun 28, 2022

Hi, is this planned to be part of an RHCOS release?

@bgilbert
Copy link
Contributor Author

@newkit Yes, that's the plan.

@newkit
Copy link

newkit commented Jun 29, 2022

@bgilbert Do you have any timeline when that’s going to happen? Would it help if I‘d create a BZ?

@bgilbert
Copy link
Contributor Author

There's no public commitment, but I'd expect it'd land soon after the functionality lands in FCOS. Note that this PR is only part of it; the overall FCOS tracking bug is coreos/fedora-coreos-tracker#567.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants