Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for running with SEV on GCP #1202

Closed
marc-orr opened this issue May 18, 2022 · 11 comments
Closed

Add support for running with SEV on GCP #1202

marc-orr opened this issue May 18, 2022 · 11 comments

Comments

@marc-orr
Copy link

Describe the bug
Fedora CoreOS running as a guest on GCE with SEV enabled encounters a page table error during boot. After the error, the guest continues running but we cannot SSH into the guest. Specifically, the fedora-coreos-35-20220424-3-0-gcp-x86-64 image with a 5.17-based kernel encounters the issue. Previous images, with a 5.16-based kernel seem to work OK.

Reproduction steps
Steps to reproduce the behavior:

  1. Create copy of official fedora-coreos-cloud image in version fedora-coreos-35-20220424-3-0-gcp-x86-64 (see https://getfedora.org/en/coreos?stream=stable) with added GVNIC,SEV_CAPABLE guest-os-features.

$ gcloud compute images create fedora-coreos-35-20220424-3-0-gcp-x86-64-sev --source-image fedora-coreos-35-20220424-3-0-gcp-x86-64 --source-image-project=fedora-coreos-cloud --guest-os-features=GVNIC,SEV_CAPABLE,VIRTIO_SCSI_MULTIQUEUE,UEFI_COMPATIBLE --project=$PROJECT

  1. Create instance with confidential compute feature enabled.

gcloud compute instances create
coreos-test-5-17-sev --project=$PROJECT
--zone=us-west1-c --machine-type=n2d-standard-2
--network-interface=network-tier=PREMIUM,subnet=default
--maintenance-policy=TERMINATE --provisioning-model=STANDARD
--service-account=$SERVICE_ACCOUNT
--scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append
--create-disk=auto-delete=yes,boot=yes,device-name=coreos-test-5-17-sev,image=projects/$PROJECT/global/images/fedora-coreos-35-20220424-3-0-gcp-x86-64-sev,mode=rw,size=10,type=projects/$PROJECT/zones/us-west1-c/diskTypes/pd-balanced
--no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --reservation-affinity=any --confidential-compute

  1. Wait a short while, then observe serial console output (VM typically encounters page table error and then becomes unresponsive to SSH).

$ gcloud compute --project=$PROJECT instances get-serial-port-output coreos-test-5-17-sev --zone=us-west1-c --port=1

Expected behavior
The VM should boot. The guest console logs should be clean of any kernel warnings/splats/page faults. We should be able to SSH into the VM after it's booted.

Actual behavior
We get a page table error. The details can vary. And then we cannot SSH into the VM. Here is an example:

[ 118.325391] systemd-udevd: Corrupted page table at address 7ffd7981ec28
[ 118.332148] PGD 800110b9f067 P4D 800110b9f067 PUD 800110b9e067 PMD 800110b9d067 PTE 8000800106eca845
[ 118.341428] Bad pagetable: 000d [#1] PREEMPT SMP NOPTI
[ 118.346808] CPU: 0 PID: 1137 Comm: systemd-udevd Not tainted 5.17.4-200.fc35.x86_64 #1
[ 118.354961] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 118.364300] RIP: 0033:0x7f1cca0705b3
[ 118.367996] Code: 04 25 10 00 00 00 be 18 00 00 00 48 8d b8 e0 02 00 00 48 89 b8 d8 02 00 00 48 89 b8 e0 02 00 00 b8 11 01 00 00 0f 05 44 89 c0 0f 1f 40 00 48 8b 15 39 b8 0d 00 f7 d8 41 b8 ff ff f
f ff 64 89
[ 118.387148] RSP: 002b:00007ffd7981ec28 EFLAGS: 00010246
[ 118.392583] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00007f1cca0705b0
[ 118.399856] RDX: 0000000000000000 RSI: 0000000000000018 RDI: 00007f1cc94cae20
[ 118.407129] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000562c76e4dc58
[ 118.407132] R10: 00007f1cc94cae10 R11: 0000000000000246 R12: 0000000000000000
[ 118.407134] R13: 00007ffd7981ede0 R14: 0000000000000000 R15: 00007ffd7981ed60
[ 118 Starting. ^[[0;1;39mLoad 407K1ernel Module co36n]figfs^[[0m... FS: 00007f1cc94cab40 GS: 00
00000000000000
[ 118.407137] Modules linked in: drm ip_tables vfat fat xfs rfkill dm_multipath nvme crct10dif_pclmul gve crc32_pclmul crc32c_intel nvme_core ghash_clmulni_intel serio_raw ipmi_devintf ipmi_msghandler fu
se
[ 118.458861] ---[ end trace 0000000000000000 ]---
[ 118.458864] RIP: 0033:0x7f1cca0705b3
[ 118.467572] RSP: 002b:00007ffd7981ec28 EFLAGS: 00010246
[ 118.473175] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00007f1cca0705b0
[ 118.480438] RDX: 0000000000000000 RSI: 0000000000000018 RDI: 00007f1cc94cae20
[ 118.487719] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000562c76e4dc58
[ 118.495072] R10: 00007f1cc94cae10 R11: 0000000000000246 R12: 0000000000000000
[ 118.502582] R13: 00007ffd7981ede0 R14: 0000000000000000 R15: 00007ffd7981ed60
[ 118.509831] FS: 00007f1cc94cab40(0000) GS:ffff8a4cf7c00000(0000) knlGS:0000000000000000
[^[[0;32m OK ^[[[ 118.518499] CS: 0033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 118.525754] CR2: 00007ffd7981ec28 CR3: 000080010e264000 CR4: 0000000000350ef0

System details

  • GCP
  • fedora-coreos-35-20220424-3-0-gcp-x86-64

Ignition config
I don't think we're using an ignition config. Instead, see the reproducer (repro) details above.

Additional information
Fedora CoreOS is not listed as supporting SEV. However, it has a a fairly new kernel that should allow SEV to work. And we have a customer who is trying to use. They were able to successfully use the previous 5.16-based kernel and only encounter this issue when upgrading to the fedora-coreos-35-20220424-3-0-gcp-x86-64 image with the new 5.17-based kernel.

@bgilbert
Copy link
Contributor

Since we don't mark the GCE images as SEV-capable, I don't think this is a user-facing bug. It'd be nice to fix it and add the feature flag to the official images, though.

@travier travier changed the title CoreOS fails to boot on GCP @ version 35 with SEV enabled Add support for running with SEV on GCP May 23, 2022
@travier
Copy link
Member

travier commented May 23, 2022

@travier
Copy link
Member

travier commented May 23, 2022

You should probably report that for the Fedora kernel package on Bugzilla. Please do link the bug here.

@cverna cverna added the jira for syncing to jira label Nov 24, 2022
@jlebon
Copy link
Member

jlebon commented Nov 28, 2022

coreos/coreos-assembler#3243 will add the SEV_CAPABLE feature.

@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. labels Nov 30, 2022
@dustymabe
Copy link
Member

The fix for this went into next stream release 37.20221211.1.0. Please try out the new release and report issues.

@dustymabe
Copy link
Member

The fix for this went into testing stream release 37.20221211.2.0. Please try out the new release and report issues.

@dustymabe
Copy link
Member

The fix for this went into stable stream release 37.20221127.3.0.

@dustymabe dustymabe removed status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Dec 15, 2022
@deeglaze
Copy link

@dustymabe great to see this applied. I went to create an instance and can see it booting in the serial console but am unable to SSH into the instance in Pantheon. Do y'all have a different way of connecting the Fedora images that's documented publicly that I could check out?

@malt3
Copy link

malt3 commented Feb 15, 2023

I'm not affiliated with RedHat but when I played with CoreOS in the cloud, I just created a butane config file and transpiled it to ignition:
https://coreos.github.io/butane/getting-started/

This document shows how to inject the ignition file and boot a GCP instance:

https://docs.fedoraproject.org/en-US/fedora-coreos/provisioning-gcp/

@bgilbert
Copy link
Contributor

Discussion continues in #1419.

@dustymabe
Copy link
Member

We're adding an automated test for this in:

HuijingHei added a commit to HuijingHei/fedora-coreos-pipeline that referenced this issue Jun 6, 2023
dustymabe pushed a commit to coreos/fedora-coreos-pipeline that referenced this issue Jun 6, 2023
Adam0Brien pushed a commit to Adam0Brien/fedora-coreos-pipeline that referenced this issue Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants