Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola: add IBM CEX device test for the s390x build #3828

Merged
merged 1 commit into from
Sep 16, 2024

Conversation

madhu-pillai
Copy link
Contributor

@madhu-pillai madhu-pillai commented Jul 3, 2024

This kola test is crucial for verifying the security of CEX
hardware-based LUKS encryption on root volume. It guarantees that the
encrypted device employs protected keys to encrypt and decrypt the
volume.

This is essentially testing the enablement done in
coreos/ignition#1820.

To run this, it needs to be on a system with a CEX device with
passthrough enabled and the device's UUID exposed via KOLA_CEX_UUID. See
also coreos/fedora-coreos-pipeline#1010.

Co-authored-by: Jonathan Lebon jonathan@jlebon.com

@@ -58,6 +58,7 @@ import (
var (
// ErrInitramfsEmergency is the marker error returned upon node blocking in emergency mode in initramfs.
ErrInitramfsEmergency = errors.New("entered emergency.target in initramfs")
cex_uuid = "68cd2d83-3eef-4e45-b22c-534f90b16cb9"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this UUID from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UUID is generated from my test coreos KVM lpar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UUID is generated from my test coreos KVM lpar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same UUID we use in the fedora-coreos-pipeline - butane file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Let's not encode the UUID into kola though. It should be passed out-of-band instead. One way is using pipeline environment vars. It's not great, but at least it's colocated with where the UUID is defined in that repo.

Another approach is to have the builder provisioning script drop a file... somewhere... that kola remote-session create then bind-mounts into the remote container. But yuck, it'd have to be s390x specific or we'd have to write that file on all arches, even if they're empty there. That's similar to what we do for secex. I've never been completely satisfied with how that worked out. Maybe for now, let's just use the env var approach and revisit this together with the secex trick in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jlebon ,.

I've added AddCexDevice in platform.qemu.go as below. then called the function in mantle.cmd.qemuexec. How does it read the env parameter?

// supports IBM Cex based LUKS encryption if it is s390x host (zKVM/LPAR)
func (builder *QemuBuilder) AddCexDevice(cex_uuid string) error {
	builder.Append("-device", fmt.Sprintf("vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/%s", cex_uuid))
	return nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use os.Getenv().
The function can error out if the env var is not defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jlebon ,

Just to clarify my doubts. Why cannot we hard code the UUID in the AddCexDevice function. If i understand correctly whether we pass through env or any other method still the qemuexec is about to use same mediated device for the vm creation and no overhead of env and file creation. And the butane create-cex-device.sh scripts always runs to make sure the UUID is present if not it re-configures the mediated device in the builder lpar returns UUID before qemuexec use the UUID to creates the VM.

mantle/kola/tests/ignition/luks.go Outdated Show resolved Hide resolved
mantle/kola/tests/ignition/luks.go Outdated Show resolved Hide resolved
mantle/cmd/kola/options.go Outdated Show resolved Hide resolved
Comment on lines 224 to 247
if coreosarch.CurrentRpmArch() == "s390x" {
rootPart = "/dev/disk/by-id/virtio-primary-disk-part4"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to drop this. I think this is an old workaround for partlabel issues on s390x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will correct it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we modifying this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove it.


// LUKSSanityCEXTest verifies that the rootfs is encrypted with Cex based LUKS
func LUKSSanityCEXTest(c cluster.TestCluster, m platform.Machine, cex bool, rootPart string) {
if cex {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will correct it.


// supports IBM Cex based LUKS encryption if it is s390x host (zKVM/LPAR)
func (builder *QemuBuilder) AddCexDevice() error {
if builder.architecture != "s390x" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to guard on this. It should only be called on s390x.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will correct it

Distros: []string{"rhcos"},
Platforms: []string{"qemu"},
ExcludeArchitectures: []string{"aarch64", "ppc64le", "x86_64"},
Tags: []string{"luks", "cex", kola.NeedsInternetTag, "reprovision"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test needs Internet access, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Internet access is not required.

mantle/kola/tests/ignition/luks.go Show resolved Hide resolved
@madhu-pillai
Copy link
Contributor Author

Hi @jlebon ,

I've added the changes as suggested. Kindly review.

mantle/kola/tests/ignition/luks.go Show resolved Hide resolved
mantle/platform/qemu.go Outdated Show resolved Hide resolved
c.Fatalf("Failed to reboot the machine: %v", err)
}
luksDump = c.MustSSH(m, "sudo cryptsetup luksDump "+rootPart)
mustMatch(c, "Cipher: paes-*", luksDump)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well test here also that ignition-ostree-growfs was successful. It'd probably work to just call TestRHCOSGrowfs().

And oh wow, I just realized that coreos/fedora-coreos-config#2986 is still not merged. Let's get that in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

mantle/kola/tests/ignition/luks.go Show resolved Hide resolved
@madhu-pillai madhu-pillai force-pushed the test_kola_cex branch 2 times, most recently from edad2cd to a012744 Compare July 15, 2024 05:06
@madhu-pillai
Copy link
Contributor Author

Hi @jlebon ,
I've added in luks.go.runCexTest that if KOLA_CEX_UUID is empty then do c.Skip(). We keep hold on the AddCexDevice() to catch if the env is nil if it runs kola qemuexe --qemu-cex.
Kindly review.

Is there any method i can test the coreos-assember manually in my zKVM machine without CI pipeline?

jlebon added a commit to jlebon/fedora-coreos-pipeline that referenced this pull request Jul 15, 2024
Add a new systemd unit that sets up a CEX device for use by kola tests.
The device is associated with a UUID. This UUID is passed to kola via
an environment variable for now.

See also: coreos/coreos-assembler#3828

Co-authored-by: Jonathan Lebon <jonathan@jlebon.com>
jlebon added a commit to coreos/fedora-coreos-pipeline that referenced this pull request Jul 15, 2024
Add a new systemd unit that sets up a CEX device for use by kola tests.
The device is associated with a UUID. This UUID is passed to kola via
an environment variable for now.

See also: coreos/coreos-assembler#3828

Co-authored-by: Jonathan Lebon <jonathan@jlebon.com>
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any method i can test the coreos-assember manually in my zKVM machine without CI pipeline?

On a machine with a CEX device, in a cosa container, you should be able to run KOLA_CEX_UUID=... kola run luks.cex to test this.

Name: `luks.cex`,
Description: "Verify that CEX-based rootfs encryption works.",
Flags: []register.Flag{},
Distros: []string{"rhcos"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should in theory work on FCOS, it's just that we don't have a CEX device on the FCOS builder currently, right? So I think it'd be more accurate to not limit to RHCOS here, and just let the test be skipped on FCOS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it something like
ExcludeDistros: []string{"fcos"} , instead for Distros: []string{"rhcos"}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean to just remove Distros entirely. The test should be skipped when running on the FCOS s390x builder (where we don't have a CEX device).

}
]
}
}`, true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: there's no point in making this an Sprintf if the formatting arguments are not actually variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that.

@madhu-pillai
Copy link
Contributor Author

Is there any method i can test the coreos-assember manually in my zKVM machine without CI pipeline?

On a machine with a CEX device, in a cosa container, you should be able to run KOLA_CEX_UUID=... kola run luks.cex to test this.

Hi @jlebon ,
I've similar setup like builder lpar .
What i set here is

  1. set the ignition.version: v3_5_0-experimental in mantle/kola/tests/luks.go
  2. copied the ignition content to coreos-assember/vendor/github/coreos/ignition/v2
  3. podman build -t localhost/coreos-assember .
  4. set the COREOS-ASSEMBER-CONTAINER=/localhost/coreos-assember
  5. Then i did the rhcos build 4.17 on machine where i have redhat vpn access, then i tar and copied to my lpar. Under that i ran the following.

then I ran kola run luks.cex in the coreos-assember after export the KOLA_CEX_UUID but it fails and when i see the console.txt, the createLuks fails due to unavailability of apqns . When i see the debug mode qemu-system-s390x does not have the argument for KOLA_CEX_UUID attached.

[coreos-assembler]$ kola run luks.cex  --debug
2024-07-16T14:48:17Z cli: Started logging at level DEBUG
2024-07-16T14:48:17Z cli: Started logging at level DEBUG
2024-07-16T14:48:17Z kola: Found kola-denylist.yaml. Processing listed denials.
2024-07-16T14:48:17Z kola: Parsed kola-denylist.yaml
2024-07-16T14:48:17Z kola: Denylist: Skipping tests for stream: '', osversion: 'rhel-9.4', arch: 's390x'
2024-07-16T14:48:17Z kola: Processing denial patterns from yaml...
⏭️  Skipping kola test pattern "ext.config.version.rhaos-pkgs-match-openshift":
  👉 https://issues.redhat.com/browse/RHEL-35883
=== RUN   luks.cex
2024-07-16T14:48:18Z platform: Started qemu (230) with args: [qemu-system-s390x -machine s390-ccw-virtio,memory-backend=mem,accel=kvm -cpu host -object memory-backend-memfd,id=mem,size=8192M,share=on -m 8192 -smp 1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0 -uuid a076a2f6-5406-4876-a5a2-02cf3f9251a1 -nographic -nodefaults -boot order=c,strict=on -add-fd fd=3,set=1 -add-fd fd=4,set=2 -device virtio-blk-ccw,drive=disk-1,serial=primary-disk,bootindex=1 -drive if=none,id=disk-1,file=/dev/fdset/1,auto-read-only=off,cache=unsafe -drive if=none,id=ignition,format=raw,file=tmp/kola/qemu-2024-07-16-1448-208/luks.cex/a076a2f6-5406-4876-a5a2-02cf3f9251a1/ignition.json,readonly=on -device virtio-blk,serial=ignition,drive=ignition -netdev user,id=eth0,hostfwd=tcp:127.0.0.1:46177-:22,hostname=qemu0,restrict=on -device virtio-net-ccw,netdev=eth0 -chardev socket,id=qemu-qmp,path=/var/tmp/mantle-qemu3710671857/qmp-1721141298361437627.sock,server=on,wait=off -mon chardev=qemu-qmp,mode=control -device virtio-serial -chardev file,id=virtioserial1,path=/dev/fdset/2,append=on -device virtserialport,chardev=virtioserial1,name=com.coreos.ignition.journal -display none -chardev file,id=log,path=tmp/kola/qemu-2024-07-16-1448-208/luks.cex/a076a2f6-5406-4876-a5a2-02cf3f9251a1/console.txt -serial chardev:log]
2024-07-16T14:49:04Z platform: machine a076a2f6-5406-4876-a5a2-02cf3f9251a1 entered emergency.target in initramfs
2024-07-16T14:49:04Z platform: Sleep 1 to allow for more journal messages to get flushed
2024-07-16T14:49:05Z platform: Killing qemu (230)
2024-07-16T14:49:05Z util: RetryUntilTimeout: f() took 46.099066502s
--- FAIL: luks.cex (50.85s)
        luks.go:246: Unable to create test machine: machine a076a2f6-5406-4876-a5a2-02cf3f9251a1 entered emergency.target in initramfs
FAIL, output in tmp/kola/qemu-2024-07-16-1448-208
Error: harness: test suite failed
2024-07-16T14:49:09Z cli: harness: test suite failed

I simply tried kola qemuexec --qemu-cex which fails on permssion issue but it shows the device attached.

[coreos-assembler]$ kola qemuexec --qemu-cex --debug
2024-07-16T14:55:16Z cli: Started logging at level DEBUG
2024-07-16T14:55:16Z cli: Started logging at level DEBUG
2024-07-16T14:55:16Z platform: Started qemu (271) with args: [qemu-system-s390x -machine s390-ccw-virtio,memory-backend=mem,accel=kvm -cpu host -object memory-backend-memfd,id=mem,size=2048M,share=on -m 2048 -smp 1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0 -nographic -nodefaults -boot order=c,strict=on -add-fd fd=3,set=1 -add-fd fd=4,set=2 -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/68cd2d83-3eef-4e45-b22c-534f90b16cb9 -device virtio-blk-ccw,drive=disk-1,serial=primary-disk,bootindex=1 -drive if=none,id=disk-1,file=/dev/fdset/1,auto-read-only=off,cache=unsafe -chardev socket,id=qemu-qmp,path=/var/tmp/mantle-qemu1814783892/qmp-1721141716762582949.sock,server=on,wait=off -mon chardev=qemu-qmp,mode=control -device virtio-serial -chardev file,id=virtioserial1,path=/dev/fdset/2,append=on -device virtserialport,chardev=virtioserial1,name=com.coreos.ignition.journal -serial mon:stdio]
qemu-system-s390x: -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/68cd2d83-3eef-4e45-b22c-534f90b16cb9: vfio 68cd2d83-3eef-4e45-b22c-534f90b16cb9: failed to open /dev/vfio/0: Permission denied
Error: failed to establish qmp connection: dial unix /var/tmp/mantle-qemu1814783892/qmp-1721141716762582949.sock: connect: connection refused
2024-07-16T14:55:45Z cli: failed to establish qmp connection: dial unix /var/tmp/mantle-qemu1814783892/qmp-1721141716762582949.sock: connect: connection refused

@jlebon
Copy link
Member

jlebon commented Jul 16, 2024

When i see the debug mode qemu-system-s390x does not have the argument for KOLA_CEX_UUID attached.

OK, it sounds like the env var isn't getting propagated to kola correctly? I assume you're using the cosa alias. Make sure that you do export KOLA_CEX_UUID=... within a cosa shell and not when outside the container. (Unless you've modified the alias to explicitly pass through the env var.)

I simply tried kola qemuexec --qemu-cex which fails on permssion issue but it shows the device attached.

Hmm, possibly you might have to pass through the device to the cosa container? Does it work if you run the cosa container as root with --privileged?

@madhu-pillai
Copy link
Contributor Author

When i see the debug mode qemu-system-s390x does not have the argument for KOLA_CEX_UUID attached.

OK, it sounds like the env var isn't getting propagated to kola correctly? I assume you're using the cosa alias. Make sure that you do export KOLA_CEX_UUID=... within a cosa shell and not when outside the container. (Unless you've modified the alias to explicitly pass through the env var.)

I did export the KOLA_CEX_UUID in the cosa shell before running the kola run luks.cex, even i tested without setting the env and as expected it skips the luks.cex.

[coreos-assembler]$ unset KOLA_CEX_UUID
[coreos-assembler]$ kola run luks.cex --debug
2024-07-17T06:30:28Z cli: Started logging at level DEBUG
2024-07-17T06:30:28Z cli: Started logging at level DEBUG
2024-07-17T06:30:28Z kola: Found kola-denylist.yaml. Processing listed denials.
2024-07-17T06:30:28Z kola: Parsed kola-denylist.yaml
2024-07-17T06:30:28Z kola: Denylist: Skipping tests for stream: '', osversion: 'rhel-9.4', arch: 's390x'
2024-07-17T06:30:28Z kola: Processing denial patterns from yaml...
⏭️  Skipping kola test pattern "ext.config.version.rhaos-pkgs-match-openshift":
  👉 https://issues.redhat.com/browse/RHEL-35883
=== RUN   luks.cex
--- SKIP: luks.cex (2.00s)
        luks.go:198: No CEX device found in KOLA_CEX_UUID env var
PASS, output in tmp/kola/qemu-2024-07-17-0630-108

I simply tried kola qemuexec --qemu-cex which fails on permssion issue but it shows the device attached.

Hmm, possibly you might have to pass through the device to the cosa container? Does it work if you run the cosa container as root with --privileged?

For the both kola command i set the env. Looks like kola qemuexec --qemu-cex runs the AddCexDevice() function , but somehow kola run luks.cex not calling the AddCexDevice(). I am running it as root and using the cosa alias from the coreos-assembler website. The --privileged is already set in cosa alias.

Technically it works when we use qemu-system-s390 without cosa.

@madhu-pillai
Copy link
Contributor Author

In between i tried with
[coreos-assembler]$ sudo -E kola qemuexec --qemu-cex fixed the permission issue, but that did boot either and went to emergency shell with following error. But I can see the cex card and domain. But kola run luks.cex still not working.

         Starting Ignition (fetch-offline)...
[    2.759528] systemd[1]: Starting Ignition (fetch-offline)...
[    2.773504] ignition[699]: Ignition v2.19.0-34-g494403a2
[    2.773614] ignition[699]: Stage: fetch-offline
[    2.773683] ignition[699]: reading system config file "/usr/lib/ignition/base.d/00-core.ign"
[    2.777796] ignition[699]: no config dir at "/usr/lib/ignition/base.platform.d/qemu"
[    2.777985] ignition[699]: no config URL provided
[    2.778014] ignition[699]: reading system config file "/usr/lib/ignition/user.ign"
[    2.778043] ignition[699]: no config at "/usr/lib/ignition/user.ign"
[    2.778072] ignition[699]: Fetching the Ignition config via the Virtio block driver is currently experimental and subject to change.
[    2.778126] ignition[699]: op(1): [started]  loading Virtio block driver module
[    2.780243] ignition[699]: op(1): [finished] loading Virtio block driver module

@madhu-pillai
Copy link
Contributor Author

Hi @jlebon , Attached the log herewith.
[coreos-assembler]$sudo-E_kola_qemuexec_--qemu-cex.log

@jlebon
Copy link
Member

jlebon commented Jul 17, 2024

Looks like the error is:

[  303.515773] ignition[689]: failed to fetch config: timed out after 5m0s waiting for block device "/dev/disk/by-id/virtio-ignition" to appear
[  303.515869] ignition[689]: failed to acquire config: timed out after 5m0s waiting for block device "/dev/disk/by-id/virtio-ignition" to appear
[  303.515889] ignition[689]: Ignition failed: timed out after 5m0s waiting for block device "/dev/disk/by-id/virtio-ignition" to appear

which means that no Ignition config was attached to the system. If you run it with --debug, what's the full QEMU command-line used?

@madhu-pillai
Copy link
Contributor Author

Looks like the error is:

[  303.515773] ignition[689]: failed to fetch config: timed out after 5m0s waiting for block device "/dev/disk/by-id/virtio-ignition" to appear
[  303.515869] ignition[689]: failed to acquire config: timed out after 5m0s waiting for block device "/dev/disk/by-id/virtio-ignition" to appear
[  303.515889] ignition[689]: Ignition failed: timed out after 5m0s waiting for block device "/dev/disk/by-id/virtio-ignition" to appear

which means that no Ignition config was attached to the system. If you run it with --debug, what's the full QEMU command-line used?

Hi @jlebon ,

If i use ignition in argument sudo -E kola qemuexec --qemu-cex -i cex.ign -m 8192 does works, but without ignition it fails. Is it how it runs with kola qemuexec?

Here is the kola run luks.cex QEMU command line.

platform: Started qemu (393) with args: [qemu-system-s390x -machine s390-ccw-virtio,memory-backend=mem,accel=kvm -cpu host -object memory-backend-memfd,id=mem,size=8192M,share=on -m 8192 -smp 1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0 -nographic -nodefaults -boot order=c,strict=on -add-fd fd=3,set=1 -add-fd fd=4,set=2 -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/68cd2d83-3eef-4e45-b22c-534f90b16cb9 -device virtio-blk-ccw,drive=disk-1,serial=primary-disk,bootindex=1 -drive if=none,id=disk-1,file=/dev/fdset/1,auto-read-only=off,cache=unsafe -drive if=none,id=ignition,format=raw,file=/var/tmp/mantle-qemu3342670463/config.ign,readonly=on -device virtio-blk,serial=ignition,drive=ignition -chardev socket,id=qemu-qmp,path=/var/tmp/mantle-qemu3342670463/qmp-1721285348259533130.sock,server=on,wait=off -mon chardev=qemu-qmp,mode=control -device virtio-serial -chardev file,id=virtioserial1,path=/dev/fdset/2,append=on -device virtserialport,chardev=virtioserial1,name=com.coreos.ignition.journal -serial mon:stdio]

Here the QEMU command line without ignition as argument in cmdline.
platform: Started qemu (550) with args: [qemu-system-s390x -machine s390-ccw-virtio,memory-backend=mem,accel=kvm -cpu host -object memory-backend-memfd,id=mem,size=8192M,share=on -m 8192 -smp 1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0 -nographic -nodefaults -boot order=c,strict=on -add-fd fd=3,set=1 -add-fd fd=4,set=2 -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/68cd2d83-3eef-4e45-b22c-534f90b16cb9 -device virtio-blk-ccw,drive=disk-1,serial=primary-disk,bootindex=1 -drive if=none,id=disk-1,file=/dev/fdset/1,auto-read-only=off,cache=unsafe -chardev socket,id=qemu-qmp,path=/var/tmp/mantle-qemu1964129807/qmp-1721294623277754231.sock,server=on,wait=off -mon chardev=qemu-qmp,mode=control -device virtio-serial -chardev file,id=virtioserial1,path=/dev/fdset/2,append=on -device virtserialport,chardev=virtioserial1,name=com.coreos.ignition.journal -serial mon:stdio]

With ignition, it works fine.

platform: Started qemu (571) with args: [qemu-system-s390x -machine s390-ccw-virtio,memory-backend=mem,accel=kvm -cpu host -object memory-backend-memfd,id=mem,size=8192M,share=on -m 8192 -smp 1 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-ccw,rng=rng0 -nographic -nodefaults -boot order=c,strict=on -add-fd fd=3,set=1 -add-fd fd=4,set=2 -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/68cd2d83-3eef-4e45-b22c-534f90b16cb9 -device virtio-blk-ccw,drive=disk-1,serial=primary-disk,bootindex=1 -drive if=none,id=disk-1,file=/dev/fdset/1,auto-read-only=off,cache=unsafe -drive if=none,id=ignition,format=raw,file=/var/tmp/mantle-qemu2099442059/config.ign,readonly=on -device virtio-blk,serial=ignition,drive=ignition -chardev socket,id=qemu-qmp,path=/var/tmp/mantle-qemu2099442059/qmp-1721296163298592313.sock,server=on,wait=off -mon chardev=qemu-qmp,mode=control -device virtio-serial -chardev file,id=virtioserial1,path=/dev/fdset/2,append=on -device virtserialport,chardev=virtioserial1,name=com.coreos.ignition.journal -serial mon:stdio]

@madhu-pillai
Copy link
Contributor Author

Added to above comments, I had word with zKVM about the permission issue and the reply was to use libvirt instead.

qemu-system-s390x: -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/68cd2d83-3eef-4e45-b22c-534f90b16cb9: vfio 68cd2d83-3eef-4e45-b22c-534f90b16cb9: failed to open /dev/vfio/0: Permission denied

The only way it works as below.

core@m13lp71:~$ sudo su -
root@m13lp71:~/rhcos417# cosa shell

then

[coreos-assembler] $ export KOLA_CEX_UUID="68cd2d83-3eef-4e45-b22c-534f90b16cb9"
[coreos-assembler] $ sudo -E kola run luks.cex
[8:02](https://ibm-systems-z.slack.com/archives/D0519B60B70/p1721313129617629)

I also tried near to the builder level.

core@m13lp71:~/rhcos417$ cosa shell
[coreos-assembler]$ sudo -E kola run luks.cex
⏭️  Skipping kola test pattern "ext.config.version.rhaos-pkgs-match-openshift":
  👉 https://issues.redhat.com/browse/RHEL-35883
=== RUN   luks.cex
qemu-system-s390x: -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/68cd2d83-3eef-4e45-b22c-534f90b16cb9: vfio 68cd2d83-3eef-4e45-b22c-534f90b16cb9: failed to open /dev/vfio/0: Permission denied

@jlebon
Copy link
Member

jlebon commented Jul 18, 2024

If i use ignition in argument sudo -E kola qemuexec --qemu-cex -i cex.ign -m 8192 does works, but without ignition it fails. Is it how it runs with kola qemuexec?

Ahh right, sorry for the confusion. If you do cosa run instead, which is usually what you would use, that implies some switches to kola qemuexec that will always result in an Ignition config getting injected. Breaking boot on no Ignition config on s390x in this case is expected (a lot of backstory on this in coreos/ignition#928).

Added to above comments, I had word with zKVM about the permission issue and the reply was to use libvirt instead.

It would be a lot of non-trivial work to bring in libvirt at this point. We've discussed this in the past before deciding to stick with qemu and unprivileged.

Where is the limitation exactly? I see a mention to /dev/vfio/0 here; would it work to just open up permissions on the device node? Or does this require kernel changes?

@madhu-pillai
Copy link
Contributor Author

If i use ignition in argument sudo -E kola qemuexec --qemu-cex -i cex.ign -m 8192 does works, but without ignition it fails. Is it how it runs with kola qemuexec?

Ahh right, sorry for the confusion. If you do cosa run instead, which is usually what you would use, that implies some switches to kola qemuexec that will always result in an Ignition config getting injected. Breaking boot on no Ignition config on s390x in this case is expected (a lot of backstory on this in coreos/ignition#928).

Added to above comments, I had word with zKVM about the permission issue and the reply was to use libvirt instead.

It would be a lot of non-trivial work to bring in libvirt at this point. We've discussed this in the past before deciding to stick with qemu and unprivileged.

Where is the limitation exactly? I see a mention to /dev/vfio/0 here; would it work to just open up permissions on the device node? Or does this require kernel changes?

Hi @jlebon ,

We can change the permission of /dev/vfio/0 and the test works .Actually i was bit skeptical on suggesting this. Since permission change is not persistent across reboot , so any time after reboot modprobe vfio_ap reload the node and we need to set the permission.

Following the snippet after changing the permission.

core@m13lp71:~/rhcos417$ cosa shell
+ podman run --rm -ti --security-opt=label=disable --privileged --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap=1001:1001:64536 -v=/var/home/core/rhcos417:/srv/ --device=/dev/kvm --device=/dev/fuse --tmpfs=/tmp -v=/var/tmp:/var/tmp --name=cosa localhost/core-coreos-assembler shell

[coreos-assembler]$ export KOLA_CEX_UUID="68cd2d83-3eef-4e45-b22c-534f90b16cb9"

[coreos-assembler]$ kola run luks.cex
⏭️  Skipping kola test pattern "ext.config.version.rhaos-pkgs-match-openshift":
  👉 https://issues.redhat.com/browse/RHEL-35883
=== RUN   luks.cex
--- FAIL: luks.cex (92.99s)
        luks.go:103: Failed to run ignition-ostree-growfs.service: ignition-ostree-growfs.service did not start
FAIL, output in tmp/kola/qemu-2024-07-22-0300-29
Error: harness: test suite failed
2024-07-22T03:02:15Z cli: harness: test suite failed

[coreos-assembler]$ exit
exit
failed to execute cmd-shell: exit status `1`
+ rc=1
+ set +x

core@m13lp71:~/rhcos417$ ls -l /dev/vfio/
0     vfio  
core@m13lp71:~/rhcos417$ ls -l /dev/vfio/0
crw-rw-rw-. 1 root root 243, 0 Jul 22 02:49 /dev/vfio/0

@madhu-pillai
Copy link
Contributor Author

The luks.cex test fails on TestRHCOSGrowFs(), It says the test did not start. when i check the script it fails on the following line. I do not know how to access the container vm that runs this script. I used --ssh-on-test-failure too but it did not work either.

return fmt.Errorf("%s did not start", unit)

@jlebon
Copy link
Member

jlebon commented Jul 22, 2024

We can change the permission of /dev/vfio/0 and the test works .Actually i was bit skeptical on suggesting this. Since permission change is not persistent across reboot , so any time after reboot modprobe vfio_ap reload the node and we need to set the permission.

OK, that's great. We can add a systemd unit in the builder's Butane config to do this on every boot.

The luks.cex test fails on TestRHCOSGrowFs(), It says the test did not start. when i check the script it fails on the following line. I do not know how to access the container vm that runs this script. I used --ssh-on-test-failure too but it did not work either.

return fmt.Errorf("%s did not start", unit)

I assume this is with #3828, right? Can you upload the full journal and console outputs?

@madhu-pillai
Copy link
Contributor Author

Hi @jlebon ,

Yes #3828 . Here is the full log.

Console.txt
journal-raw.txt
journal.txt

luksDump = c.MustSSH(m, "sudo cryptsetup luksDump "+rootPart)
mustMatch(c, "cipher: paes-*", luksDump)

err = coretest.TestRHCOSGrowfs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this before the reboot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried by adding the TestRHCOSGrowfs() before reboot but same issue.

@@ -149,6 +149,12 @@ func (qc *Cluster) NewMachineWithQemuOptions(userdata *conf.UserData, options pl
primaryDisk = *diskp
}

if options.Cex {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, let's make this

Suggested change
if options.Cex {
if qc.flight.opts.Cex || options.Cex {

That way e.g. if someone runs kola run basic --qemu-cex, it'll run the basic tests with the CEX device attached. Other options work that way as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

luksDump = c.MustSSH(m, "sudo cryptsetup luksDump "+rootPart)
mustMatch(c, "cipher: paes-*", luksDump)

err = coretest.TestRHCOSGrowfs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so the actual problem with this is that coretest.TestRHCOSGrowfs() assumes that it's running on the VM itself under test. In the basic tests, it's a native function. Here we're running it from the host so it's actually e.g. calling journalctl on your host system which is of course not what we want.

Will add comments for how we can fix this.

mantle/kola/tests/ignition/luks.go Show resolved Hide resolved
mantle/kola/tests/ignition/luks.go Show resolved Hide resolved
Comment on lines 101 to 104
err = coretest.TestRHCOSGrowfs()
if err != nil {
c.Fatalf("Failed to run ignition-ostree-growfs.service: %v", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@madhu-pillai
Copy link
Contributor Author

Hi @jlebon ,
The suggested code worked. Test passed.

core@m13lp71:~/rhcos417$ cosa kola run luks.cex
+ podman run --rm -ti --security-opt=label=disable --privileged --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap=1001:1001:64536 -v=/var/home/core/rhcos417:/srv/ --device=/dev/kvm --device=/dev/fuse --tmpfs=/tmp -v=/var/tmp:/var/tmp --name=cosa localhost/core-coreos-assembler kola run luks.cex
kola -p qemu run luks.cex --output-dir tmp/kola
⏭️  Skipping kola test pattern "ext.config.version.rhaos-pkgs-match-openshift":
  👉 https://issues.redhat.com/browse/RHEL-35883
=== RUN   luks.cex
=== RUN   luks.cex/RHCOSGrowpart
--- PASS: luks.cex (139.74s)
    --- PASS: luks.cex/RHCOSGrowpart (0.22s)
PASS, output in tmp/kola
+ rc=0
+ set +x

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sane overall but:

  1. We need another commit that bumps the Ignition vendoring to the latest version to bring in the spec with CEX support.
  2. Let's add a commit message please.

#3844 should fix CI.

CI should be fixed by #3844.

mantle/kola/tests/ignition/luks.go Outdated Show resolved Hide resolved
@jlebon
Copy link
Member

jlebon commented Aug 14, 2024

Requires: #3850

@madhu-pillai
Copy link
Contributor Author

/test rhcos

@jlebon
Copy link
Member

jlebon commented Aug 27, 2024

Still needs a commit message.

I think this is still blocked on #3850. I've restarted CI there.

@madhu-pillai
Copy link
Contributor Author

Hi, @jlebon ,
Added commit message.

This kola test is crucial for verifying the security of CEX
hardware-based LUKS encryption on root volume. It guarantees that the
encrypted device employs protected keys to encrypt and decrypt the
volume.

This is essentially testing the enablement done in
coreos/ignition#1820.

To run this, it needs to be on a system with a CEX device with
passthrough enabled and the device's UUID exposed via KOLA_CEX_UUID. See
also coreos/fedora-coreos-pipeline#1010.

Co-authored-by: Jonathan Lebon <jonathan@jlebon.com>
@jlebon jlebon changed the title Add IBM Cex device test for the s390x build kola: add IBM CEX device test for the s390x build Aug 29, 2024
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Linked to the other relevant PRs in the commit message.

Before we merge this, can you confirm that this is passing in FCOS and RHCOS (on their respective s390x builders)?

@madhu-pillai
Copy link
Contributor Author

@jlebon , I've done test on RHCOS. But i'll rerun the test on RHCOS and FCOS again and update you soon.

@madhu-pillai
Copy link
Contributor Author

Hi @jlebon,

Here is the process i followed for the test/

1. clone the coreos-assembler cex commit and rebase with main.
2. hardcoded the CEX UUID instead of getting from os.Getenv KOLA_CEX_UUID.
3. build the image.

Rhcos:
Build the rhcos image with override the ignition, module-setup, ignition-ostree-growfs

Run the test `cosa kola run luks.cex`
[root@m13lp71 rhcos]# cosa kola run luks.cex
+ podman run --rm -ti --security-opt=label=disable --privileged --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap=1001:1001:64536 -v=/root/rhcos:/srv/ --device=/dev/kvm --device=/dev/fuse --tmpfs=/tmp -v=/var/tmp:/var/tmp --name=cosa localhost/core-coreos-assembler kola run luks.cex
kola -p qemu run luks.cex --output-dir tmp/kola
⏭️  Skipping kola test pattern "ext.config.version.rhaos-pkgs-match-openshift":
  👉 https://issues.redhat.com/browse/RHEL-35883
=== RUN   luks.cex
=== RUN   luks.cex/RHCOSGrowpart
--- PASS: luks.cex (153.09s)
--- PASS: luks.cex/RHCOSGrowpart (0.21s)
PASS, output in tmp/kola
+ rc=0
+ set +x
Fcos:
Build the Fcos with same override similar to rhcos, but add the s390utils.base rpm in manifests and uncomment
the perl from exclude package before the build.
Without the above procedure the fcos build fails on the test.

Run the test `cosa kola run luks.cex`
[root@m13lp71 fcos]# cosa kola run luks.cex
+ podman run --rm -ti --security-opt=label=disable --privileged --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap=1001:1001:64536 -v=/root/fcos:/srv/ --device=/dev/kvm --device=/dev/fuse --tmpfs=/tmp -v=/var/tmp:/var/tmp --name=cosa localhost/core-coreos-assembler kola run luks.cex
kola -p qemu run luks.cex --output-dir tmp/kola
⏭️  Skipping kola test pattern "fcos.internet":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
⏭️  Skipping kola test pattern "podman.workflow":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
=== RUN   luks.cex
=== RUN   luks.cex/FCOSGrowpart
--- PASS: luks.cex (197.16s)
--- PASS: luks.cex/FCOSGrowpart (0.52s)
PASS, output in tmp/kola
+ rc=0
+ set +x

@jlebon
Copy link
Member

jlebon commented Sep 4, 2024

Added coreos/fedora-coreos-tracker#1708 for meeting discussion.

@jlebon jlebon added the hold waiting on something label Sep 4, 2024
@jlebon jlebon removed the hold waiting on something label Sep 16, 2024
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

This won't run in pipelines until we actually define KOLA_CEX_UUID in the pipecfg. We will not be able to run this in the FCOS pipeline because the s390x builder we use there is a cloud instance and has no access to a CEX device. Still, it would be beneficial to eventually have better packaging there, but it doesn't need to block this work.

@jlebon jlebon merged commit 41e5c4a into coreos:main Sep 16, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants