-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kola: add IBM CEX device test for the s390x build #3828
Conversation
mantle/platform/qemu.go
Outdated
@@ -58,6 +58,7 @@ import ( | |||
var ( | |||
// ErrInitramfsEmergency is the marker error returned upon node blocking in emergency mode in initramfs. | |||
ErrInitramfsEmergency = errors.New("entered emergency.target in initramfs") | |||
cex_uuid = "68cd2d83-3eef-4e45-b22c-534f90b16cb9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this UUID from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UUID is generated from my test coreos KVM lpar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UUID is generated from my test coreos KVM lpar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same UUID we use in the fedora-coreos-pipeline - butane file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's not encode the UUID into kola though. It should be passed out-of-band instead. One way is using pipeline environment vars. It's not great, but at least it's colocated with where the UUID is defined in that repo.
Another approach is to have the builder provisioning script drop a file... somewhere... that kola remote-session create
then bind-mounts into the remote container. But yuck, it'd have to be s390x specific or we'd have to write that file on all arches, even if they're empty there. That's similar to what we do for secex. I've never been completely satisfied with how that worked out. Maybe for now, let's just use the env var approach and revisit this together with the secex trick in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do that..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jlebon ,.
I've added AddCexDevice
in platform.qemu.go
as below. then called the function in mantle.cmd.qemuexec
. How does it read the env parameter?
// supports IBM Cex based LUKS encryption if it is s390x host (zKVM/LPAR)
func (builder *QemuBuilder) AddCexDevice(cex_uuid string) error {
builder.Append("-device", fmt.Sprintf("vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/%s", cex_uuid))
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use os.Getenv()
.
The function can error out if the env var is not defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jlebon ,
Just to clarify my doubts. Why cannot we hard code the UUID
in the AddCexDevice
function. If i understand correctly whether we pass through env or any other method still the qemuexec
is about to use same mediated device for the vm creation and no overhead of env and file creation. And the butane create-cex-device.sh
scripts always runs to make sure the UUID
is present if not it re-configures the mediated device in the builder lpar returns UUID
before qemuexec
use the UUID
to creates the VM.
mantle/kola/tests/ignition/luks.go
Outdated
if coreosarch.CurrentRpmArch() == "s390x" { | ||
rootPart = "/dev/disk/by-id/virtio-primary-disk-part4" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be able to drop this. I think this is an old workaround for partlabel issues on s390x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will correct it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we modifying this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove it.
mantle/kola/tests/util/luks.go
Outdated
|
||
// LUKSSanityCEXTest verifies that the rootfs is encrypted with Cex based LUKS | ||
func LUKSSanityCEXTest(c cluster.TestCluster, m platform.Machine, cex bool, rootPart string) { | ||
if cex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will correct it.
mantle/platform/qemu.go
Outdated
|
||
// supports IBM Cex based LUKS encryption if it is s390x host (zKVM/LPAR) | ||
func (builder *QemuBuilder) AddCexDevice() error { | ||
if builder.architecture != "s390x" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to guard on this. It should only be called on s390x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will correct it
mantle/kola/tests/ignition/luks.go
Outdated
Distros: []string{"rhcos"}, | ||
Platforms: []string{"qemu"}, | ||
ExcludeArchitectures: []string{"aarch64", "ppc64le", "x86_64"}, | ||
Tags: []string{"luks", "cex", kola.NeedsInternetTag, "reprovision"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test needs Internet access, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Internet access is not required.
20a4f18
to
2df4b40
Compare
Hi @jlebon , I've added the changes as suggested. Kindly review. |
mantle/kola/tests/util/luks.go
Outdated
c.Fatalf("Failed to reboot the machine: %v", err) | ||
} | ||
luksDump = c.MustSSH(m, "sudo cryptsetup luksDump "+rootPart) | ||
mustMatch(c, "Cipher: paes-*", luksDump) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might as well test here also that ignition-ostree-growfs was successful. It'd probably work to just call TestRHCOSGrowfs()
.
And oh wow, I just realized that coreos/fedora-coreos-config#2986 is still not merged. Let's get that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
edad2cd
to
a012744
Compare
Hi @jlebon , Is there any method i can test the coreos-assember manually in my zKVM machine without CI pipeline? |
Add a new systemd unit that sets up a CEX device for use by kola tests. The device is associated with a UUID. This UUID is passed to kola via an environment variable for now. See also: coreos/coreos-assembler#3828 Co-authored-by: Jonathan Lebon <jonathan@jlebon.com>
Add a new systemd unit that sets up a CEX device for use by kola tests. The device is associated with a UUID. This UUID is passed to kola via an environment variable for now. See also: coreos/coreos-assembler#3828 Co-authored-by: Jonathan Lebon <jonathan@jlebon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any method i can test the coreos-assember manually in my zKVM machine without CI pipeline?
On a machine with a CEX device, in a cosa container, you should be able to run KOLA_CEX_UUID=... kola run luks.cex
to test this.
mantle/kola/tests/ignition/luks.go
Outdated
Name: `luks.cex`, | ||
Description: "Verify that CEX-based rootfs encryption works.", | ||
Flags: []register.Flag{}, | ||
Distros: []string{"rhcos"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should in theory work on FCOS, it's just that we don't have a CEX device on the FCOS builder currently, right? So I think it'd be more accurate to not limit to RHCOS here, and just let the test be skipped on FCOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it something like
ExcludeDistros: []string{"fcos"}
, instead for Distros: []string{"rhcos"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean to just remove Distros
entirely. The test should be skipped when running on the FCOS s390x builder (where we don't have a CEX device).
mantle/kola/tests/ignition/luks.go
Outdated
} | ||
] | ||
} | ||
}`, true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: there's no point in making this an Sprintf
if the formatting arguments are not actually variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do that.
Hi @jlebon ,
then I ran
I simply tried
|
OK, it sounds like the env var isn't getting propagated to
Hmm, possibly you might have to pass through the device to the cosa container? Does it work if you run the cosa container as root with |
I did export the
For the both kola command i set the env. Looks like Technically it works when we use |
In between i tried with
|
Hi @jlebon , Attached the log herewith. |
Looks like the error is:
which means that no Ignition config was attached to the system. If you run it with |
Hi @jlebon , If i use ignition in argument Here is the
Here the QEMU command line without ignition as argument in cmdline. With ignition, it works fine.
|
Added to above comments, I had word with zKVM about the permission issue and the reply was to use libvirt instead.
The only way it works as below.
then
I also tried near to the builder level.
|
Ahh right, sorry for the confusion. If you do
It would be a lot of non-trivial work to bring in libvirt at this point. We've discussed this in the past before deciding to stick with qemu and unprivileged. Where is the limitation exactly? I see a mention to |
Hi @jlebon , We can change the permission of Following the snippet after changing the permission.
|
a012744
to
22fcef9
Compare
The
|
OK, that's great. We can add a systemd unit in the builder's Butane config to do this on every boot.
I assume this is with #3828, right? Can you upload the full journal and console outputs? |
mantle/kola/tests/util/luks.go
Outdated
luksDump = c.MustSSH(m, "sudo cryptsetup luksDump "+rootPart) | ||
mustMatch(c, "cipher: paes-*", luksDump) | ||
|
||
err = coretest.TestRHCOSGrowfs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this before the reboot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried by adding the TestRHCOSGrowfs() before reboot but same issue.
@@ -149,6 +149,12 @@ func (qc *Cluster) NewMachineWithQemuOptions(userdata *conf.UserData, options pl | |||
primaryDisk = *diskp | |||
} | |||
|
|||
if options.Cex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, let's make this
if options.Cex { | |
if qc.flight.opts.Cex || options.Cex { |
That way e.g. if someone runs kola run basic --qemu-cex
, it'll run the basic tests with the CEX device attached. Other options work that way as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
mantle/kola/tests/util/luks.go
Outdated
luksDump = c.MustSSH(m, "sudo cryptsetup luksDump "+rootPart) | ||
mustMatch(c, "cipher: paes-*", luksDump) | ||
|
||
err = coretest.TestRHCOSGrowfs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so the actual problem with this is that coretest.TestRHCOSGrowfs()
assumes that it's running on the VM itself under test. In the basic tests, it's a native function. Here we're running it from the host so it's actually e.g. calling journalctl on your host system which is of course not what we want.
Will add comments for how we can fix this.
mantle/kola/tests/util/luks.go
Outdated
err = coretest.TestRHCOSGrowfs() | ||
if err != nil { | ||
c.Fatalf("Failed to run ignition-ostree-growfs.service: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Hi @jlebon ,
|
22fcef9
to
9b01e7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires: #3850 |
3cbb555
to
edd4983
Compare
/test rhcos |
Still needs a commit message. I think this is still blocked on #3850. I've restarted CI there. |
edd4983
to
fa2b306
Compare
Hi, @jlebon , |
This kola test is crucial for verifying the security of CEX hardware-based LUKS encryption on root volume. It guarantees that the encrypted device employs protected keys to encrypt and decrypt the volume. This is essentially testing the enablement done in coreos/ignition#1820. To run this, it needs to be on a system with a CEX device with passthrough enabled and the device's UUID exposed via KOLA_CEX_UUID. See also coreos/fedora-coreos-pipeline#1010. Co-authored-by: Jonathan Lebon <jonathan@jlebon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Linked to the other relevant PRs in the commit message.
Before we merge this, can you confirm that this is passing in FCOS and RHCOS (on their respective s390x builders)?
@jlebon , I've done test on RHCOS. But i'll rerun the test on RHCOS and FCOS again and update you soon. |
Hi @jlebon, Here is the process i followed for the test/
|
Added coreos/fedora-coreos-tracker#1708 for meeting discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This won't run in pipelines until we actually define KOLA_CEX_UUID
in the pipecfg. We will not be able to run this in the FCOS pipeline because the s390x builder we use there is a cloud instance and has no access to a CEX device. Still, it would be beneficial to eventually have better packaging there, but it doesn't need to block this work.
This kola test is crucial for verifying the security of CEX
hardware-based LUKS encryption on root volume. It guarantees that the
encrypted device employs protected keys to encrypt and decrypt the
volume.
This is essentially testing the enablement done in
coreos/ignition#1820.
To run this, it needs to be on a system with a CEX device with
passthrough enabled and the device's UUID exposed via KOLA_CEX_UUID. See
also coreos/fedora-coreos-pipeline#1010.
Co-authored-by: Jonathan Lebon jonathan@jlebon.com