Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support root on iSCSI (non-HBA) #1590

Closed
jlebon opened this issue Oct 3, 2023 · 9 comments
Closed

Support root on iSCSI (non-HBA) #1590

jlebon opened this issue Oct 3, 2023 · 9 comments
Assignees
Labels
jira for syncing to jira kind/enhancement

Comments

@jlebon
Copy link
Member

jlebon commented Oct 3, 2023

Some bare metal systems (and some cloud providers offering bare metal instance types, like Oracle Cloud) run their boot disk off of iSCSI. In setups without an HBA, it's up to the OS to act as the iSCSI initiator. During the initramfs, the OS connects to the target to be able to mount the root device and keep going from there. There's a pile of code that already exists for this in dracut that we should be able to reuse (look for the rd.iscsi keys in https://www.man7.org/linux/man-pages/man7/dracut.cmdline.7.html).

The main complexity for us is ensuring that it meshes well with our existing maze of initrd services and ordering.

@jlebon
Copy link
Member Author

jlebon commented Oct 3, 2023

The main complexity for us is ensuring that it meshes well with our existing maze of initrd services and ordering.

What makes this tricky is that it involves two areas that we dabble a lot in in our initramfs code: finding the bootfs/rootfs, and networking.

For example, we have an ordering right now between the bootfs and networking because we need to be able to apply NetworkManager keyfiles hosted in the bootfs (see coreos/coreos-installer#713). I think we'll need to relax this so that in the iSCSI case, applying custom keyfiles before networking is just not supported. Otherwise, we'll be stuck waiting for the bootfs before bringing up networking.

@jbtrystram jbtrystram self-assigned this Oct 4, 2023
@travier travier added the jira for syncing to jira label Oct 4, 2023
@jlebon
Copy link
Member Author

jlebon commented Oct 16, 2023

xref progress in jbtrystram/targetcli-containers#1 (comment)

Re.

The metal image needs to actually be expanded to some reasonable size, e.g.

$ truncate -s 16G fcos.raw

The way we're testing things right now is that we're copying the metal raw image to some location and serving it over iSCSI, but a more realistic scenario is that the user would install FCOS/RHCOS onto a blank decently sized iSCSI target and then reboot into it. I think it's OK to take that shortcut for now (and have to manually add iSCSI-related kargs), but when we add proper testing for this, it should probably be a kola testiso scenario that goes through the real flow.

@jlebon
Copy link
Member Author

jlebon commented Oct 18, 2023

For reference, here's how the kernel command-line looks on a CentOS Stream 8 machine booted in Oracle Cloud:

BOOT_IMAGE=(hd1,gpt2)/vmlinuz-4.18.0-512.el8.x86_64 root=/dev/mapper/centosvolume-root ro crashkernel=auto LANG=en_US.UTF-8 transparent_hugepage=never console=tty0 console=ttyS0,115200 libiscsi.debug_libiscsi_eh=1 rd.luks=0 rd.md=0 rd.dm=0 rd.lvm.vg=centosvolume rd.lvm.lv=centosvolume/root ip=single-dhcp net.ifnames=1 rd.net.timeout.dhcp=10 network-config=e2NvbmZpZzogZGlzYWJsZWR9Cg== rd.iscsi.param=node.session.timeo.replacement_timeout=6000 rd.iscsi.ibft=1 rd.iscsi.firmware=1 rd.iscsi.initiator=iqn.2010-04.org.ipxe

The important bit here is rd.iscsi.ibft (configure the NIC using iBFT), rd.iscsi.firmware (configure iSCSI target using iBFT), and rd.iscsi.initiator (the initiator name of the node). The last one doesn't look particularly instance-specific so I think it could in theory be baked into an Oracle Cloud image, though it needs a bit more digging.

@jbtrystram
Copy link
Contributor

jbtrystram commented Oct 19, 2023

I did a bit of testing with iPXE and ibft.
booting the following boot.ipxe :

#!ipxe
sanboot iscsi:10.0.2.15::::iqn.2023-10.coreos.target.vm:coreos

Then adding rd.iscsi.ibft=1 rd.iscsi.firmware=1 in kargs.
In initramfs we have:

:/root# iscsiadm -m fw
# BEGIN RECORD 6.2.1.4
iface.initiatorname = iqn.2010-04.org.ipxe:00000000-0000-0000-0000-000000000000
iface.transport_name = tcp
iface.hwaddress = 52:54:00:12:34:56
iface.bootproto = DHCP
iface.ipaddress = 10.0.3.15
iface.prefix_len = 24
iface.subnet_mask = 255.255.255.0
iface.gateway = 10.0.3.2
iface.primary_dns = 10.0.3.3
iface.vlan_id = 0
iface.net_ifacename = ens3
node.name = iqn.2023-10.coreos.target.vm:coreos
node.conn[0].address = 10.0.2.15
node.conn[0].port = 3260
node.boot_lun = 00000000
# END RECORD

This looks like it's pulled from /sys/firmware/ibft:

:/root# cat /sys/firmware/ibft/target0/target-name 
iqn.2023-10.coreos.target.vm:coreos
:/root# cat /sys/firmware/ibft/target0/{ip-addr,port} 
10.0.2.15
3260
:/root# cat /sys/firmware/ibft/initiator/initiator-name 
iqn.2010-04.org.ipxe:00000000-0000-0000-0000-000000000000

jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Oct 30, 2023
Rather than having the unit have `After=coreos-gpt-setup.service`
as a way to conditionally require the bootfs to be available, have
`coreos-diskful-generator` actually inject the requirement dynamically,
just like it does already for `coreos-ignition-setup-user.service`.

This is prep for moving `coreos-gpt-setup.service`.

Part of: coreos/fedora-coreos-tracker#1590
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Oct 30, 2023
These services need to manipulate the boot disk but if the disk is
backed by iSCSI, we need networking first. Move them to run later in the
boot process but still before `ignition-disks`.

Part of: coreos/fedora-coreos-tracker#1590
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Oct 30, 2023
In an iSCSI boot, we need networking before we can see the bootfs. So
`coreos-ignition-setup-user.service` needs to run after networking. We
also can't rely on the `fetch-offline` stage. Tweak things so that in
an iSCSI boot (1) we neuter the `fetch-offline` stage, and (2) we still
ensure that `coreos-ignition-setup-user.service` runs before the `fetch`
stage, which will take care of "consuming" the injected Ignition config.

Part of: coreos/fedora-coreos-tracker#1590
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Oct 30, 2023
In an iSCSI boot, we have no utility for
`coreos-copy-firstboot-network.service` since it copies network settings
from the bootfs, which will only show up after networking has already
come up. Users are required to configure non-default networking via
kargs for iSCSI boots.

Part of: coreos/fedora-coreos-tracker#1590
jlebon added a commit to jlebon/fedora-coreos-config that referenced this issue Oct 30, 2023
During an iSCSI boot, we rely on networking to access the root device.
So at no point can networking go down once the rootfs is mounted. This
means we have to skip the teardown logic in `coreos-teardown-initramfs`
in that case.

One way to look at this conceptually is: the goal of this service is to
reset everything so that the real root is configured on the first boot
like it would be on any other boot (i.e. avoid first boot-specific state
carried from the initramfs). Since initramfs networking is required on
every boot for iSCSI, it's true on every boot that networking needs to
be carried forward across switchroot. So the goal is still met since
it's not a first boot-only concern.

Part of: coreos/fedora-coreos-tracker#1590
@jlebon
Copy link
Member Author

jlebon commented Oct 30, 2023

I opened coreos/fedora-coreos-config#2702, with which I'm able to get FCOS up and running all the way to the real root on iSCSI via iPXE + rd.iscsi.firmware=1.

This addresses the generic case and isn't specific to Oracle Cloud. In theory, for Oracle Cloud enablement, all that's left is for AI to do the right wiring so that it adds rd.iscsi.firmware=1 at install time. But we still need to verify that's all we need.

@jlebon
Copy link
Member Author

jlebon commented Nov 6, 2023

Follow-up on Oracle Cloud enablement: I've successfully tested RHCOS with coreos/fedora-coreos-config#2702 in Oracle Cloud on all the bare metal instance types certified for RHEL. For good measure, I've also tested the VM instance types there, but those don't use iSCSI so it was expected to work.

As expected, one needs to add rd.iscsi.firmware=1, just like you would in traditional RHEL. I've also had to add ip=ibft so that NM only brings up networking on the iBFT device. Otherwise, it'll try to bring it up on all the NICs, some of which take much longer to get DHCP and can cause boot to timeout. (Of course, an alternative there is to bump the timeout via e.g. x-systemd.device-timeout=0 though it seems much cleaner to only bring up the NIC we actually need to find the root disk.)

cgwalters pushed a commit to coreos/fedora-coreos-config that referenced this issue Nov 6, 2023
Rather than having the unit have `After=coreos-gpt-setup.service`
as a way to conditionally require the bootfs to be available, have
`coreos-diskful-generator` actually inject the requirement dynamically,
just like it does already for `coreos-ignition-setup-user.service`.

This is prep for moving `coreos-gpt-setup.service`.

Part of: coreos/fedora-coreos-tracker#1590
cgwalters pushed a commit to coreos/fedora-coreos-config that referenced this issue Nov 6, 2023
These services need to manipulate the boot disk but if the disk is
backed by iSCSI, we need networking first. Move them to run later in the
boot process but still before `ignition-disks`.

Part of: coreos/fedora-coreos-tracker#1590
cgwalters pushed a commit to coreos/fedora-coreos-config that referenced this issue Nov 6, 2023
In an iSCSI boot, we need networking before we can see the bootfs. So
`coreos-ignition-setup-user.service` needs to run after networking. We
also can't rely on the `fetch-offline` stage. Tweak things so that in
an iSCSI boot (1) we neuter the `fetch-offline` stage, and (2) we still
ensure that `coreos-ignition-setup-user.service` runs before the `fetch`
stage, which will take care of "consuming" the injected Ignition config.

Part of: coreos/fedora-coreos-tracker#1590
cgwalters pushed a commit to coreos/fedora-coreos-config that referenced this issue Nov 6, 2023
In an iSCSI boot, we have no utility for
`coreos-copy-firstboot-network.service` since it copies network settings
from the bootfs, which will only show up after networking has already
come up. Users are required to configure non-default networking via
kargs for iSCSI boots.

Part of: coreos/fedora-coreos-tracker#1590
cgwalters pushed a commit to coreos/fedora-coreos-config that referenced this issue Nov 6, 2023
During an iSCSI boot, we rely on networking to access the root device.
So at no point can networking go down once the rootfs is mounted. This
means we have to skip the teardown logic in `coreos-teardown-initramfs`
in that case.

One way to look at this conceptually is: the goal of this service is to
reset everything so that the real root is configured on the first boot
like it would be on any other boot (i.e. avoid first boot-specific state
carried from the initramfs). Since initramfs networking is required on
every boot for iSCSI, it's true on every boot that networking needs to
be carried forward across switchroot. So the goal is still met since
it's not a first boot-only concern.

Part of: coreos/fedora-coreos-tracker#1590
@dustymabe dustymabe added the status/pending-next-release Fixed upstream. Waiting on a next release. label Nov 6, 2023
@dustymabe
Copy link
Member

The fix for this went into next stream release 39.20231106.1.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. and removed status/pending-next-release Fixed upstream. Waiting on a next release. labels Nov 7, 2023
@dustymabe
Copy link
Member

The fix for this went into testing stream release 39.20231119.2.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Nov 21, 2023
@dustymabe
Copy link
Member

The fix for this went into stable stream release 39.20231119.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Jan 10, 2024
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
Rather than having the unit have `After=coreos-gpt-setup.service`
as a way to conditionally require the bootfs to be available, have
`coreos-diskful-generator` actually inject the requirement dynamically,
just like it does already for `coreos-ignition-setup-user.service`.

This is prep for moving `coreos-gpt-setup.service`.

Part of: coreos/fedora-coreos-tracker#1590
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
These services need to manipulate the boot disk but if the disk is
backed by iSCSI, we need networking first. Move them to run later in the
boot process but still before `ignition-disks`.

Part of: coreos/fedora-coreos-tracker#1590
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
In an iSCSI boot, we need networking before we can see the bootfs. So
`coreos-ignition-setup-user.service` needs to run after networking. We
also can't rely on the `fetch-offline` stage. Tweak things so that in
an iSCSI boot (1) we neuter the `fetch-offline` stage, and (2) we still
ensure that `coreos-ignition-setup-user.service` runs before the `fetch`
stage, which will take care of "consuming" the injected Ignition config.

Part of: coreos/fedora-coreos-tracker#1590
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
In an iSCSI boot, we have no utility for
`coreos-copy-firstboot-network.service` since it copies network settings
from the bootfs, which will only show up after networking has already
come up. Users are required to configure non-default networking via
kargs for iSCSI boots.

Part of: coreos/fedora-coreos-tracker#1590
aaradhak pushed a commit to aaradhak/fedora-coreos-config that referenced this issue Mar 18, 2024
During an iSCSI boot, we rely on networking to access the root device.
So at no point can networking go down once the rootfs is mounted. This
means we have to skip the teardown logic in `coreos-teardown-initramfs`
in that case.

One way to look at this conceptually is: the goal of this service is to
reset everything so that the real root is configured on the first boot
like it would be on any other boot (i.e. avoid first boot-specific state
carried from the initramfs). Since initramfs networking is required on
every boot for iSCSI, it's true on every boot that networking needs to
be carried forward across switchroot. So the goal is still met since
it's not a first boot-only concern.

Part of: coreos/fedora-coreos-tracker#1590
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira kind/enhancement
Projects
None yet
Development

No branches or pull requests

4 participants