arm64: unable to PXE boot, hang after AHCI errors #871

Snaipe · 2022-10-13T07:30:29Z

Description

On Gigabyte arm64 servers, the Flatcar PXE images hang during the boot process, making them unusable, while Fedora CoreOS images work. We think we narrowed it down to CMA not being enabled in the kernel config, and enabling it generally and for DMA seems to get the boot process farther along.

Impact

Inability to use flatcar images altogether on arm-based servers

Environment and steps to reproduce

We are running the flatcar pxe vmlinuz and initrd images on a bare metal arm64 server. This server is a Gigabyte R152-P31.

This seems to be happening with multiple (likely all) published Flatcar images for arm64. I have tested 3033.3.5, 3227.2.2, and 3374.0.0. We have yet to find an image that works.

We use the following grub config to boot these images (adapting the versions appropriately):

function load_video {
        insmod efi_gop
        insmod efi_uga
        insmod video_bochs
        insmod video_cirrus
        insmod all_video
}

load_video
set gfxpayload=keep
insmod gzio

set timeout=30

menuentry 'Install Flatcar 3033.3.5 (aarch64)' --class gnu-linux --class gnu --class os --id flatcar3033.3.5-aarch64 {
    linux flatcar3033.3.5/aarch64/vmlinuz ip=dhcp inst.ks=http://10.90.21.50/cgi-bin/kickstart?os=flatcar3033.3.5-aarch64 coreos.autologin ipv6.disable=1 cloud-config-url="http://10.90.21.50/cgi-bin/bootstrap.sh?name={name}&arch=aarch64&domain={domain}{bootstrap_params}" flatcar.first_boot=1 flatcar.autologin console=tty0 module_blacklist=nvme,nvme_core
    initrd flatcar3033.3.5/aarch64/initrd.img
}

The kernel seems to start correctly, however it invariably ends up printing this message and hanging:

[   18.562856] ata1.00: qc timeout (cmd 0xec)
[   19.352940] ahci 000c:01:00.0: AHCI controller unavailable!
[   19.352969] pcieport 000c:00:01.0: AER: Uncorrected (Non-Fatal) error received: 000c:00:00.0
[   20.144487] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   20.144490] pcieport 000c:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[   20.174143] pcieport 000c:00:01.0:   device [1def:e101] error status/mask=00004000/00400000
[   20.186763] pcieport 000c:00:01.0:    [14] CmpltTO                (First)
[   20.545764] ahci 000c:01:00.0: AHCI controller unavailable!
[   20.545779] ahci 000c:01:00.0: AER: can't recover (no error_detected callback)
[   20.567037] pcieport 000c:00:01.0: AER: device recovery failed
[   20.577034] pcieport 000c:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: 000c:00:00.0

... with stack dumps being periodically printed out because the whole thing is blocked. Full logs for a 3227.2.2 boot: flatcar-3227.2.2-boot.log

Booting the kernel with modules_blacklist=ahci of course works and gets us a shell, but means no disks are visible.

One thing we have noticed is that Fedora CoreOS 36.20220918.3.0 has no issue getting past this point. Here is a diff of the boot logs for both:
flatcar-vs-fcos-boot.diff

One obvious difference is the kernel version (Flatcar uses 5.15 while FCOS uses 5.19), but we think we have narrowed it down to the following differences:

- Memory: 394529984K/402392064K available (9920K kernel code, 2314K rwdata, 7220K rodata, 39808K init, 780K bss, 7862080K reserved, 0K cma-reserved)
+ Memory: 394648504K/402392064K available (16576K kernel code, 4122K rwdata, 13564K rodata, 7488K init, 10738K bss, 7678024K reserved, 65536K cma-reserved)
...
- DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations
- DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
- DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
+ DMA: preallocated 16384 KiB GFP_KERNEL pool for atomic allocations
+ DMA: preallocated 16384 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
+ DMA: preallocated 16384 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
...
- pci 000c:01:00.0: [1b21:1164] typefailed to assign [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: no space for [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: failed to assign [io  size 0x1000]
- pci 0004:00:01.0: BAR 13: no space for [io  size 0x1000]
- pci 0004:00:01.0: BAR 13: failed to assign [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: no space for [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: failed to assign [io  size 0x1000]
- pci 0004:00:03.0: BAR 13: no spacTCP: Hash tables configured (established 524288 bind 65536)
+ pci 000c:01:00.0: [1b21:1164] type ata36: SATA max UDMA/133 abar m8192@0x40282000 port 0x40282680 irq 109

In particular, the DMA coherent pools are rather small and it's been accepted that the coherent pool can only work correctly with CMA enabled. The inability to allocate space for the offending PCI device during boot seem to correlate this. On Flatcar, there is no space reserved for CMA while FCOS has 64MiB, and specifying cma=64M on the kernel cmdline prints that the argument is unrecognized, hinting that it's config-disabled.

I have built the kernel of flatcar 3227.2.3 using the SDK, and enabled the following configs:

CONFIG_CMA=y
CONFIG_DMA_CMA=y
CONFIG_CMA_AREAS=19
CONFIG_CMA_SIZE_MBYTES=64

This seemed to have gotten the boot process farther along, except for dracut having issues using iscsi, but I think this may have something to do with my build more than anything. Boot logs for this are here:
flatcar-with-cma.log. Currently investigating.

Expected behavior

We would have expected the boot to complete.

The text was updated successfully, but these errors were encountered:

jepio · 2022-10-13T08:24:28Z

Hi @Snaipe,
it may very well be down to cma, but could you try the following kernel parameters on stock Flatcar: coherent_pool=64M pci=realloc. Could you also upload the FCOS kconfig? The bigger atomic pool could come down to FCOS defaulting to 16K pages (reading kernel source here, need to verify).

Snaipe · 2022-10-13T09:23:57Z

Just tried to boot the vanilla 3227.2.2 pxe images with coherent_pool=64M pci=realloc -- same issue: flatcar-vanilla-with-opts.log

I'm not sure why, but here or even in my own tests I'm not seeing any effect when setting coherent_pool=xyz; the boot logs still contain DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations. Unless I'm misreading the kernel code, this is printed here: https://elixir.bootlin.com/linux/v5.15.70/source/kernel/dma/pool.c#L182, which should be expanding the pool by pool_size, which should be atomic_pool_size, which should be overridden by coherent_pool. Maybe I'm just misunderstanding the effect of atomic_pool_expand and the returned size from gen_pool_size.

Anyway, here's FCOS' kconfig: fcos-kconfig.txt

jepio · 2022-10-13T09:44:55Z

Would you be able to test build a Flatcar image with the following value set in the kernel config:

CONFIG_FORCE_MAX_ZONEORDER=13

That would be the source of the 16M vs 4M atomic pool difference. But to be honest I have no idea why CMA would make a difference here, especially with assigning io space.

Snaipe · 2022-10-14T08:03:48Z

I thought I had done something wrong and had to rebuild the kernel twice to confirm, but that config didn't seem to have any effect. I'm a bit puzzled.

I am also noticing that when I pass cma=0 to the kernel cmdline, I'm back at the same point as above, so maybe it's something else.

I'm also noticing that I am consistently getting a BTF mismatch on the scsi_common module so I think I might be doing something wrong. Is the guide up-to-date? I'm going to try enabling MODULE_ALLOW_BTF_MISMATCH for now to see if I can get this thing farther, but even better would be to actually reproduce the 3227.2.2 images.

jepio · 2022-10-14T08:35:08Z

I thought I had done something wrong and had to rebuild the kernel twice to confirm, but that config didn't seem to have any effect. I'm a bit puzzled.

I am also noticing that when I pass cma=0 to the kernel cmdline, I'm back at the same point as above, so maybe it's something else.

That's weird. If I find some time i'd check it myself.

I'm also noticing that I am consistently getting a BTF mismatch on the scsi_common module so I think I might be doing something wrong. Is the guide up-to-date? I'm going to try enabling MODULE_ALLOW_BTF_MISMATCH for now to see if I can get this thing farther, but even better would be to actually reproduce the 3227.2.2 images.

Sorry, just noticed that the guide has some issues. Try patching the config in coreos-overlay/sys-kernel/coreos-modules/files and then rebuild in this order:

emerge-amd64-usr coreos-modules
emerge-amd64-usr coreos-kernel
./build_image.sh
./image_to_vm.sh

Would you open a PR with the kconfig changes after you've checked which one works best?

Snaipe · 2022-10-14T10:42:26Z

Sorry, just noticed that the guide has some issues. Try patching the config in coreos-overlay/sys-kernel/coreos-modules/files and then rebuild in this order:
emerge-amd64-usr coreos-modules
emerge-amd64-usr coreos-kernel
./build_image.sh
./image_to_vm.sh

Thanks, doing this right now.

Would you open a PR with the kconfig changes after you've checked which one works best?

Yep, will do.

Snaipe · 2022-10-14T12:50:11Z

Okay, this was very helpful -- the image I got reproduces the original problem, which mean I'll be able to properly troubleshoot things. I can confirm that CMA might have been a red herring, as with all of the above config set, I'm still seeing the problem.

jepio · 2022-10-14T14:31:55Z

What about enabling CONFIG_IOMMU_SUPPORT=y and related ARM settings CONFIG_ARM_SMMU=y and CONFIG_ARM_SMMU_V3=y?

Snaipe · 2022-10-14T20:07:19Z

Progress! Enabling IOMMU got me past the original failure. We're now blocking for about 300 seconds after the NICs get initialized and the disks get SCSI-attached, before it gives up and reboots. Boot logs.

I'm going to try to bisect the kernel config differences with FCOS too in the meantime.

Snaipe · 2022-10-17T07:58:33Z

Unfortunately I was not able to bisect the config diff to something that worked -- the initial set was nonfunctional from the get-go (can't even boot) and too large to efficiently trim to find something that worked. I did try investigating the DMA differences a bit further, but I wasn't really trusting the config at that point, so I applied this patch:

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 4d40dcce..0467eb2e 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -196,7 +196,16 @@ static int __init dma_atomic_pool_init(void)
 		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
 		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
 		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
+
+		pr_info("DMA: using default pool size\n");
+	}
+	pr_info("DMA: default pool size is %zu KiB\n", atomic_pool_size >> 10);
+
+	if (atomic_pool_size == (4096 << 10)) {
+		atomic_pool_size = 16384 << 10;
+		pr_info("DMA: force overriding atomic pool size to %zu KiB\n", atomic_pool_size >> 10);
 	}
+
 	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);

 	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6ba10021..add66dea 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -25,7 +25,7 @@

 /* Free memory management - zoned buddy allocator.  */
 #ifndef CONFIG_FORCE_MAX_ZONEORDER
-#define MAX_ORDER 11
+#define MAX_ORDER 13
 #else
 #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
 #endif

This didn't seem to have any effect on the default preallocation size, though it did confirm that coherent_pool was correctly being picked up.

That said, I'm not sure this is related anymore, as I'm pretty sure the latest boot logs indicate that init is running pretty well. It looks like systemd is waiting on something and timing out, so I'm going to try and see if I can get it to dump some information.

jepio · 2022-10-17T08:36:06Z

I don't know which console you grab the logs from but make sure you have this in the command line:
console=ttyXXX systemd.journald.forward_to_console=1

Snaipe · 2022-10-17T08:39:05Z

From serial through IPMI -- I've added forward_to_console=1 and I think it might be a config issue pulling something over the network. Currently troubleshooting, but it seems that the IOMMU configs were what really unwedged me. I'll report back if/once I get the config issues working and make a PR with the kconfig changes if nothing else is breaking

jepio · 2022-10-17T09:04:20Z

I think you may want to set ignition.config.url instead of cloud-config-url, which will fetch the provisioning config and should run it through https://github.com/flatcar/coreos-cloudinit if it's not an ignition file (https://www.flatcar.org/docs/latest/installing/bare-metal/booting-with-ipxe/).

Snaipe · 2022-10-17T12:07:22Z

I finally got it to work -- removing everything in the config except for a pretty bare bones ignition config, it managed to finally get all the way to login:

$ curl http://10.90.21.50/snaipe-debug.ign | jq                                                                                                 
{
  "ignition": {
    "config": {},
    "security": {
      "tls": {}
    },
    "timeouts": {},
    "version": "2.3.0"
  },
  "networkd": {},
  "passwd": {
    "users": [
      {
        "name": "core",
        "sshAuthorizedKeys": [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAbcwEDwYPaFHr2VDFBFxH++6cz9Hthr8/FdNLaDCFby snaipe@arista.com"
        ]
      }
    ]
  },
  "storage": {},
  "systemd": {}
}

I guess we have some stuff to revise in our current configuration -- it was the same config used for older flatcar deployments so I had assumed it would continue working, but perhaps not. At the very least, being able to boot means I finally have something to go on.

I had some extras kernel configs enabled, so I'll try just with IOMMU first to see if I can identify the minimum set of configs to turn on, then put that in a pull request.

Snaipe · 2022-10-18T09:35:04Z

I've made a PR at flatcar-archive/coreos-overlay#2235 with the kconfig changes.

jepio · 2022-10-21T12:40:42Z

Thank you for this contribution @Snaipe. This fix will now progress through alpha->beta->stable.

Snaipe added the kind/bug Something isn't working label Oct 13, 2022

jepio added the platform/arm64 label Oct 13, 2022

jepio mentioned this issue Oct 14, 2022

dev-guides: fix order of kernel package compilation flatcar-archive/flatcar-docs#261

Merged

Snaipe mentioned this issue Oct 17, 2022

sys-kernel: enable IOMMU on arm64 flatcar-archive/coreos-overlay#2235

Merged

2 tasks

jepio closed this as completed in flatcar-archive/coreos-overlay#2235 Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arm64: unable to PXE boot, hang after AHCI errors #871

arm64: unable to PXE boot, hang after AHCI errors #871

Snaipe commented Oct 13, 2022 •

edited

Loading

jepio commented Oct 13, 2022

Snaipe commented Oct 13, 2022

jepio commented Oct 13, 2022 •

edited

Loading

Snaipe commented Oct 14, 2022

jepio commented Oct 14, 2022

Snaipe commented Oct 14, 2022

Snaipe commented Oct 14, 2022

jepio commented Oct 14, 2022

Snaipe commented Oct 14, 2022 •

edited

Loading

Snaipe commented Oct 17, 2022

jepio commented Oct 17, 2022

Snaipe commented Oct 17, 2022 •

edited

Loading

jepio commented Oct 17, 2022 •

edited

Loading

Snaipe commented Oct 17, 2022

Snaipe commented Oct 18, 2022

jepio commented Oct 21, 2022 •

edited

Loading

arm64: unable to PXE boot, hang after AHCI errors #871

arm64: unable to PXE boot, hang after AHCI errors #871

Comments

Snaipe commented Oct 13, 2022 • edited Loading

Description

Impact

Environment and steps to reproduce

Expected behavior

jepio commented Oct 13, 2022

Snaipe commented Oct 13, 2022

jepio commented Oct 13, 2022 • edited Loading

Snaipe commented Oct 14, 2022

jepio commented Oct 14, 2022

Snaipe commented Oct 14, 2022

Snaipe commented Oct 14, 2022

jepio commented Oct 14, 2022

Snaipe commented Oct 14, 2022 • edited Loading

Snaipe commented Oct 17, 2022

jepio commented Oct 17, 2022

Snaipe commented Oct 17, 2022 • edited Loading

jepio commented Oct 17, 2022 • edited Loading

Snaipe commented Oct 17, 2022

Snaipe commented Oct 18, 2022

jepio commented Oct 21, 2022 • edited Loading

Snaipe commented Oct 13, 2022 •

edited

Loading

jepio commented Oct 13, 2022 •

edited

Loading

Snaipe commented Oct 14, 2022 •

edited

Loading

Snaipe commented Oct 17, 2022 •

edited

Loading

jepio commented Oct 17, 2022 •

edited

Loading

jepio commented Oct 21, 2022 •

edited

Loading