Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64: unable to PXE boot, hang after AHCI errors #871

Closed
Snaipe opened this issue Oct 13, 2022 · 16 comments · Fixed by flatcar-archive/coreos-overlay#2235
Closed

arm64: unable to PXE boot, hang after AHCI errors #871

Snaipe opened this issue Oct 13, 2022 · 16 comments · Fixed by flatcar-archive/coreos-overlay#2235
Labels
kind/bug Something isn't working platform/arm64

Comments

@Snaipe
Copy link

Snaipe commented Oct 13, 2022

Description

On Gigabyte arm64 servers, the Flatcar PXE images hang during the boot process, making them unusable, while Fedora CoreOS images work. We think we narrowed it down to CMA not being enabled in the kernel config, and enabling it generally and for DMA seems to get the boot process farther along.

Impact

Inability to use flatcar images altogether on arm-based servers

Environment and steps to reproduce

We are running the flatcar pxe vmlinuz and initrd images on a bare metal arm64 server. This server is a Gigabyte R152-P31.

This seems to be happening with multiple (likely all) published Flatcar images for arm64. I have tested 3033.3.5, 3227.2.2, and 3374.0.0. We have yet to find an image that works.

We use the following grub config to boot these images (adapting the versions appropriately):

function load_video {
        insmod efi_gop
        insmod efi_uga
        insmod video_bochs
        insmod video_cirrus
        insmod all_video
}

load_video
set gfxpayload=keep
insmod gzio

set timeout=30

menuentry 'Install Flatcar 3033.3.5 (aarch64)' --class gnu-linux --class gnu --class os --id flatcar3033.3.5-aarch64 {
    linux flatcar3033.3.5/aarch64/vmlinuz ip=dhcp inst.ks=http://10.90.21.50/cgi-bin/kickstart?os=flatcar3033.3.5-aarch64 coreos.autologin ipv6.disable=1 cloud-config-url="http://10.90.21.50/cgi-bin/bootstrap.sh?name={name}&arch=aarch64&domain={domain}{bootstrap_params}" flatcar.first_boot=1 flatcar.autologin console=tty0 module_blacklist=nvme,nvme_core
    initrd flatcar3033.3.5/aarch64/initrd.img
}

The kernel seems to start correctly, however it invariably ends up printing this message and hanging:

[   18.562856] ata1.00: qc timeout (cmd 0xec)
[   19.352940] ahci 000c:01:00.0: AHCI controller unavailable!
[   19.352969] pcieport 000c:00:01.0: AER: Uncorrected (Non-Fatal) error received: 000c:00:00.0
[   20.144487] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[   20.144490] pcieport 000c:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[   20.174143] pcieport 000c:00:01.0:   device [1def:e101] error status/mask=00004000/00400000
[   20.186763] pcieport 000c:00:01.0:    [14] CmpltTO                (First)
[   20.545764] ahci 000c:01:00.0: AHCI controller unavailable!
[   20.545779] ahci 000c:01:00.0: AER: can't recover (no error_detected callback)
[   20.567037] pcieport 000c:00:01.0: AER: device recovery failed
[   20.577034] pcieport 000c:00:01.0: AER: Multiple Uncorrected (Non-Fatal) error received: 000c:00:00.0

... with stack dumps being periodically printed out because the whole thing is blocked. Full logs for a 3227.2.2 boot: flatcar-3227.2.2-boot.log

Booting the kernel with modules_blacklist=ahci of course works and gets us a shell, but means no disks are visible.

One thing we have noticed is that Fedora CoreOS 36.20220918.3.0 has no issue getting past this point. Here is a diff of the boot logs for both:
flatcar-vs-fcos-boot.diff

One obvious difference is the kernel version (Flatcar uses 5.15 while FCOS uses 5.19), but we think we have narrowed it down to the following differences:

- Memory: 394529984K/402392064K available (9920K kernel code, 2314K rwdata, 7220K rodata, 39808K init, 780K bss, 7862080K reserved, 0K cma-reserved)
+ Memory: 394648504K/402392064K available (16576K kernel code, 4122K rwdata, 13564K rodata, 7488K init, 10738K bss, 7678024K reserved, 65536K cma-reserved)
...
- DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations
- DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
- DMA: preallocated 4096 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
+ DMA: preallocated 16384 KiB GFP_KERNEL pool for atomic allocations
+ DMA: preallocated 16384 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
+ DMA: preallocated 16384 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
...
- pci 000c:01:00.0: [1b21:1164] typefailed to assign [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: no space for [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: failed to assign [io  size 0x1000]
- pci 0004:00:01.0: BAR 13: no space for [io  size 0x1000]
- pci 0004:00:01.0: BAR 13: failed to assign [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: no space for [io  size 0x1000]
- pci 0004:00:05.0: BAR 13: failed to assign [io  size 0x1000]
- pci 0004:00:03.0: BAR 13: no spacTCP: Hash tables configured (established 524288 bind 65536)
+ pci 000c:01:00.0: [1b21:1164] type ata36: SATA max UDMA/133 abar m8192@0x40282000 port 0x40282680 irq 109

In particular, the DMA coherent pools are rather small and it's been accepted that the coherent pool can only work correctly with CMA enabled. The inability to allocate space for the offending PCI device during boot seem to correlate this. On Flatcar, there is no space reserved for CMA while FCOS has 64MiB, and specifying cma=64M on the kernel cmdline prints that the argument is unrecognized, hinting that it's config-disabled.

I have built the kernel of flatcar 3227.2.3 using the SDK, and enabled the following configs:

CONFIG_CMA=y
CONFIG_DMA_CMA=y
CONFIG_CMA_AREAS=19
CONFIG_CMA_SIZE_MBYTES=64

This seemed to have gotten the boot process farther along, except for dracut having issues using iscsi, but I think this may have something to do with my build more than anything. Boot logs for this are here:
flatcar-with-cma.log. Currently investigating.

Expected behavior

We would have expected the boot to complete.

@Snaipe Snaipe added the kind/bug Something isn't working label Oct 13, 2022
@jepio
Copy link
Member

jepio commented Oct 13, 2022

Hi @Snaipe,
it may very well be down to cma, but could you try the following kernel parameters on stock Flatcar: coherent_pool=64M pci=realloc. Could you also upload the FCOS kconfig? The bigger atomic pool could come down to FCOS defaulting to 16K pages (reading kernel source here, need to verify).

@Snaipe
Copy link
Author

Snaipe commented Oct 13, 2022

Just tried to boot the vanilla 3227.2.2 pxe images with coherent_pool=64M pci=realloc -- same issue: flatcar-vanilla-with-opts.log

I'm not sure why, but here or even in my own tests I'm not seeing any effect when setting coherent_pool=xyz; the boot logs still contain DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations. Unless I'm misreading the kernel code, this is printed here: https://elixir.bootlin.com/linux/v5.15.70/source/kernel/dma/pool.c#L182, which should be expanding the pool by pool_size, which should be atomic_pool_size, which should be overridden by coherent_pool. Maybe I'm just misunderstanding the effect of atomic_pool_expand and the returned size from gen_pool_size.

Anyway, here's FCOS' kconfig: fcos-kconfig.txt

@jepio
Copy link
Member

jepio commented Oct 13, 2022

Would you be able to test build a Flatcar image with the following value set in the kernel config:

CONFIG_FORCE_MAX_ZONEORDER=13

That would be the source of the 16M vs 4M atomic pool difference. But to be honest I have no idea why CMA would make a difference here, especially with assigning io space.

@Snaipe
Copy link
Author

Snaipe commented Oct 14, 2022

I thought I had done something wrong and had to rebuild the kernel twice to confirm, but that config didn't seem to have any effect. I'm a bit puzzled.

I am also noticing that when I pass cma=0 to the kernel cmdline, I'm back at the same point as above, so maybe it's something else.

I'm also noticing that I am consistently getting a BTF mismatch on the scsi_common module so I think I might be doing something wrong. Is the guide up-to-date? I'm going to try enabling MODULE_ALLOW_BTF_MISMATCH for now to see if I can get this thing farther, but even better would be to actually reproduce the 3227.2.2 images.

@jepio
Copy link
Member

jepio commented Oct 14, 2022

I thought I had done something wrong and had to rebuild the kernel twice to confirm, but that config didn't seem to have any effect. I'm a bit puzzled.

I am also noticing that when I pass cma=0 to the kernel cmdline, I'm back at the same point as above, so maybe it's something else.

That's weird. If I find some time i'd check it myself.

I'm also noticing that I am consistently getting a BTF mismatch on the scsi_common module so I think I might be doing something wrong. Is the guide up-to-date? I'm going to try enabling MODULE_ALLOW_BTF_MISMATCH for now to see if I can get this thing farther, but even better would be to actually reproduce the 3227.2.2 images.

Sorry, just noticed that the guide has some issues. Try patching the config in coreos-overlay/sys-kernel/coreos-modules/files and then rebuild in this order:

emerge-amd64-usr coreos-modules
emerge-amd64-usr coreos-kernel
./build_image.sh
./image_to_vm.sh

Would you open a PR with the kconfig changes after you've checked which one works best?

@Snaipe
Copy link
Author

Snaipe commented Oct 14, 2022

Sorry, just noticed that the guide has some issues. Try patching the config in coreos-overlay/sys-kernel/coreos-modules/files and then rebuild in this order:

emerge-amd64-usr coreos-modules
emerge-amd64-usr coreos-kernel
./build_image.sh
./image_to_vm.sh

Thanks, doing this right now.

Would you open a PR with the kconfig changes after you've checked which one works best?

Yep, will do.

@Snaipe
Copy link
Author

Snaipe commented Oct 14, 2022

Okay, this was very helpful -- the image I got reproduces the original problem, which mean I'll be able to properly troubleshoot things. I can confirm that CMA might have been a red herring, as with all of the above config set, I'm still seeing the problem.

@jepio
Copy link
Member

jepio commented Oct 14, 2022

What about enabling CONFIG_IOMMU_SUPPORT=y and related ARM settings CONFIG_ARM_SMMU=y and CONFIG_ARM_SMMU_V3=y?

@Snaipe
Copy link
Author

Snaipe commented Oct 14, 2022

Progress! Enabling IOMMU got me past the original failure. We're now blocking for about 300 seconds after the NICs get initialized and the disks get SCSI-attached, before it gives up and reboots. Boot logs.

I'm going to try to bisect the kernel config differences with FCOS too in the meantime.

@Snaipe
Copy link
Author

Snaipe commented Oct 17, 2022

Unfortunately I was not able to bisect the config diff to something that worked -- the initial set was nonfunctional from the get-go (can't even boot) and too large to efficiently trim to find something that worked. I did try investigating the DMA differences a bit further, but I wasn't really trusting the config at that point, so I applied this patch:

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 4d40dcce..0467eb2e 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -196,7 +196,16 @@ static int __init dma_atomic_pool_init(void)
 		unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K);
 		pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES);
 		atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K);
+
+		pr_info("DMA: using default pool size\n");
+	}
+	pr_info("DMA: default pool size is %zu KiB\n", atomic_pool_size >> 10);
+
+	if (atomic_pool_size == (4096 << 10)) {
+		atomic_pool_size = 16384 << 10;
+		pr_info("DMA: force overriding atomic pool size to %zu KiB\n", atomic_pool_size >> 10);
 	}
+
 	INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);

 	atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6ba10021..add66dea 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -25,7 +25,7 @@

 /* Free memory management - zoned buddy allocator.  */
 #ifndef CONFIG_FORCE_MAX_ZONEORDER
-#define MAX_ORDER 11
+#define MAX_ORDER 13
 #else
 #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
 #endif

This didn't seem to have any effect on the default preallocation size, though it did confirm that coherent_pool was correctly being picked up.

That said, I'm not sure this is related anymore, as I'm pretty sure the latest boot logs indicate that init is running pretty well. It looks like systemd is waiting on something and timing out, so I'm going to try and see if I can get it to dump some information.

@jepio
Copy link
Member

jepio commented Oct 17, 2022

I don't know which console you grab the logs from but make sure you have this in the command line:
console=ttyXXX systemd.journald.forward_to_console=1

@Snaipe
Copy link
Author

Snaipe commented Oct 17, 2022

From serial through IPMI -- I've added forward_to_console=1 and I think it might be a config issue pulling something over the network. Currently troubleshooting, but it seems that the IOMMU configs were what really unwedged me. I'll report back if/once I get the config issues working and make a PR with the kconfig changes if nothing else is breaking

@jepio
Copy link
Member

jepio commented Oct 17, 2022

I think you may want to set ignition.config.url instead of cloud-config-url, which will fetch the provisioning config and should run it through https://github.com/flatcar/coreos-cloudinit if it's not an ignition file (https://www.flatcar.org/docs/latest/installing/bare-metal/booting-with-ipxe/).

@Snaipe
Copy link
Author

Snaipe commented Oct 17, 2022

I finally got it to work -- removing everything in the config except for a pretty bare bones ignition config, it managed to finally get all the way to login:

$ curl http://10.90.21.50/snaipe-debug.ign | jq                                                                                                 
{
  "ignition": {
    "config": {},
    "security": {
      "tls": {}
    },
    "timeouts": {},
    "version": "2.3.0"
  },
  "networkd": {},
  "passwd": {
    "users": [
      {
        "name": "core",
        "sshAuthorizedKeys": [
          "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAbcwEDwYPaFHr2VDFBFxH++6cz9Hthr8/FdNLaDCFby snaipe@arista.com"
        ]
      }
    ]
  },
  "storage": {},
  "systemd": {}
}

I guess we have some stuff to revise in our current configuration -- it was the same config used for older flatcar deployments so I had assumed it would continue working, but perhaps not. At the very least, being able to boot means I finally have something to go on.

I had some extras kernel configs enabled, so I'll try just with IOMMU first to see if I can identify the minimum set of configs to turn on, then put that in a pull request.

@Snaipe
Copy link
Author

Snaipe commented Oct 18, 2022

I've made a PR at flatcar-archive/coreos-overlay#2235 with the kconfig changes.

@jepio
Copy link
Member

jepio commented Oct 21, 2022

Thank you for this contribution @Snaipe. This fix will now progress through alpha->beta->stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working platform/arm64
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants