-
-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pin kernel to 5.15.x for now #152
Conversation
The 6.1.x series of kernels has busted networking on the community box, so we pin to 5.15.x which works fine. I'll look around to see what the problem may be, and see if I can either bisect the issue, or find something in the kernel archives (somewhere) about what the problem maybe, hopefully including a patch. [ 291.298247] ------------[ cut here ]------------ [ 291.302859] NETDEV WATCHDOG: eth3 (mlx5_core): transmit queue 30 timed out [ 291.309738] WARNING: CPU: 77 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x278/0x280 [ 291.318081] Modules linked in: cfg80211 rfkill mlx5_ib ib_uverbs ib_core acpi_ipmi crct10dif_ce mlx5_core polyval_ce ipmi_ssif polyval_generic arm_spe_pmu ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper mlxfw ipmi_devintf psample pci_hyperv_intf ipmi_msghandler arm_cmn arm_dmc620_pmu xgene_hwmon cppc_cpufreq arm_dsu_pmu acpi_tad ip6_tables xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat sch_fq_codel nf_tables libcrc32c nfnetlink bonding tls tap macvlan bridge stp llc fuse drm dmi_sysfs ip_tables x_tables nvme nvme_core xhci_pci xhci_pci_renesas dm_mod dax zfs(PO) zunicode(PO) zzstd(O) zlua(O) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) overlay [ 291.383581] CPU: 77 PID: 0 Comm: swapper/77 Tainted: P O 6.1.21 #1-NixOS [ 291.391658] Hardware name: GIGABYTE R272-P30-JG/MP32-AR0-JG, BIOS F17a (SCP: 1.07.20210713) 07/22/2021 [ 291.400950] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 291.407899] pc : dev_watchdog+0x278/0x280 [ 291.411895] lr : dev_watchdog+0x278/0x280 [ 291.415891] sp : ffff80000826bdd0 [ 291.419192] x29: ffff80000826bdd0 x28: ffffcbd179f87000 x27: ffff80000826bee0 [ 291.426315] x26: ffffcbd17962f008 x25: 0000000000000000 x24: ffffcbd179f8ea58 [ 291.433437] x23: 0000000000000100 x22: ffffcbd179f87000 x21: 000000000000001e [ 291.440560] x20: ffff07ff9b6c0000 x19: ffff07ff9b6c0488 x18: 0000000000000006 [ 291.447682] x17: ffff3c6ce6977000 x16: ffff80000826c000 x15: ffff80000826b910 [ 291.454804] x14: 0000000000000000 x13: 74756f2064656d69 x12: 7420303320657565 [ 291.461926] x11: 00000000ffffbfff x10: ffff083f5fec3bc0 x9 : ffffcbd1770095cc [ 291.469048] x8 : 000000000005ffe8 x7 : c0000000ffffbfff x6 : 0000000000000000 [ 291.476170] x5 : ffff083e5ffa8b50 x4 : ffff083e5ffa8b50 x3 : ffff083e5ffb4cb0 [ 291.483292] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff07ff81582dc0 [ 291.490414] Call trace: [ 291.492848] dev_watchdog+0x278/0x280 [ 291.496497] call_timer_fn+0x3c/0x15c [ 291.500149] __run_timers+0x2e8/0x3a0 [ 291.503799] run_timer_softirq+0x28/0x50 [ 291.507709] __do_softirq+0x128/0x368 [ 291.511359] ____do_softirq+0x18/0x24 [ 291.515009] call_on_irq_stack+0x2c/0x60 [ 291.518919] do_softirq_own_stack+0x24/0x3c [ 291.523089] __irq_exit_rcu+0x148/0x150 [ 291.526913] irq_exit_rcu+0x18/0x24 [ 291.530388] el1_interrupt+0x38/0x54 [ 291.533953] el1h_64_irq_handler+0x18/0x2c [ 291.538036] el1h_64_irq+0x64/0x68 [ 291.541425] cpuidle_enter_state+0xbc/0x440 [ 291.545598] cpuidle_enter+0x40/0x60 [ 291.549162] do_idle+0x234/0x2c0 [ 291.552378] cpu_startup_entry+0x30/0x3c [ 291.556288] secondary_start_kernel+0x130/0x154 [ 291.560807] __secondary_switched+0xb0/0xb4 [ 291.564978] ---[ end trace 0000000000000000 ]---
Drafted because I haven't yet tested this (I tested by rolling back to a nixos-unstable commit prior to the default switching to 6.1). Will undraft and merge once the machine comes back up in a good state (which I expect it to, but kernel issues are always fun). |
It worked. |
Bisecting this won't be fun, but probably worthwhile doing before the 23.05 release. |
Yeah, I'll probably spend my day doing that tomorrow... Note to self:
|
Progress update: I'm fairly certain I've found the problematic commit after bisecting. Running with that commit reliably stalled, and reverting it did not (in any meaningful amount of time). I'll probably be drafting an email to the kernel mailing list about this tomorrow (which list specifically, though? I don't know yet :).
|
|
Did a little bit of searching on the kernel archives and noticed these two patches that are related:
I don't know how likely it is this gets backported because it doesn't apply cleanly to 6.1.23 ( |
Is this still an issue? |
I haven't tested any more recent kernels, so I don't know. It looks like Nixpkgs' |
The 6.1.x series of kernels has busted networking on the community box, so we pin to 5.15.x which works fine. I'll look around to see what the problem may be, and see if I can either bisect the issue, or find something in the kernel archives (somewhere) about what the problem maybe, hopefully including a patch.