Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic on shutdown via qemu-ga when booted from ISO image #9017

Closed
mattwillsher opened this issue Jul 16, 2024 · 1 comment · Fixed by #9024
Closed

Kernel panic on shutdown via qemu-ga when booted from ISO image #9017

mattwillsher opened this issue Jul 16, 2024 · 1 comment · Fixed by #9024
Assignees

Comments

@mattwillsher
Copy link

Bug Report

When booted from CD, shutting down via QEMU Guest agent causes a kernel panic.

Description

Running on Proxmox 8.2.4 with Talos Linux 1.7.5 (also tested 1.6.7), booted from an ISO with schematic_id ce4c980550dd2ab1b17bbf2b08801c7eb59418eafe8f279833297925d67c7515 (qemu-agent).

Shutting down the VM via Proxmox API causes a kernel panic.

This does not happen when shutdown is sent and qemu-agent is disabled in the VM settings in Proxmox (that is, shutdown is send via ACPI)

This does not happen once configuration has been run - shutting down via the QEMU Agent works as expected.

Logs

[  177.665653] [talos] shutdown via API received. actor id: 4e8e6603-c3cc-408a-9bd4-7c13e804a7a5
[  177.668181] [talos] task loadConfig (1/1): failed: context canceled
[  177.669196] [talos] phase config (10/11): failed
[  177.669778] [talos] initialize sequence: failed
[  177.670364] show_signal_msg: 28 callbacks suppressed
[  177.670366] init[2149]: segfault at 30 ip 00000000027f1575 sp 000000c000bb9bc0 error 4 in init[400000+2412000] likely on CPU 0 (core 0, socket 0)
[  177.672628] Code: 24 18 02 00 00 48 85 d2 74 0a 80 7a 28 00 74 04 31 d2 eb 36 48 8b 8c 24 08 02 00 00 48 8b 51 28 48 8b 84 24 10 02 00 00 ff d2 <48> 8b 48 30 48 89 d8 ff d1 48 8b 48 50 48 89 d8 ff d1 48 8b 48 68
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x27f1575]

goroutine 2344 [running]:
github.com/siderolabs/talos/internal/app/machined/pkg/runtime/v1alpha1.(*Sequencer).Shutdown(0xc00105b5c0?, {0x36c8ef8, 0xc000228cd0}, 0xc001032900)
        /src/internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_sequencer.go:383 +0x155
github.com/siderolabs/talos/internal/app/machined/pkg/runtime/v1alpha1.(*Controller).phases(0xc0004d0840?, 0x36bbcc0?, {0x2f6a620?, 0xc001032900?})
        /src/internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_controller.go:378 +0xc6
github.com/siderolabs/talos/internal/app/machined/pkg/runtime/v1alpha1.(*Controller).Run(0xc0004d0930, {0x36a59c0?, 0xc0010329f0?}, 0x4, {0x2f6a620, 0xc001032900}, {0xc0001baef0?, 0xc000cfb5f0?, 0x21?})
        /src/internal/app/machined/pkg/runtime/v1alpha1/v1alpha1_controller.go:132 +0x336
github.com/siderolabs/talos/internal/app/machined/internal/server/v1alpha1.(*Server).Shutdown.func1()
        /src/internal/app/machined/internal/server/v1alpha1/v1alpha1_server.go:433 +0x90
created by github.com/siderolabs/talos/internal/app/machined/internal/server/v1alpha1.(*Server).Shutdown in goroutine 2343
        /src/internal/app/machined/internal/server/v1alpha1/v1alpha1_server.go:432 +0x23a
[  177.697378] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000200
[  177.698319] CPU: 0 PID: 2150 Comm: init Not tainted 6.6.33-talos #1
[  177.699096] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 4.2023.08-4 02/15/2024
[  177.700111] Call Trace:
[  177.700412]  <TASK>
[  177.700678]  dump_stack_lvl+0x47/0x60
[  177.701161]  panic+0x1a8/0x380
[  177.701536]  ? raw_spin_rq_lock_nested.constprop.0+0x11/0x60
[  177.702243]  do_exit+0x92c/0xaa0
[  177.702639]  ? __futex_unqueue+0x29/0x40
[  177.703144]  ? futex_unqueue+0x2d/0x60
[  177.703596]  do_group_exit+0x31/0x80
[  177.704062]  get_signal+0x9d4/0xa00
[  177.704487]  arch_do_signal_or_restart+0x3e/0x240
[  177.705096]  exit_to_user_mode_prepare+0xe3/0x130
[  177.705659]  syscall_exit_to_user_mode+0x26/0x50
[  177.706244]  do_syscall_64+0x66/0x80
[  177.706683]  ? clear_bhb_loop+0x25/0x80
[  177.707179]  ? clear_bhb_loop+0x25/0x80
[  177.707639]  ? clear_bhb_loop+0x25/0x80
[  177.708137]  ? clear_bhb_loop+0x25/0x80
[  177.708601]  ? clear_bhb_loop+0x25/0x80
[  177.709094]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[  177.709714] RIP: 0033:0x475f43
[  177.710106] Code: 24 20 c3 cc cc cc cc 48 8b 7c 24 08 8b 74 24 10 8b 54 24 14 4c 8b 54 24 18 4c 8b 44 24 20 44 8b 4c 24 28 b8 ca 00 00 00 0f 05 <89> 44 24 30 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[  177.712358] RSP: 002b:000000c000b17cf8 EFLAGS: 00000286 ORIG_RAX: 00000000000000ca
[  177.713276] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 0000000000475f43
[  177.714145] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000c000830948
[  177.715009] RBP: 000000c000b17d40 R08: 0000000000000000 R09: 0000000000000000
[  177.715875] R10: 0000000000000000 R11: 0000000000000286 R12: 000000c00007e008
[  177.716727] R13: 000000c000d1fe01 R14: 000000c000e9aa80 R15: 0000000000000001
[  177.717575]  </TASK>
[  177.718050] Kernel Offset: disabled
[  177.718473] Rebooting in 10 seconds..

Environment

  • Talos version: v1.7.5,v1.6.7
  • Kubernetes version: n/a
  • Platform: Proxmox 8.2.4
@mattwillsher mattwillsher changed the title Kernel panic on shutdown via qemu-ga when booted from CD Kernel panic on shutdown via qemu-ga when booted from ISO Jul 16, 2024
@mattwillsher mattwillsher changed the title Kernel panic on shutdown via qemu-ga when booted from ISO Kernel panic on shutdown via qemu-ga when booted from ISO image Jul 16, 2024
@smira
Copy link
Member

smira commented Jul 16, 2024

yep, this looks like a bug, thanks for reporting it!

@smira smira self-assigned this Jul 16, 2024
smira added a commit to smira/talos that referenced this issue Aug 5, 2024
Fixes siderolabs#9017

Don't assume the config is there before trying to access it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit d983e44)
smira added a commit to smira/talos that referenced this issue Aug 6, 2024
Fixes siderolabs#9017

Don't assume the config is there before trying to access it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit d983e44)
smira added a commit to smira/talos that referenced this issue Aug 6, 2024
Fixes siderolabs#9017

Don't assume the config is there before trying to access it.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit d983e44)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants