Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DSP Panic on Intel MTL #9695

Open
as400l opened this issue Nov 29, 2024 · 14 comments · May be fixed by thesofproject/linux#5267
Open

[BUG] DSP Panic on Intel MTL #9695

as400l opened this issue Nov 29, 2024 · 14 comments · May be fixed by thesofproject/linux#5267
Labels
bug Something isn't working as expected P2 Critical bugs or normal features
Milestone

Comments

@as400l
Copy link

as400l commented Nov 29, 2024

Describe the bug
DSP Panic seen and full freeze of the OS.

To Reproduce
Open pavucontrol mute/unmute microphone few times. Close pavucontrol. Wait for freeze.

Reproduction Rate
100%

Expected behavior
No DSP Panic.

Impact
Cannot use builtin microphone.

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
    • Kernel: 6.12.1
    • SOF: sof-bin 2024.09.1
  2. Name of the topology file
    • Topology: sof-hda-generic-2ch.tplg
  3. Name of the platform(s) on which the bug is observed.
    • Platform: Intel Meteor Lake Ultra 9 185H, Asus Zenbook 14 OLED UX3405M, Alpine Linux

Screenshots or console output

[  186.448058] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
[  186.448069] sof-audio-pci-intel-mtl 0000:00:1f.3: DSP panic!
[  186.448071] sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[  186.448078] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x50000005: module: ROM_EXT, state: FW_ENTERED, running
[  186.448083] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware state: 0x5, status/error code: 0x0
[  186.448116] sof-audio-pci-intel-mtl 0000:00:1f.3: Unknown toolchain is used
[  186.448120] sof-audio-pci-intel-mtl 0000:00:1f.3: error: DSP Firmware Oops
[  186.448121] sof-audio-pci-intel-mtl 0000:00:1f.3: error: Exception Cause: AllocaCause, MOVSP instruction, if caller’s registers are not in the register file
[  186.448123] sof-audio-pci-intel-mtl 0000:00:1f.3: EXCCAUSE 0x00000005 EXCVADDR 0x00000000 PS       0x00060d20 SAR     0x0000000c
[  186.448126] sof-audio-pci-intel-mtl 0000:00:1f.3: EPC1     0xa007626d EPC2     0x00000000 EPC3     0x00000000 EPC4    0x00000000
[  186.448128] sof-audio-pci-intel-mtl 0000:00:1f.3: EPC5     0x00000000 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
[  186.448129] sof-audio-pci-intel-mtl 0000:00:1f.3: EPS2     0x00000000 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00000000
[  186.448131] sof-audio-pci-intel-mtl 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00000000
[  186.448132] sof-audio-pci-intel-mtl 0000:00:1f.3: stack dump from 0x00000000
[  186.448134] sof-audio-pci-intel-mtl 0000:00:1f.3: AR registers:
[  186.448136] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x0: a004ed15 a0111680 00000000 4015a7c0
[  186.448138] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x10: a0166b00 00000018 401492b0 a0111680
[  186.448140] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x20: a005fb41 a0111640 401492b0 a006506c
[  186.448142] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x30: a005fb41 a0111640 401492b0 a006506c
[  186.448144] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump end ]------------
[  186.946817] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0xe030001|0x300
[  186.946837] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
[  186.946851] sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0x8e030001|0x300|0x0, target: 0x1b0a0000|0x0|0x0, ctl: 0x3
[  186.946856] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump end ]------------
[  186.946859] sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout
[  186.946866] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -110
[  186.946878]  HDMI2: ASoC: trigger FE cmd: 1 failed: -110
[  186.946897] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0xe010001|0x0 failed: -19
[  186.946902] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -19
[  186.946904]  HDMI2: ASoC: trigger FE cmd: 0 failed: -19
[  186.947086] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x13000003|0x1 failed: -19
[  186.947091] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to pause all pipelines
[  186.947093] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_component_trigger on 0000:00:1f.3: -19
[  186.947096]  DMIC Raw: ASoC: trigger FE cmd: 0 failed: -19
[  186.947198] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x46060004|0x19 failed: -19
[  186.947203] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to unbind modules module-copier.12.2:0 -> tdfb.11.1:0
[  186.947208] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x12040000|0x0 failed: -19
[  186.947212] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to free pipeline widget pipeline.12
[  186.947219] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x12050000|0x0 failed: -19
[  186.947222] sof-audio-pci-intel-mtl 0000:00:1f.3: failed to free pipeline widget pipeline.11
[  186.947225] sof-audio-pci-intel-mtl 0000:00:1f.3: Failed to free connected widgets
[  186.947233] sof-audio-pci-intel-mtl 0000:00:1f.3: sof_pcm_stream_free: sof_widget_list_free failed -19
[  186.947236] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_prepare on 0000:00:1f.3: -19
[  186.947240]  DMIC Raw: ASoC: error at __soc_pcm_prepare on DMIC Raw: -19
[  186.947243]  DMIC Raw: ASoC: error at dpcm_fe_dai_prepare on DMIC Raw: -19
[  186.947344] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc4_tx_msg_unlocked: ipc message send for 0x13020003|0x0 failed: -19
[  186.947348] sof-audio-pci-intel-mtl 0000:00:1f.3: ASoC: error at soc_dai_trigger on Analog CPU DAI: -19
[  186.947354]  HDA Analog: ASoC: error at dpcm_be_dai_trigger on HDA Analog: -19
[  186.947357]  HDA Analog: ASoC: trigger FE cmd: 0 failed: -19

Full dmesg in attachment.
dmesg.txt

@as400l as400l added the bug Something isn't working as expected label Nov 29, 2024
@lgirdwood
Copy link
Member

@as400l are you able to see this with alsamixer ? and if so which DMIC Kcontrol ?
@ujfalusi any additional kernel debug options to enable ?

@lgirdwood lgirdwood added this to the v2.12 milestone Nov 29, 2024
@lgirdwood lgirdwood added the P2 Critical bugs or normal features label Nov 29, 2024
@ujfalusi
Copy link
Contributor

As usual, @as400l:
Can you add this file sof-dyndbg.conf.txt
as /etc/modprobe.d/sof-dyndbg.conf, reboot and re-attach the dmesg log which contains the boot and the error itself?

In case the log is truncated because of a small log buffer, please add log_buf_len=4M to the kernel command line parameter (passed by the bootloader to the kernel)

@as400l
Copy link
Author

as400l commented Nov 29, 2024

Here is dmesg with the error and sof-dyndbg.conf enabled.

BTW - isn't it strange that it uses sof-hda-generic-2ch.tplg file ?

@lgirdwood - I tried with alsamixer but can't reproduce it. But, on the other hand, with alsamixer I can't unmute the mic. I have this LED on keyboard and no matter what I tried with alsamixer it's just constantly on. Which means that the mic was not unmuted.

dmesg.log.gz

@ujfalusi
Copy link
Contributor

@as400l, for some reason the dyndbg did not enabled the debug prints, we don't see what was the last message that was sent to the firmware, we know that the next would have been 0xe010002|0x0, which is not sent as the firmware has crashed.
Can you check again if the dyndbg is in place? The probing should be much more verbose with lots of prints about modules and stuff.

sof-hda-generic-2ch.tplg is chosen, because you have DMIC in your system

[   15.097047] sof-audio-pci-intel-mtl 0000:00:1f.3: DMICs detected in NHLT tables: 2

you also have BT offload advertised:

[   15.097044] sof-audio-pci-intel-mtl 0000:00:1f.3: NHLT device BT(0) detected, ssp_mask 0x4
[   15.097046] sof-audio-pci-intel-mtl 0000:00:1f.3: BT link detected in NHLT tables: 0x4

I'm not sure if that can cause any issues.

You can disable the dmic for testing the analog path (you will loose the laptop microphones) :

options snd_sof_intel_hda_generic dyndbg=+pmf dmic_num=0

in for example /etc/modprobe.d/no-dmic.conf

@as400l
Copy link
Author

as400l commented Nov 29, 2024

I tried multiple times with "wpctl set-mute @DEFAULT_AUDIO_SOURCE@ toggle". But could not reproduce this behaviour.

So maybe the real cause of this is actually XE drm module crash or hang related to pavucontrol ? Which may be seen at the end of dmesg I've sent ? Is this even possible ?

As to the debug prints. My kernel may is really slimmed down. So that may be the reason. May have to try with default distro kernel.

@lgirdwood
Copy link
Member

Here is dmesg with the error and sof-dyndbg.conf enabled.

BTW - isn't it strange that it uses sof-hda-generic-2ch.tplg file ?

@lgirdwood - I tried with alsamixer but can't reproduce it. But, on the other hand, with alsamixer I can't unmute the mic. I have this LED on keyboard and no matter what I tried with alsamixer it's just constantly on. Which means that the mic was not unmuted.

dmesg.log.gz

Ok, its strange that alsamixer wont unmute the mic, I assume you tried alsamixer -c N (where N is card number) to make sure all kcontrols have been tried.

Btw, is the keyboard LED on a key ? i.e. can it be pressed with Fn/Alt/Ctrl/shift combinations to switch LED on/off ? This should be mapped to the kcontrol that will mute/unmute the mic.

Please do try the stock kernel. We need to figure out what has happened here with stock kernel logs.

@as400l
Copy link
Author

as400l commented Dec 2, 2024

@lgirdwood - as I mentioned above - I tried with "wpctl" and it correctly mutes/unmutes microphone. LED goes off/on as it should. But I could not reproduce this error.

I'm leaning towards something else causing this panic.

Stock Alpine kernel was also not helpful since it's probably also stripped.

@as400l
Copy link
Author

as400l commented Dec 2, 2024

@lgirdwood
@ujfalusi

I compiled a kernel with DYNAMIC_DEBUG and here are logs with the error. I had to try to trigger it mutliple times as this time it wasn't so eager to panic.
Panic is at "313.393689".

dmesg.log.gz

@ujfalusi
Copy link
Contributor

ujfalusi commented Dec 3, 2024

Based on the log I think it is the ChainDMA (HDMI audio) which is causing the firmware panic:

[  313.391960] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-mtl 0000:00:1f.3: pcm: trigger stream 4 dir 0 cmd 1
[  313.391963] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-mtl 0000:00:1f.3: trigger cmd: 1 state: 4
[  313.391966] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc tx      : 0xe030001|0x300
[  313.393682] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc rx      : 0x1b0a0000|0x0
[  313.393687] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
[  313.393689] sof-audio-pci-intel-mtl 0000:00:1f.3: DSP panic!
[  313.393691] sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[  313.393695] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x50000005: module: ROM_EXT, state: FW_ENTERED, running
[  313.393700] sof-audio-pci-intel-mtl 0000:00:1f.3: Firmware state: 0x5, status/error code: 0x0
[  313.393733] sof-audio-pci-intel-mtl 0000:00:1f.3: Unknown toolchain is used
[  313.393735] sof-audio-pci-intel-mtl 0000:00:1f.3: error: DSP Firmware Oops
[  313.393737] sof-audio-pci-intel-mtl 0000:00:1f.3: error: Exception Cause: AllocaCause, MOVSP instruction, if caller’s registers are not in the register file
[  313.393741] sof-audio-pci-intel-mtl 0000:00:1f.3: EXCCAUSE 0x00000005 EXCVADDR 0x00000000 PS       0x00060d20 SAR     0x0000000c
[  313.393745] sof-audio-pci-intel-mtl 0000:00:1f.3: EPC1     0xa007626d EPC2     0x00000000 EPC3     0x00000000 EPC4    0x00000000
[  313.393748] sof-audio-pci-intel-mtl 0000:00:1f.3: EPC5     0x00000000 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
[  313.393750] sof-audio-pci-intel-mtl 0000:00:1f.3: EPS2     0x00000000 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00000000
[  313.393752] sof-audio-pci-intel-mtl 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00000000
[  313.393754] sof-audio-pci-intel-mtl 0000:00:1f.3: stack dump from 0x00000000
[  313.393756] sof-audio-pci-intel-mtl 0000:00:1f.3: AR registers:
[  313.393759] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x0: a004ed15 a0111680 00000000 40152c80
[  313.393762] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x10: a0166740 00000018 40149740 a0111680
[  313.393764] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x20: a005fb41 a0111640 40149740 a006506c
[  313.393766] sof-audio-pci-intel-mtl 0000:00:1f.3: 0x30: a005fb41 a0111640 40149740 a006506c
[  313.393768] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump end ]------------
[  313.393770] snd_sof:sof_set_fw_state: sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state change: 7 -> 8
[  313.393774] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-mtl 0000:00:1f.3: ipc rx done : 0x1b0a0000|0x0
[  313.898625] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0xe030001|0x300
[  313.898645] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
[  313.898657] sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0x8e030001|0x300|0x0, target: 0x1b0a0000|0x0|0x0, ctl: 0x3
[  313.898661] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump end ]------------
[  313.898663] sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout

0xe030001 is ChainDMA with ALLOCATE and ENABLE bit set, but what is not right is that the Host DMA ID is 1 while the Link DMA ID is 0.
We had similar issue in past (thesofproject/linux#5116) which supposed to be fixed by thesofproject/linux#5119.

There are lots of things happening in the log, but looks like something (PW?) is trying PCMs at random keeping them open and stopping, starting, reconfiguring them.

@as400l
Copy link
Author

as400l commented Dec 3, 2024

@ujfalusi - just to remind - this happens only while using pavucontrol which is actually PulseAudio tool.
I could not reproduce this while using native WirePlumber tool - wpctl.

@ujfalusi
Copy link
Contributor

ujfalusi commented Dec 3, 2024

OK, so to reproduce the issue:
aplay -Dhw:0,3 -c8 -r48000 -fS32_LE /dev/zero -d 120

[ 2810.282081] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm3 (HDMI1), dir 0: Entry: trigger (cmd: 1)
[ 2810.282087] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm3 (HDMI1), dir 0: cmd: 1, state: 4
[ 2810.282093] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0xe030000|0xc00: GLB_CHAIN_DMA
[ 2810.282656] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx reply: 0x2e000000|0xc00: GLB_CHAIN_DMA
[ 2810.282692] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx done : 0xe030000|0xc00: GLB_CHAIN_DMA
[ 2810.283232] snd_sof_intel_hda_common:hda_dsp_stream_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x2014001e successful

Press <CTRL+z> to freeze aplay

[ 2814.029625] snd_sof:sof_pcm_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm3 (HDMI1), dir 0: Entry: trigger (cmd: 0)
[ 2814.029633] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm3 (HDMI1), dir 0: cmd: 0, state: 3
[ 2814.029645] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0xe010000|0x0: GLB_CHAIN_DMA
[ 2814.030855] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx reply: 0x2e000000|0x0: GLB_CHAIN_DMA
[ 2814.031022] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx done : 0xe010000|0x0: GLB_CHAIN_DMA
[ 2814.031034] snd_soc_core:dpcm_be_dai_trigger:  iDisp1: ASoC: trigger BE iDisp1 cmd 0
[ 2814.031045] snd_sof_intel_hda_common:hda_dai_trigger: sof-audio-pci-intel-tgl 0000:00:1f.3: cmd=0 dai iDisp1 Pin direction 0

wait a sec or two then start a new HDMI playback (while the :0,3 is frozen):
aplay -Dhw:0,4 -c8 -r48000 -fS32_LE /dev/zero -d 120

[ 2823.354025] snd_sof:sof_ipc4_trigger_pipelines: sof-audio-pci-intel-tgl 0000:00:1f.3: pcm4 (HDMI2), dir 0: cmd: 1, state: 4
[ 2823.354033] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc tx      : 0xe030001|0xc00: GLB_CHAIN_DMA
[ 2823.357361] snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-tgl 0000:00:1f.3: ipc rx      : 0x1b0a0000|0x0: GLB_NOTIFICATION|EXCEPTION_CAUGHT
[ 2823.357367] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 2823.357370] sof-audio-pci-intel-tgl 0000:00:1f.3: DSP panic!
[ 2823.357373] sof-audio-pci-intel-tgl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[ 2823.357381] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x00000005: module: ROM, state: FW_ENTERED, running
[ 2823.357490] sof-audio-pci-intel-tgl 0000:00:1f.3: FW is built with Zephyr toolchain
[ 2823.357493] sof-audio-pci-intel-tgl 0000:00:1f.3: error: DSP Firmware Oops
[ 2823.357496] sof-audio-pci-intel-tgl 0000:00:1f.3: error: Exception Cause: AllocaCause, MOVSP instruction, if caller’s registers are not in the register file
[ 2823.357499] sof-audio-pci-intel-tgl 0000:00:1f.3: EXCCAUSE 0x00000005 EXCVADDR 0x00000000 PS       0x00060f20 SAR     0x0000001d
[ 2823.357503] sof-audio-pci-intel-tgl 0000:00:1f.3: EPC1     0xbe04126c EPC2     0x00000000 EPC3     0x00000000 EPC4    0x00000000
[ 2823.357507] sof-audio-pci-intel-tgl 0000:00:1f.3: EPC5     0x00000000 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
[ 2823.357510] sof-audio-pci-intel-tgl 0000:00:1f.3: EPS2     0x00000000 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00000000
[ 2823.357513] sof-audio-pci-intel-tgl 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00000000
[ 2823.357515] sof-audio-pci-intel-tgl 0000:00:1f.3: stack dump from 0x00000000
[ 2823.357518] sof-audio-pci-intel-tgl 0000:00:1f.3: AR registers:
[ 2823.357521] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x0: be04156b be0a2eb0 9e0b1700 be0b17c0
[ 2823.357525] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x10: fff001ff 00000000 00003000 be0a2eb0
[ 2823.357540] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x20: 00000000 be0a2e90 9e0a8630 00060f25
[ 2823.357546] sof-audio-pci-intel-tgl 0000:00:1f.3: 0x30: 00000000 be0a2e90 9e0a8630 00060f25
[ 2823.357551] sof-audio-pci-intel-tgl 0000:00:1f.3: ------------[ DSP dump end ]------------

@ujfalusi
Copy link
Contributor

ujfalusi commented Dec 3, 2024

Reverting 7eab5d86f218 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop") is fixing this particular issue.

That patch was part of thesofproject/linux#5197, which was fixing various metallic noise issues around similar sequences.

The issue is not limited to ChainDMA
TGL HDA machine will fail:

aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

or will cause fw panic:

aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120

On LNL sdw it is the same with all endpoints:

aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,2 -c2 -r48000 -fS32_LE /dev/zero -d 120

Only the ChainDMA PCMs will cause panic, others will fail.

@ranj063, I think it might be because we release the LinkDMA channel in sof/intel/ but we don't inform the firmware about this (we don't do a full stop) and this is causing a race if a new PCM comes in between the stop and the prepare/hw_params/start of the other PCM.

ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 9, 2024
…IPC4

We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 9, 2024
…IPC4

We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
@ujfalusi
Copy link
Contributor

ujfalusi commented Dec 9, 2024

@ranj063, @as400l, this patch fixes the issue for me: thesofproject/linux#5267

@lgirdwood
Copy link
Member

@abonislawski fyi - for FW panic, it probably worth checking if this panic is due to HW state transition (fixed above in SW) and if it needs a FW fix too. Thanks !

ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 10, 2024
We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 10, 2024
We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 10, 2024
We need to reclaim the link DMA channel after clearing it with IPC4 as
the pipelines are not cleared in firmware, the Link DMA channel is
preserved.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Dec 11, 2024
The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.

This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120

then the link DMA channels will be mixed up, resulting firmware error or
crash.

Fixes: ab55937 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop")
Closes: thesofproject/sof#9695
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected P2 Critical bugs or normal features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants