Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace mailbox shows garbage #6

Closed
lgirdwood opened this issue Jun 21, 2018 · 3 comments
Closed

Trace mailbox shows garbage #6

lgirdwood opened this issue Jun 21, 2018 · 3 comments
Assignees
Labels
APL Applies to Apollolake platform
Milestone

Comments

@lgirdwood
Copy link
Member

lgirdwood commented Jun 21, 2018

The trace mailbox is outputting garbage on APL data after patch 80f3d47 was applied.

0x0 [6076024018.780075] delta [] value 0x03a3b651fd7d6ccc
0x10 [26736773379.643696] delta [
] value 0x0f2eac82f5e4a85a
0x20 [304907117229.521912] delta [] value 0xe5847e76513e5b06
0x30 [922837080142.584229] delta [
] value 0xfae37d57071fc582
0x40 [951347671207.020264] delta [] value 0xff17f99f019e7557
0x50 [451660774.405905] delta [
] value 0x00bfe0f6ff85d41d
0x60 [4067790368.758392] delta [] value 0x0132bf27fed5e827
0x70 [4336307921.055160] delta [
] value 0x0102b78afee84ac4

Are we missing some PM or cache writeback ?

@lgirdwood lgirdwood added the bug Something isn't working as expected label Jun 21, 2018
@lgirdwood lgirdwood added this to the 1.2 milestone Jun 21, 2018
@lgirdwood lgirdwood added the APL Applies to Apollolake platform label Jun 21, 2018
@lgirdwood
Copy link
Member Author

Can be reproduced by running "rmbox" on the host. Expectation is that SRAM window may contain 1 or 2 error message but that's all. The rest should be all 0s.

@lgirdwood lgirdwood added regression and removed bug Something isn't working as expected labels Jun 22, 2018
@lgirdwood
Copy link
Member Author

Confirmed on my UP^2.

@keqiaozhang
Copy link
Collaborator

This issue is fixed by #11

RDharageswari pushed a commit to RDharageswari/sof that referenced this issue Jun 6, 2020
This BUG halt was reported a while back, but the patch somehow got
missed:

PID: 2879   TASK: c16adaa0  CPU: 1   COMMAND: "sctpn"
 #0 [f418dd28] crash_kexec at c04a7d8c
 thesofproject#1 [f418dd7c] oops_end at c0863e02
 thesofproject#2 [f418dd90] do_invalid_op at c040aaca
 thesofproject#3 [f418de28] error_code (via invalid_op) at c08631a5
    EAX: f34baac0  EBX: 00000090  ECX: f418deb0  EDX: f5542950  EBP: 00000000
    DS:  007b      ESI: f34ba800  ES:  007b      EDI: f418dea0  GS:  00e0
    CS:  0060      EIP: c046fa5e  ERR: ffffffff  EFLAGS: 00010286
 thesofproject#4 [f418de5c] add_timer at c046fa5e
 thesofproject#5 [f418de68] sctp_do_sm at f8db8c77 [sctp]
 thesofproject#6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp]
 thesofproject#7 [f418df48] inet_shutdown at c080baf9
 thesofproject#8 [f418df5c] sys_shutdown at c079eedf
 thesofproject#9 [f418df70] sys_socketcall at c079fe88
    EAX: ffffffda  EBX: 0000000d  ECX: bfceea90  EDX: 0937af98
    DS:  007b      ESI: 0000000c  ES:  007b      EDI: b7150ae4
    SS:  007b      ESP: bfceea7c  EBP: bfceeaa8  GS:  0033
    CS:  0073      EIP: b775c424  ERR: 00000066  EFLAGS: 00000282

It appears that the side effect that starts the shutdown timer was processed
multiple times, which can happen as multiple paths can trigger it.  This of
course leads to the BUG halt in add_timer getting called.

Fix seems pretty straightforward, just check before the timer is added if its
already been started.  If it has mod the timer instead to min(current
expiration, new expiration)

Its been tested but not confirmed to fix the problem, as the issue has only
occured in production environments where test kernels are enjoined from being
installed.  It appears to be a sane fix to me though.  Also, recentely,
Jere found a reproducer posted on list to confirm that this resolves the
issues

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: jere.leppanen@nokia.com
CC: marcelo.leitner@gmail.com
CC: netdev@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nashif added a commit to nashif/sof that referenced this issue May 7, 2021
marc-hb added a commit to marc-hb/sof that referenced this issue Feb 27, 2024
As discussed in the alternative approach
zephyrproject-rtos/zephyr#68494, k_panic() in
POSIX mode has various shortcomings that do not provide a useful
trace. Useless pointers to signal handlers or other cleanup routines are
printed instead. Leverage our already existing
k_sys_fatal_error_handler() and dereference a NULL pointer there when in
POSIX mode. This "fails fast" and provides a complete and relevant stack
trace in CI when fuzzing or when using some other static
analyzer. Example of how fuzzing failure thesofproject#8832 would have looked like in
CI results thanks to this commit:

```
./build-fuzz/zephyr/zephyr.exe: Running 1 inputs 1 time(s) each.
Running: ./rballoc_align_fuzz_crash
*** Booting Zephyr OS build zephyr-v3.5.0-3971-ge07de4e0a167 ***
[00:00:00.000,000] <inf> main: SOF on native_posix
[00:00:00.000,000] <inf> main: SOF initialized
@ WEST_TOPDIR/sof/zephyr/lib/alloc.c:391
[00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic
[00:00:00.000,000] <err> os: Current thread: 0x891f8a0 (unknown)
[00:00:00.000,000] <err> zephyr: Halting emulation
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1784402==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000
==1784402==The signal is caused by a WRITE memory access.
==1784402==Hint: address points to the zero page.
    #0 0x829a77d in k_sys_fatal_error_handler zephyr/wrapper.c:352:19
    #1 0x829b8c0 in rballoc_align zephyr/lib/alloc.c:391:3
    thesofproject#2 0x8281438 in buffer_alloc src/audio/buffer.c:58:16
    thesofproject#3 0x826a60a in buffer_new src/ipc/ipc-helper.c:48:11
    thesofproject#4 0x8262107 in ipc_buffer_new src/ipc/ipc3/helper.c:459:11
    thesofproject#5 0x825944d in ipc_glb_tplg_buffer_new src/ipc/ipc3/handler.c:1305:8
    thesofproject#6 0x8257036 in ipc_cmd src/ipc/ipc3/handler.c:1651:9
    thesofproject#7 0x8272e59 in ipc_platform_do_cmd src/platform/posix/ipc.c:162:2
    thesofproject#8 0x826a2ac in ipc_do_cmd src/ipc/ipc-common.c:328:9
    thesofproject#9 0x829b0ab in task_run zephyr/include/rtos/task.h:94:9
    thesofproject#10 0x829abd8 in edf_work_handler zephyr/edf_schedule.c:32:16
    thesofproject#11 0x83560f7 in work_queue_main zephyr/kernel/work.c:688:3
    thesofproject#12 0x82244c2 in z_thread_entry zephyr/lib/os/thread_entry.c:48:2
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof that referenced this issue Feb 27, 2024
As discussed in the alternative approach
zephyrproject-rtos/zephyr#68494, k_panic() in
POSIX mode has various shortcomings that do not provide a useful
trace. Useless pointers to signal handlers or other cleanup routines are
printed instead. Leverage our already existing
k_sys_fatal_error_handler() and dereference a NULL pointer there when in
POSIX mode. This "fails fast" and provides a complete and relevant stack
trace in CI when fuzzing or when using some other static
analyzer. Example of how fuzzing failure thesofproject#8832 would have looked like in
CI results thanks to this commit:

```
./build-fuzz/zephyr/zephyr.exe: Running 1 inputs 1 time(s) each.
Running: ./rballoc_align_fuzz_crash
*** Booting Zephyr OS build zephyr-v3.5.0-3971-ge07de4e0a167 ***
[00:00:00.000,000] <inf> main: SOF on native_posix
[00:00:00.000,000] <inf> main: SOF initialized
@ WEST_TOPDIR/sof/zephyr/lib/alloc.c:391
[00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic
[00:00:00.000,000] <err> os: Current thread: 0x891f8a0 (unknown)
[00:00:00.000,000] <err> zephyr: Halting emulation
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1784402==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000
==1784402==The signal is caused by a WRITE memory access.
==1784402==Hint: address points to the zero page.
    #0 0x829a77d in k_sys_fatal_error_handler zephyr/wrapper.c:352:19
    #1 0x829b8c0 in rballoc_align zephyr/lib/alloc.c:391:3
    thesofproject#2 0x8281438 in buffer_alloc src/audio/buffer.c:58:16
    thesofproject#3 0x826a60a in buffer_new src/ipc/ipc-helper.c:48:11
    thesofproject#4 0x8262107 in ipc_buffer_new src/ipc/ipc3/helper.c:459:11
    thesofproject#5 0x825944d in ipc_glb_tplg_buffer_new src/ipc/ipc3/handler.c:1305:8
    thesofproject#6 0x8257036 in ipc_cmd src/ipc/ipc3/handler.c:1651:9
    thesofproject#7 0x8272e59 in ipc_platform_do_cmd src/platform/posix/ipc.c:162:2
    thesofproject#8 0x826a2ac in ipc_do_cmd src/ipc/ipc-common.c:328:9
    thesofproject#9 0x829b0ab in task_run zephyr/include/rtos/task.h:94:9
    thesofproject#10 0x829abd8 in edf_work_handler zephyr/edf_schedule.c:32:16
    thesofproject#11 0x83560f7 in work_queue_main zephyr/kernel/work.c:688:3
    thesofproject#12 0x82244c2 in z_thread_entry zephyr/lib/os/thread_entry.c:48:2
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
lgirdwood pushed a commit that referenced this issue Feb 28, 2024
As discussed in the alternative approach
zephyrproject-rtos/zephyr#68494, k_panic() in
POSIX mode has various shortcomings that do not provide a useful
trace. Useless pointers to signal handlers or other cleanup routines are
printed instead. Leverage our already existing
k_sys_fatal_error_handler() and dereference a NULL pointer there when in
POSIX mode. This "fails fast" and provides a complete and relevant stack
trace in CI when fuzzing or when using some other static
analyzer. Example of how fuzzing failure #8832 would have looked like in
CI results thanks to this commit:

```
./build-fuzz/zephyr/zephyr.exe: Running 1 inputs 1 time(s) each.
Running: ./rballoc_align_fuzz_crash
*** Booting Zephyr OS build zephyr-v3.5.0-3971-ge07de4e0a167 ***
[00:00:00.000,000] <inf> main: SOF on native_posix
[00:00:00.000,000] <inf> main: SOF initialized
@ WEST_TOPDIR/sof/zephyr/lib/alloc.c:391
[00:00:00.000,000] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic
[00:00:00.000,000] <err> os: Current thread: 0x891f8a0 (unknown)
[00:00:00.000,000] <err> zephyr: Halting emulation
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1784402==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000
==1784402==The signal is caused by a WRITE memory access.
==1784402==Hint: address points to the zero page.
    #0 0x829a77d in k_sys_fatal_error_handler zephyr/wrapper.c:352:19
    #1 0x829b8c0 in rballoc_align zephyr/lib/alloc.c:391:3
    #2 0x8281438 in buffer_alloc src/audio/buffer.c:58:16
    #3 0x826a60a in buffer_new src/ipc/ipc-helper.c:48:11
    #4 0x8262107 in ipc_buffer_new src/ipc/ipc3/helper.c:459:11
    #5 0x825944d in ipc_glb_tplg_buffer_new src/ipc/ipc3/handler.c:1305:8
    #6 0x8257036 in ipc_cmd src/ipc/ipc3/handler.c:1651:9
    #7 0x8272e59 in ipc_platform_do_cmd src/platform/posix/ipc.c:162:2
    #8 0x826a2ac in ipc_do_cmd src/ipc/ipc-common.c:328:9
    #9 0x829b0ab in task_run zephyr/include/rtos/task.h:94:9
    #10 0x829abd8 in edf_work_handler zephyr/edf_schedule.c:32:16
    #11 0x83560f7 in work_queue_main zephyr/kernel/work.c:688:3
    #12 0x82244c2 in z_thread_entry zephyr/lib/os/thread_entry.c:48:2
```

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
jsarha pushed a commit to jsarha/sof that referenced this issue Oct 16, 2024
The fuzzer engine has produced crash caused by NULL pointer read that
originated from ipc_stream_pcm_free(). The crash happens when the
pipeline of the found comp_dev does not have a source_comp and
pipeline_reset() is called for it. This commit simply adds a test
for such a situation and bails out if it is found.

Here is the call stack from the situation:

    #0 0x81e9317 in dev_comp_pipe_id sof/sof/src/include/sof/audio/component.h:646:25
    thesofproject#1 0x81e8015 in pipeline_comp_reset sof/sof/src/audio/pipeline/pipeline-graph.c:326:22
    thesofproject#2 0x81e7d1d in pipeline_reset sof/sof/src/audio/pipeline/pipeline-graph.c:393:8
    thesofproject#3 0x820d7ea in ipc_stream_pcm_free sof/sof/src/ipc/ipc3/handler.c:398:8
    thesofproject#4 0x8208969 in ipc_cmd sof/sof/src/ipc/ipc3/handler.c:1689:9
    thesofproject#5 0x81cbed8 in ipc_platform_do_cmd sof/sof/src/platform/posix/ipc.c:162:2
    thesofproject#6 0x81d10db in ipc_do_cmd sof/sof/src/ipc/ipc-common.c:330:9
    thesofproject#7 0x81f87e9 in task_run sof/sof/zephyr/include/rtos/task.h:94:9
    thesofproject#8 0x81f8308 in edf_work_handler sof/sof/zephyr/edf_schedule.c:31:16
    thesofproject#9 0x82b4b32 in work_queue_main sof/zephyr/kernel/work.c:668:3
    thesofproject#10 0x8193ec2 in z_thread_entry sof/zephyr/lib/os/thread_entry.c:36:2
    thesofproject#11 0x815f639 in __asan::AsanThread::ThreadStart(unsigned long long) /src/llvm-project/compiler-rt/lib/asan/asan_thread.cpp:277:25

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
jsarha pushed a commit to jsarha/sof that referenced this issue Oct 17, 2024
The fuzzer engine has produced crash caused by NULL pointer read that
originated from ipc_stream_pcm_free(). The crash happens when the
pipeline of the found comp_dev does not have a source_comp and
pipeline_reset() is called for it. This commit simply adds a test
for such a situation and bails out if it is found.

Here is the call stack from the situation:

    #0 0x81e9317 in dev_comp_pipe_id sof/sof/src/include/sof/audio/component.h:646:25
    thesofproject#1 0x81e8015 in pipeline_comp_reset sof/sof/src/audio/pipeline/pipeline-graph.c:326:22
    thesofproject#2 0x81e7d1d in pipeline_reset sof/sof/src/audio/pipeline/pipeline-graph.c:393:8
    thesofproject#3 0x820d7ea in ipc_stream_pcm_free sof/sof/src/ipc/ipc3/handler.c:398:8
    thesofproject#4 0x8208969 in ipc_cmd sof/sof/src/ipc/ipc3/handler.c:1689:9
    thesofproject#5 0x81cbed8 in ipc_platform_do_cmd sof/sof/src/platform/posix/ipc.c:162:2
    thesofproject#6 0x81d10db in ipc_do_cmd sof/sof/src/ipc/ipc-common.c:330:9
    thesofproject#7 0x81f87e9 in task_run sof/sof/zephyr/include/rtos/task.h:94:9
    thesofproject#8 0x81f8308 in edf_work_handler sof/sof/zephyr/edf_schedule.c:31:16
    thesofproject#9 0x82b4b32 in work_queue_main sof/zephyr/kernel/work.c:668:3
    thesofproject#10 0x8193ec2 in z_thread_entry sof/zephyr/lib/os/thread_entry.c:36:2
    thesofproject#11 0x815f639 in __asan::AsanThread::ThreadStart(unsigned long long) /src/llvm-project/compiler-rt/lib/asan/asan_thread.cpp:277:25

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
jsarha pushed a commit to jsarha/sof that referenced this issue Oct 18, 2024
The fuzzer engine has produced crash caused by NULL pointer read
that originated from ipc3 ipc_stream_pcm_free(). The crash happens
when the pipeline of the found comp_dev does not have a
source_comp and pipeline_reset() is called for it. This commit
adds check to pipeline_comp_test() for this situation and bails
out if it is found.

Here is the call stack from the situation:

    #0 0x81e9317 in dev_comp_pipe_id sof/sof/src/include/sof/audio/component.h:646:25
    thesofproject#1 0x81e8015 in pipeline_comp_reset sof/sof/src/audio/pipeline/pipeline-graph.c:326:22
    thesofproject#2 0x81e7d1d in pipeline_reset sof/sof/src/audio/pipeline/pipeline-graph.c:393:8
    thesofproject#3 0x820d7ea in ipc_stream_pcm_free sof/sof/src/ipc/ipc3/handler.c:398:8
    thesofproject#4 0x8208969 in ipc_cmd sof/sof/src/ipc/ipc3/handler.c:1689:9
    thesofproject#5 0x81cbed8 in ipc_platform_do_cmd sof/sof/src/platform/posix/ipc.c:162:2
    thesofproject#6 0x81d10db in ipc_do_cmd sof/sof/src/ipc/ipc-common.c:330:9
    thesofproject#7 0x81f87e9 in task_run sof/sof/zephyr/include/rtos/task.h:94:9
    thesofproject#8 0x81f8308 in edf_work_handler sof/sof/zephyr/edf_schedule.c:31:16
    thesofproject#9 0x82b4b32 in work_queue_main sof/zephyr/kernel/work.c:668:3
    thesofproject#10 0x8193ec2 in z_thread_entry sof/zephyr/lib/os/thread_entry.c:36:2
    thesofproject#11 0x815f639 in __asan::AsanThread::ThreadStart(unsigned long long) /src/llvm-project/compiler-rt/lib/asan/asan_thread.cpp:277:25

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
lgirdwood pushed a commit that referenced this issue Oct 21, 2024
The fuzzer engine has produced crash caused by NULL pointer read
that originated from ipc3 ipc_stream_pcm_free(). The crash happens
when the pipeline of the found comp_dev does not have a
source_comp and pipeline_reset() is called for it. This commit
adds check to pipeline_comp_test() for this situation and bails
out if it is found.

Here is the call stack from the situation:

    #0 0x81e9317 in dev_comp_pipe_id sof/sof/src/include/sof/audio/component.h:646:25
    #1 0x81e8015 in pipeline_comp_reset sof/sof/src/audio/pipeline/pipeline-graph.c:326:22
    #2 0x81e7d1d in pipeline_reset sof/sof/src/audio/pipeline/pipeline-graph.c:393:8
    #3 0x820d7ea in ipc_stream_pcm_free sof/sof/src/ipc/ipc3/handler.c:398:8
    #4 0x8208969 in ipc_cmd sof/sof/src/ipc/ipc3/handler.c:1689:9
    #5 0x81cbed8 in ipc_platform_do_cmd sof/sof/src/platform/posix/ipc.c:162:2
    #6 0x81d10db in ipc_do_cmd sof/sof/src/ipc/ipc-common.c:330:9
    #7 0x81f87e9 in task_run sof/sof/zephyr/include/rtos/task.h:94:9
    #8 0x81f8308 in edf_work_handler sof/sof/zephyr/edf_schedule.c:31:16
    #9 0x82b4b32 in work_queue_main sof/zephyr/kernel/work.c:668:3
    #10 0x8193ec2 in z_thread_entry sof/zephyr/lib/os/thread_entry.c:36:2
    #11 0x815f639 in __asan::AsanThread::ThreadStart(unsigned long long) /src/llvm-project/compiler-rt/lib/asan/asan_thread.cpp:277:25

Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APL Applies to Apollolake platform
Projects
None yet
Development

No branches or pull requests

5 participants