Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watcher reports stack overflow in VC Packet Router #15440

Closed
nhuang-tt opened this issue Nov 25, 2024 · 0 comments · Fixed by #15441
Closed

Watcher reports stack overflow in VC Packet Router #15440

nhuang-tt opened this issue Nov 25, 2024 · 0 comments · Fixed by #15441
Assignees
Labels
bug Something isn't working

Comments

@nhuang-tt
Copy link
Member

Wormhole B0. When running tests involving the vc_packet_router, watcher will report that the kernel has a stack overflow.

export TT_METAL_WATCHER=1
./build/test/tt_metal/perf_microbenchmark/routing/test_vc_uni_tunnel
./build_Debug/tools/watcher_dump -w -d=0
Watcher detected stack overflow on Device 0 Core (x=1,y=2): brisc! Kernel tt_metal/impl/dispatch/kernels/vc_packet_router.cpp uses 768/768 of the stack.
Watcher dump tool finished
@nhuang-tt nhuang-tt added the bug Something isn't working label Nov 25, 2024
@nhuang-tt nhuang-tt self-assigned this Nov 25, 2024
@nhuang-tt nhuang-tt linked a pull request Nov 25, 2024 that will close this issue
5 tasks
nhuang-tt added a commit that referenced this issue Nov 26, 2024
### Ticket
[Link to Github
Issue](#15440)

### Problem description
Stack overflow in VC Packet Router

### What's changed
Moved the allocation of the 10 packet queues out of the function to the
global data section.

Before: `Watcher detected stack overflow on Device 0 Core (x=1,y=2):
brisc! Kernel tt_metal/impl/dispatch/kernels/vc_packet_router.cpp uses
768/768 of the stack`

After `brisc highest stack usage: 360/768, on core (x=1,y=2), running
kernel tt_metal/impl/dispatch/kernels/vc_packet_router.cpp`

### Checklist
- [ ] Post commit CI passes
- [ ] Blackhole Post commit (if applicable)
- [ ] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] New/Existing tests provide coverage for changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant