Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SubDevice Implementation Master Issue #13655

Open
10 of 17 tasks
tt-aho opened this issue Oct 9, 2024 · 2 comments
Open
10 of 17 tasks

SubDevice Implementation Master Issue #13655

tt-aho opened this issue Oct 9, 2024 · 2 comments
Assignees

Comments

@tt-aho
Copy link
Contributor

tt-aho commented Oct 9, 2024

  • SubDevice object

    • Vec of CoreRangeSets by CoreType
  • SubDeviceManager Object

    • Allocator changes to support reduced size global L1 allocator, and local SubDevice allocators
    • State required to be tracked per manager
    • APIs for Create/Load/Clear of CoreMeshManager
    • Event/finish APIs need to take in SubDevices to block on
    • Program spanning multiple sub-devices (not needed for llama?)
      • CB validation needs to be across used sub-devices
      • Syncing/levelling of ring buffers for semaphores

Potential dependencies for feature completeness of SubDevice

  • Refactor of program to separate api and state
    • Potentially needed since we only finalize programs once, but recompiling for a different device may have different commands, so we need to track state by device. May not be immediately needed since op infra does not recompile a program across multiple devices
  • API changes for accessing rtas/updating ptrs (ETA ?)
    • Needed for generality of running the same program with different SubDeviceManagers loaded. May not be immediately needed for Llama deadline

Potentially needed features that depend on SubCoreMesh:

@tt-aho tt-aho self-assigned this Oct 9, 2024
tt-aho added a commit that referenced this issue Oct 29, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE will be updated to take in a list of sub devices for blocking on
- Trace will currently track all sub devices. Potential to track specific sub devices (could be automatic) in the future
- EP is currently hardcoded to sub device 0. This will be updated to determine the used sub devices in the future
tt-aho added a commit that referenced this issue Oct 29, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE will be updated to take in a list of sub devices for blocking on in the future. Currently will sync all sub_devices
- Trace will currently track all sub devices. Potential to track specific sub devices (could be automatic) in the future
- EP is currently hardcoded to sub device 0. This will be updated to determine the used sub devices in the future
tt-aho added a commit that referenced this issue Oct 30, 2024
…h kernels

CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Oct 30, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE will be updated to take in a list of sub devices for blocking on in the future. Currently will sync all sub_devices
- Trace will currently track all sub devices. Potential to track specific sub devices (could be automatic) in the future
- EP is currently hardcoded to sub device 0. This will be updated to determine the used sub devices in the future
tt-aho added a commit that referenced this issue Oct 30, 2024
…h kernels

CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Oct 30, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE will be updated to take in a list of sub devices for blocking on in the future. Currently will sync all sub_devices
- Trace will currently track all sub devices. Potential to track specific sub devices (could be automatic) in the future
- EP is currently hardcoded to sub device 0. This will be updated to determine the used sub devices in the future
tt-aho added a commit that referenced this issue Oct 30, 2024
…h kernels

CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Oct 30, 2024
…h kernels

CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Oct 30, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE will be updated to take in a list of sub devices for blocking on in the future. Currently will sync all sub_devices
- Trace will currently track all sub devices. Potential to track specific sub devices (could be automatic) in the future
- EP is currently hardcoded to sub device 0. This will be updated to determine the used sub devices in the future
tt-aho added a commit that referenced this issue Oct 30, 2024
…h kernels

CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Nov 1, 2024
…h kernels

CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
@tt-aho tt-aho changed the title SubCoreMesh Implementation Master Issue SubDevice Implementation Master Issue Nov 1, 2024
tt-aho added a commit that referenced this issue Nov 4, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE will be updated to take in a list of sub devices for blocking on in the future. Currently will sync all sub_devices
- Trace will currently track all sub devices. Potential to track specific sub devices (could be automatic) in the future
- EP is currently hardcoded to sub device 0. This will be updated to determine the used sub devices in the future
tt-aho added a commit that referenced this issue Nov 4, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE will be updated to take in a list of sub devices for blocking on in the future. Currently will sync all sub_devices
- Trace will currently track all sub devices. Potential to track specific sub devices (could be automatic) in the future
- EP is currently hardcoded to sub device 0. This will be updated to determine the used sub devices in the future
tt-aho added a commit that referenced this issue Nov 4, 2024
…h kernels

CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Nov 5, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE will be updated to take in a list of sub devices for blocking on in the future. Currently will sync all sub_devices
- Trace will currently track all sub devices. Potential to track specific sub devices (could be automatic) in the future
- EP is currently hardcoded to sub device 0. This will be updated to determine the used sub devices in the future
tt-aho added a commit that referenced this issue Nov 5, 2024
…h kernels

CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Nov 12, 2024
Add support for splitting a device into multiple SubDevices, as well and maintaining different SubDeviceManager configurations, owned by device
Add basic tests to validate sub-device support
tt-aho added a commit that referenced this issue Nov 12, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE takes in a list of sub devices for blocking/issuing waits on. Will wait on all sub-devices if none are provided
- Trace will track only specific sub devices used
- EP currently only supports one sub-device
- Remove compile time mcast grid and unicast cores from dispatch kernels
  CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Nov 12, 2024
…s, instead of assuming a rectangular grid

Update device allocator related apis to take in a sub-device parameter
tt-aho added a commit that referenced this issue Nov 12, 2024
Add support for splitting a device into multiple SubDevices, as well and maintaining different SubDeviceManager configurations, owned by device
Add basic tests to validate sub-device support
tt-aho added a commit that referenced this issue Nov 13, 2024
Add support for splitting a device into multiple SubDevices, as well and maintaining different SubDeviceManager configurations, owned by device
Add basic tests to validate sub-device support
Update device apis to overload rather than take in optional sub_device ids
Make SubDeviceId, SubDeviceManagerId strong types
Refactor Device/SubDeviceManager state so that the default state is also encapsulated in a SubDeviceManager, and access the active SubDeviceManager through a pointer instead of map lookup
tt-aho added a commit that referenced this issue Nov 13, 2024
…ter go signal command

Instead, we populate a static array on dispatcher when we change sub-device configurations with all noc txn data, and read from it using an offset passed in the go signal command
Remove dynamic allocation of sub-devices/expected workers pairs, and pass them as separate spans
Fix cmd in sweep_pgm_dispatch
tt-aho added a commit that referenced this issue Nov 13, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE takes in a list of sub devices for blocking/issuing waits on. Will wait on all sub-devices if none are provided
- Trace will track only specific sub devices used
- EP currently only supports one sub-device
- Remove compile time mcast grid and unicast cores from dispatch kernels
  CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Nov 13, 2024
…s, instead of assuming a rectangular grid

Update device allocator related apis to take in a sub-device parameter
tt-aho added a commit that referenced this issue Nov 13, 2024
Add support for splitting a device into multiple SubDevices, as well and maintaining different SubDeviceManager configurations, owned by device
Add basic tests to validate sub-device support
Update device apis to overload rather than take in optional sub_device ids
Make SubDeviceId, SubDeviceManagerId strong types
Refactor Device/SubDeviceManager state so that the default state is also encapsulated in a SubDeviceManager, and access the active SubDeviceManager through a pointer instead of map lookup
tt-aho added a commit that referenced this issue Nov 13, 2024
…ter go signal command

Instead, we populate a static array on dispatcher when we change sub-device configurations with all noc txn data, and read from it using an offset passed in the go signal command
Remove dynamic allocation of sub-devices/expected workers pairs, and pass them as separate spans
Fix cmd in sweep_pgm_dispatch
tt-aho added a commit that referenced this issue Nov 13, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE takes in a list of sub devices for blocking/issuing waits on. Will wait on all sub-devices if none are provided
- Trace will track only specific sub devices used
- EP currently only supports one sub-device
- Remove compile time mcast grid and unicast cores from dispatch kernels
  CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Nov 13, 2024
…s, instead of assuming a rectangular grid

Update device allocator related apis to take in a sub-device parameter
tt-aho added a commit that referenced this issue Nov 13, 2024
Add support for splitting a device into multiple SubDevices, as well and maintaining different SubDeviceManager configurations, owned by device
Add basic tests to validate sub-device support
Update device apis to overload rather than take in optional sub_device ids
Make SubDeviceId, SubDeviceManagerId strong types
Refactor Device/SubDeviceManager state so that the default state is also encapsulated in a SubDeviceManager, and access the active SubDeviceManager through a pointer instead of map lookup
tt-aho added a commit that referenced this issue Nov 13, 2024
…ter go signal command

Instead, we populate a static array on dispatcher when we change sub-device configurations with all noc txn data, and read from it using an offset passed in the go signal command
Remove dynamic allocation of sub-devices/expected workers pairs, and pass them as separate spans
Fix cmd in sweep_pgm_dispatch
tt-aho added a commit that referenced this issue Nov 13, 2024
…ter go signal command

Instead, we populate a static array on dispatcher when we change sub-device configurations with all noc txn data, and read from it using an offset passed in the go signal command
Remove dynamic allocation of sub-devices/expected workers pairs, and pass them as separate spans
Fix cmd in sweep_pgm_dispatch
tt-aho added a commit that referenced this issue Nov 13, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE takes in a list of sub devices for blocking/issuing waits on. Will wait on all sub-devices if none are provided
- Trace will track only specific sub devices used
- EP currently only supports one sub-device
- Remove compile time mcast grid and unicast cores from dispatch kernels
  CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Nov 13, 2024
…s, instead of assuming a rectangular grid

Update device allocator related apis to take in a sub-device parameter
tt-aho added a commit that referenced this issue Nov 13, 2024
Add support for splitting a device into multiple SubDevices, as well and maintaining different SubDeviceManager configurations, owned by device
Add basic tests to validate sub-device support
Update device apis to overload rather than take in optional sub_device ids
Make SubDeviceId, SubDeviceManagerId strong types
Refactor Device/SubDeviceManager state so that the default state is also encapsulated in a SubDeviceManager, and access the active SubDeviceManager through a pointer instead of map lookup
tt-aho added a commit that referenced this issue Nov 13, 2024
…ter go signal command

Instead, we populate a static array on dispatcher when we change sub-device configurations with all noc txn data, and read from it using an offset passed in the go signal command
Remove dynamic allocation of sub-devices/expected workers pairs, and pass them as separate spans
Fix cmd in sweep_pgm_dispatch
tt-aho added a commit that referenced this issue Nov 14, 2024
Support multiple dispatch entries for worker->dispatch sync
Update dispatch d/s to have a semaphore per dispatch entry to enable syncing on specific worker counts
Update LaunchMessageRingBufferState and WorkerConfigBufferMgr to be tracked per sub_device
Update various FD commands to support syncing on multiple sub devices:
- ERB, EWB, ERE takes in a list of sub devices for blocking/issuing waits on. Will wait on all sub-devices if none are provided
- Trace will track only specific sub devices used
- EP currently only supports one sub-device
- Remove compile time mcast grid and unicast cores from dispatch kernels
  CQDispatchGoSignalMcastCmd now expects noc txn data to follow the cmd for sending go signal to cores
tt-aho added a commit that referenced this issue Nov 14, 2024
…s, instead of assuming a rectangular grid

Update device allocator related apis to take in a sub-device parameter
tt-aho added a commit that referenced this issue Nov 14, 2024
Add support for splitting a device into multiple SubDevices, as well and maintaining different SubDeviceManager configurations, owned by device
Add basic tests to validate sub-device support
Update device apis to overload rather than take in optional sub_device ids
Make SubDeviceId, SubDeviceManagerId strong types
Refactor Device/SubDeviceManager state so that the default state is also encapsulated in a SubDeviceManager, and access the active SubDeviceManager through a pointer instead of map lookup
tt-aho added a commit that referenced this issue Nov 14, 2024
…ter go signal command

Instead, we populate a static array on dispatcher when we change sub-device configurations with all noc txn data, and read from it using an offset passed in the go signal command
Remove dynamic allocation of sub-devices/expected workers pairs, and pass them as separate spans
Fix cmd in sweep_pgm_dispatch
tt-aho added a commit that referenced this issue Nov 14, 2024
…g sub-device managers

Rename clear_loaded_sub_device_manager->reset_active_sub_device_manager
Remove device->allocator_ reacharound in ttnn
tt-aho added a commit that referenced this issue Nov 14, 2024
…g sub-device managers

Rename clear_loaded_sub_device_manager->reset_active_sub_device_manager
Remove device->allocator_ reacharound in ttnn
tt-aho added a commit that referenced this issue Nov 14, 2024
Was not accounting for alignment when calculating expected addresses
tt-aho added a commit that referenced this issue Nov 14, 2024
Was not accounting for alignment when calculating expected addresses
tt-aho added a commit that referenced this issue Nov 14, 2024
Was not accounting for alignment when calculating expected addresses
@pgkeller
Copy link
Contributor

@tt-aho is this still in flight or resolved at this point?

@tt-aho
Copy link
Contributor Author

tt-aho commented Jan 30, 2025

There are a few outstanding features we specced out to support, but wasn't immediately needed for llama support. I can just spin those as separate issues and close this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants