-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Implement distributed.py in the c++ #15061
Comments
@cfjchu Do we have any updates for this bug? either ETA or otherwise from your team? |
@staylorTT we are looking for a simpler multi-device sharding API than what is currently provided in If the multi-device sharding is a blocker for you, would it work to provide |
@omilyutin-tt I think that API is workable for us. Ideally if this is a TTNN API eventually it accepts a tensor type and returns a tensor type (multi-device sharded). It also seems like it might be straightforward to achieve this by just bouncing a host storage tensor through std::vector and turning around and calling this proposed API. Anyway, we'll work with what we can get in the short term, thank you for looking into this :) |
### Ticket #15061 ### What's changed * Refactor `DistributedTensorConfig` in it's own header * Use typed `struct` to represent `MeshShape` and `MeshOffset` ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/12210236362) - [X] New/Existing tests provide coverage for changes
…++ ttnn tensors (#15886) ### Ticket #15755 ### Problem description Multi-device tensor distribution currently works through `distributed.py`, which relies on PyTorch libraries to perform sharding / concatenation. ### What's changed * Add xtensor to ttnn. * Lower facilities from tt-train down to ttnn. In particular: `chunk`, `concatenate` functions along with some conversion utils, and the relevant tests. * Add `distributed_tensor.hpp` header with the multi-device distribution APIs. **In follow up PRs:** * Support bf4 / bf8 and other formats in `from_vector` / `to_vector` and other overloads. * Support outputting a tilized tensor. * Migrate functionality from `pytensor.cpp` to using the new APIs. ### Checklist - [x] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/12333746639/job/34427015707) (failure in clang-tidy in unreleated tt-train directory) - [X] [code analysis run](https://github.com/tenstorrent/tt-metal/actions/runs/12360844971) - [x] [T3K unit + frequent + model reg tests](https://github.com/tenstorrent/tt-metal/actions/runs/12360656141) - same breakage on main. - [X] New/Existing tests provide coverage for changes
#15886 adds the initial support for distributing a tensor across devices. I'm working on a couple of follow ups to support more data types, handle tilized layouts, also some performance optimizations. Please report any issues you are encountering! |
…rmats (#16105) ### Ticket #15061 ### Problem description `to_vector` / `from_vector` don't support some of the special cases, which prevents a more widespread adoption (distributing tensors across mesh of devices in particular). ### What's changed * Support tilized layouts. * Support bf4 / bf8 data types with auto-padding. * Extended `chunk` / `concat` support for the added types. ### Next steps * Optimize certain operations on-device, such as tilization, whenever possible. * Perform auto-padding in tilized layouts / when using sharding. * Switching pytensor logic to using `from_vector` API. ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/12422597810) - [X] New/Existing tests provide coverage for changes --------- Co-authored-by: Oleg Milyutin <omilyutin-tt@tenstorrent.com>
…rmats (#16105) ### Ticket #15061 ### Problem description `to_vector` / `from_vector` don't support some of the special cases, which prevents a more widespread adoption (distributing tensors across mesh of devices in particular). ### What's changed * Support tilized layouts. * Support bf4 / bf8 data types with auto-padding. * Extended `chunk` / `concat` support for the added types. ### Next steps * Optimize certain operations on-device, such as tilization, whenever possible. * Perform auto-padding in tilized layouts / when using sharding. * Switching pytensor logic to using `from_vector` API. ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/12422597810) - [X] New/Existing tests provide coverage for changes --------- Co-authored-by: Oleg Milyutin <omilyutin-tt@tenstorrent.com>
Is your feature request related to a problem? Please describe.
We have a lot of classes implemented in the distribute.py which we need in c++.
But we don't have pytorch in c++. Other library (xtensor) should be used instead.
Also we have to_torch/from_torch functions in python but there are no convenient versions of them.
Describe the solution you'd like
Easiest way to add it is CPM:
CPMAddPackage(NAME xtl GITHUB_REPOSITORY xtensor-stack/xtl GIT_TAG 0.7.7 OPTIONS "XTL_ENABLE_TESTS OFF")
CPMAddPackage(NAME xtensor GITHUB_REPOSITORY xtensor-stack/xtensor GIT_TAG 0.25.0 OPTIONS "XTENSOR_ENABLE_TESTS OFF")
tt-metal/tt-train/sources/ttml/core/tt_tensor_utils.cpp
Line 183 in 758f8c9
Need to make sure that all types are supported. In my reference I didn't add support for 4-8 bit types.
Seems like xtensor can support our custom bfloat16. So it might be useful in some cases.
Describe alternatives you've considered
I've considered to implement a few hacks in the tt-train.
The text was updated successfully, but these errors were encountered: