-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#11403: SubMesh Support + Porting/Stamping T3K Tests to Galaxy #12962
Conversation
Hey @cfjchu I'm mildly concerned with this inversion of responsibility where the ccl must request the topology it "wants" because in general it shouldn't or wouldn't know what it wants. Instead what I thought the usage model would be is we have a universal entry point for a given ccl op and the ccl op is given APIS to query mesh information so it can set of its a ring or a line. Is this still the case except the mesh device infra knows which op variant to dispatch to? Little confused about this interface. |
@SeanNijjar This is a layered problem. So let's talk about what's currently in In My plan is to commonize |
@@ -196,17 +197,19 @@ Tensor all_gather( | |||
if (num_devices == 1){ | |||
topology = all_gather_op::Topology::Linear; | |||
} | |||
auto mesh_device = MeshDevice::fetch_mesh_device(devices); | |||
auto ring_devices = mesh_device->get_ring_devices(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is really my only current concern.
I agree that we should have only one allgather entrypoint which can temporarily be qualified with a topology. Then allgather can request a chip order accordingly. I also agree we need a way to have a consistent way to get an order of chips.
However, I wouldn't want the op to have to request a ring or line but instead just be given a sequence of chips and it decides if it can make a ring out of it or not. This way we get portability across galaxy variants ( those that are and aren't torused up) and the op can automatically take advantage of the torus without the model writer having to update all their ccl calls.
This is more inline with allgather forcing a line for a 2 chip invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should have only one allgather entrypoint which can temporarily be qualified with a topology. Then allgather can request a chip order accordingly. I also agree we need a way to have a consistent way to get an order of chips.
However, I wouldn't want the op to have to request a ring or line but instead just be given a sequence of chips and it decides if it can make a ring out of it or not.
Seems to conflict. MeshTensor is mapped onto a set of devices, and invocation to ttnn.all_gather(mesh_tensor)
will land here. What we're doing here is requesting the ring devices from the mesh tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
a8df31f
to
83bfde3
Compare
Ticket
Link to Github Issue
Problem description
We want to add support to instantiate submeshes on our mesh. Key reasons:
What's changed
Example API Usage:
For example, create two ring 2x2 on T3000 machines and execute two ring all-gather on 2x2 ring.
Checklist