-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLIR] API Request: Query location of device #10745
Comments
Just adding an asterisk, in order to be as flexible as possible, we want this API for TTMetal device, potentially TTNN device can just forward the info from TTMetal device and or modify the API to however it sees fit. We not only would like to know chip locations, but their connectivity. This could be a list of edges where each edge is a pair of device ids + placeholder for additional info, like BW info. This API should also capture ethernet streams that are exclusively reversed for fast dispatch, i.e. not advertise those as available connections. |
We're doing a round of refactor for Device APIs, probably going to pull out useful user-facing APIs into an api list like
|
Hey @aliuTT thanks, I have a couple more runtime API requests I'll file those now! Comments inline:
Chip location as in its physical coordinate in the galaxy system. This should include rack/shelf whatever nomenclature we're using to distinguish chips between galaxies (like on a TGG).
Sounds good.
So I think we're asking for the opposite, we want to know the set of usable links. I..e these APIs should filter out connections / links / chips / etc. that are reserved for fast dispatch and only present usable ones.
Probably a static number of theoretical max (or max that we've physically measured) is the way to go. And then different users could implement whatever heuristic to better model their real world usage. |
Continuing discussions:
For tracking, can you follow what I tagged in this request? Link the
This is an interesting request. Overall ethernet coordinates are only used for link training. Today we have ethernet coordinates as (x, y, rack, shelf), and so it nicely maps to the physical topology. But in the future we will have 3D coordinates for Torus connectivity. Specs are still up in the air but as an example you can have four chips in the same Galaxy shelf/box with coordinates
I meant to say, we do have what you list here. Device APIs today present active links/connectivity that filter out links reserved for fast dispatch.
Sounds good! |
Done! Updated the other 2 issues I filed.
I see, are
Ah ok, I didn't realize this. I suppose we can get all of this information using w/ std::unordered_set<CoreCoord> get_active_ethernet_cores(bool skip_reserved_tunnel_cores=false) And then to get connectivity using: std::tuple<chip_id_t, CoreCoord> get_connected_ethernet_core(CoreCoord eth_core) |
Thanks!
Device ids are guaranteed to be unique. And you're right, ids + APIs should be able to reconstruct full topology.
Right exactly. Or we have this other concept of a socket, which is returns an ordered representation of ethernet cores (also excludes dispatch reserved cores).
So the following ethernet cores are connected:
|
Awesome, thanks @aliuTT for the explanations. I think we probably have what we need then. Will close this ticket and file a new one if something comes up. |
Requesting an API that can be used to query the location of a ttnn device.
Usage: contributes to creating a system description in Uhuru, which includes information of chip locations.
Related ticket: #10671
The text was updated successfully, but these errors were encountered: