Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#10671: API to get chip location #10674

Merged
merged 1 commit into from
Jul 25, 2024
Merged

#10671: API to get chip location #10674

merged 1 commit into from
Jul 25, 2024

Conversation

jnie-TT
Copy link
Contributor

@jnie-TT jnie-TT commented Jul 24, 2024

Ticket

Link to Github Issue

What's changed

API added to Device and Cluster to get chip location

@@ -341,6 +341,12 @@ uint32_t Cluster::get_harvested_rows(chip_id_t chip) const {
}
}

eth_coord_t Cluster::get_chip_location(chip_id_t chip) const {
static std::unordered_map<chip_id_t, eth_coord_t> chip_locations = this->cluster_desc_->get_chip_locations();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is the call get_chip_locations? At first glance making this a static seems ok, but we've had nontrivial problems in the past i.e. the same program/test swaps out a cluster file in the same execution, but the static holds stale state.

Copy link
Contributor

@nsmithtt nsmithtt Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking that typically the call to get_chip_location is only something an app (like ttrt) will do offline and/or infrequently so not too worried about overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nsmithtt great point, @aliuTT brought this up too - it's not expensive so I'll just initialize it as a local variable. @aliuTT also suggested we should directly query this cluster API instead of adding it to device. Pushed an update that does that as well.

@jnie-TT jnie-TT force-pushed the jnie/chip_location_api branch from bf63394 to 87bcf36 Compare July 24, 2024 17:16
@@ -341,6 +341,12 @@ uint32_t Cluster::get_harvested_rows(chip_id_t chip) const {
}
}

eth_coord_t Cluster::get_chip_location(chip_id_t chip) const {
std::unordered_map<chip_id_t, eth_coord_t> chip_locations = this->cluster_desc_->get_chip_locations();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it a reference? Or just return the coord directly this->cluster_desc_->get_chip_locations().at(chip). No need to make a copy here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigned it as reference. Still want to catch the case when the chip isn't in the map instead of debugging map_base::at errors.

@jnie-TT jnie-TT force-pushed the jnie/chip_location_api branch from 87bcf36 to 71cb95d Compare July 24, 2024 17:26
@jnie-TT jnie-TT force-pushed the jnie/chip_location_api branch from 71cb95d to 323dfa2 Compare July 24, 2024 18:57
@jnie-TT jnie-TT force-pushed the jnie/chip_location_api branch from 323dfa2 to 65f9fac Compare July 25, 2024 14:12
@jnie-TT jnie-TT force-pushed the jnie/chip_location_api branch from 65f9fac to 9b98198 Compare July 25, 2024 15:00
@jnie-TT jnie-TT merged commit 8bc9f18 into main Jul 25, 2024
5 checks passed
@jnie-TT jnie-TT deleted the jnie/chip_location_api branch July 25, 2024 15:01
@davorchap
Copy link
Collaborator

davorchap commented Jul 25, 2024

@jnie-TT and @jnie-TT: this API is a layer violation, it's a backdoor, and bypasses ttnn and Metal-Runtime APIs.

if you require a new API , please file an API request:

  • use case
  • why a new a new API is needed , ie why current API isn’t sufficient

Metalium stack has these layers:

  • UMD layer
  • FD layer
  • Metal run-time API layer
  • ttnn API layer

Layers shouldn't be bypass, rather if a new API is needed it can be exposed and/or propagated up the layers.

@davorchap
Copy link
Collaborator

In addition to filing a request, the process for fulling a request is:

@jnie-TT
Copy link
Contributor Author

jnie-TT commented Jul 25, 2024

@davorchap I talked to @aliuTT and we agreed to add it as a cluster API and not directly expose it in upper layers. Apologies if this caused any inconvenience, would it be desirable for me to revert this and file an API request? @nsmithtt FYI

@aliuTT
Copy link
Contributor

aliuTT commented Jul 25, 2024

Please revert for now. This commit shouldn't be addressing any needs from MLIR, since we should never be bypassing Device apis anyway. Apologies Jackson, I jumped the gun on approving this PR. We should sync first on what APIs to expose even at the ttCluster level that isn't user facing.

jnie-TT added a commit that referenced this pull request Jul 25, 2024
@jnie-TT
Copy link
Contributor Author

jnie-TT commented Jul 25, 2024

@aliuTT no worries, thanks. Created a PR to revert this here

jnie-TT added a commit that referenced this pull request Jul 25, 2024
@tt-rkim
Copy link
Collaborator

tt-rkim commented Jul 25, 2024

Note that there are no CODEOWNERS for this section of the runtime because runtime team previously complained that PRs took too long.

I'm going to talk to the runtime team about putting more files under purview, but this also means more PRs will need approval.

jdesousa-TT pushed a commit that referenced this pull request Jul 26, 2024
ttmchiou pushed a commit that referenced this pull request Jul 30, 2024
ttmchiou pushed a commit that referenced this pull request Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants