Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2004: Added shlo to ttir conversion pass for all_gather and updated shlo to ttir conversion test cases for all gather #2018

Merged
merged 1 commit into from
Jan 31, 2025

Conversation

tapspatel
Copy link
Contributor

@tapspatel tapspatel commented Jan 29, 2025

Meshes we will support

  • 1x2
  • 1x8
  • 2x4
  • 1x32
  • 8x4

Multi-device tests consolidation

test/ttmlir/Conversion/StableHLOToTTIR/ccl_ops.mlir

  • all_reduce tests for all meshes
  • all_gather tests for all meshes

@tapspatel tapspatel self-assigned this Jan 29, 2025
@tapspatel tapspatel requested a review from vmilosevic as a code owner January 29, 2025 19:09
@tapspatel
Copy link
Contributor Author

One issue I am trying to get around is preventing all the single chip test cases from running on multi chip systems. LIT gives us the ability to do boolean expressions in its REQUIRES: directive

OR

// REQUIRES: feature1 || feature2

AND

// REQUIRES: feature1, feature2

So I was thinking we can set something like this

// REQUIRES: num-chips-1 || num-chips-2
// REQUIRES: num-chips-32

@tapspatel
Copy link
Contributor Author

and in test/lit.cfg.py , we already do this for multi-chip systems

def set_system_desc_features(system_desc):
    config.available_features.add(system_desc["chip_descs"][0]["arch"])
    if len(system_desc["chip_desc_indices"]) > 1:
        config.available_features.add("multi-chip")
    config.available_features.add(
        "multi-chip-x" + str(len(system_desc["chip_desc_indices"]))
    )

@tapspatel tapspatel requested a review from ctodTT January 29, 2025 19:22
This was referenced Jan 29, 2025
@tapspatel tapspatel linked an issue Jan 29, 2025 that may be closed by this pull request
@tapspatel
Copy link
Contributor Author

we got some tg machine failures related to huge pages. Fixing the machine

@tapspatel tapspatel requested a review from nsmithtt January 31, 2025 18:40
@tapspatel tapspatel changed the title #2004: Added shlo to ttir conversion pass for all_gather, ported over all existing multichip test cases to tg and increased multichip test coverage #2004: Added shlo to ttir conversion pass for all_gather and updated shlo to ttir conversion test cases for all gather Jan 31, 2025
Copy link
Contributor

@nsmithtt nsmithtt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

…shlo to ttir conversion test cases for all gather
@tapspatel tapspatel merged commit b633ae5 into main Jan 31, 2025
28 checks passed
@tapspatel tapspatel deleted the tpatel/issue-2004 branch January 31, 2025 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add shlo to ttir pass for all_gather
2 participants