Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moved llama benchmark, sglang benchmark, sglang integration, and sdxl to ossci cluster #971

Merged
merged 21 commits into from
Feb 18, 2025

Conversation

Eliasj42
Copy link
Contributor

@Eliasj42 Eliasj42 commented Feb 15, 2025

moved llama benchmark, sglang benchmark, sglang integration, and sdxl to ossci cluster

Elias Joseph added 12 commits February 10, 2025 15:11
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
@saienduri
Copy link
Contributor

let's move this one too:

runs-on: [llama-mi300x-3]

Elias Joseph and others added 3 commits February 14, 2025 19:32
@Eliasj42 Eliasj42 marked this pull request as ready for review February 17, 2025 01:26
Elias Joseph added 3 commits February 16, 2025 19:34
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
Signed-off-by: Elias Joseph <eljoseph@amd.com>
@ScottTodd
Copy link
Member

As on #938, please add a descriptive PR title (e.g. Migrate more workflows to MI300 cluster. instead of Eliasj42/migrate mi300 runners) and tag relevant issues like #793 in the PR description.

@Eliasj42 Eliasj42 changed the title Eliasj42/migrate mi300 runners moved sglang benchmark, sglang integration, and sdxl to ossci cluster Feb 17, 2025
@Eliasj42 Eliasj42 added the infra General category for infrastructure-related requests for common triaging and prioritization label Feb 17, 2025
Signed-off-by: Elias Joseph <eljoseph@amd.com>
@@ -102,6 +102,7 @@ jobs:
env:
VENV_DIR: ${{ github.workspace }}/.venv
HF_HOME: "/data/huggingface"
HF_TOKEN: ${{ secrets.HF_FLUX_TOKEN }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this for now too

Signed-off-by: Elias Joseph <eljoseph@amd.com>
@Eliasj42 Eliasj42 changed the title moved sglang benchmark, sglang integration, and sdxl to ossci cluster moved llama benchmark, sglang benchmark, sglang integration, and sdxl to ossci cluster Feb 18, 2025
@saienduri saienduri self-requested a review February 18, 2025 00:35
Copy link
Contributor

@saienduri saienduri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, we can add ci-sharktank in follow up. looks like things are red TOM anyways at the moment

@Eliasj42 Eliasj42 merged commit 80674bc into main Feb 18, 2025
34 of 35 checks passed
@Eliasj42 Eliasj42 deleted the eliasj42/migrate-mi300-runners branch February 18, 2025 00:58
Copy link

@yamiyysu yamiyysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All migrated workflows passed.

  • Not sure why this shortfin llm test is pending

  • Data dependent test under CI-sharktank is failing in other PRs too.

@saienduri
Copy link
Contributor

saienduri commented Feb 18, 2025

https://github.com/nod-ai/shark-ai/actions/runs/13380841378/job/37369105641?pr=971 this one is running (downloading the hf model at the moment), but Elias already verified that passes before a minor edit that retriggered it now so should be fine

renxida pushed a commit to renxida/shark-ai that referenced this pull request Feb 20, 2025
… to ossci cluster (nod-ai#971)

moved llama benchmark, sglang benchmark, sglang integration, and sdxl to
ossci cluster

---------

Signed-off-by: Elias Joseph <eljoseph@amd.com>
Co-authored-by: Elias Joseph <eljoseph@amd.com>
Co-authored-by: saienduri <saimanas.enduri@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra General category for infrastructure-related requests for common triaging and prioritization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants