Skip to content

Commit

Permalink
moved llama benchmark, sglang benchmark, sglang integration, and sdxl…
Browse files Browse the repository at this point in the history
… to ossci cluster (#971)

moved llama benchmark, sglang benchmark, sglang integration, and sdxl to
ossci cluster

---------

Signed-off-by: Elias Joseph <eljoseph@amd.com>
Co-authored-by: Elias Joseph <eljoseph@amd.com>
Co-authored-by: saienduri <saimanas.enduri@amd.com>
  • Loading branch information
3 people authored Feb 18, 2025
1 parent 84a2a3a commit 80674bc
Show file tree
Hide file tree
Showing 6 changed files with 11 additions and 14 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci-llama-large-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
matrix:
version: [3.11]
fail-fast: false
runs-on: llama-mi300x-1
runs-on: linux-mi300-1gpu-ossci
defaults:
run:
shell: bash
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci-sdxl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ env:
jobs:
install-and-test:
name: Install and test
runs-on: mi300x-3
runs-on: linux-mi300-1gpu-ossci

steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
Expand Down
14 changes: 6 additions & 8 deletions .github/workflows/ci-sglang-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ jobs:
matrix:
version: [3.11]
fail-fast: false
runs-on: mi300x-3
runs-on: linux-mi300-1gpu-ossci
defaults:
run:
shell: bash
Expand Down Expand Up @@ -82,7 +82,9 @@ jobs:
- name: Login to huggingface
continue-on-error: true
run: huggingface-cli login --token ${{ secrets.HF_TOKEN }}
run: |
pip install -U "huggingface_hub[cli]"
huggingface-cli login --token ${{ secrets.HF_TOKEN }}
- name: Run Shortfin Benchmark Tests
run: |
Expand All @@ -101,7 +103,7 @@ jobs:
matrix:
version: [3.11]
fail-fast: false
runs-on: mi300x-3
runs-on: linux-mi300-1gpu-ossci
defaults:
run:
shell: bash
Expand Down Expand Up @@ -187,15 +189,11 @@ jobs:
needs: benchmark_sglang
name: "Docker Cleanup"
if: always()
runs-on: mi300x-3
runs-on: linux-mi300-1gpu-ossci
steps:
- name: Stop sglang-server
run: docker stop sglang-server || true # Stop container if it's running

# Deleting image after run due to large disk space requirement (83 GB)
- name: Cleanup SGLang Image
run: docker image rm lmsysorg/sglang:v0.3.5.post1-rocm620

merge_and_upload_reports:
name: "Merge and upload benchmark reports"
needs: [benchmark_shortfin, benchmark_sglang]
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/ci-sglang-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
matrix:
version: [3.11]
fail-fast: false
runs-on: mi300x-3
runs-on: linux-mi300-1gpu-ossci
defaults:
run:
shell: bash
Expand Down Expand Up @@ -69,7 +69,6 @@ jobs:
pip install sentence_transformers
pip freeze
- name: Run Integration Tests
run: |
source ${VENV_DIR}/bin/activate
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def test_shortfin_benchmark(
request,
):
# TODO: Remove when multi-device is fixed
os.environ["ROCR_VISIBLE_DEVICES"] = "1"
os.environ["ROCR_VISIBLE_DEVICES"] = "0"

process, port = server

Expand Down
2 changes: 1 addition & 1 deletion app_tests/integration_tests/llm/sglang/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def model_artifacts(request, tmp_path_factory):

@pytest.fixture(scope="module")
def start_server(request, model_artifacts):
os.environ["ROCR_VISIBLE_DEVICES"] = "1"
os.environ["ROCR_VISIBLE_DEVICES"] = "0"
device_settings = request.param["device_settings"]

server_config = ServerConfig(
Expand Down

0 comments on commit 80674bc

Please sign in to comment.