Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync #341

Merged
merged 146 commits into from
Oct 5, 2024
Merged

sync #341

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
146 commits
Select commit Hold shift + click to select a range
f069775
initial commit for llama2 gh action
anandhu-eng Sep 26, 2024
6ecd8b4
updated run cmds
anandhu-eng Sep 26, 2024
62c6d9f
added clean tad
anandhu-eng Sep 26, 2024
1fb631f
added submission tags
anandhu-eng Sep 26, 2024
965a402
Merge branch 'mlperf-inference' into llama2_gha_selfhosted
anandhu-eng Sep 26, 2024
909474e
updated to only run if owner is gateoverflow
anandhu-eng Sep 26, 2024
7135715
Added additional env key for host dataset download
anandhu-eng Sep 26, 2024
e30dfe0
Updated env key for 3d unet
anandhu-eng Sep 26, 2024
b2808ff
Modified for host model download - 3d unet
anandhu-eng Sep 26, 2024
76451bc
handle model accuracy variants
anandhu-eng Sep 26, 2024
46e5a28
3d unet dataset made as prehook deps
anandhu-eng Sep 26, 2024
6380c8f
Revert "3d unet dataset made as prehook deps"
anandhu-eng Sep 26, 2024
9c6098e
skips model download if model download to host is enabled
anandhu-eng Sep 26, 2024
e0a79b1
Changed for running in GO-i9 system
anandhu-eng Sep 26, 2024
7935270
Update and rename check-broken-links.md to check-broken-links.yml
arjunsuresh Sep 26, 2024
358b875
Merge branch 'mlcommons:mlperf-inference' into mlperf-inference
arjunsuresh Sep 26, 2024
e4fc590
Fix model deps for nvidia mlperf inference sdxl
arjunsuresh Sep 26, 2024
dd50c6e
Fix tflite dependency for app-mlperf-inference-mlcommons-python
arjunsuresh Sep 26, 2024
9d53999
Merge branch 'mlcommons:mlperf-inference' into mlperf-inference
arjunsuresh Sep 26, 2024
07d0a69
Update check-broken-links.yml
arjunsuresh Sep 26, 2024
75911cd
Merge pull request #301 from GATEOverflow/mlperf-inference
arjunsuresh Sep 26, 2024
9472060
fix typo
anandhu-eng Sep 26, 2024
42bd118
bug fix
anandhu-eng Sep 26, 2024
f56714c
updated with compatable scipy version
anandhu-eng Sep 26, 2024
63d280f
code clean
anandhu-eng Sep 26, 2024
ef31ade
Merge branch 'mlperf-inference' into nvidia-sdxl-v4.1
anandhu-eng Sep 26, 2024
4622237
Syntax correction for version - scipy
anandhu-eng Sep 26, 2024
68d645e
Merge pull request #137 from anandhu-eng/nvidia-sdxl-v4.1
arjunsuresh Sep 26, 2024
b6f54d7
Update code-review.yml
arjunsuresh Sep 26, 2024
6810d7c
Updated - llama2 model download to host
anandhu-eng Sep 27, 2024
e7a9293
Skip model download if user sets llama 2 download to host
anandhu-eng Sep 27, 2024
0455eee
model path env variable updated
anandhu-eng Sep 27, 2024
d3c1f2d
Merge branch 'mlperf-inference' into nvidia-sdxl-v4.1
anandhu-eng Sep 27, 2024
b4e83f4
Added numpy dependency for pycuda
arjunsuresh Sep 27, 2024
a7108f0
Merge pull request #138 from anandhu-eng/nvidia-sdxl-v4.1
arjunsuresh Sep 27, 2024
dd6fe0e
Update code-review.yml
arjunsuresh Sep 27, 2024
1c2d924
Merge pull request #306 from GATEOverflow/mlperf-inference
arjunsuresh Sep 27, 2024
b3cf801
Update test-scc24-sdxl.yaml | added SCC specific result directory
arjunsuresh Sep 27, 2024
c52e446
added starting weights filename
anandhu-eng Sep 27, 2024
4d6d737
Update test-scc24-sdxl.yaml
arjunsuresh Sep 27, 2024
91e792d
Improve the detect-sudo script | make it bot compatible
arjunsuresh Sep 27, 2024
07787c2
added additional tags
anandhu-eng Sep 27, 2024
645ad8f
Downgraded nltk
anandhu-eng Sep 27, 2024
3886ec2
Merge branch 'mlperf-inference' into nvidia-sdxl-v4.1
anandhu-eng Sep 27, 2024
19c6367
Update _cm.yaml
arjunsuresh Sep 27, 2024
8c438fd
Merge pull request #139 from anandhu-eng/nvidia-sdxl-v4.1
arjunsuresh Sep 27, 2024
41d5934
Update test-mlperf-inference-gptj.yml
arjunsuresh Sep 27, 2024
fcd586d
Update test-mlperf-inference-sdxl.yaml
arjunsuresh Sep 27, 2024
cf96b5a
Removed the version restrictions for dlrmv2, tested on torch 2.4, add…
arjunsuresh Sep 27, 2024
83eb295
Added set-user-limit CM script
arjunsuresh Sep 27, 2024
0f98c47
Update test-mlperf-inference-llama2.yml
arjunsuresh Sep 27, 2024
8386ce5
Merge pull request #134 from anandhu-eng/llama2_gha_selfhosted
arjunsuresh Sep 27, 2024
d551274
Merge pull request #308 from GATEOverflow/mlperf-inference
arjunsuresh Sep 27, 2024
341f782
fix indendation
anandhu-eng Sep 27, 2024
b4e6427
Fix issues with detect-sudo
arjunsuresh Sep 27, 2024
4f289ed
Merge pull request #309 from GATEOverflow/mlperf-inference
arjunsuresh Sep 27, 2024
1ded44c
Merge pull request #140 from anandhu-eng/nvidia-sdxl-v4.1
arjunsuresh Sep 27, 2024
b48aed7
Merge branch 'mlcommons:mlperf-inference' into mlperf-inference
arjunsuresh Sep 27, 2024
f524f0d
Merge pull request #310 from GATEOverflow/mlperf-inference
arjunsuresh Sep 27, 2024
aee19ba
Update test-scc24-sdxl.yaml
arjunsuresh Sep 28, 2024
056848f
Update test-scc24-sdxl.yaml | fixes the run
arjunsuresh Sep 28, 2024
61d0f5b
Update test-mlperf-inference-sdxl.yaml | turned off
arjunsuresh Sep 28, 2024
de13f1c
Added fvcore dependency for mlperf inference reference dlrmv2
arjunsuresh Sep 27, 2024
cee2e81
Added variations for pip index and extra-index urls
arjunsuresh Sep 28, 2024
ba39130
Update test-scc24-sdxl.yaml
arjunsuresh Sep 29, 2024
faa5781
Update test-mlperf-inference-gptj.yml
arjunsuresh Sep 29, 2024
cfeb0ab
Merge branch 'mlcommons:mlperf-inference' into mlperf-inference
arjunsuresh Sep 29, 2024
e9f9ced
Update test-scc24-sdxl.yaml
arjunsuresh Sep 29, 2024
f0d7f7e
Update test-scc24-sdxl.yaml
arjunsuresh Sep 29, 2024
663f4eb
Update test-scc24-sdxl.yaml
arjunsuresh Sep 29, 2024
f1728bf
Update test-scc24-sdxl.yaml
arjunsuresh Sep 30, 2024
b3e0caa
commit in reference to https://github.com/mlcommons/cm4mlops/issues/103
anandhu-eng Sep 30, 2024
c9bfea9
Merge branch 'mlperf-inference' into nvidia-sdxl-v4.1
anandhu-eng Sep 30, 2024
dd5a1d8
Added env key - docker_not_pull_update
anandhu-eng Sep 30, 2024
92dbfc7
Default value docker_not_pull_update - False
anandhu-eng Sep 30, 2024
41f17fa
Update test-scc24-sdxl.yaml
arjunsuresh Sep 30, 2024
102f774
added cm pull repo before cm run
anandhu-eng Sep 30, 2024
104886b
Update test-scc24-sdxl.yaml
arjunsuresh Sep 30, 2024
6d503a0
Update test-scc24-sdxl.yaml
arjunsuresh Sep 30, 2024
f0a4699
Update test-scc24-sdxl.yaml
arjunsuresh Sep 30, 2024
5f6ab95
bug fix
anandhu-eng Sep 30, 2024
aa8c8d8
Update test-scc24-sdxl.yaml
arjunsuresh Sep 30, 2024
34f9ccf
Update test-scc24-sdxl.yaml
arjunsuresh Sep 30, 2024
a7376e3
Merge branch 'mlperf-inference' into nvidia-sdxl-v4.1
anandhu-eng Sep 30, 2024
38f6c6c
Merge pull request #141 from anandhu-eng/nvidia-sdxl-v4.1
arjunsuresh Sep 30, 2024
40461ec
Update test-scc24-sdxl.yaml
arjunsuresh Sep 30, 2024
332ec61
Improve detect-sudo for non-interactive shells
arjunsuresh Sep 30, 2024
2f843cb
Added tabulate install in github action
arjunsuresh Oct 1, 2024
fbf3832
Update test-scc24-sdxl.yaml | use SDXL model from host
arjunsuresh Oct 1, 2024
8626640
Update test-scc24-sdxl.yaml | Added the perf+acc run
arjunsuresh Oct 1, 2024
93267a8
Added CHECK_CMD for rsync
arjunsuresh Oct 1, 2024
1fc5a53
Update test-scc24-sdxl.yaml
arjunsuresh Oct 1, 2024
1662691
Update test-scc24-sdxl.yaml | Dont force reinstall cm4mlops
arjunsuresh Oct 1, 2024
88f56e7
Support pip install of loadgen
arjunsuresh Oct 1, 2024
ce56f82
Merge branch 'mlcommons:mlperf-inference' into mlperf-inference
arjunsuresh Oct 1, 2024
8371f0b
Fix bug in get-mlperf-inference-loadgen
arjunsuresh Oct 1, 2024
0fe7e2a
Merge pull request #321 from GATEOverflow/mlperf-inference
arjunsuresh Oct 1, 2024
6f73b44
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 1, 2024
98b9a22
Merge branch 'mlcommons:mlperf-inference' into mlperf-inference
arjunsuresh Oct 1, 2024
4427270
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 1, 2024
459a440
Avoid get,compiler when mlperf run is with python
arjunsuresh Oct 1, 2024
8e9493e
Fix github test for pip-loadgen
arjunsuresh Oct 1, 2024
ea1ae8e
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 1, 2024
e402dbd
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 1, 2024
6845d0e
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 1, 2024
8c22922
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 1, 2024
3a05a1c
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 1, 2024
d676c63
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 1, 2024
1de5a86
Skip get,compiler for mlperf-loadgen when using pypi
arjunsuresh Oct 1, 2024
dcad163
Make mlperf-inference submission checker windows compatible
arjunsuresh Oct 1, 2024
0e3f58b
Make mlperf-inference submission checker windows compatible
arjunsuresh Oct 1, 2024
9426635
Merge pull request #322 from GATEOverflow/mlperf-inference
arjunsuresh Oct 1, 2024
a38837f
Update test-mlperf-inference-llama2.yml
arjunsuresh Oct 1, 2024
12044a3
Merge branch 'mlcommons:mlperf-inference' into mlperf-inference
arjunsuresh Oct 2, 2024
5dbadf5
Handled some more cases for detect-sudo
arjunsuresh Oct 2, 2024
b374e90
Update test-mlperf-inference-gptj.yml
arjunsuresh Oct 2, 2024
6de8677
added hf token - to prevent user interaction if model is absent
anandhu-eng Oct 3, 2024
30751f0
installation of hf cli library limited to local virtual env
anandhu-eng Oct 3, 2024
1d177dd
pip run through python interpreter
anandhu-eng Oct 3, 2024
84a333f
Update test-scc24-sdxl.yaml
arjunsuresh Oct 3, 2024
3e5855c
Update test-mlperf-inference-llama2.yml
arjunsuresh Oct 3, 2024
9779781
Merge pull request #143 from anandhu-eng/hftokenadd
arjunsuresh Oct 3, 2024
dc92673
Update test-mlperf-inference-gptj.yml
arjunsuresh Oct 3, 2024
d5f275a
Fix numpy version for SDXL accuracy
arjunsuresh Oct 3, 2024
8c2f7bc
Merge pull request #332 from GATEOverflow/mlperf-inference
arjunsuresh Oct 3, 2024
4631186
Update test-mlperf-inference-gptj.yml
arjunsuresh Oct 4, 2024
4acebf3
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
cd1e447
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
e446ae1
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
54c9fb9
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
c516986
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
b1151a6
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
d2d187f
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
1e355b7
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
c00de07
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
0edaa6e
Update test-mlperf-inference-resnet50.yml
arjunsuresh Oct 4, 2024
2a0be1b
Merge pull request #335 from GATEOverflow/mlperf-inference
arjunsuresh Oct 4, 2024
06da009
Merge branch 'main' into mlperf-inference
gfursin Oct 4, 2024
8092d05
Merge pull request #338 from mlcommons/dev
arjunsuresh Oct 4, 2024
238325f
Update README.md
arjunsuresh Oct 4, 2024
b10d07e
Merge pull request #331 from mlcommons/mlperf-inference
arjunsuresh Oct 4, 2024
ba1e507
fixing imagenet 500 url
gfursin Oct 5, 2024
73b9b3a
Merge branch 'main' of https://github.com/flexaihq/cm4mlops
gfursin Oct 5, 2024
6ab2217
Merge pull request #7 from mlcommons/main
gfursin Oct 5, 2024
861c4aa
restoring old links to dataset and win bin tools; however need to fin…
gfursin Oct 5, 2024
a7d560e
Merge pull request #340 from flexaihq/main
ctuning-admin Oct 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
name: Check .md README files for broken links
name: "Check .md README files for broken links"

on: [pull_request]
on:
push:
branches:
- master

jobs:
markdown-link-check:
runs-on: ubuntu-latest
# check out the latest version of the code
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

# Checks the status of hyperlinks in .md files in verbose mode
- name: Check links
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/code-review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: OpenAI Code Review

on:
pull_request_target:
types: [opened, synchronize]
types: [opened]
paths:
- 'automation/**'
- 'script/**'
Expand All @@ -15,7 +15,7 @@ permissions:
jobs:
code_review:
runs-on: ubuntu-latest
if: github.repository_owner == 'gateoverflow' && github.event.pull_request.changed_files > 0
if: github.repository_owner == 'gateoverflow_off' && github.event.pull_request.changed_files > 0
steps:
# Run code review via OpenAI
# Step to run the OpenAI Code Review using the GATEOverflow action
Expand Down
9 changes: 6 additions & 3 deletions .github/workflows/test-mlperf-inference-gptj.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ name: MLPerf inference GPT-J

on:
schedule:
- cron: "1 1 * * */3"
- cron: "1 2 * * *"

jobs:
build:
if: github.repository_owner == 'gateoverflow'
runs-on: [ self-hosted, linux, x64 ]
runs-on: [ self-hosted, linux, x64, GO-spr ]
strategy:
fail-fast: false
matrix:
Expand All @@ -24,7 +24,10 @@ jobs:
source gh_action/bin/deactivate || python3 -m venv gh_action
source gh_action/bin/activate
export CM_REPOS=$HOME/GH_CM
cm pull repo --url=${{ github.event.pull_request.head.repo.html_url }} --checkout=${{ github.event.pull_request.head.ref }}
python3 -m pip install cm4mlops
cm pull repo
- name: Test MLPerf Inference GPTJ
run: |
cm run script --tags=run-mlperf,inference,_submission,_short --submitter="MLCommons" --docker --model=gptj-99 --backend=${{ matrix.backend }} --device=cuda --scenario=Offline --test_query_count=1 --precision=${{ matrix.precision }} --target_qps=1 --quiet --docker_it=no --docker_cm_repo=gateoverflow@cm4mlops --adr.compiler.tags=gcc --beam_size=1 --hw_name=gh_action --docker_dt=yes --results_dir=$HOME/gh_action_results --submission_dir=$HOME/gh_action_submissions --clean
cm run script --tags=push,github,mlperf,inference,submission --repo_url=https://github.com/gateoverflow/mlperf_inference_test_submissions_v5.0 --repo_branch=main --commit_message="Results from self hosted Github actions - NVIDIARTX4090" --quiet --submission_dir=$HOME/gh_action_submissions

33 changes: 33 additions & 0 deletions .github/workflows/test-mlperf-inference-llama2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: MLPerf inference LLAMA 2 70B

on:
schedule:
- cron: "30 19 * * 4"

jobs:
build_reference:
if: github.repository_owner == 'gateoverflow'
runs-on: [ self-hosted, GO-i9, linux, x64 ]
strategy:
fail-fast: false
matrix:
python-version: [ "3.12" ]
backend: [ "pytorch" ]
device: [ "cpu" ]

steps:
- name: Install dependencies
run: |
source gh_action/bin/deactivate || python3 -m venv gh_action
source gh_action/bin/activate
export CM_REPOS=$HOME/GH_CM
python3 -m pip install cm4mlops
cm pull repo
python3 -m pip install "huggingface_hub[cli]"
huggingface-cli login --token ${{ secrets.HF_TOKEN }} --add-to-git-credential
- name: Test MLPerf Inference LLAMA 2 70B reference implementation
run: |
cm run script --tags=run-mlperf,inference,_submission,_short --submitter="MLCommons" --model=llama2-70b-99 --implementation=reference --backend=${{ matrix.backend }} --category=datacenter --scenario=Offline --execution_mode=test --device=${{ matrix.device }} --docker --quiet --test_query_count=1 --target_qps=1 --docker_it=no --docker_cm_repo=gateoverflow@cm4mlops --adr.compiler.tags=gcc --hw_name=gh_action --docker_dt=yes --results_dir=$HOME/gh_action_results --submission_dir=$HOME/gh_action_submissions --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST=yes --adr.inference-src.tags=_repo.https://github.com/anandhu-eng/inference.git --clean
30 changes: 24 additions & 6 deletions .github/workflows/test-mlperf-inference-resnet50.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
name: MLPerf inference ResNet50

on:
pull_request:
pull_request_target:
branches: [ "main", "dev", "mlperf-inference" ]
paths:
- '.github/workflows/test-mlperf-inference-resnet50.yml'
Expand All @@ -28,9 +28,7 @@ jobs:
- os: macos-latest
backend: tf
- os: windows-latest
# MLPerf requires interaction when installing LLVM on Windows - that's why we excluded it here


implementation: cpp
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -41,6 +39,26 @@ jobs:
run: |
python3 -m pip install cmind
cm pull repo --url=${{ github.event.pull_request.head.repo.html_url }} --checkout=${{ github.event.pull_request.head.ref }}
- name: Test MLPerf Inference ResNet50
- name: Test MLPerf Inference ResNet50 (Windows)
if: matrix.os == 'windows-latest'
run: |
cm run script --tags=run-mlperf,inference,_submission,_short --submitter="cTuning" --hw_name=default --model=resnet50 --implementation=${{ matrix.implementation }} --backend=${{ matrix.backend }} --device=cpu --scenario=Offline --test_query_count=500 --target_qps=1 -v --quiet
cm run script --tags=run-mlperf,inference,_submission,_short --submitter="MLCommons" --hw_name=gh_windows --model=resnet50 --adr.loadgen.tags=_from-pip --pip_loadgen=yes --implementation=${{ matrix.implementation }} --backend=${{ matrix.backend }} --device=cpu --scenario=Offline --test_query_count=500 --target_qps=1 -v --quiet
- name: Test MLPerf Inference ResNet50 (Linux/macOS)
if: matrix.os != 'windows-latest'
run: |
cm run script --tags=run-mlperf,inference,_submission,_short --submitter="MLCommons" --hw_name=gh_${{ matrix.os }}_x86 --model=resnet50 --implementation=${{ matrix.implementation }} --backend=${{ matrix.backend }} --device=cpu --scenario=Offline --test_query_count=500 --target_qps=1 -v --quiet
- name: Push Results
if: github.repository_owner == 'gateoverflow'
env:
USER: "GitHub Action"
EMAIL: "admin@gateoverflow.com"
run: |
git config --global user.name "$USER"
git config --global user.email "$EMAIL"
git config --global credential.https://git.luolix.top.helper ""
git config --global credential.https://git.luolix.top.helper "!gh auth git-credential"
git config --global credential.https://gist.git.luolix.top.helper ""
git config --global credential.https://gist.git.luolix.top.helper "!gh auth git-credential"

cm run script --tags=auth,gh,cli --with_token="${{ secrets.TEST_RESULTS_GITHUB_TOKEN }}"
cm run script --tags=push,github,mlperf,inference,submission --repo_url=https://github.com/gateoverflow/mlperf_inference_test_submissions_v5.0 --repo_branch=main --commit_message="Results from R50 GH action" --quiet
19 changes: 8 additions & 11 deletions .github/workflows/test-mlperf-inference-sdxl.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
name: MLPerf inference SDXL

#off now as we have SCC24 test doing the same
on:
schedule:
- cron: "1 2 * * *"

jobs:
build_reference:
if: github.repository_owner == 'gateoverflow'
if: github.repository_owner == 'gateoverflow_off'
runs-on: [ self-hosted, linux, x64 ]
strategy:
fail-fast: false
Expand All @@ -15,18 +15,17 @@ jobs:
backend: [ "pytorch" ]
precision: [ "float16" ]
steps:
- name: Install dependencies
- name: Test MLPerf Inference SDXL Reference
run: |
source gh_action/bin/deactivate || python3 -m venv gh_action
source gh_action/bin/activate
export CM_REPOS=$HOME/GH_CM
cm pull repo --url=${{ github.event.pull_request.head.repo.html_url }} --checkout=${{ github.event.pull_request.head.ref }}
- name: Test MLPerf Inference SDXL
run: |
python3 -m pip install cm4mlops
cm pull repo
cm run script --tags=run-mlperf,inference,_submission,_short --submitter="MLCommons" --docker --model=sdxl --backend=${{ matrix.backend }} --device=cuda --scenario=Offline --test_query_count=1 --precision=${{ matrix.precision }} --target_qps=1 --quiet --docker_it=no --docker_cm_repo=gateoverflow@cm4mlops --adr.compiler.tags=gcc --hw_name=gh_action --docker_dt=yes --results_dir=$HOME/gh_action_results --submission_dir=$HOME/gh_action_submissions --clean

build_nvidia:
if: github.repository_owner == 'gateoverflow'
if: github.repository_owner == 'gateoverflow_off'
runs-on: [ self-hosted, linux, x64 ]
strategy:
fail-fast: false
Expand All @@ -36,12 +35,10 @@ jobs:
precision: [ "float16" ]
implementation: [ "nvidia" ]
steps:
- name: Install dependencies
- name: Test MLPerf Inference SDXL Nvidia
run: |
source gh_action/bin/deactivate || python3 -m venv gh_action
source gh_action/bin/activate
export CM_REPOS=$HOME/GH_CM
cm pull repo --url=${{ github.event.pull_request.head.repo.html_url }} --checkout=${{ github.event.pull_request.head.ref }}
- name: Test MLPerf Inference SDXL
run: |
cm pull repo
cm run script --tags=run-mlperf,inference,_submission,_short --submitter="MLCommons" --docker --model=sdxl --implementation=${{ matrix.implementation }} --backend=${{ matrix.backend }} --device=cuda --scenario=Offline --test_query_count=1 --precision=${{ matrix.precision }} --target_qps=1 --quiet --docker_it=no --docker_cm_repo=gateoverflow@cm4mlops --adr.compiler.tags=gcc --hw_name=gh_action --docker_dt=yes --results_dir=$HOME/gh_action_results --submission_dir=$HOME/gh_action_submissions --clean
50 changes: 26 additions & 24 deletions .github/workflows/test-scc24-sdxl.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
name: MLPerf inference SDXL
name: MLPerf inference SDXL (SCC)

on:
schedule:
- cron: "43 1 * * *"
- cron: "1 3 * * *"

jobs:
build_reference:
if: github.repository_owner == 'gateoverflow'
runs-on: [ self-hosted, linux, x64 ]
runs-on: [ self-hosted, linux, x64, GO-spr ]
env:
CM_REPOS: $HOME/GH_CM
strategy:
fail-fast: false
matrix:
Expand All @@ -16,23 +18,23 @@ jobs:
precision: [ "float16" ]
device: [ "cuda" ]
steps:
- name: Install dependencies
- name: Test MLPerf Inference reference SDXL SCC
run: |
source gh_action/bin/deactivate || python3 -m venv gh_action
if [ -f "gh_action/bin/deactivate" ]; then source gh_action/bin/deactivate; fi
python3 -m venv gh_action
source gh_action/bin/activate
export CM_REPOS=$HOME/GH_CM
cm pull repo --url=${{ github.event.pull_request.head.repo.html_url }} --checkout=${{ github.event.pull_request.head.ref }}
- name: Test MLPerf Inference reference SDXL SCC
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}
run: |
cm run script --tags=run-mlperf,inference,_find-performance,_r4.1-dev,_short,_scc24-base --model=sdxl --implementation=reference --backend=${{ matrix.backend }} --category=datacenter --scenario=Offline --execution_mode=test --device=${{ matrix.device }} --precision=${{ matrix.precision }} --quiet --results_dir=$HOME/gh_action_results --submission_dir=$HOME/gh_action_submissions --precision=float16 --clean |
cm run script --tags=generate,inference,submission --clean --preprocess_submission=yes --run-checker --tar=yes --env.CM_TAR_OUTFILE=submission.tar.gz --division=open --category=datacenter --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes --run_style=test --adr.submission-checker.tags=_short-run --quiet --submitter=MLCommons |
cm run script --tags=push,github,mlperf,inference,submission --repo_url=https://github.com/gateoverflow/cm4mlperf-inference --repo_branch=mlperf-inference-results-scc24 --commit_message="Results from self hosted Github actions - NVIDIARTX4090" --quiet
pip install --upgrade cm4mlops
pip install tabulate
cm pull repo
cm run script --tags=run-mlperf,inference,_find-performance,_r4.1-dev,_short,_scc24-base --model=sdxl --implementation=reference --backend=${{ matrix.backend }} --category=datacenter --scenario=Offline --execution_mode=test --device=${{ matrix.device }} --precision=${{ matrix.precision }} --docker --docker_it=no --docker_cm_repo=gateoverflow@cm4mlops --docker_dt=yes --quiet --results_dir=$HOME/scc_gh_action_results --submission_dir=$HOME/scc_gh_action_submissions --precision=float16 --env.CM_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST=yes --clean
cm run script --tags=run-mlperf,inference,_r4.1-dev,_short,_scc24-base --model=sdxl --implementation=reference --backend=${{ matrix.backend }} --category=datacenter --scenario=Offline --execution_mode=test --device=${{ matrix.device }} --precision=${{ matrix.precision }} --docker --docker_it=no --docker_cm_repo=gateoverflow@cm4mlops --docker_dt=yes --quiet --results_dir=$HOME/scc_gh_action_results --submission_dir=$HOME/scc_gh_action_submissions --precision=float16 --env.CM_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST=yes --clean
cm run script --tags=generate,inference,submission --clean --preprocess_submission=yes --run-checker --tar=yes --env.CM_TAR_OUTFILE=submission.tar.gz --division=open --category=datacenter --run_style=test --adr.submission-checker.tags=_short-run --quiet --submitter=MLCommons --submission_dir=$HOME/scc_gh_action_submissions --results_dir=$HOME/scc_gh_action_results/test_results
cm run script --tags=push,github,mlperf,inference,submission --repo_url=https://github.com/gateoverflow/cm4mlperf-inference --repo_branch=mlperf-inference-results-scc24 --commit_message="Results from self hosted Github actions - NVIDIARTX4090" --quiet --submission_dir=$HOME/scc_gh_action_submissions

build_nvidia:
if: github.repository_owner == 'gateoverflow'
runs-on: [ self-hosted, linux, x64 ]
runs-on: [ self-hosted, linux, x64, GO-spr]
strategy:
fail-fast: false
matrix:
Expand All @@ -41,16 +43,16 @@ jobs:
precision: [ "float16" ]
implementation: [ "nvidia" ]
steps:
- name: Install dependencies
- name: Test MLPerf Inference NVIDIA SDXL SCC
run: |
source gh_action/bin/deactivate || python3 -m venv gh_action
if [ -f "gh_action/bin/deactivate" ]; then source gh_action/bin/deactivate; fi
python3 -m venv gh_action
source gh_action/bin/activate
export CM_REPOS=$HOME/GH_CM
cm pull repo --url=${{ github.event.pull_request.head.repo.html_url }} --checkout=${{ github.event.pull_request.head.ref }}
- name: Test MLPerf Inference NVIDIA SDXL SCC
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}
run: |
cm run script --tags=run-mlperf,inference,_find-performance,_r4.1-dev,_short,_scc24-base --model=sdxl --implementation=nvidia --backend=${{ matrix.backend }} --category=datacenter --scenario=Offline --execution_mode=test --device=${{ matrix.device }} --precision=${{ matrix.precision }} --docker --docker_it=no --docker_cm_repo=gateoverflow@cm4mlops --docker_dt=yes --quiet --results_dir=$HOME/gh_action_results --submission_dir=$HOME/gh_action_submissions --precision=float16 --clean |
cm run script --tags=generate,inference,submission --clean --preprocess_submission=yes --run-checker --tar=yes --env.CM_TAR_OUTFILE=submission.tar.gz --division=open --category=datacenter --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes --run_style=test --adr.submission-checker.tags=_short-run --quiet --submitter=MLCommons |
cm run script --tags=push,github,mlperf,inference,submission --repo_url=https://github.com/gateoverflow/cm4mlperf-inference --repo_branch=mlperf-inference-results-scc24 --commit_message="Results from self hosted Github actions - NVIDIARTX4090" --quiet
pip install --upgrade cm4mlops
pip install tabulate
cm pull repo
cm run script --tags=run-mlperf,inference,_find-performance,_r4.1-dev,_short,_scc24-base --model=sdxl --implementation=nvidia --backend=${{ matrix.backend }} --category=datacenter --scenario=Offline --execution_mode=test --device=${{ matrix.device }} --precision=${{ matrix.precision }} --docker --docker_it=no --docker_cm_repo=gateoverflow@cm4mlops --docker_dt=yes --quiet --results_dir=$HOME/scc_gh_action_results --submission_dir=$HOME/scc_gh_action_submissions --precision=float16 --env.CM_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST=yes --hw_name=go-spr --clean
cm run script --tags=run-mlperf,inference,_r4.1-dev,_short,_scc24-base --model=sdxl --implementation=nvidia --backend=${{ matrix.backend }} --category=datacenter --scenario=Offline --execution_mode=test --device=${{ matrix.device }} --precision=${{ matrix.precision }} --docker --docker_it=no --docker_cm_repo=gateoverflow@cm4mlops --docker_dt=yes --quiet --results_dir=$HOME/scc_gh_action_results --submission_dir=$HOME/scc_gh_action_submissions --precision=float16 --env.CM_MLPERF_MODEL_SDXL_DOWNLOAD_TO_HOST=yes --clean
cm run script --tags=generate,inference,submission --clean --preprocess_submission=yes --run-checker --tar=yes --env.CM_TAR_OUTFILE=submission.tar.gz --division=open --category=datacenter --run_style=test --adr.submission-checker.tags=_short-run --quiet --submitter=MLCommons --submission_dir=$HOME/scc_gh_action_submissions --results_dir=$HOME/scc_gh_action_results/test_results
cm run script --tags=push,github,mlperf,inference,submission --repo_url=https://github.com/gateoverflow/cm4mlperf-inference --repo_branch=mlperf-inference-results-scc24 --commit_message="Results from self hosted Github actions - NVIDIARTX4090" --quiet --submission_dir=$HOME/scc_gh_action_submissions
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
## Unified and cross-platform CM interface for DevOps, MLOps and MLPerf

[![arXiv](https://img.shields.io/badge/arXiv-2406.16791-b31b1b.svg)](https://arxiv.org/abs/2406.16791)
[![License](https://img.shields.io/badge/License-Apache%202.0-green)](LICENSE.md)
[![Python Version](https://img.shields.io/badge/python-3+-blue.svg)](https://github.com/mlcommons/ck/tree/master/cm/cmind)
[![Powered by CM](https://img.shields.io/badge/Powered_by-MLCommons%20CM-blue)](https://github.com/mlcommons/ck).
Expand Down
2 changes: 1 addition & 1 deletion script/app-mlperf-inference-ctuning-cpp-tflite/_cm.json
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@
{
"names": [
"tensorflow",
"tflite"
"tflite"
],
"tags": "get,tensorflow,lib,_tflite"
},
Expand Down
Loading
Loading