Push benchmark artifacts for auto-validation #2157

agunapal · 2023-02-24T01:18:33Z

Description

For auto-validation of benchmark, we will validate all the metrics with the average values from 7 consecutive successful runs.

To achieve this, we need to save ab_report.csv for all the models for 7 consecutive successful runs.

This PR does the following

Check if Benchmark workflow runs have existing auto-validation artifacts
If yes, download the artifacts, update the artifacts with the latest successful run and upload artifacts
If no, update the artifacts with the latest successful run and upload artifacts
The logs below show how the moving window is updated.

The artifacts are stored in the following structure

.
└── cpu_benchmark_validation/
    ├── 0/
    │   ├── eager_mode_mnist_w4_b1/
    │   │   └── ab_report.csv
    │   ├── eager_mode_mnist_w4_b2/
    │   │   └── ab_report.csv
    │   └── ... 
    ├── 1/
    │   ├── eager_mode_mnist_w4_b1/
    │   │   └── ab_report.csv
    │   ├── eager_mode_mnist_w4_b2/
    │   │   └── ab_report.csv
    │   └── ... 
    ├── ...
    └── 6/
        ├── eager_mode_mnist_w4_b1/
        │   └── ab_report.csv
        ├── eager_mode_mnist_w4_b2/
        │   └── ab_report.csv
        └── ...

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Local Test

(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
There are no artifacts. A new package needs to be created starting at /tmp/ts_artifacts/cpu_benchmark_validation/0
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/1
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/2
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/2
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/3
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/3
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/4
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/4
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/5
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/5
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/6
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/6
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/7
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/7
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/0
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/0
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/1
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/1
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/2
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/2
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/3
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/3
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/4
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/4
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/5
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/5
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/6
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/6
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/7
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/7
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/0
(torchserve) ubuntu@ip-172-31-60-100:~/serve$

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

codecov · 2023-02-24T01:50:17Z

Codecov Report

Merging #2157 (229d9b8) into master (86d4400) will not change coverage.
The diff coverage is n/a.

❗ Current head 229d9b8 differs from pull request most recent head f9b458a. Consider uploading reports for the commit f9b458a to get more accurate results

@@           Coverage Diff           @@
##           master    #2157   +/-   ##
=======================================
  Coverage   53.37%   53.37%           
=======================================
  Files          71       71           
  Lines        3226     3226           
  Branches       57       57           
=======================================
  Hits         1722     1722           
  Misses       1504     1504

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

msaroufim

Left some preliminary feedback

My main concern is that I'm confused by how update_artifacts() actually works, it feels like code could be simpler if we leverage shutil.copytree and each of the if conditions in the code really need some more comments and better variable names explaining how you'd reach it - I see 3 scenarios

No artifacts have been uploaded
Some artifacts have been uploaded but less than window size
Max window length has been achieved so delete old artifacts

So for each explain what you can get to that branch and what you're doing at a high level, might also make sense to add some simple unit tests since the code will be brittle to changes

.github/workflows/benchmark_nightly_gpu.yml

benchmarks/utils/update_artifacts.py

…com/pytorch/serve into feature/publish_benchmark_artifacts

msaroufim · 2023-03-04T01:54:06Z

Much clearer thanks!

namannandan

LGTM.
Since we already publish the benchmark data to an S3 bucket and also publish benchmark results as cloudwatch metrics, I was wondering if that was considered as an option as the source of benchmark data to do validation as well?

namannandan · 2023-03-07T20:22:54Z

benchmarks/utils/update_artifacts.py

+def update_new_report(input_dir, output_dir, add_report_id, del_report_id):
+
+    # Add new report
+    new_dir = os.path.join(output_dir, str(add_report_id))


Nit: Would it make sense to sanity check if add_report_id is an int and use add_report_id % WINDOW_LEN?

msaroufim · 2023-03-07T21:29:48Z

Since we already publish the benchmark data to an S3 bucket and also publish benchmark results as cloudwatch metrics, I was wondering if that was considered as an option as the source of benchmark data to do validation as well?

It depends on whether we can make that S3 bucket publicly available to both the Meta and AWS team historically that's been a challenge, so I'd personally rather have as much as possible on Github infra

agunapal added 10 commits February 23, 2023 02:58

added example artifact

0e0a2ad

added example artifact

e41080b

added example artifact

5feee11

added example artifact

688d495

verify download

5913a3e

verify download

6ddc1a9

Upload benchmark artifacts for auto-validation

6d870ae

testing benchmark artifacts

46a0631

testing benchmark artifacts

1d5ae06

testing benchmark artifacts

f6f507c

agunapal added 6 commits February 24, 2023 02:14

testing benchmark artifacts

a4acfbd

testing benchmark artifacts

aebb820

testing benchmark artifacts

a87fc21

reverting tests

59d22cf

reverting tests

d5e05d1

reverting tests

9f07014

agunapal changed the title ~~(WIP) Push benchmark artifacts for auto-validation~~ Push benchmark artifacts for auto-validation Feb 24, 2023

agunapal requested review from msaroufim and lxning February 24, 2023 03:01

agunapal mentioned this pull request Feb 24, 2023

Automate benchmarking output validation #2143

Open

4 tasks

Merge branch 'master' into feature/publish_benchmark_artifacts

ded02c8

msaroufim requested changes Feb 27, 2023

View reviewed changes

msaroufim reviewed Feb 27, 2023

View reviewed changes

benchmarks/utils/update_artifacts.py Outdated Show resolved Hide resolved

agunapal added 4 commits March 3, 2023 20:43

made code more modular and added more comments to make the logic clear

55e6fad

Merge branch 'master' into feature/publish_benchmark_artifacts

a6839f5

Merge branch 'feature/publish_benchmark_artifacts' of https://github.…

85277fb

…com/pytorch/serve into feature/publish_benchmark_artifacts

review comments

0c25427

agunapal requested a review from msaroufim March 3, 2023 20:54

review comments

0767c05

minor tweak to the logic

f9b458a

msaroufim approved these changes Mar 4, 2023

View reviewed changes

namannandan reviewed Mar 7, 2023

View reviewed changes

namannandan approved these changes Mar 7, 2023

View reviewed changes

agunapal merged commit fd8f1b3 into master Mar 7, 2023

agunapal deleted the feature/publish_benchmark_artifacts branch March 7, 2023 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push benchmark artifacts for auto-validation #2157

Push benchmark artifacts for auto-validation #2157

agunapal commented Feb 24, 2023 •

edited

Loading

codecov bot commented Feb 24, 2023 •

edited

Loading

msaroufim left a comment •

edited

Loading

msaroufim commented Mar 4, 2023

namannandan left a comment

namannandan Mar 7, 2023

msaroufim commented Mar 7, 2023

Push benchmark artifacts for auto-validation #2157

Push benchmark artifacts for auto-validation #2157

Conversation

agunapal commented Feb 24, 2023 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

codecov bot commented Feb 24, 2023 • edited Loading

Codecov Report

msaroufim left a comment • edited Loading

Choose a reason for hiding this comment

msaroufim commented Mar 4, 2023

namannandan left a comment

Choose a reason for hiding this comment

namannandan Mar 7, 2023

Choose a reason for hiding this comment

msaroufim commented Mar 7, 2023

agunapal commented Feb 24, 2023 •

edited

Loading

codecov bot commented Feb 24, 2023 •

edited

Loading

msaroufim left a comment •

edited

Loading