Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push benchmark artifacts for auto-validation #2157

Merged
merged 23 commits into from
Mar 7, 2023

Conversation

agunapal
Copy link
Collaborator

@agunapal agunapal commented Feb 24, 2023

Description

For auto-validation of benchmark, we will validate all the metrics with the average values from 7 consecutive successful runs.

To achieve this, we need to save ab_report.csv for all the models for 7 consecutive successful runs.

This PR does the following

  • Check if Benchmark workflow runs have existing auto-validation artifacts
  • If yes, download the artifacts, update the artifacts with the latest successful run and upload artifacts
  • If no, update the artifacts with the latest successful run and upload artifacts
  • The logs below show how the moving window is updated.

The artifacts are stored in the following structure

.
└── cpu_benchmark_validation/
    ├── 0/
    │   ├── eager_mode_mnist_w4_b1/
    │   │   └── ab_report.csv
    │   ├── eager_mode_mnist_w4_b2/
    │   │   └── ab_report.csv
    │   └── ... 
    ├── 1/
    │   ├── eager_mode_mnist_w4_b1/
    │   │   └── ab_report.csv
    │   ├── eager_mode_mnist_w4_b2/
    │   │   └── ab_report.csv
    │   └── ... 
    ├── ...
    └── 6/
        ├── eager_mode_mnist_w4_b1/
        │   └── ab_report.csv
        ├── eager_mode_mnist_w4_b2/
        │   └── ab_report.csv
        └── ... 

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Local Test
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
There are no artifacts. A new package needs to be created starting at /tmp/ts_artifacts/cpu_benchmark_validation/0
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/1
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/2
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/2
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/3
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/3
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/4
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/4
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/5
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/5
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/6
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/6
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/7
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/7
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/0
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/0
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/1
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/1
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/2
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/2
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/3
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/3
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/4
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/4
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/5
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/5
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/6
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/6
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/7
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ python benchmarks/utils/update_artifacts.py --output /tmp/ts_artifacts/cpu_benchmark_validation
Creating artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/7
Removing artifacts  /tmp/ts_artifacts/cpu_benchmark_validation/0
(torchserve) ubuntu@ip-172-31-60-100:~/serve$ 

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@codecov
Copy link

codecov bot commented Feb 24, 2023

Codecov Report

Merging #2157 (229d9b8) into master (86d4400) will not change coverage.
The diff coverage is n/a.

❗ Current head 229d9b8 differs from pull request most recent head f9b458a. Consider uploading reports for the commit f9b458a to get more accurate results

@@           Coverage Diff           @@
##           master    #2157   +/-   ##
=======================================
  Coverage   53.37%   53.37%           
=======================================
  Files          71       71           
  Lines        3226     3226           
  Branches       57       57           
=======================================
  Hits         1722     1722           
  Misses       1504     1504           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@agunapal agunapal changed the title (WIP) Push benchmark artifacts for auto-validation Push benchmark artifacts for auto-validation Feb 24, 2023
Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some preliminary feedback

My main concern is that I'm confused by how update_artifacts() actually works, it feels like code could be simpler if we leverage shutil.copytree and each of the if conditions in the code really need some more comments and better variable names explaining how you'd reach it - I see 3 scenarios

  1. No artifacts have been uploaded
  2. Some artifacts have been uploaded but less than window size
  3. Max window length has been achieved so delete old artifacts

So for each explain what you can get to that branch and what you're doing at a high level, might also make sense to add some simple unit tests since the code will be brittle to changes

.github/workflows/benchmark_nightly_gpu.yml Show resolved Hide resolved
benchmarks/utils/update_artifacts.py Outdated Show resolved Hide resolved
benchmarks/utils/update_artifacts.py Outdated Show resolved Hide resolved
benchmarks/utils/update_artifacts.py Outdated Show resolved Hide resolved
benchmarks/utils/update_artifacts.py Show resolved Hide resolved
@agunapal agunapal requested a review from msaroufim March 3, 2023 20:54
@msaroufim
Copy link
Member

Much clearer thanks!

Copy link
Collaborator

@namannandan namannandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Since we already publish the benchmark data to an S3 bucket and also publish benchmark results as cloudwatch metrics, I was wondering if that was considered as an option as the source of benchmark data to do validation as well?

def update_new_report(input_dir, output_dir, add_report_id, del_report_id):

# Add new report
new_dir = os.path.join(output_dir, str(add_report_id))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would it make sense to sanity check if add_report_id is an int and use add_report_id % WINDOW_LEN?

@msaroufim
Copy link
Member

Since we already publish the benchmark data to an S3 bucket and also publish benchmark results as cloudwatch metrics, I was wondering if that was considered as an option as the source of benchmark data to do validation as well?

It depends on whether we can make that S3 bucket publicly available to both the Meta and AWS team historically that's been a challenge, so I'd personally rather have as much as possible on Github infra

@agunapal agunapal merged commit fd8f1b3 into master Mar 7, 2023
@agunapal agunapal deleted the feature/publish_benchmark_artifacts branch March 7, 2023 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants