-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push benchmark artifacts for auto-validation #2157
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2157 +/- ##
=======================================
Coverage 53.37% 53.37%
=======================================
Files 71 71
Lines 3226 3226
Branches 57 57
=======================================
Hits 1722 1722
Misses 1504 1504 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some preliminary feedback
My main concern is that I'm confused by how update_artifacts()
actually works, it feels like code could be simpler if we leverage shutil.copytree
and each of the if conditions in the code really need some more comments and better variable names explaining how you'd reach it - I see 3 scenarios
- No artifacts have been uploaded
- Some artifacts have been uploaded but less than window size
- Max window length has been achieved so delete old artifacts
So for each explain what you can get to that branch and what you're doing at a high level, might also make sense to add some simple unit tests since the code will be brittle to changes
…com/pytorch/serve into feature/publish_benchmark_artifacts
Much clearer thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Since we already publish the benchmark data to an S3 bucket and also publish benchmark results as cloudwatch metrics, I was wondering if that was considered as an option as the source of benchmark data to do validation as well?
def update_new_report(input_dir, output_dir, add_report_id, del_report_id): | ||
|
||
# Add new report | ||
new_dir = os.path.join(output_dir, str(add_report_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would it make sense to sanity check if add_report_id
is an int
and use add_report_id % WINDOW_LEN
?
It depends on whether we can make that S3 bucket publicly available to both the Meta and AWS team historically that's been a challenge, so I'd personally rather have as much as possible on Github infra |
Description
For auto-validation of benchmark, we will validate all the metrics with the average values from 7 consecutive successful runs.
To achieve this, we need to save
ab_report.csv
for all the models for 7 consecutive successful runs.This PR does the following
The artifacts are stored in the following structure
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Checklist: