-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable opt-6.7b benchmark on inf2 #2400
Enable opt-6.7b benchmark on inf2 #2400
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2400 +/- ##
=======================================
Coverage 71.89% 71.89%
=======================================
Files 78 78
Lines 3654 3654
Branches 58 58
=======================================
Hits 2627 2627
Misses 1023 1023
Partials 4 4 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
08f2d89
to
4563876
Compare
4563876
to
8be8b60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have different mar files for each batch size?
For inferentia2, we'll need to trace the model separately to support different batch sizes. Here, the model is being traced at model load time using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unblocking
Description
Enable benchmarking for the
opt-6.7b
model on inferentia2 based on the inf2 example: #2399Model archives:
Type of change
Feature/Issue validation/testing
inf2-opt-benchmark-test
Benchmark results
TorchServe Benchmark on neuronx
Date: 2023-06-22 08:44:16
TorchServe Version: inf2-opt-benchmark-test
scripted_mode_opt_6.7b_neuronx_batch_1
scripted_mode_opt_6.7b_neuronx_batch_2
scripted_mode_opt_6.7b_neuronx_batch_4
scripted_mode_opt_6.7b_neuronx_batch_8