Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable opt-6.7b benchmark on inf2 #2400

Merged
merged 6 commits into from
Jun 29, 2023

Conversation

namannandan
Copy link
Collaborator

@namannandan namannandan commented Jun 8, 2023

Description

Enable benchmarking for the opt-6.7b model on inferentia2 based on the inf2 example: #2399

Model archives:

Type of change

  • New feature (non-breaking change which adds functionality)

Feature/Issue validation/testing

Benchmark results

TorchServe Benchmark on neuronx

Date: 2023-06-22 08:44:16

TorchServe Version: inf2-opt-benchmark-test

scripted_mode_opt_6.7b_neuronx_batch_1

version Benchmark Batch size Batch delay Workers Model Concurrency Input Requests TS failed requests TS throughput TS latency P50 TS latency P90 TS latency P99 TS latency mean TS error rate Model_p50 Model_p90 Model_p99 predict_mean handler_time_mean waiting_time_mean worker_thread_mean cpu_percentage_mean memory_percentage_mean gpu_percentage_mean gpu_memory_percentage_mean gpu_memory_used_mean
inf2-opt-benchmark-test AB 1 100 1 .mar 10 input 2000 1946 0.63 15945 16075 16132 15974.07 97.3 1591.6 1593.65 1594.07 1596.79 1596.7 14332.83 0.28 2.88 6.76 0.0 0.0 0.0

scripted_mode_opt_6.7b_neuronx_batch_2

version Benchmark Batch size Batch delay Workers Model Concurrency Input Requests TS failed requests TS throughput TS latency P50 TS latency P90 TS latency P99 TS latency mean TS error rate Model_p50 Model_p90 Model_p99 predict_mean handler_time_mean waiting_time_mean worker_thread_mean cpu_percentage_mean memory_percentage_mean gpu_percentage_mean gpu_memory_percentage_mean gpu_memory_used_mean
inf2-opt-benchmark-test AB 2 100 1 .mar 10 input 2000 1934 1.13 8860 8938 8953 8881.404 96.7 1769.37 1770.75 1770.97 1773.37 1773.28 7075.18 0.49 0.0 6.8 0.0 0.0 0.0

scripted_mode_opt_6.7b_neuronx_batch_4

version Benchmark Batch size Batch delay Workers Model Concurrency Input Requests TS failed requests TS throughput TS latency P50 TS latency P90 TS latency P99 TS latency mean TS error rate Model_p50 Model_p90 Model_p99 predict_mean handler_time_mean waiting_time_mean worker_thread_mean cpu_percentage_mean memory_percentage_mean gpu_percentage_mean gpu_memory_percentage_mean gpu_memory_used_mean
inf2-opt-benchmark-test AB 4 100 1 .mar 10 input 2000 1955 2.19 3666 5483 5493 4566.966 97.75 1819.03 1822.06 1822.97 1821.48 1821.39 2725.47 0.65 5.0 7.4 0.0 0.0 0.0

scripted_mode_opt_6.7b_neuronx_batch_8

version Benchmark Batch size Batch delay Workers Model Concurrency Input Requests TS failed requests TS throughput TS latency P50 TS latency P90 TS latency P99 TS latency mean TS error rate Model_p50 Model_p90 Model_p99 predict_mean handler_time_mean waiting_time_mean worker_thread_mean cpu_percentage_mean memory_percentage_mean gpu_percentage_mean gpu_memory_percentage_mean gpu_memory_used_mean
inf2-opt-benchmark-test AB 8 100 1 .mar 10 input 2000 1938 4.28 1863 3724 3732 2337.482 96.9 1857.53 1859.36 1859.8 1859.7 1859.61 463.83 1.11 0.0 7.3 0.0 0.0 0.0

@codecov
Copy link

codecov bot commented Jun 8, 2023

Codecov Report

Merging #2400 (307ac65) into master (ec3b992) will not change coverage.
The diff coverage is n/a.

❗ Current head 307ac65 differs from pull request most recent head 06ea628. Consider uploading reports for the commit 06ea628 to get more accurate results

@@           Coverage Diff           @@
##           master    #2400   +/-   ##
=======================================
  Coverage   71.89%   71.89%           
=======================================
  Files          78       78           
  Lines        3654     3654           
  Branches       58       58           
=======================================
  Hits         2627     2627           
  Misses       1023     1023           
  Partials        4        4           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@namannandan namannandan marked this pull request as ready for review June 20, 2023 20:54
Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have different mar files for each batch size?

@namannandan
Copy link
Collaborator Author

For inferentia2, we'll need to trace the model separately to support different batch sizes. Here, the model is being traced at model load time using model-config.yaml. Since for each batch size, a different model-config.yaml file is required, I've packaged them into separate mar files.

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unblocking

@namannandan namannandan merged commit b260776 into pytorch:master Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants