Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BERT nightly benchmark on Inferentia2 #2283

Merged
merged 2 commits into from
May 16, 2023

Conversation

namannandan
Copy link
Collaborator

@namannandan namannandan commented Apr 28, 2023

Description

Benchmark BERT model on Inferentia2 instance

Model artifacts:

Self hosted runner(inf2.8xlarge):

  • 32 vCPUs
  • 2 Inferentia2 chips (2 neuron cores per chip)

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature testing

Checkpoint file generation

Note: The artifacts above were traced using transformers version 4.19.0. With more recent transformers versions, the traced model for Neuron may generate incorrect inference result. Model output is NaN.

$ cd examples/Huggingface_Transformers/
$ cat setup_config.json
{
 "model_name":"bert-base-uncased",
 "mode":"sequence_classification",
 "do_lower_case":true,
 "num_labels":"2",
 "save_mode":"torchscript",
 "max_length":"150",
 "captum_explanation":false,
 "embedding_name": "bert",
 "FasterTransformer":false,
 "BetterTransformer":false,
 "model_parallel":false,
 "hardware": "neuronx",
 "batch_size": "2"
}
$ python Download_Transformer_models.py setup_config.json
$ ls Transformer_model/
traced_bert-base-uncased_model_neuronx_batch_2.pt

MAR file generation

$ cat requirements.txt
torch-neuronx
$ torch-model-archiver --model-name BERTSeqClassification_torchscript_neuronx_batch_2 --version 1.0 --serialized-file ./examples/Huggingface_Transformers/Transformer_model/traced_bert-base-uncased_model_neuronx_batch_2.pt --handler ./examples/Huggingface_Transformers/Transformer_handler_generalized_neuron.py --extra-files "./examples/Huggingface_Transformers/setup_config.json,./examples/Huggingface_Transformers/Seq_classification_artifacts/index_to_name.json,./examples/Huggingface_Transformers/Transformer_handler_generalized.py" --requirements-file requirements.txt

Workflow test

Test branch: test-inf2-benchmark
Workflow run and artifacts: https://github.com/pytorch/serve/actions/runs/4834127396
(Artifacts and metrics are being published but validation fails currently).

Benchmark results:

TorchServe Benchmark on neuronx

Date: 2023-04-28 20:57:15

TorchServe Version: torchserve-nightly==2023.4.27

scripted_mode_bert_neuronx_batch_1

version Benchmark Batch size Batch delay Workers Model Concurrency Input Requests TS failed requests TS throughput TS latency P50 TS latency P90 TS latency P99 TS latency mean TS error rate Model_p50 Model_p90 Model_p99 predict_mean handler_time_mean waiting_time_mean worker_thread_mean cpu_percentage_mean memory_percentage_mean gpu_percentage_mean gpu_memory_percentage_mean gpu_memory_used_mean
torchserve-nightly==2023.4.27 AB 1 100 2 .mar 100 input 10000 0 518.99 190 194 248 192.682 0.0 3.43 3.45 3.46 3.48 3.41 187.23 0.17 0.0 0.0 0.0 0.0 0.0

scripted_mode_bert_neuronx_batch_2

version Benchmark Batch size Batch delay Workers Model Concurrency Input Requests TS failed requests TS throughput TS latency P50 TS latency P90 TS latency P99 TS latency mean TS error rate Model_p50 Model_p90 Model_p99 predict_mean handler_time_mean waiting_time_mean worker_thread_mean cpu_percentage_mean memory_percentage_mean gpu_percentage_mean gpu_memory_percentage_mean gpu_memory_used_mean
torchserve-nightly==2023.4.27 AB 2 100 2 .mar 100 input 10000 0 634.03 155 157 199 157.722 0.0 5.64 5.71 5.72 5.71 5.64 148.06 0.25 0.0 0.0 0.0 0.0 0.0

scripted_mode_bert_neuronx_batch_4

version Benchmark Batch size Batch delay Workers Model Concurrency Input Requests TS failed requests TS throughput TS latency P50 TS latency P90 TS latency P99 TS latency mean TS error rate Model_p50 Model_p90 Model_p99 predict_mean handler_time_mean waiting_time_mean worker_thread_mean cpu_percentage_mean memory_percentage_mean gpu_percentage_mean gpu_memory_percentage_mean gpu_memory_used_mean
torchserve-nightly==2023.4.27 AB 4 100 2 .mar 100 input 10000 0 686.83 143 148 153 145.597 0.0 10.57 10.68 10.7 10.71 10.64 130.77 0.34 0.0 0.0 0.0 0.0 0.0

scripted_mode_bert_neuronx_batch_8

version Benchmark Batch size Batch delay Workers Model Concurrency Input Requests TS failed requests TS throughput TS latency P50 TS latency P90 TS latency P99 TS latency mean TS error rate Model_p50 Model_p90 Model_p99 predict_mean handler_time_mean waiting_time_mean worker_thread_mean cpu_percentage_mean memory_percentage_mean gpu_percentage_mean gpu_memory_percentage_mean gpu_memory_used_mean
torchserve-nightly==2023.4.27 AB 8 100 2 .mar 100 input 10000 0 716.99 134 149 154 139.472 0.0 20.36 20.68 20.71 20.67 20.6 114.37 0.69 0.0 0.0 0.0 0.0 0.0

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@codecov
Copy link

codecov bot commented Apr 28, 2023

Codecov Report

Merging #2283 (28b482b) into master (f01868f) will increase coverage by 0.42%.
The diff coverage is n/a.

❗ Current head 28b482b differs from pull request most recent head 4146b29. Consider uploading reports for the commit 4146b29 to get more accurate results

@@            Coverage Diff             @@
##           master    #2283      +/-   ##
==========================================
+ Coverage   69.39%   69.82%   +0.42%     
==========================================
  Files          77       77              
  Lines        3441     3420      -21     
  Branches       57       57              
==========================================
  Hits         2388     2388              
+ Misses       1050     1029      -21     
  Partials        3        3              

see 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@namannandan namannandan marked this pull request as ready for review April 28, 2023 22:00
Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@namannandan If the transformers version expected is 4.19.0, where is this being set?

@namannandan
Copy link
Collaborator Author

@agunapal the issue with the transformers version is only observed when tracing the model. Loading the traced model and inference works as expected even with more recent versions of transformers.

Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@namannandan Is the issue with the validate_benchmark.py resolved now?

@namannandan
Copy link
Collaborator Author

@agunapal benchmark validation is currently still failing. Tracking it here: #2318

@namannandan
Copy link
Collaborator Author

Successful benchmark run with validation: https://github.com/pytorch/serve/actions/runs/4986426850

@namannandan namannandan merged commit 25f3700 into pytorch:master May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants