BERT nightly benchmark on Inferentia2 #2283

namannandan · 2023-04-28T00:09:26Z

Description

Benchmark BERT model on Inferentia2 instance

Model artifacts:

Self hosted runner(inf2.8xlarge):

32 vCPUs
2 Inferentia2 chips (2 neuron cores per chip)

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature testing

Checkpoint file generation

Note: The artifacts above were traced using transformers version 4.19.0. With more recent transformers versions, the traced model for Neuron may generate incorrect inference result. Model output is NaN.

$ cd examples/Huggingface_Transformers/
$ cat setup_config.json
{
 "model_name":"bert-base-uncased",
 "mode":"sequence_classification",
 "do_lower_case":true,
 "num_labels":"2",
 "save_mode":"torchscript",
 "max_length":"150",
 "captum_explanation":false,
 "embedding_name": "bert",
 "FasterTransformer":false,
 "BetterTransformer":false,
 "model_parallel":false,
 "hardware": "neuronx",
 "batch_size": "2"
}
$ python Download_Transformer_models.py setup_config.json
$ ls Transformer_model/
traced_bert-base-uncased_model_neuronx_batch_2.pt

MAR file generation

$ cat requirements.txt
torch-neuronx
$ torch-model-archiver --model-name BERTSeqClassification_torchscript_neuronx_batch_2 --version 1.0 --serialized-file ./examples/Huggingface_Transformers/Transformer_model/traced_bert-base-uncased_model_neuronx_batch_2.pt --handler ./examples/Huggingface_Transformers/Transformer_handler_generalized_neuron.py --extra-files "./examples/Huggingface_Transformers/setup_config.json,./examples/Huggingface_Transformers/Seq_classification_artifacts/index_to_name.json,./examples/Huggingface_Transformers/Transformer_handler_generalized.py" --requirements-file requirements.txt

Workflow test

Test branch: test-inf2-benchmark
Workflow run and artifacts: https://github.com/pytorch/serve/actions/runs/4834127396
(Artifacts and metrics are being published but validation fails currently).

Benchmark results:

TorchServe Benchmark on neuronx

Date: 2023-04-28 20:57:15

TorchServe Version: torchserve-nightly==2023.4.27

scripted_mode_bert_neuronx_batch_1

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
torchserve-nightly==2023.4.27	AB	1	100	2	.mar	100	input	10000	0	518.99	190	194	248	192.682	0.0	3.43	3.45	3.46	3.48	3.41	187.23	0.17	0.0	0.0	0.0	0.0	0.0

scripted_mode_bert_neuronx_batch_2

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
torchserve-nightly==2023.4.27	AB	2	100	2	.mar	100	input	10000	0	634.03	155	157	199	157.722	0.0	5.64	5.71	5.72	5.71	5.64	148.06	0.25	0.0	0.0	0.0	0.0	0.0

scripted_mode_bert_neuronx_batch_4

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
torchserve-nightly==2023.4.27	AB	4	100	2	.mar	100	input	10000	0	686.83	143	148	153	145.597	0.0	10.57	10.68	10.7	10.71	10.64	130.77	0.34	0.0	0.0	0.0	0.0	0.0

scripted_mode_bert_neuronx_batch_8

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
torchserve-nightly==2023.4.27	AB	8	100	2	.mar	100	input	10000	0	716.99	134	149	154	139.472	0.0	20.36	20.68	20.71	20.67	20.6	114.37	0.69	0.0	0.0	0.0	0.0	0.0

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

codecov · 2023-04-28T00:47:13Z

Codecov Report

Merging #2283 (28b482b) into master (f01868f) will increase coverage by 0.42%.
The diff coverage is n/a.

❗ Current head 28b482b differs from pull request most recent head 4146b29. Consider uploading reports for the commit 4146b29 to get more accurate results

@@            Coverage Diff             @@
##           master    #2283      +/-   ##
==========================================
+ Coverage   69.39%   69.82%   +0.42%     
==========================================
  Files          77       77              
  Lines        3441     3420      -21     
  Branches       57       57              
==========================================
  Hits         2388     2388              
+ Misses       1050     1029      -21     
  Partials        3        3

see 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

agunapal

@namannandan If the transformers version expected is 4.19.0, where is this being set?

namannandan · 2023-05-03T00:26:34Z

@agunapal the issue with the transformers version is only observed when tracing the model. Loading the traced model and inference works as expected even with more recent versions of transformers.

agunapal

@namannandan Is the issue with the validate_benchmark.py resolved now?

namannandan · 2023-05-03T20:23:58Z

@agunapal benchmark validation is currently still failing. Tracking it here: #2318

namannandan · 2023-05-16T00:38:54Z

Successful benchmark run with validation: https://github.com/pytorch/serve/actions/runs/4986426850

namannandan marked this pull request as ready for review April 28, 2023 22:00

namannandan requested review from msaroufim, agunapal and lxning April 28, 2023 22:00

agunapal reviewed May 2, 2023

View reviewed changes

agunapal reviewed May 3, 2023

View reviewed changes

Naman Nandan added 2 commits May 15, 2023 14:40

Inf2 nightly benchmark

f3efa92

fix linter spellcheck error

4146b29

namannandan force-pushed the naman-inf2-benchmark branch from 42ac457 to 4146b29 Compare May 15, 2023 21:43

namannandan requested a review from agunapal May 15, 2023 21:44

msaroufim approved these changes May 16, 2023

View reviewed changes

lxning approved these changes May 16, 2023

View reviewed changes

namannandan merged commit 25f3700 into pytorch:master May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT nightly benchmark on Inferentia2 #2283

BERT nightly benchmark on Inferentia2 #2283

namannandan commented Apr 28, 2023 •

edited

Loading

codecov bot commented Apr 28, 2023 •

edited

Loading

agunapal left a comment •

edited

Loading

namannandan commented May 3, 2023

agunapal left a comment

namannandan commented May 3, 2023

namannandan commented May 16, 2023

BERT nightly benchmark on Inferentia2 #2283

BERT nightly benchmark on Inferentia2 #2283

Conversation

namannandan commented Apr 28, 2023 • edited Loading

Description

Type of change

Feature testing

Checkpoint file generation

MAR file generation

Workflow test

Benchmark results:

TorchServe Benchmark on neuronx

Date: 2023-04-28 20:57:15

TorchServe Version: torchserve-nightly==2023.4.27

scripted_mode_bert_neuronx_batch_1

scripted_mode_bert_neuronx_batch_2

scripted_mode_bert_neuronx_batch_4

scripted_mode_bert_neuronx_batch_8

Checklist:

codecov bot commented Apr 28, 2023 • edited Loading

Codecov Report

agunapal left a comment • edited Loading

Choose a reason for hiding this comment

namannandan commented May 3, 2023

agunapal left a comment

Choose a reason for hiding this comment

namannandan commented May 3, 2023

namannandan commented May 16, 2023

namannandan commented Apr 28, 2023 •

edited

Loading

codecov bot commented Apr 28, 2023 •

edited

Loading

agunapal left a comment •

edited

Loading