BERT nightly benchmark on Inferentia1 #2167

namannandan · 2023-03-02T10:08:29Z

Description

Benchmark BERT model on Inferentia1 instance

Model artifacts:

Self hosted runner(inf1.6xlarge):

24 vCPUs
4 Inferentia1 chips (4 neuron cores per chip)

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature testing

Checkpoint file generation

Note: The artifacts above were traced using transformers version 4.6.0 as documented in the Inferentia tutorial. With more recent transformers versions, the traced model for Neuron may generate incorrect inference result. Model output is NaN.

$ cd examples/Huggingface_Transformers/
$ cat setup_config.json
{
 "model_name":"bert-base-uncased",
 "mode":"sequence_classification",
 "do_lower_case":true,
 "num_labels":"2",
 "save_mode":"torchscript",
 "max_length":"150",
 "captum_explanation":false,
 "embedding_name": "bert",
 "FasterTransformer":false,
 "BetterTransformer":false,
 "model_parallel":false,
 "hardware": "neuron",
 "batch_size": "2"
}
$ python Download_Transformer_models.py setup_config.json
$ ls Transformer_model/
traced_bert-base-uncased_model_neuron_batch_2.pt

MAR file generation

$ cat requirements.txt
torch-neuron
$ torch-model-archiver --model-name BERTSeqClassification_torchscript_neuron_batch_2 --version 1.0 --serialized-file ./examples/Huggingface_Transformers/Transformer_model/traced_bert-base-uncased_model_neuron_batch_2.pt --handler ./examples/Huggingface_Transformers/Transformer_handler_generalized_neuron.py --extra-files "./examples/Huggingface_Transformers/setup_config.json,./examples/Huggingface_Transformers/Seq_classification_artifacts/index_to_name.json,./examples/Huggingface_Transformers/Transformer_handler_generalized.py" --requirements-file requirements.txt

Benchmark run

$ cat benchmarks/benchmark_config_neuron.yaml
# Torchserve version is to be installed. It can be one of the options
#  - branch : "master"
#  - nightly: "2022.3.16"
#  - release: "0.5.3"
# Nightly build will be installed if "ts_version" is not specifiged
#ts_version:
#    branch: &ts_version "master"

# a list of model configure yaml files defined in benchmarks/models_config
# or a list of model configure yaml files with full path
models:
  - "bert_neuron_batch_2.yaml"

# benchmark on "cpu", "gpu" or "neuron".
# "cpu" is set if "hardware" is not specified
hardware: &hardware "neuron"
$
$ cat benchmarks/models_config/bert_neuron_batch_2.yaml
---
bert:
  scripted_mode:
    benchmark_engine: "ab"
    url: "file:///home/ubuntu/pytorch/model_store/BERTSeqClassification_torchscript_neuron_batch_2.mar"
    workers:
      - 4
    batch_delay: 100
    batch_size:
      - 2
    input: "./examples/Huggingface_Transformers/Seq_classification_artifacts/sample_text.txt"
    requests: 10000
    concurrency: 100
    backend_profiling: False
    exec_env: "local"
    processors:
      - "neuron"
$
$ python benchmarks/auto_benchmark.py --input benchmarks/benchmark_config_neuron.yaml
$ 
$ cat /tmp/ts_benchmark/scripted_mode_bert_w4_b2/ab_report.csv
Benchmark,Batch size,Batch delay,Workers,Model,Concurrency,Input,Requests,TS failed requests,TS throughput,TS latency P50,TS latency P90,TS latency P99,TS latency mean,TS error rate,Model_p50,Model_p90,Model_p99,predict_mean,handler_time_mean,waiting_time_mean,worker_thread_mean,cpu_percentage_mean,memory_percentage_mean,gpu_percentage_mean,gpu_memory_percentage_mean,gpu_memory_used_mean
AB,2,100,4,[.mar](file:///home/ubuntu/pytorch/model_store/BERTSeqClassification_torchscript_neuron_batch_2.mar),100,[input](./examples/Huggingface_Transformers/Seq_classification_artifacts/sample_text.txt),10000,0,436.38,225,234,254,229.157,0.0,17.14,18.07,22.93,17.44,17.31,207.2,0.27,0.0,0.0,0.0,0.0,0.0

Workflow test

Test branch: test-neuron-benchmark-workflow
Successful workflow run and artifacts: https://github.com/pytorch/serve/actions/runs/4352309159

Benchmark results:

TorchServe Benchmark on neuron

Date: 2023-03-07 23:18:08

TorchServe Version: torchserve-nightly==2023.3.6

scripted_mode_bert_neuron_batch_1

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
torchserve-nightly==2023.3.6	AB	1	100	4	.mar	100	input	10000	0	196.42	505	601	678	509.115	0.0	14.31	39.43	51.18	19.52	19.42	485.66	0.23	0.0	8.5	0.0	0.0	0.0

scripted_mode_bert_neuron_batch_2

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
torchserve-nightly==2023.3.6	AB	2	100	4	.mar	100	input	10000	0	575.7	163	198	230	173.702	0.0	12.28	14.42	34.95	13.15	13.06	156.46	0.18	0.0	0.0	0.0	0.0	0.0

scripted_mode_bert_neuron_batch_4

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
torchserve-nightly==2023.3.6	AB	4	100	4	.mar	100	input	10000	0	652.79	149	150	240	153.188	0.0	22.97	23.16	23.65	23.12	23.02	125.6	0.48	0.0	0.0	0.0	0.0	0.0

scripted_mode_bert_neuron_batch_8

version	Benchmark	Batch size	Batch delay	Workers	Model	Concurrency	Input	Requests	TS failed requests	TS throughput	TS latency P50	TS latency P90	TS latency P99	TS latency mean	TS error rate	Model_p50	Model_p90	Model_p99	predict_mean	handler_time_mean	waiting_time_mean	worker_thread_mean	cpu_percentage_mean	memory_percentage_mean	gpu_percentage_mean	gpu_memory_percentage_mean	gpu_memory_used_mean
torchserve-nightly==2023.3.6	AB	8	100	4	.mar	100	input	10000	0	649.43	150	164	171	153.98	0.0	46.95	47.26	48.51	47.05	46.93	101.99	0.48	0.0	0.0	0.0	0.0	0.0

Consolidated benchmark workflow test

Test branch: test-neuron-benchmark-workflow
Successful workflow run: https://github.com/pytorch/serve/actions/runs/4400212613

Regression test

CPU: neuron_benchmark_regression_log_cpu.txt
GPU: neuron_benchmark_regression_log_gpu.txt

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

codecov · 2023-03-02T10:36:47Z

Codecov Report

Merging #2167 (8a27cd5) into master (1768902) will increase coverage by 0.03%.
The diff coverage is n/a.

❗ Current head 8a27cd5 differs from pull request most recent head fd3743d. Consider uploading reports for the commit fd3743d to get more accurate results

@@            Coverage Diff             @@
##           master    #2167      +/-   ##
==========================================
+ Coverage   71.41%   71.45%   +0.03%     
==========================================
  Files          73       73              
  Lines        3296     3296              
  Branches       57       57              
==========================================
+ Hits         2354     2355       +1     
+ Misses        942      941       -1

see 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

lxning · 2023-03-06T04:50:39Z

Please file a ticket to inferentia about transformer==4.6.0 issue.

agunapal

@namannandan If we are going to run this nightly, It would be good to just add it to the benchmark_gpu workflow and modify it to run on both the machines with the matrix command. The yaml file can be changed with an if else statement.

namannandan · 2023-03-07T01:44:54Z

@lxning Inferentia team now has an internal ticket to track the issue with torch-neuron library being unable to correctly trace models when transformers package version is >4.19. Currently, the model archives I've prepared have checkpoint files that were traced using transformers version 4.6.0. I tested that the inference with these model archives works as expected with the latest transformers package version as of this writing which is 4.26.1. So the issue is only with tracing the model. We should be able to go ahead and start running the Inf1 benchmark and once the Inferentia team fixes the issue with torch-neuron we can just re-create the model archives and upload them to the model zoo. Please let me know your thoughts.

@agunapal since these models are traced and intended to run on Inferentia1 which is an additional hardware platform alongside CPU and GPU, would it make sense to maintain this in a separate workflow? Unless there is a downside to creating separate workflows for different hardware platforms. Please let me know your thoughts.

lxning · 2023-03-07T01:59:42Z

@namannandan you can test it the workflows you created in the PR and link the result at here.

namannandan · 2023-03-07T23:59:16Z

@lxning tested workflow and linked benchmark results in the PR summary above.

Based on offline discussion with @agunapal, consolidation of workflow files for benchmarking using matrix command to run on different runners makes sense. This can be implemented in a separate PR.

lxning · 2023-03-08T19:38:24Z

@namannandan

can you run regression test to make sure there are no breaks for the existing test for hf transformers.
as @ankithagunapal suggested, update github workflow by using matrix in this PR so that one single PR can cover everything of this task.
Thanks a lot.

…r BERT Set the NEURON_RT_NUM_CORES value as a string in the inf1 nightly benchmark workflow file

namannandan · 2023-03-13T21:57:22Z

@lxning, @agunapal

Logs for regression tests
CPU: neuron_benchmark_regression_log_cpu.txt
GPU: neuron_benchmark_regression_log_gpu.txt
Consolidated benchmark workflow files into a single file. Successful workflow run and artifacts: https://github.com/pytorch/serve/actions/runs/4400212613

agunapal · 2023-03-13T22:01:37Z

.github/workflows/benchmark_nightly.yml

-            sudo apt-get install -y apache2-utils
-            pip install -r benchmarks/requirements-ab.txt
-            export omp_num_threads=1
+          sudo apt-get update -y


Thanks for consolidating the yml files.
L41-43 is common across platforms. This can sit outside the if block.

@msaroufim @min-jean-cho
Is export omp_num_threads=1 applicable to CPU benchmarks only?

Yes, currently it's only applicable to CPU benchmarks. By the way, I noticed export omp_num_threads=1 doesn't correctly set OMP_NUM_THREADS to 1, #2151 You may want to double check in the github action.

@agunapal makes sense, I'll update the install dependencies step.

@min-jean-cho thanks for spoting that! I'll fix the env var.

Thanks @namannandan, by the way I recall setting the environment variable export OMP_NUM_THREADS=1 seemed not to be correctly setting the number of threads, https://github.com/pytorch/serve/pull/2151/files#diff-cac3a24029ba9498c7e1735f8fc6e65b5a8a090d7f015bd3c35051f57a9981caR178-R179. You may also want to have a double check, thanks!

Ah I see, I'll double check that. I wonder if using the env key in the Benchmark cpu nightly step would do the trick. It is documented here. I'll try this method as well.

Looks like using the env key works as expected:
https://github.com/pytorch/serve/actions/runs/4410687740/workflow#L48

cpu benchmark logs:

OMP_NUM_THREADS: 1 torch.get_num_threads: 1 NEURON_RT_NUM_CORES:

gpu benchmark logs:

OMP_NUM_THREADS: torch.get_num_threads: 24 NEURON_RT_NUM_CORES:

Inf1 benchmark logs:

OMP_NUM_THREADS: torch.get_num_threads: 12 NEURON_RT_NUM_CORES: 4

Add necessary env variables for cpu and inf1 Disable fail-fast to enable all benchmarks to run even if one of them fail

examples/Huggingface_Transformers/README.md

agunapal

LGTM

namannandan · 2023-03-20T16:23:48Z

Successful consolidated benchmark workflow run: https://github.com/pytorch/serve/actions/runs/4451500075

* BERT nightly benchmark on Inferentia1 * Consolidate neuron benchmark model config files into a single file for BERT Set the NEURON_RT_NUM_CORES value as a string in the inf1 nightly benchmark workflow file * Update trnsformer model downloader documentation * test workflow before merge * Consolidate benchmark workflows * Update runs-on syntax * Remove hardware specific benchmark workflow files * Consolidate install dependencies step Add necessary env variables for cpu and inf1 Disable fail-fast to enable all benchmarks to run even if one of them fail * update documentation --------- Co-authored-by: Naman Nandan <namannan@amazon.com>

namannandan requested review from lxning, mreso, msaroufim and HamidShojanazeri March 3, 2023 02:03

namannandan marked this pull request as ready for review March 4, 2023 01:41

agunapal reviewed Mar 6, 2023

View reviewed changes

namannandan requested a review from agunapal March 7, 2023 01:49

Naman Nandan added 7 commits March 12, 2023 22:31

BERT nightly benchmark on Inferentia1

9087653

Consolidate neuron benchmark model config files into a single file fo…

97f1891

…r BERT Set the NEURON_RT_NUM_CORES value as a string in the inf1 nightly benchmark workflow file

Update trnsformer model downloader documentation

efd0c4a

test workflow before merge

1ce9ee4

Consolidate benchmark workflows

1e58d5a

Update runs-on syntax

e979789

Remove hardware specific benchmark workflow files

a10368d

namannandan force-pushed the naman-neuron-benchmark branch from 8c673c4 to a10368d Compare March 13, 2023 05:40

agunapal requested changes Mar 13, 2023

View reviewed changes

Naman Nandan and others added 2 commits March 13, 2023 18:16

Consolidate install dependencies step

ceb86c8

Add necessary env variables for cpu and inf1 Disable fail-fast to enable all benchmarks to run even if one of them fail

Merge branch 'master' into naman-neuron-benchmark

4ff7114

namannandan requested review from agunapal and min-jean-cho March 14, 2023 01:19

agunapal reviewed Mar 14, 2023

View reviewed changes

examples/Huggingface_Transformers/README.md Show resolved Hide resolved

update documentation

0eaf907

namannandan requested a review from agunapal March 15, 2023 22:20

agunapal approved these changes Mar 15, 2023

View reviewed changes

msaroufim approved these changes Mar 20, 2023

View reviewed changes

Merge branch 'master' into naman-neuron-benchmark

fd3743d

namannandan merged commit 6daaa42 into pytorch:master Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT nightly benchmark on Inferentia1 #2167

BERT nightly benchmark on Inferentia1 #2167

namannandan commented Mar 2, 2023 •

edited

Loading

codecov bot commented Mar 2, 2023 •

edited

Loading

lxning commented Mar 6, 2023

agunapal left a comment

namannandan commented Mar 7, 2023 •

edited

Loading

lxning commented Mar 7, 2023

namannandan commented Mar 7, 2023

lxning commented Mar 8, 2023

namannandan commented Mar 13, 2023 •

edited

Loading

agunapal Mar 13, 2023

min-jean-cho Mar 13, 2023

namannandan Mar 13, 2023 •

edited

Loading

min-jean-cho Mar 13, 2023

namannandan Mar 13, 2023 •

edited

Loading

namannandan Mar 14, 2023 •

edited

Loading

agunapal left a comment

namannandan commented Mar 20, 2023

BERT nightly benchmark on Inferentia1 #2167

BERT nightly benchmark on Inferentia1 #2167

Conversation

namannandan commented Mar 2, 2023 • edited Loading

Description

Type of change

Feature testing

Checkpoint file generation

MAR file generation

Benchmark run

Workflow test

TorchServe Benchmark on neuron

Date: 2023-03-07 23:18:08

TorchServe Version: torchserve-nightly==2023.3.6

scripted_mode_bert_neuron_batch_1

scripted_mode_bert_neuron_batch_2

scripted_mode_bert_neuron_batch_4

scripted_mode_bert_neuron_batch_8

Consolidated benchmark workflow test

Regression test

Checklist:

codecov bot commented Mar 2, 2023 • edited Loading

Codecov Report

lxning commented Mar 6, 2023

agunapal left a comment

Choose a reason for hiding this comment

namannandan commented Mar 7, 2023 • edited Loading

lxning commented Mar 7, 2023

namannandan commented Mar 7, 2023

lxning commented Mar 8, 2023

namannandan commented Mar 13, 2023 • edited Loading

agunapal Mar 13, 2023

Choose a reason for hiding this comment

min-jean-cho Mar 13, 2023

Choose a reason for hiding this comment

namannandan Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

min-jean-cho Mar 13, 2023

Choose a reason for hiding this comment

namannandan Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

namannandan Mar 14, 2023 • edited Loading

Choose a reason for hiding this comment

agunapal left a comment

Choose a reason for hiding this comment

namannandan commented Mar 20, 2023

namannandan commented Mar 2, 2023 •

edited

Loading

codecov bot commented Mar 2, 2023 •

edited

Loading

namannandan commented Mar 7, 2023 •

edited

Loading

namannandan commented Mar 13, 2023 •

edited

Loading

namannandan Mar 13, 2023 •

edited

Loading

namannandan Mar 13, 2023 •

edited

Loading

namannandan Mar 14, 2023 •

edited

Loading