-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BERT nightly benchmark on Inferentia1 #2167
BERT nightly benchmark on Inferentia1 #2167
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2167 +/- ##
==========================================
+ Coverage 71.41% 71.45% +0.03%
==========================================
Files 73 73
Lines 3296 3296
Branches 57 57
==========================================
+ Hits 2354 2355 +1
+ Misses 942 941 -1 see 2 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Please file a ticket to inferentia about transformer==4.6.0 issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@namannandan If we are going to run this nightly, It would be good to just add it to the benchmark_gpu workflow and modify it to run on both the machines with the matrix command. The yaml file can be changed with an if else statement.
@lxning Inferentia team now has an internal ticket to track the issue with @agunapal since these models are traced and intended to run on Inferentia1 which is an additional hardware platform alongside CPU and GPU, would it make sense to maintain this in a separate workflow? Unless there is a downside to creating separate workflows for different hardware platforms. Please let me know your thoughts. |
@namannandan you can test it the workflows you created in the PR and link the result at here. |
|
…r BERT Set the NEURON_RT_NUM_CORES value as a string in the inf1 nightly benchmark workflow file
8c673c4
to
a10368d
Compare
|
sudo apt-get install -y apache2-utils | ||
pip install -r benchmarks/requirements-ab.txt | ||
export omp_num_threads=1 | ||
sudo apt-get update -y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for consolidating the yml files.
L41-43 is common across platforms. This can sit outside the if block.
@msaroufim @min-jean-cho
Is export omp_num_threads=1
applicable to CPU benchmarks only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently it's only applicable to CPU benchmarks. By the way, I noticed export omp_num_threads=1
doesn't correctly set OMP_NUM_THREADS
to 1
, #2151 You may want to double check in the github action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agunapal makes sense, I'll update the install dependencies step.
@min-jean-cho thanks for spoting that! I'll fix the env var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @namannandan, by the way I recall setting the environment variable export OMP_NUM_THREADS=1
seemed not to be correctly setting the number of threads, https://github.com/pytorch/serve/pull/2151/files#diff-cac3a24029ba9498c7e1735f8fc6e65b5a8a090d7f015bd3c35051f57a9981caR178-R179. You may also want to have a double check, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, I'll double check that. I wonder if using the env
key in the Benchmark cpu nightly
step would do the trick. It is documented here. I'll try this method as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like using the env
key works as expected:
https://github.com/pytorch/serve/actions/runs/4410687740/workflow#L48
cpu benchmark logs:
OMP_NUM_THREADS: 1
torch.get_num_threads: 1
NEURON_RT_NUM_CORES:
gpu benchmark logs:
OMP_NUM_THREADS:
torch.get_num_threads: 24
NEURON_RT_NUM_CORES:
Inf1 benchmark logs:
OMP_NUM_THREADS:
torch.get_num_threads: 12
NEURON_RT_NUM_CORES: 4
Add necessary env variables for cpu and inf1 Disable fail-fast to enable all benchmarks to run even if one of them fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Successful consolidated benchmark workflow run: https://github.com/pytorch/serve/actions/runs/4451500075 |
* BERT nightly benchmark on Inferentia1 * Consolidate neuron benchmark model config files into a single file for BERT Set the NEURON_RT_NUM_CORES value as a string in the inf1 nightly benchmark workflow file * Update trnsformer model downloader documentation * test workflow before merge * Consolidate benchmark workflows * Update runs-on syntax * Remove hardware specific benchmark workflow files * Consolidate install dependencies step Add necessary env variables for cpu and inf1 Disable fail-fast to enable all benchmarks to run even if one of them fail * update documentation --------- Co-authored-by: Naman Nandan <namannan@amazon.com>
Description
Benchmark BERT model on Inferentia1 instance
Model artifacts:
Self hosted runner(inf1.6xlarge):
Type of change
Feature testing
Checkpoint file generation
Note: The artifacts above were traced using
transformers
version4.6.0
as documented in the Inferentia tutorial. With more recenttransformers
versions, the traced model for Neuron may generate incorrect inference result. Model output isNaN
.MAR file generation
Benchmark run
Workflow test
Test branch:
test-neuron-benchmark-workflow
Successful workflow run and artifacts: https://github.com/pytorch/serve/actions/runs/4352309159
Benchmark results:
TorchServe Benchmark on neuron
Date: 2023-03-07 23:18:08
TorchServe Version: torchserve-nightly==2023.3.6
scripted_mode_bert_neuron_batch_1
scripted_mode_bert_neuron_batch_2
scripted_mode_bert_neuron_batch_4
scripted_mode_bert_neuron_batch_8
Consolidated benchmark workflow test
Test branch:
test-neuron-benchmark-workflow
Successful workflow run: https://github.com/pytorch/serve/actions/runs/4400212613
Regression test
CPU: neuron_benchmark_regression_log_cpu.txt
GPU: neuron_benchmark_regression_log_gpu.txt
Checklist: