-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[1.x] TensorRT: add INT8 with calibration #19011
Conversation
Hey @Kh4L , Thanks for submitting the PR
CI supported jobs: [website, centos-cpu, unix-gpu, miscellaneous, windows-gpu, centos-gpu, clang, unix-cpu, windows-cpu, edge, sanity] Note: |
export CUDNN_VERSION=${CUDNN_VERSION:-7.0.3} | ||
export MXNET_ENABLE_CYTHON=0 | ||
export DMLC_LOG_STACK_TRACE_DEPTH=10 | ||
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100==0.24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leezu could you assist reviewing this PR? I think the structure wrt installation of packages and creating a new job might not be in line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this address returns 404 for me: https://developer.download.nvidia.com/compute/redist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://developer.download.nvidia.com/compute/redist isn't supposed to be accessed by itself
The pip install line is the standard way of installing DALI:
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html
8090dd1
to
99efa1c
Compare
@mxnet-bot run ci [sanity] |
Jenkins CI successfully triggered : [sanity] |
@mxnet-bot run ci [unix-gpu] |
Jenkins CI successfully triggered : [unix-gpu] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed subgraph property portions, looks good to me. Please find someone to review the TRT-specific changes (maybe @ptrendx or @KellenSunderland) and the CI changes (maybe @josephevans?) too.
pip3 install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100==0.24 | ||
wget -nc http://data.mxnet.io/data/val_256_q90.rec | ||
python3.6 tests/python/tensorrt/rec2idx.py val_256_q90.rec val_256_q90.idx | ||
nosetests-3.4 $NOSE_COVERAGE_ARGUMENTS $NOSE_TIMER_ARGUMENTS --with-xunit --xunit-file nosetests_trt_gpu.xml --verbose --nocapture tests/python/tensorrt/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in mxnet we switched to pytest and are no longer using nose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that discussion, can you point me to the RFC so I can catch up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to add the [1.x] tag, this is 1.x PR, where we still use nosetests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szha is this fine to leave it as nosetests as this is the 1.8 branch?
@@ -0,0 +1,107 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DALI has this script, why including it here too?
Signed-off-by: Serge Panev <spanev@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This PR adds INT8 with calibration support to MXNet-TensorRT.
It enables TensorRT internal optimization to create an INT8 engine (that will contain some INT8 kernels, if they are faster than the FP16 or FP32 ones).
In this first version, the quantization and de-quantization values are computed during the calibration phase. During this phase (of a number of iterations set by the
calibration_iters
), the user is expect to provide samples representing the inference data, used to calibrate the engine. The inference model is slower during this phase.Once the calibration is done, the MXNet-TensorRT inference model is ready for fast inference with INT8.
Saving and loading of the calibration tables will be added in a later PR.
Usage
We set calibration_iters to the number of batches we can feed with the calibration dataset.
For instance:
We call optimize_for:
Symbolic
Gluon
We create the executor and we feed the calibration data:
The calibration is slower than regular inference. Once it's done, we get a info message on stdout.
The executor with TRT INT8 engines is ready!