TensorRT OSS 21.04 release #1185

rajeevsrao · 2021-04-12T18:59:26Z

Added

SM86 kernels for BERT MHA plugin
Added opset13 support for SoftMax, LogSoftmax, Squeeze, and Unsqueeze.
Added support for the EyeLike and GatherElements operators.

Changed

Updated TensorRT version to v7.2.3.4.
Update to ONNX-TensorRT 21.03
ONNX-GraphSurgeon (v0.3.4) - updates fold_constants to correctly exit early.
Set default CUDA_INSTALL_DIR #798
Plugin bugfixes, qkv kernels for sm86
Fixed GroupNorm CMakeFile for cu sources #1083
Permit groupadd with non-unique GID in build containers #1091
Avoid reinterpret_cast #146
Clang-format plugins and samples
Avoid arithmetic on void pointer in multilevelProposeROIPlugin.cpp #1028
Update BERT plugin documentation.

Removed

Removes extra terminate call in InstanceNorm

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

…to fold Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

…IDIA#1028 Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

PriorBox plugin serialize CPU metadata (array size) A and GPU data (array elements) B' in engine. B' is modified from CPU array B when constructing the object. A deserialized object then holds data A and B' which is different from the original (A and B). If a new object is created from a deserialized one via `PriorBox::clone()`, which rebuilds array elements at GPU side from CPU holding array A and B', the generated GPU data is incorrect (A and B''), resulting in wrong inference result. As PriorBox is designed to track data in specific format, we now serialize only the CPU data A and B, i.e. the parameters that used to construct a PriorBox object, to engine. bad image processing with deserialized engine 1. Fixed the memory deallocation error in plugin PriorBox::clone() method even without serialization by initializing empty pointer to nullptr. 2. Initialized weights to empty structs 3. Added mParam.aspectRatios to serialization and deserialization since mParam.aspectRatios are different from aspectRatios device weights in count and values. Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

1. add varlen mha fp16 slen=384 kernel for sm_86 2. referesh all sm_86 kernels now use NVCC -gencode=arch=compute_86,code=\"sm_86\" 3. use unfused kernel for fixed len s=384 fp16 Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Removes extra terminate call in InstanceNorm

562fc1a

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

rajeevsrao requested review from kevinch-nv and pranavm-nvidia April 12, 2021 18:59

pranavm-nvidia approved these changes Apr 12, 2021

View reviewed changes

rajeevsrao force-pushed the dev/21.04-release branch from 67a520c to 7cf5f18 Compare April 12, 2021 19:08

rajeevsrao and others added 12 commits April 12, 2021 12:14

Update TensorRT versions to 7.2.3.4

6d20c94

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Permit groupadd with non-unique GID in build containers

af1b827

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Updates fold_constants to correctly exit early when there is nothing …

cdb402e

…to fold Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

clang-format plugins and samples

42bbc2d

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Update TensorRT headers to 7.2.3.4

77384c1

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Allow MHA plugin to run on SM_86 as well

3102875

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Avoid arithmetic on void pointer in multilevelProposeROIPlugin.cpp NV…

c2f95cc

…IDIA#1028 Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Samples refresh and bugfixes

4d65c80

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

fix doc for bert plugins

5f6ac20

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Update ONNX parser

0fa021a

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

Add varlen MHA fp16 slen=384 kernels for sm_86

3ea9099

1. add varlen mha fp16 slen=384 kernel for sm_86 2. referesh all sm_86 kernels now use NVCC -gencode=arch=compute_86,code=\"sm_86\" 3. use unfused kernel for fixed len s=384 fp16 Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

rajeevsrao force-pushed the dev/21.04-release branch from 7cf5f18 to 3ea9099 Compare April 12, 2021 19:14

TensorRT OSS 21.04 release - update Changelog

8e4af19

Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>

kevinch-nv approved these changes Apr 12, 2021

View reviewed changes

rajeevsrao merged commit 4c99d07 into NVIDIA:master Apr 12, 2021

rajeevsrao deleted the dev/21.04-release branch August 23, 2023 22:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT OSS 21.04 release #1185

TensorRT OSS 21.04 release #1185

rajeevsrao commented Apr 12, 2021 •

edited

Loading

TensorRT OSS 21.04 release #1185

TensorRT OSS 21.04 release #1185

Conversation

rajeevsrao commented Apr 12, 2021 • edited Loading

Added

Changed

Removed

rajeevsrao commented Apr 12, 2021 •

edited

Loading