[MXNET-703] TensorRT runtime integration #11325

mkolod · 2018-06-18T15:41:49Z

Description

This PR introduces runtime integration of TensorRT into MxNet, in order to accelerate inference.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Test coverage:
- the end-to-end application of the graph partitioner, NNVM-to-ONNX converer, and TensorRT inference is covered by a test (LeNet-5 on MNIST). Some intermediate methods aren't tested, but end-to-end is.
CI build updates
- There is a new Docker build for TensorRT, along with a unit test.
Code is well-documented:
- Code is fairly documented
- There is additional documentation regarding the design here.
- A README discusses usage

Changes

NNVM graph partitioner for TensorRT graph compatibility
NNVM-to-ONNX converter (for common layers such as convolution, batchnorm, pooling, fully-connected, etc.)
Git submodule to support the ONNX-to-TensorRT graph converter
Makefile updates to support linking against ONNX, Protobuf (for ONNX) and TensorRT
Code guards to only enable the TensorRT-inclusive build when USE_TENSORRT flag is added to config.mk
End-to-end (training/inference) unit test for LeNet-5 on MNIST
Tests of Gluon CV zoo model converted to symbol API graphs (ResNet v1/v2, ResNeXt, SSD)

Comments

The TensorRT 3.0 package only works with CUDA up to 9.0. TensorRT 4.0 is in RC and will soon be GA, and it will support CUDA 9.1 onwards. The build was tested with both TensorRT 3.0 GA and 4.0 RC, but won't be updated to TensorRT 4.0 until it becomes GA.

mkolod · 2018-06-18T15:52:10Z

@KellenSunderland ^^ in case you're interested.

mkolod · 2018-06-18T15:54:36Z

Also, I'd like to add that this PR represents joint work with @Caenorst. All intermediate commits got squashed into one, but his significant contributions are attributed in the comments, and I wanted to mention his work here as well.

reminisce · 2018-06-18T17:52:17Z

@mkolod It would be useful if you could document the criteria adopted in this PR for finding subgraphs. Is the process solely based upon matching operator names, or certain network structures/patterns are factored in? Seems it's not explained in the design doc. Could you also answer the questions there? Thanks.
https://docs.google.com/document/d/1UbsUacxWRKXCEE6v0r4VmKL76QLmFQYgMyAcQP0I8U0/edit

mkolod · 2018-06-21T17:25:31Z

@reminisce Thanks for the request. @Caenorst wrote the partitioner and he will answer the questions regarding that component. Hence, the design doc will be updated. I orginally wanted to put the whole design doc on the MxNet Confluence page, but I didn't have write permissions. If anyone does and feels like migrating the doc there, that would probably be quite useful for long-term storage of the documentation.

zhreshold · 2018-06-22T22:43:14Z

It is a very nice feature, solid PR.
I do observe that now the graph executors depend on contrib features like onnx graph and tensorrt op, introducing a bit chaos.

eric-haibin-lin · 2018-06-24T06:02:18Z

docs/api/python/contrib/tensorrt.md

@@ -0,0 +1,117 @@
+# MxNet-TensorRT Runtime Integration


I had the same observation and provided a pull to Marek that should address the issue.

eric-haibin-lin · 2018-06-24T06:12:48Z

python/mxnet/cuda_utils.py

+import ctypes as C
+from ctypes.util import find_library
+
+def cint(init_val=0):


This doesn't look like cuda specific. Suggest to move to base.py

@eric-haibin-lin Good point, the Ctypes utils could just be moved to base, and then reused in cuda_utils.

eric-haibin-lin · 2018-06-24T06:17:30Z

src/common/serialization.h

+namespace common {
+
+template<typename T>
+inline size_t serialized_size(const T& obj);


Do we rely on the end2end example to test all these functions? I'm curious about the test coverage. Would cpp unit test increase the coverage?

@eric-haibin-lin It would make sense to increase test coverage for this independently. Will add it to the to-do list for polishing up the PR.

eric-haibin-lin · 2018-06-24T06:18:08Z

src/executor/exec_pass.h

+ * \brief Replace subgraphs by TRT (forward only)
+ */
+Graph ReplaceSubgraph(Graph&& g,
+                      std::unordered_set<nnvm::Node*> set_subgraph,


Changing to const&

eric-haibin-lin · 2018-06-24T06:29:09Z

src/executor/tensorrt_pass.cc

+Graph ReplaceSubgraph(Graph&& g,
+                      std::unordered_set<nnvm::Node*> set_subgraph,
+                      std::unordered_map<std::string, NDArray>* const params_map) {
+  // Create MxNet subgraph


What do you think of the graph partition and subgraph selector PR https://github.com/apache/incubator-mxnet/pull/11251/files ? Is it possible to create trt subgraph in the same way? What do you see missing?

eric-haibin-lin · 2018-06-24T06:29:59Z

src/operator/contrib/nnvm_to_onnx-inl.h

+    const nnvm::Graph &g,
+    std::unordered_map<std::string, NDArray> *const shared_buffer);
+
+static const std::unordered_map<std::string, ConverterFunction> converter_map = {


Are these the only set of ops supported by this PR?

@eric-haibin-lin Yes, so far. TensorRT supports more operators, so the list will be expanded once the initial integration is in place.

reminisce · 2018-06-25T03:27:15Z

docs/api/python/contrib/tensorrt.md

+# MxNet-TensorRT Runtime Integration
+## What is this?
+
+This document described how to use the [MxNet](http://mxnet.incubator.apache.org/)-[TensorRT](https://developer.nvidia.com/tensorrt) runtime integration to accelerate model inference.


MxNet -> MXNet. There are multiple places where "MxNet" is used and should be corrected.

I had the same observation and provided a pull to Marek that should address the issue.

reminisce · 2018-06-25T03:32:55Z

docs/api/python/contrib/tensorrt.md

+1. When loading a pre-trained model, the inference will be handled using the Symbol API, rather than the Module API.
+2. In order to provide the weights to the MxNet (NNVM) to TensorRT graph converter befor the symbol is fully bound (before the memory is allocated, etc.), the `arg_params` and `aux_params` need to be provided to the symbol's `simple_bind` method. The weights and other values (e.g. moments learned from data by batch normalization, provided via `aux_params`) will be provided via the `shared_buffer` argument to `simple_bind` as follows:
+```python
+executor = sym.simple_bind(ctx=ctx, data = data_shape,


Is simple_bind the only interface supported in this integration? It is far less common for users to use this API directly. Normally, Module API and C Predict API are adopted in inference production jobs.

reminisce · 2018-06-25T03:33:37Z

docs/api/python/contrib/tensorrt.md

+
+all_params = merge_dicts(arg_params, aux_params)
+```
+This `all_params` dictionary cn be seem in use in the `simple_bind` call in `#2`.


cn -> can
seem -> seen

@reminisce Good catch, thanks!

reminisce · 2018-06-25T03:36:18Z

docs/api/python/contrib/tensorrt.md

+When building your own models, feel free to use the above ResNet-50 model as an example. Here, we highlight a small number of issues that need to be taken into account.
+
+1. When loading a pre-trained model, the inference will be handled using the Symbol API, rather than the Module API.
+2. In order to provide the weights to the MxNet (NNVM) to TensorRT graph converter befor the symbol is fully bound (before the memory is allocated, etc.), the `arg_params` and `aux_params` need to be provided to the symbol's `simple_bind` method. The weights and other values (e.g. moments learned from data by batch normalization, provided via `aux_params`) will be provided via the `shared_buffer` argument to `simple_bind` as follows:


moments -> momenta

reminisce · 2018-06-25T03:39:47Z

include/mxnet/executor.h

@@ -152,19 +152,19 @@ class Executor {
  static Executor* SimpleBind(nnvm::Symbol symbol,


This breaks API backward compatibility. You can define another simple bind function such as "SimpleBineEx" to achieve your purpose.

@reminisce: I'm not sure that I'd consider this a breaking API change. My understanding from what was discussed the dev list was that we would follow the hour-glass model for the c_api, and make the c_api the only native interface that we semantically version.

@KellenSunderland You got a valid point. My worry was that since this function is declared in the header placed in the include directory, we have no idea whether users have used this function or not to build something for their own.

@reminisce Yes I do see what you mean. It's certainly possible someone used the header. As long as it's not too much work would you mind creating a second function @mkolod?

I think we don't need to maintain compatibility on this. Its unlikely any user would depend on it.

Thanks @piiswrong !

@mkolod As we discussed offline about not breaking existing APIs, could you create a simple bind API for you to use only, rather than modifying this one?

reminisce · 2018-06-25T04:28:57Z

src/executor/tensorrt_pass.cc

+    }
+  });
+  inputs.insert(inputs.begin(), _inputs.begin(), _inputs.end());
+  return inputs;


Same question as for GetSubgraphOutputs.

@reminisce See my question to your question for GetSubgraphOutputs.

reminisce · 2018-06-25T04:35:31Z

src/executor/tensorrt_pass.cc

+  const auto& idx     = g.indexed_graph();
+  const auto& sub_idx = subgraph.indexed_graph();
+
+  const auto shape               = g.GetAttr<nnvm::ShapeVector>("shape");


const reference. Same for the following.

Good point.

reminisce · 2018-06-25T04:54:44Z

src/operator/contrib/tensorrt-inl.h

+                const std::vector<TBlob>& inputs, const std::vector<OpReqType>& req,
+                const std::vector<TBlob>& outputs);
+
+inline bool TRTInferShape(const NodeAttrs& attrs,


You can move these functions to .cc if they are not called anywhere else.

reminisce · 2018-06-25T04:55:58Z

src/operator/contrib/tensorrt-inl.h

+                          const std::vector<int>& itype);
+
+template<typename xpu>
+void TRTCompute(const OpStatePtr& state, const OpContext& ctx,


Does it make sense to define FCopute CPU? You can just define the GPU function in .cu and only register FStatefulCompute<gpu>.

@reminisce Yes, to cause it to fail with a useful message. Otherwise it will fail as not implemented, but if the message needs to be overriddent, this could be useful. However if the default not implemented exception message is good enough, this could be refactored. Thoughts?

IMHO, it's framework's job to throw error messages like this. Registering a CPU version of stateful FCompute for TRT doesn't sound semantically correct, even though it would print an error message in the Forward function. If the framework's error message is not informative enough, we can always improve it.

Agree with @reminisce . I think that if we're improving the messaging here we should do it in the framework.

@reminisce @KellenSunderland Makes sense, thanks!

reminisce · 2018-06-25T04:58:16Z

src/operator/contrib/tensorrt.cc

+  return true;
+}
+
+inline std::vector<std::string> TRTListInputNames(const NodeAttrs& attrs) {


Does this keep the order of inputs consistent with their corresponding nodes' topological order in the original graph?

no the order of input is random (iterate through an nnvm::NodeEntryMap)

If the input is random, would this mismatch the original user input data order? For example, if op A is TRT compatible and it has multiple inputs whose order is A.ListInputNames(), after you make a TRT subgraph for A, the input order of the whole graph may be different from before. When users call on the executor of the TRT graph for inference, their data inputs still keep the original order while the TRT graph is expecting another order. Would this cause mismatching problem?

@reminisce I think that the user sees only the original symbol, and not the optimized graph after the rewrite, because the optimization happens at simple_bind time, but the executor gets a copy of the original symbol, not the optimized one. So, from the user's perspective, nothing changes. Also, the weight matrices are fed in via the shared_buffer, so their order also doesn't matter, since shared_buffer is a dict in Python and a std::unordered_map inm C++. To me the whole graph rewrite is enapsulated and the user doesn't see it at all. As for inputs to the graph (not weights), they are set via the executor's arg dict, which again is unordered (Python dict). Am I missing something?

I see. This only works for using symbol API directly. For more common use cases, where Module API is adopted, this would not work since the order of inputs may be different after partitioning, right?

@reminisce It's a good point regarding the Module. Unfortunately this initial PR does not support Module, partly because it relies on Symbol's simple_bind method to provide the shared buffer, which is necessary because TensorRT stores copies of the weights. Binding from Module hides away the access to the weights from the user, and the user providing the weights during binding makes this approach work not only with symbolic models, but also Gluon models converted to symbolic from the pre-trained GluonCV model zoo. The concern over Module is valid, and the support for it would probably require some eventual re-working on the integration, but I though it would be good to at least get this initial integration out there, and then broaden the scope to the more common Module approach.

KellenSunderland · 2018-06-25T10:12:46Z

example/image-classification/tensorrt/test_tensorrt_resnet50.py

+    num_ex = 10000
+    all_preds = np.zeros([num_ex, 10])
+
+    train_iter, test_iter = get_cifar10_iterator(args={'data_dir':data_dir, 'batch_size':batch_size}, kv={'num_workers':1, 'rank':0})


Are train_iter, test_iter used / required by this function?

KellenSunderland · 2018-06-25T10:13:17Z

example/image-classification/tensorrt/test_tensorrt_resnet50.py

+
+    train_iter2, test_iter2 = get_cifar10_iterator(args={'data_dir':data_dir, 'batch_size':num_ex}, kv={'num_workers':1, 'rank':0})
+
+    all_label_train = train_iter2.next().label[0].asnumpy()


Is all_label_train used / required by this function?

KellenSunderland · 2018-06-25T10:13:58Z

example/image-classification/tensorrt/test_tensorrt_resnet50.py

+import numpy as np
+from time import time
+import sys
+import urllib


Is urllib used / required somewhere in this test?

It's not used anymore (it used to be). Good catch! It's fixed now.

KellenSunderland · 2018-06-25T12:09:11Z

Makefile

@@ -94,6 +94,14 @@ else
 endif
 CFLAGS += -I$(TPARTYDIR)/mshadow/ -I$(TPARTYDIR)/dmlc-core/include -fPIC -I$(NNVM_PATH)/include -I$(DLPACK_PATH)/include -I$(TPARTYDIR)/tvm/include -Iinclude $(MSHADOW_CFLAGS)
 LDFLAGS = -pthread $(MSHADOW_LDFLAGS) $(DMLC_LDFLAGS)
+
+
+ifeq ($(USE_TENSORRT), 1)


We spoke offline about this, but just a quick note that we should also add the ability to build MXNet-TensorRT integration to our cmake builds.

@KellenSunderland I agree. Should the CMake build be part of the initial PR or a subsequent one?

@KellenSunderland ping ^^

I think either way would work.

KellenSunderland · 2018-06-25T13:31:41Z

example/image-classification/tensorrt/test_tensorrt_resnet50.py

+    return fnames
+
+def get_cifar10_iterator(args, kv):
+    data_shape = (3, 32, 32) #28, 28) 


Is this comment needed?

KellenSunderland · 2018-06-25T13:33:31Z

example/image-classification/tensorrt/test_tensorrt_resnet50.py

+from __future__ import print_function
+
+import os.path
+import subprocess


Is subprocess used / required somewhere in this test?

It isn't anymore, used to be, good catch!

KellenSunderland · 2018-06-25T13:37:56Z

example/image-classification/tensorrt/test_tensorrt_resnet50.py

+            try: # try to create the directory if it doesn't exists
+                os.makedirs(dir_name)
+            except OSError as exc:
+                if exc.errno != errno.EEXIST:


Missing errno import?

mkolod · 2018-06-30T22:11:06Z

@reminisce As far as I can tell, the two unresolved issues are your questions here and here. Since I'm not quite sure whether I understood your questions completely, I asked for a follow-up. Please let me know if these are still valid concerns, and if so, I need more info. Thanks!

Address design review and feedback from MXNet commiters

reminisce · 2018-08-09T18:41:05Z

src/executor/graph_executor.cc

@@ -56,8 +56,8 @@ GraphExecutor::~GraphExecutor() {
  }
 }

-inline NDArray InitZeros(const NDArrayStorageType stype, const TShape &shape,
-                                const Context &ctx, const int dtype) {
+inline NDArray GraphExecutor::InitZeros(const NDArrayStorageType stype, const TShape &shape,


I can see the reasoning of making these free util functions as member functions of graph executor after reading the trt executor implementation. However, this kind of encapsulation is unnecessary and against the goal of cutting component design dependencies. These util functions can be reused by any other modules/functions/classes other than graph executor. Can we put them in a util file as free functions instead? See here for the argument of preferring free functions to member functions.
https://stackoverflow.com/questions/5989734/effective-c-item-23-prefer-non-member-non-friend-functions-to-member-functions

Good point @reminisce. Missed that there's a few util headers these can move into. Updating now.

Address Pedro and Jun's comments.

piiswrong · 2018-08-09T22:34:48Z

tests/python/tensorrt/test_tensorrt_lenet5.py

+    if not use_tensorrt:
+        shared_buffer = dict([(k, v.as_in_context(mx.gpu(0))) for k, v in shared_buffer.items()])
+
+    executor = sym.simple_bind(ctx=mx.gpu(0),


This is still using shared_buffer to pass in weights.

Per our last discussion could you make a new API. For example, mx.sym.contrib.tensorrt_simple_bind.

I overlooked this part. Thank @piiswrong for pointing it out. @KellenSunderland I believe we all agreed that symbol binding API for TRT should be also in the contrib. Could you make the change? Thanks.

Yeah I've got a version of an implementation that does it. Shouldn't take me too long to add it. I wasn't 100% sure this was what was being asked for offline.

Tensorrt integration 16

Reorder arguments, allow position calling

reminisce · 2018-08-10T02:15:02Z

python/mxnet/contrib/tensorrt.py

+import logging
+import os
+
+from mxnet.symbol import Symbol


Should here be relative import?

Addressed here: #12124

merrymercy · 2018-08-16T01:17:21Z

@mkolod The link to README is broken. Can you update it?

@vandanavk

adding tutorial index pages to whitelist added custom fork feature adding settings to turn off/on doc sets using custom fork directory for artifacts automate upstream branch refresh switched to boolean types and added debug messaging build will copy current config files to each version build build will copy current config files to each version build stashing config files before checking out new version put mxnet.css as artifact to be copied during build fix formatting issues in h tags refactored to build each version in a different folder grab latest README from local fork using settings.ini for document sets per version fix R doc config for mxnet root matching conf.py updates to current and excluding 3rdparty folder align R doc gen bug fix with other PR 11970 pass the current tag in the make args and set to default if empty fix bug for default version and add BUILD_VER to make html call turning off scala docs for versions less than 1.2.0 turning off r docs until CI can handle it enabling new docs build capability in CI failover to fetching remote branch Remove stale Keras-MXNet tests from MXNet repo (apache#11902) Disable flaky cpp test (apache#12056) Adjusting tolerance level and removing fixed seed for tests: test_ifft, test_fft (apache#12010) * adjusting tolerance level and removing fixed seed * CI retrigger * removing status [MXNET-774] Flaky test in test_executor.py:test_bind (apache#12016) * fix test bind, remove fixed seed * add tracking info * remove tracking info fix flaky test_quantization.test_get_optimal_thresholds (apache#12004) removed fixed seed 1234 (apache#12072) tested with 100k runs, no failures improve error message of cudnn operators (apache#11886) Fix for undefined variable errors (apache#12037) * Undefined name in initializer * Fix undefined name in test_mkldnn * Fix for undefined names in examples Fix undefined_variable lint errors in examples (apache#12052) * Fix lint errors in dqn example * Fix lint error in gluon example * Fix undefined error in autoencoder example MXNET-776 [Perl] Better documentation/bug fixes. (apache#12038) * MXNET-776 1) Several new metric classes. 2) Improved documentation. 3) Bugfixes. * added links and fixed a typo. Redesign Jenkinsfiles (apache#12000) * Rework Jenkinsfile * Add functionality to assign node labels dynamically * Extract functions into util file * Change all Jenkinsfiles to use utils * Make a new commit... * Address review comments 1 * Address review comments 2 fix unidirectional model's parameter format (apache#12055) * fix unidirectional model's parameter format * Update rnn_layer.py Fix syntax errors in Jenkinsfiles (apache#12095) [MXAPPS-581] Straight Dope nightly fixes. (apache#11934) Enable 3 notebooks that were failing tests after making updates to the Straight Dope book. We also add pandas required by one of these notebooks. Fix jenkinsfile syntax errors (apache#12096) remove fixed seed for test_triplet_loss (apache#12011) got rid of fixed seed for test_optimizer/test_operator_gpu.test_ftml (apache#12003) [MXNET-696] Fix undefined variable errors (apache#11982) * Fix undefined error in image segmentation ctx is used undefined. Setting the default ctx to cpu and editing the comment to let the user know that it can be changed to GPU as required. * Fix undefined names in SSD example maskUtils is disabled. Remove code referencing it. Initializing start_offset. got rid of fixed seed for test_optimizer/test_operator_gpu.test_nag (apache#11981) Fix flaky test for elementwise_sum (apache#11959) Re-enabling test_operator.test_binary_math_operators (apache#11712) (apache#12053) Test passes on CPU and GPU (10000 runs) update docs to explain CPU incompatibilities (apache#11931) removed fixed from test_optimizer.test_signum (apache#12088) Add missing object to tests/nightly/model_backwards_compatibility_check/JenkinsfileForMBCC (apache#12108) Add GetName function in Symbol class for cpp pack (apache#12076) Add unique number of parameters to summary output in Gluon Block (apache#12077) * add unique parameters in summary output * rebuild Update fully_connected.cc documentation (apache#12097) [MXNET-244] Update RaspberryPI instructions (apache#11562) * Update RaspberryPI instructions [MXNET-749] Correct usages of `CutSubgraph` in 3 control flow operators (apache#12078) * Fix cut graph * Copy only when necessary * Add unittest for while_loop * Add unittest for foreach * Add unittest for cond * Avoid magic number: 0 => kUndefinedStorage [MXNET-703] TensorRT runtime integration (apache#11325) * [MXNET-703] TensorRT runtime integration Co-authored-by: Clement Fuji-Tsang <caenorst@hotmail.com> Co-authored-by: Kellen Sunderland <kellen.sunderland@gmail.com> * correctly assign self._optimized_symbol in executor * declare GetTrtCompatibleSubsets and ReplaceSubgraph only if MXNET_USE_TENSORRT * add comments in ReplaceSubgraph * Addressing Haibin's code review points * Check that shared_buffer is not empty when USE_TENSORRT is set * Added check that TensorRT binding is for inference only * Removed redundant decl. * WIP Refactored TRT integration and tests * Add more build guards, remove unused code * Remove ccache report * Remove redundant const in declaration * Clean Cmake TRT files * Remove TensorRT env var usage We don't want to use environment variables with TensorRT yet, the logic being that we want to try and have as much fwd compatiblity as possible when working on an experimental feature. Were we to add env vars they would have to be gaurenteed to work in the future until a major version change. Moving the functionality to a contrib call reduces this risk. * Use contrib optimize_graph instaed of bind * Clean up cycle detector * Convert lenet test to contrib optimize * Protect interface with trt build flag * Fix whitespace issues * Add another build guard to c_api * Move get_optimized_symbol to contrib area * Ignore gz files in test folder * Make trt optimization implicit * Remove unused declaration * Replace build guards with runtime errors * Change default value of TensorRT to off This is change applies to both TensorRT and non-TensorRT builds. * Warn user when TRT not active at runtime * Move TensorRTBind declaration, add descriptive errors * Test TensorRT graph execution, fix bugs * Fix lint and whitespace issues * Fix typo * Removed default value for set_use_tensorrt * Improved documentation and fixed spacing issues * Move static exec funcs to util files * Update comments to match util style * Apply const to loop element * Fix a few namespace issues * Make static funcs inline to avoid compiler warning * Remove unused inference code from lenet5_train * Add explicit trt contrib bind, update tests to use it * Rename trt bind call * Remove documentation that is not needed for trt * Reorder arguments, allow position calling Decrease success rate to make test more stable (apache#12092) I have added this test back to unit test coverage and decreased success rate even more, to make sure that fails would happen even more rare Add Clojure to website nav (apache#12075) * adding clojure to API navigation * adding clojure to the sidebar * switched order Fix flaky tests for quantize and requantize (apache#12040) [MXNET-703] Use relative path for symbol import (apache#12124) Fix shared memory with gluon dataloader, add option pin_memory (apache#11908) * use threading for mp dataloader fetching, allow pin_memory option * allow pin tuple of data into cpu_pinned * fix as_in_context if not cpu_pinned * fix cpu_pinned * fix unittest for windows, update doc that windows mp is available * fix pin_memory * fix lint * always use simplequeue for data queue * remove main thread clearing for data_queue * do not use outside folder as pythonpath but run nosetests inside * use :MXNET_LIBRARY_PATH= to locate dll * fix dll path * correct dll path reduce a copy for rowsparse parameter.reduce (apache#12039) GPU Memory Query to C API (apache#12083) * add support for GPU memory query * remove lint take custom dataset into consideration (apache#12093) [MXNET-782] Fix Custom Metric Creation in R tutorial (apache#12117) * fix tutorial * install instructions * fix typo [MXAPPS-805] Notebook execution failures in CI. (apache#12068) * [MXAPPS-805] Notebook execution failures in CI. * Add a retry policy when starting a notebook executor to handle the failure to start a notebook executor (due to a port collision, kernel taking too long to start, etc.). * Change logging level for tests to INFO so that we have more informative test output. * Make retry logic for Jupyter notebook execution specific to the error message we are looking for to prevent false positives in the retry logic. rm wrong infertype for AdaptiveAvgPool and BilinearReisze2D (apache#12098) Document MXNET_LIBRARY_PATH environment variable which was not documented explicitly. (apache#12074) Generalized reshape_like operator (apache#11928) * first commit * fix documentation * changed static_cast<bool>(end) to end.has_value() fixed documentation issues * change begin from int to optional * test None as lhs fix cython nnvm include path (apache#12133) CI scripts refinements. Separate Py2 and Py3 installs cripts. Fix perms. (apache#12125) zipfian random sampler without replacement (apache#12113) * code compiles * update doc * fix bug and add test * fix lint update dmlc-core (apache#12129) Fix quantized graphpass bug (apache#11937) * fix quantized graphpass bug * add residual quantization testcase * handle dtype and backend issues support selu activation function (apache#12059) Fix flaky test test_operator_gpu:deformable_conv and deformable_psroi_pooling (apache#12070) [MXNET-767] Fix flaky test for kl_loss (apache#11963) * Fix flaky test for kl_loss * remove comment. [MXNET-788] Fix for issue apache#11733 pooling op test (apache#12067) * added support to check_consistency function to generate random numbers for a specific datatype (ie. fp16) this ensures that for tests that compare results among different precisions, that data is generated in the least precise type and casted to the most precise changed test_pooling_with_type test case to specify fp16 precision for random input data renamed the 2nd test_pooling_with_type function to test_pooling_with_type2 so it doesnt redefine the first and both are tested fixed equation formatting issue in pooling operator description Added myself to the contributors readme file * updated from latest in master (had old version of the file) * shortened lines per lint spec * renamed default_type argument to rand_type for clarity updated function docstring with argument description removed rand_type setting for non-max pooling tests * cleaned up check_consistency function docstring Do not show "needs to register block" warning for registered blocks. (apache#12130) Fix precision issue of test case test_rnnrelu_bidirectional (apache#12099) * adjust tolerance only for relu for fixing test case bug * only adjust torence for test_rnnrelu_bidirectional and adjust back on test_rnnrelu_sym Accelerate the performance of topk for CPU side (apache#12085) * Accelerate the performance of topk for CPU side * Add comments for the code changes Remove unused TensorRT code (apache#12147) Removing some python code that isn't in the current TensorRT execution paths. This should make the code more readable and avoid potential linting errors. Thanks to @vandanavk for pointing out the dead code and @cclauss for a quick alternative fix. Co-authored-by: Vandana Kannan <vandanavk@users.noreply.github.com> Co-authored-by: cclauss <cclauss@bluewin.ch> Disable test_io.test_CSVIter (apache#12146) Fix RAT license checker which is broken in trunk (apache#12148) Remove obsolete CI folder set bind flag after bind completes (apache#12155) Fix MXPredReshape in the c_predict_api (apache#11493) * Fix MXPredReshape in the c_predict_api. * Add unittest for the C predict API. * Fix path in the test. * Fix for Windows. * Try again to fix for Windows. * One more try to fix test on Windows. * Try again with CI. * Try importing from mxnet first if cannot find the amalgamation lib. * Add a log message when libmxnet_predict.so is not found. * Set specific rtol and atol values. * Fix missing rtol and atol values. * Empty commit. * Try again with CI. * One more try with CI. * Retry CI. [Flaky Test] Fix test_gluon_model_zoo.test_models when MXNET_MKLDNN_DEBUG=1 (apache#12069) * reorder inputs * use function flatten vs build in method * update similar array atoi to 0.01 * fix reorder * enable MXNET_MKLDNN_DEBUG in CI * add exclude debug flag * fix lint * add warning log for excluded op * retrigger RAT check readme updated (apache#12170) update ndarray stack Doc for apache#11925 (apache#12015) * update ndarray stack Doc Add worker_fn argument to multiworker function (apache#12177) * add worker_fn argument to multiworker function * fix pylin Remove fixed seed for test_huber tests (apache#12169) Removed fixed seed and increased learning rate and tolerance for test_nadam (apache#12164) documentation changes. added full reference (apache#12153) * documentation changes. added full reference * fixing lint * fixing more lint * jenkins * adding the coding line utf-8 Partially enable flaky test for norm operator (apache#12027) add examples for slicing option (apache#11918) Module predict API can accept NDArray as input (apache#12166) * forward and predict can accept nd.array np.array [MXNET-744] Docs build tools update (apache#11990) [MXNET-744] Docs build tools update (apache#11990) [MXNET-696] Fix undefined name errors (apache#12137) * Fix undefined name error in neural style example * Fix import exception error * Fix undefined name in AUCMetric * Fix undefined name in a3c example Fix profiler executer when memonger is used (apache#12152) add handling for grad req type other than kNullOp for indices (apache#11983) Fix a minor bug in deformable_im2col.cuh (apache#12060) Function `deformable_col2im_coord ` called deformable_col2im_coord_gpu_kernel but check the deformable_col2im_gpu_kernel. [MXNet-744] Fix website build pipeline Python 3 issues (apache#12195) * Fix website build pipeline Python 3 issues (apache#12195) Fix MKLDNNSum cpp test failure (apache#12080) bump timeout on Jenkins for docs/website to 120 min (apache#12199) * bump timeout on Jenkins to 120 min * add branches to settings using v notation; apply appropiate settings Fixing typo in python/mxnet/symbol/image.py (apache#12194) Fixing typo in python/mxnet/symbol/image.py Fix the topk regression issue (apache#12197) (apache#12202) * Fix the topk regression issue (apache#12197) * Add comments pull changes in from master

* [MXNET-703] TensorRT runtime integration Co-authored-by: Clement Fuji-Tsang <caenorst@hotmail.com> Co-authored-by: Kellen Sunderland <kellen.sunderland@gmail.com> * correctly assign self._optimized_symbol in executor * declare GetTrtCompatibleSubsets and ReplaceSubgraph only if MXNET_USE_TENSORRT * add comments in ReplaceSubgraph * Addressing Haibin's code review points * Check that shared_buffer is not empty when USE_TENSORRT is set * Added check that TensorRT binding is for inference only * Removed redundant decl. * WIP Refactored TRT integration and tests * Add more build guards, remove unused code * Remove ccache report * Remove redundant const in declaration * Clean Cmake TRT files * Remove TensorRT env var usage We don't want to use environment variables with TensorRT yet, the logic being that we want to try and have as much fwd compatiblity as possible when working on an experimental feature. Were we to add env vars they would have to be gaurenteed to work in the future until a major version change. Moving the functionality to a contrib call reduces this risk. * Use contrib optimize_graph instaed of bind * Clean up cycle detector * Convert lenet test to contrib optimize * Protect interface with trt build flag * Fix whitespace issues * Add another build guard to c_api * Move get_optimized_symbol to contrib area * Ignore gz files in test folder * Make trt optimization implicit * Remove unused declaration * Replace build guards with runtime errors * Change default value of TensorRT to off This is change applies to both TensorRT and non-TensorRT builds. * Warn user when TRT not active at runtime * Move TensorRTBind declaration, add descriptive errors * Test TensorRT graph execution, fix bugs * Fix lint and whitespace issues * Fix typo * Removed default value for set_use_tensorrt * Improved documentation and fixed spacing issues * Move static exec funcs to util files * Update comments to match util style * Apply const to loop element * Fix a few namespace issues * Make static funcs inline to avoid compiler warning * Remove unused inference code from lenet5_train * Add explicit trt contrib bind, update tests to use it * Rename trt bind call * Remove documentation that is not needed for trt * Reorder arguments, allow position calling

ThomasDelteil · 2018-09-12T21:04:43Z

@mkolod @KellenSunderland @aaronmarkham
I can't find the README.md/tensorrt.md with TensorRT integration usage, has it been lost at merge time that would be critical for including TensorRT integration in the 1.3.0 release announcement.

The tensorRT contrib package does not appear in the docs since there is no tensorrt.md. @aaronmarkham are the docs not versionned anymore? Sorry I've been a bit out of the loops lately.

KellenSunderland · 2018-09-12T22:34:48Z

@mkolod (no need to worry about this one, I know you're slammed this week).

@ThomasDelteil I'm drafting a how-to based on the old doc that should get users up to speed.

mkolod requested a review from szha as a code owner June 18, 2018 15:41

mkolod force-pushed the tensorrt_integration branch from b1e155a to 26bd12d Compare June 18, 2018 15:49

mkolod force-pushed the tensorrt_integration branch from 26bd12d to 16fe162 Compare June 18, 2018 16:25

szha requested review from zhreshold, reminisce and eric-haibin-lin June 18, 2018 16:52

mkolod force-pushed the tensorrt_integration branch from 16fe162 to 1d56077 Compare June 19, 2018 16:47

mkolod requested a review from marcoabreu as a code owner June 21, 2018 17:08

mkolod force-pushed the tensorrt_integration branch from cca282e to 30e3356 Compare June 22, 2018 17:34

eric-haibin-lin reviewed Jun 24, 2018

View reviewed changes

reminisce reviewed Jun 25, 2018

View reviewed changes

KellenSunderland reviewed Jun 25, 2018

View reviewed changes

mkolod force-pushed the tensorrt_integration branch 6 times, most recently from 941ff3a to 2f0b3c2 Compare June 30, 2018 22:02

KellenSunderland and others added 2 commits August 9, 2018 17:41

Improved documentation and fixed spacing issues

55dd422

Merge pull request #8 from KellenSunderland/tensorrt_integration_wip

f7ff036

Address design review and feedback from MXNet commiters

reminisce suggested changes Aug 9, 2018

View reviewed changes

szha removed their request for review August 9, 2018 18:57

KellenSunderland and others added 6 commits August 9, 2018 22:11

Move static exec funcs to util files

ec9d3ea

Update comments to match util style

4b63738

Apply const to loop element

694cbfb

Fix a few namespace issues

2be7d25

Make static funcs inline to avoid compiler warning

369a3f7

Merge pull request #9 from KellenSunderland/tensorrt_integration_15

64b7e95

Address Pedro and Jun's comments.

reminisce approved these changes Aug 9, 2018

View reviewed changes

piiswrong reviewed Aug 9, 2018

View reviewed changes

KellenSunderland and others added 7 commits August 10, 2018 02:23

Remove unused inference code from lenet5_train

1c7698b

Add explicit trt contrib bind, update tests to use it

74b6603

Rename trt bind call

7fff80c

Remove documentation that is not needed for trt

a754aab

Merge pull request #10 from KellenSunderland/tensorrt_integration_16

7ea6ef9

Tensorrt integration 16

Reorder arguments, allow position calling

22a3823

Merge pull request #11 from KellenSunderland/tensorrt_integration_16

c3ace78

Reorder arguments, allow position calling

piiswrong approved these changes Aug 10, 2018

View reviewed changes

reminisce reviewed Aug 10, 2018

View reviewed changes

KellenSunderland mentioned this pull request Aug 10, 2018

[MXNET-703] Fix relative import in TensorRT #12122

Closed

5 tasks

marcoabreu merged commit c053262 into apache:master Aug 10, 2018

KellenSunderland mentioned this pull request Aug 10, 2018

[MXNET-703] Use relative path for symbol import #12124

Merged

5 tasks

cliffwoolley mentioned this pull request Aug 10, 2018

TensorRT doesn't accelerate onnx/onnx-tensorrt#35

Closed

		@@ -152,19 +152,19 @@ class Executor {
		static Executor* SimpleBind(nnvm::Symbol symbol,


		train_iter2, test_iter2 = get_cifar10_iterator(args={'data_dir':data_dir, 'batch_size':num_ex}, kv={'num_workers':1, 'rank':0})

		all_label_train = train_iter2.next().label[0].asnumpy()

[MXNET-703] TensorRT runtime integration #11325

[MXNET-703] TensorRT runtime integration #11325

Conversation

mkolod commented Jun 18, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

mkolod commented Jun 18, 2018

mkolod commented Jun 18, 2018 • edited Loading

reminisce commented Jun 18, 2018

mkolod commented Jun 21, 2018

zhreshold commented Jun 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Caenorst Jul 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkolod commented Jun 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy commented Aug 16, 2018

mkolod commented Jun 18, 2018 •

edited

Loading

mkolod commented Jun 18, 2018 •

edited

Loading

Caenorst Jul 10, 2018 •

edited

Loading

mkolod commented Jun 30, 2018 •

edited

Loading

KellenSunderland commented Sep 12, 2018 •

edited

Loading