[TensorRT] Support Multiple EP Context #23294

jingyanwangms · 2025-01-08T21:59:19Z

Description

Use CreateEpContextModel from graph_partitioner.cc to save model with context ep. Now multi ep context in a model is supported
Updated merging ep context related options from session option and tensorrt option
Updated and adding unit test

Supported scenarios:

Save/run static single ep context node using engine cache
Save/run static single ep context node with embedded ep context info
Save/run static multiple ep context node using engine cache
Save/run static multiple ep context node with embedded ep context info
Save/run dynamic multiple ep context node using engine cache
Save/run dynamic multiple ep context node with embedded ep context info

Unsupported scenarios:

Subsequent runs with embedded dynamic input ep context node where dynamic input dimension changed
This does not work because tensorrt engine might be updated during run time because of input size change but ORT does not have a call back mechanism to call CreateEpContextModel to update embedded ep context. Supporting this will require significant changes in the existing infrastructure.

Motivation and Context

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

chilo-ms · 2025-01-14T05:49:19Z

You should modify tensorrt_execution_provider.cc line # 3853 to 3856

      // dump ep context model
      if (dump_ep_context_model_ && ep_context_embed_mode_) {
        UpdateCtxNodeModelEngineContext(model_proto_.get(), reinterpret_cast<char*>(serialized_engine->data()), serialized_engine->size());
        DumpCtxModel(model_proto_.get(), ctx_model_path_);
      }

The code above handles the case when the graph has dynamic shape input(s) and the engine is being updated during inference.
Old TRT EP behavior will update the engine binary embedded in EP Context node and dump the EP Context model to disk.
In this PR to support EP Context model for partitioning, it's graph partitioner which dumps the model to disk, but we still need to think about how to handle the special case here for TRT EP. If not, the new TRT EP might not work for the old app which works on dynamic shape input and ep_context_embed_mode is 1.

jingyanwangms · 2025-01-22T22:29:55Z

You should modify tensorrt_execution_provider.cc line # 3853 to 3856
      // dump ep context model
      if (dump_ep_context_model_ && ep_context_embed_mode_) {
        UpdateCtxNodeModelEngineContext(model_proto_.get(), reinterpret_cast<char*>(serialized_engine->data()), serialized_engine->size());
        DumpCtxModel(model_proto_.get(), ctx_model_path_);
      }
The code above handles the case when the graph has dynamic shape input(s) and the engine is being updated during inference. Old TRT EP behavior will update the engine binary embedded in EP Context node and dump the EP Context model to disk. In this PR to support EP Context model for partitioning, it's graph partitioner which dumps the model to disk, but we still need to think about how to handle the special case here for TRT EP. If not, the new TRT EP might not work for the old app which works on dynamic shape input and ep_context_embed_mode is 1.

I added a warning in if (dump_ep_context_model_ && ep_context_embed_mode_) case to prompt user generate ep context model. Handling this case will require changes to all EP context design. We have confirmed this is a lower priority use case

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

…r.cc Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…nferenceWithMultiThreads

github-actions

You can commit the suggested changes from lintrunner.

github-actions · 2025-02-03T23:17:05Z

onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc

  ASSERT_TRUE(status.IsOK());
-  // run inference
-  // TRT engine will be created and cached
-  // TRT profile will be created and cached only for dynamic input shape
-  // Data in profile,
-  // X: 1, 3, 3, 2, 2, 2
-  // Y: 1, 3, 3, 2, 2, 2
-  // Z: 1, 3, 3, 2, 2, 2
-  RunSession(session_object3, run_options, feeds, output_names, expected_dims_mul_m, expected_values_mul_m);
+
+  // Test engine cache path:


Suggested change

ASSERT_TRUE(status.IsOK());

// run inference

// TRT engine will be created and cached

// TRT profile will be created and cached only for dynamic input shape

// Data in profile,

// X: 1, 3, 3, 2, 2, 2

// Y: 1, 3, 3, 2, 2, 2

// Z: 1, 3, 3, 2, 2, 2

RunSession(session_object3, run_options, feeds, output_names, expected_dims_mul_m, expected_values_mul_m);

// Test engine cache path:

ASSERT_TRUE(status.IsOK());

// Test engine cache path:

onnxruntime/test/python/onnxruntime_test_python_nested_control_flow_op.py

github-actions

You can commit the suggested changes from lintrunner.

github-actions · 2025-02-05T20:21:06Z

onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc

+
+
 std::vector<char> ReadFileFromDisk(const PathString& path) {


Suggested change

std::vector<char> ReadFileFromDisk(const PathString& path) {

std::vector<char> ReadFileFromDisk(const PathString& path) {

github-actions · 2025-02-05T20:21:06Z

onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc

  std::vector<int> dims = {1, 3, 2};
+
+  remove(ctx_model_path.c_str());  // remove the context model file generated by previous test


Suggested change

std::vector<int> dims = {1, 3, 2};

remove(ctx_model_path.c_str()); // remove the context model file generated by previous test

std::vector<int> dims = {1, 3, 2};

remove(ctx_model_path.c_str()); // remove the context model file generated by previous test

github-actions

You can commit the suggested changes from lintrunner.

github-actions · 2025-02-05T21:57:13Z

onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc

+  std::vector<int64_t> expected_dims_mul_m = {3, 6};
+  std::vector<int64_t> expected_values_mul_m = { 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 13, 14, 14, 16, 16, 18, 0, 1 };
+


Suggested change

std::vector<int64_t> expected_dims_mul_m = {3, 6};

std::vector<int64_t> expected_values_mul_m = { 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 13, 14, 14, 16, 16, 18, 0, 1 };

std::vector<int64_t> expected_dims_mul_m = {3, 6};

std::vector<int64_t> expected_values_mul_m = {1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 13, 14, 14, 16, 16, 18, 0, 1};

…o run remaining tests

…ut and change graph_view access

github-actions

You can commit the suggested changes from lintrunner.

github-actions · 2025-02-15T01:23:46Z

onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc

 bool GraphHasCtxNode(const GraphViewer& graph_viewer) {
-  for (int i = 0; i < graph_viewer.MaxNodeIndex(); ++i) {
-    auto node = graph_viewer.GetNode(i);
+  for (auto node_index: graph_viewer.GetNodesInTopologicalOrder()) {
+    auto node = graph_viewer.GetNode(node_index);


Suggested change

bool GraphHasCtxNode(const GraphViewer& graph_viewer) {

for (int i = 0; i < graph_viewer.MaxNodeIndex(); ++i) {

auto node = graph_viewer.GetNode(i);

for (auto node_index: graph_viewer.GetNodesInTopologicalOrder()) {

auto node = graph_viewer.GetNode(node_index);

bool GraphHasCtxNode(const GraphViewer& graph_viewer) {

for (auto node_index : graph_viewer.GetNodesInTopologicalOrder()) {

auto node = graph_viewer.GetNode(node_index);

github-actions · 2025-02-15T01:23:47Z

onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc

+  const auto& subgraph_node_list = graph_viewer.GetNodesInTopologicalOrder();
+  assert(subgraph_node_list.size() == 1); // There should only be 1 node in filtered graph
+  const auto node = graph_viewer.GetNode(subgraph_node_list[0]);


Suggested change

const auto& subgraph_node_list = graph_viewer.GetNodesInTopologicalOrder();

assert(subgraph_node_list.size() == 1); // There should only be 1 node in filtered graph

const auto node = graph_viewer.GetNode(subgraph_node_list[0]);

const auto& subgraph_node_list = graph_viewer.GetNodesInTopologicalOrder();

assert(subgraph_node_list.size() == 1); // There should only be 1 node in filtered graph

const auto node = graph_viewer.GetNode(subgraph_node_list[0]);

github-actions · 2025-02-15T01:23:47Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

  LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] TensorRT provider options: "
-                        << "device_id: " << device_id_
-                        << ", trt_max_partition_iterations: " << max_partition_iterations_
-                        << ", trt_min_subgraph_size: " << min_subgraph_size_
-                        << ", trt_max_workspace_size: " << max_workspace_size_
-                        << ", trt_fp16_enable: " << fp16_enable_
-                        << ", trt_int8_enable: " << int8_enable_
-                        << ", trt_int8_calibration_cache_name: " << int8_calibration_cache_name_
-                        << ", int8_calibration_cache_available: " << int8_calibration_cache_available_
-                        << ", trt_int8_use_native_tensorrt_calibration_table: " << int8_use_native_tensorrt_calibration_table_
-                        << ", trt_dla_enable: " << dla_enable_
-                        << ", trt_dla_core: " << dla_core_
-                        << ", trt_dump_subgraphs: " << dump_subgraphs_
-                        << ", trt_engine_cache_enable: " << engine_cache_enable_
-                        << ", trt_weight_stripped_engine_enable: " << weight_stripped_engine_enable_
-                        << ", trt_onnx_model_folder_path: " << onnx_model_folder_path_
-                        << ", trt_cache_path: " << cache_path_
-                        << ", trt_global_cache_path: " << global_cache_path_
-                        << ", trt_engine_decryption_enable: " << engine_decryption_enable_
-                        << ", trt_engine_decryption_lib_path: " << engine_decryption_lib_path_
-                        << ", trt_force_sequential_engine_build: " << force_sequential_engine_build_
-                        << ", trt_context_memory_sharing_enable: " << context_memory_sharing_enable_
-                        << ", trt_layer_norm_fp32_fallback: " << layer_norm_fp32_fallback_
-                        << ", trt_build_heuristics_enable: " << build_heuristics_enable_
-                        << ", trt_sparsity_enable: " << sparsity_enable_
-                        << ", trt_builder_optimization_level: " << builder_optimization_level_
-                        << ", trt_auxiliary_streams: " << auxiliary_streams_
-                        << ", trt_tactic_sources: " << tactic_sources_
-                        << ", trt_profile_min_shapes: " << profile_min_shapes
-                        << ", trt_profile_max_shapes: " << profile_max_shapes
-                        << ", trt_profile_opt_shapes: " << profile_opt_shapes
-                        << ", trt_cuda_graph_enable: " << cuda_graph_enable_
-                        << ", trt_dump_ep_context_model: " << dump_ep_context_model_
-                        << ", trt_ep_context_file_path: " << ep_context_file_path_
-                        << ", trt_ep_context_embed_mode: " << ep_context_embed_mode_
-                        << ", trt_cache_prefix: " << cache_prefix_
-                        << ", trt_engine_hw_compatible: " << engine_hw_compatible_
-                        << ", trt_onnx_model_bytestream_size_: " << onnx_model_bytestream_size_;
+            << "device_id: " << device_id_
+            << ", trt_max_partition_iterations: " << max_partition_iterations_
+            << ", trt_min_subgraph_size: " << min_subgraph_size_
+            << ", trt_max_workspace_size: " << max_workspace_size_
+            << ", trt_fp16_enable: " << fp16_enable_
+            << ", trt_int8_enable: " << int8_enable_
+            << ", trt_int8_calibration_cache_name: " << int8_calibration_cache_name_
+            << ", int8_calibration_cache_available: " << int8_calibration_cache_available_
+            << ", trt_int8_use_native_tensorrt_calibration_table: " << int8_use_native_tensorrt_calibration_table_
+            << ", trt_dla_enable: " << dla_enable_
+            << ", trt_dla_core: " << dla_core_
+            << ", trt_dump_subgraphs: " << dump_subgraphs_
+            << ", trt_engine_cache_enable: " << engine_cache_enable_
+            << ", trt_weight_stripped_engine_enable: " << weight_stripped_engine_enable_
+            << ", trt_onnx_model_folder_path: " << onnx_model_folder_path_
+            << ", trt_cache_path: " << cache_path_
+            << ", trt_global_cache_path: " << global_cache_path_
+            << ", trt_engine_decryption_enable: " << engine_decryption_enable_
+            << ", trt_engine_decryption_lib_path: " << engine_decryption_lib_path_
+            << ", trt_force_sequential_engine_build: " << force_sequential_engine_build_
+            << ", trt_context_memory_sharing_enable: " << context_memory_sharing_enable_
+            << ", trt_layer_norm_fp32_fallback: " << layer_norm_fp32_fallback_
+            << ", trt_build_heuristics_enable: " << build_heuristics_enable_
+            << ", trt_sparsity_enable: " << sparsity_enable_
+            << ", trt_builder_optimization_level: " << builder_optimization_level_
+            << ", trt_auxiliary_streams: " << auxiliary_streams_
+            << ", trt_tactic_sources: " << tactic_sources_
+            << ", trt_profile_min_shapes: " << profile_min_shapes
+            << ", trt_profile_max_shapes: " << profile_max_shapes
+            << ", trt_profile_opt_shapes: " << profile_opt_shapes
+            << ", trt_cuda_graph_enable: " << cuda_graph_enable_
+            << ", trt_dump_ep_context_model: " << dump_ep_context_model_
+            << ", trt_ep_context_file_path: " << ep_context_file_path_
+            << ", trt_ep_context_embed_mode: " << ep_context_embed_mode_
+            << ", trt_cache_prefix: " << cache_prefix_
+            << ", trt_engine_hw_compatible: " << engine_hw_compatible_
+            << ", trt_onnx_model_bytestream_size_: " << onnx_model_bytestream_size_;
 }


Suggested change

LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] TensorRT provider options: "

<< "device_id: " << device_id_

<< ", trt_max_partition_iterations: " << max_partition_iterations_

<< ", trt_min_subgraph_size: " << min_subgraph_size_

<< ", trt_max_workspace_size: " << max_workspace_size_

<< ", trt_fp16_enable: " << fp16_enable_

<< ", trt_int8_enable: " << int8_enable_

<< ", trt_int8_calibration_cache_name: " << int8_calibration_cache_name_

<< ", int8_calibration_cache_available: " << int8_calibration_cache_available_

<< ", trt_int8_use_native_tensorrt_calibration_table: " << int8_use_native_tensorrt_calibration_table_

<< ", trt_dla_enable: " << dla_enable_

<< ", trt_dla_core: " << dla_core_

<< ", trt_dump_subgraphs: " << dump_subgraphs_

<< ", trt_engine_cache_enable: " << engine_cache_enable_

<< ", trt_weight_stripped_engine_enable: " << weight_stripped_engine_enable_

<< ", trt_onnx_model_folder_path: " << onnx_model_folder_path_

<< ", trt_cache_path: " << cache_path_

<< ", trt_global_cache_path: " << global_cache_path_

<< ", trt_engine_decryption_enable: " << engine_decryption_enable_

<< ", trt_engine_decryption_lib_path: " << engine_decryption_lib_path_

<< ", trt_force_sequential_engine_build: " << force_sequential_engine_build_

<< ", trt_context_memory_sharing_enable: " << context_memory_sharing_enable_

<< ", trt_layer_norm_fp32_fallback: " << layer_norm_fp32_fallback_

<< ", trt_build_heuristics_enable: " << build_heuristics_enable_

<< ", trt_sparsity_enable: " << sparsity_enable_

<< ", trt_builder_optimization_level: " << builder_optimization_level_

<< ", trt_auxiliary_streams: " << auxiliary_streams_

<< ", trt_tactic_sources: " << tactic_sources_

<< ", trt_profile_min_shapes: " << profile_min_shapes

<< ", trt_profile_max_shapes: " << profile_max_shapes

<< ", trt_profile_opt_shapes: " << profile_opt_shapes

<< ", trt_cuda_graph_enable: " << cuda_graph_enable_

<< ", trt_dump_ep_context_model: " << dump_ep_context_model_

<< ", trt_ep_context_file_path: " << ep_context_file_path_

<< ", trt_ep_context_embed_mode: " << ep_context_embed_mode_

<< ", trt_cache_prefix: " << cache_prefix_

<< ", trt_engine_hw_compatible: " << engine_hw_compatible_

<< ", trt_onnx_model_bytestream_size_: " << onnx_model_bytestream_size_;

<< "device_id: " << device_id_

<< ", trt_max_partition_iterations: " << max_partition_iterations_

<< ", trt_min_subgraph_size: " << min_subgraph_size_

<< ", trt_max_workspace_size: " << max_workspace_size_

<< ", trt_fp16_enable: " << fp16_enable_

<< ", trt_int8_enable: " << int8_enable_

<< ", trt_int8_calibration_cache_name: " << int8_calibration_cache_name_

<< ", int8_calibration_cache_available: " << int8_calibration_cache_available_

<< ", trt_int8_use_native_tensorrt_calibration_table: " << int8_use_native_tensorrt_calibration_table_

<< ", trt_dla_enable: " << dla_enable_

<< ", trt_dla_core: " << dla_core_

<< ", trt_dump_subgraphs: " << dump_subgraphs_

<< ", trt_engine_cache_enable: " << engine_cache_enable_

<< ", trt_weight_stripped_engine_enable: " << weight_stripped_engine_enable_

<< ", trt_onnx_model_folder_path: " << onnx_model_folder_path_

<< ", trt_cache_path: " << cache_path_

<< ", trt_global_cache_path: " << global_cache_path_

<< ", trt_engine_decryption_enable: " << engine_decryption_enable_

<< ", trt_engine_decryption_lib_path: " << engine_decryption_lib_path_

<< ", trt_force_sequential_engine_build: " << force_sequential_engine_build_

<< ", trt_context_memory_sharing_enable: " << context_memory_sharing_enable_

<< ", trt_layer_norm_fp32_fallback: " << layer_norm_fp32_fallback_

<< ", trt_build_heuristics_enable: " << build_heuristics_enable_

<< ", trt_sparsity_enable: " << sparsity_enable_

<< ", trt_builder_optimization_level: " << builder_optimization_level_

<< ", trt_auxiliary_streams: " << auxiliary_streams_

<< ", trt_tactic_sources: " << tactic_sources_

<< ", trt_profile_min_shapes: " << profile_min_shapes

<< ", trt_profile_max_shapes: " << profile_max_shapes

<< ", trt_profile_opt_shapes: " << profile_opt_shapes

<< ", trt_cuda_graph_enable: " << cuda_graph_enable_

<< ", trt_dump_ep_context_model: " << dump_ep_context_model_

<< ", trt_ep_context_file_path: " << ep_context_file_path_

<< ", trt_ep_context_embed_mode: " << ep_context_embed_mode_

<< ", trt_cache_prefix: " << cache_prefix_

<< ", trt_engine_hw_compatible: " << engine_hw_compatible_

<< ", trt_onnx_model_bytestream_size_: " << onnx_model_bytestream_size_;

}

LOGS_DEFAULT(VERBOSE) << "[TensorRT EP] TensorRT provider options: "

<< "device_id: " << device_id_

<< ", trt_max_partition_iterations: " << max_partition_iterations_

<< ", trt_min_subgraph_size: " << min_subgraph_size_

<< ", trt_max_workspace_size: " << max_workspace_size_

<< ", trt_fp16_enable: " << fp16_enable_

<< ", trt_int8_enable: " << int8_enable_

<< ", trt_int8_calibration_cache_name: " << int8_calibration_cache_name_

<< ", int8_calibration_cache_available: " << int8_calibration_cache_available_

<< ", trt_int8_use_native_tensorrt_calibration_table: " << int8_use_native_tensorrt_calibration_table_

<< ", trt_dla_enable: " << dla_enable_

<< ", trt_dla_core: " << dla_core_

<< ", trt_dump_subgraphs: " << dump_subgraphs_

<< ", trt_engine_cache_enable: " << engine_cache_enable_

<< ", trt_weight_stripped_engine_enable: " << weight_stripped_engine_enable_

<< ", trt_onnx_model_folder_path: " << onnx_model_folder_path_

<< ", trt_cache_path: " << cache_path_

<< ", trt_global_cache_path: " << global_cache_path_

<< ", trt_engine_decryption_enable: " << engine_decryption_enable_

<< ", trt_engine_decryption_lib_path: " << engine_decryption_lib_path_

<< ", trt_force_sequential_engine_build: " << force_sequential_engine_build_

<< ", trt_context_memory_sharing_enable: " << context_memory_sharing_enable_

<< ", trt_layer_norm_fp32_fallback: " << layer_norm_fp32_fallback_

<< ", trt_build_heuristics_enable: " << build_heuristics_enable_

<< ", trt_sparsity_enable: " << sparsity_enable_

<< ", trt_builder_optimization_level: " << builder_optimization_level_

<< ", trt_auxiliary_streams: " << auxiliary_streams_

<< ", trt_tactic_sources: " << tactic_sources_

<< ", trt_profile_min_shapes: " << profile_min_shapes

<< ", trt_profile_max_shapes: " << profile_max_shapes

<< ", trt_profile_opt_shapes: " << profile_opt_shapes

<< ", trt_cuda_graph_enable: " << cuda_graph_enable_

<< ", trt_dump_ep_context_model: " << dump_ep_context_model_

<< ", trt_ep_context_file_path: " << ep_context_file_path_

<< ", trt_ep_context_embed_mode: " << ep_context_embed_mode_

<< ", trt_cache_prefix: " << cache_prefix_

<< ", trt_engine_hw_compatible: " << engine_hw_compatible_

<< ", trt_onnx_model_bytestream_size_: " << onnx_model_bytestream_size_;

}

github-actions · 2025-02-15T01:23:47Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

  // Generate file name for dumping ep context model
-  if (dump_ep_context_model_ && ctx_model_path_.empty()) {
-    ctx_model_path_ = GetCtxModelPath(ep_context_file_path_, model_path_);
-  }
-
+
  if (!has_dynamic_shape) {


Suggested change

// Generate file name for dumping ep context model

if (dump_ep_context_model_ && ctx_model_path_.empty()) {

ctx_model_path_ = GetCtxModelPath(ep_context_file_path_, model_path_);

}

if (!has_dynamic_shape) {

// Generate file name for dumping ep context model

if (!has_dynamic_shape) {

github-actions · 2025-02-15T01:23:47Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.h

+  bool is_single_node_epcontext_graph = false;
+
  std::unordered_set<std::string> control_flow_op_set_ = {"If", "Loop", "Scan"};


Suggested change

bool is_single_node_epcontext_graph = false;

std::unordered_set<std::string> control_flow_op_set_ = {"If", "Loop", "Scan"};

bool is_single_node_epcontext_graph = false;

std::unordered_set<std::string> control_flow_op_set_ = {"If", "Loop", "Scan"};

jingyanwangms added 6 commits December 21, 2024 06:58

Support EP context partition

00cecb7

Unit test and perftest fix

bf7c0df

perf test and EP updates

e8880c5

Updated session option merge and unit test

48c882e

Clean up

842b3e1

Removee debug logs

a63973e

jywu-msft requested a review from chilo-ms January 10, 2025 17:06

chilo-ms reviewed Jan 14, 2025

View reviewed changes

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Show resolved Hide resolved

chilo-ms reviewed Jan 14, 2025

View reviewed changes

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Outdated Show resolved Hide resolved

jingyanwangms added 4 commits January 15, 2025 00:10

Set correct ep_cache_context

0ac2ac7

Fix orphaned output

cfaee44

Fix GetCapabilities logic

41958dc

Add tensor(bool) to valid EPContext input type (Zcode model error)

44818f1

jingyanwangms added 8 commits January 24, 2025 00:26

merge with main

1fa99bd

Skip memory test

90c2554

Add unit test

194b90c

Try memory test

2786f88

lint runner

91e4e5e

Merge branch 'main' into jingywa/epcontext

2d8b2ba

Remove old UT carried from old branch

3a56834

Fix windows build error and regenerate ContribOperators.md

afe0a26

github-actions bot reviewed Jan 31, 2025

View reviewed changes

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Outdated Show resolved Hide resolved

jingyanwangms and others added 5 commits January 30, 2025 21:52

Update onnxruntime/core/providers/tensorrt/tensorrt_execution_provide…

169e13d

…r.cc Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Fix windows build

3db4f82

Merge

1b37a57

Fix TensorrtExecutionProviderTest.SessionCreationWithMultiThreadsAndI…

b8464fa

…nferenceWithMultiThreads

Fix CI

b18f425

github-actions bot reviewed Feb 3, 2025

View reviewed changes

Add tests

7ef865c

github-advanced-security bot found potential problems Feb 4, 2025

View reviewed changes

onnxruntime/test/python/onnxruntime_test_python_nested_control_flow_op.py Fixed Show fixed Hide fixed

Update UT

253a599

github-actions bot reviewed Feb 5, 2025

View reviewed changes

jingyanwangms added 2 commits February 5, 2025 21:25

fix typo

1fd22ef

Fix UT

bb393a9

github-actions bot reviewed Feb 5, 2025

View reviewed changes

jingyanwangms added 4 commits February 5, 2025 22:05

Fix nested test

8fc043f

lint

24cde8c

Disable SessionCreationWithMultiThreadsAndInferenceWithMultiThreads t…

61df552

…o run remaining tests

Overwrite _ctx model when embedeed 1 ep context node with dynamic inp…

02015d4

…ut and change graph_view access

github-actions bot reviewed Feb 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT] Support Multiple EP Context #23294

[TensorRT] Support Multiple EP Context #23294

jingyanwangms commented Jan 8, 2025 •

edited

Loading

chilo-ms commented Jan 14, 2025 •

edited

Loading

jingyanwangms commented Jan 22, 2025 •

edited

Loading

github-actions bot left a comment

github-actions bot left a comment

github-actions bot Feb 3, 2025

github-actions bot left a comment

github-actions bot Feb 5, 2025

github-actions bot Feb 5, 2025

github-actions bot left a comment

github-actions bot Feb 5, 2025

github-actions bot left a comment

github-actions bot Feb 15, 2025

github-actions bot Feb 15, 2025

github-actions bot Feb 15, 2025

github-actions bot Feb 15, 2025

github-actions bot Feb 15, 2025



		std::vector<char> ReadFileFromDisk(const PathString& path) {

		std::vector<int> dims = {1, 3, 2};

		remove(ctx_model_path.c_str()); // remove the context model file generated by previous test

		std::vector<int64_t> expected_dims_mul_m = {3, 6};
		std::vector<int64_t> expected_values_mul_m = { 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 13, 14, 14, 16, 16, 18, 0, 1 };

		bool is_single_node_epcontext_graph = false;

		std::unordered_set<std::string> control_flow_op_set_ = {"If", "Loop", "Scan"};

[TensorRT] Support Multiple EP Context #23294

Are you sure you want to change the base?

[TensorRT] Support Multiple EP Context #23294

Conversation

jingyanwangms commented Jan 8, 2025 • edited Loading

Description

Motivation and Context

chilo-ms commented Jan 14, 2025 • edited Loading

jingyanwangms commented Jan 22, 2025 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Feb 3, 2025

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Feb 5, 2025

Choose a reason for hiding this comment

github-actions bot Feb 5, 2025

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Feb 5, 2025

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot Feb 15, 2025

Choose a reason for hiding this comment

github-actions bot Feb 15, 2025

Choose a reason for hiding this comment

github-actions bot Feb 15, 2025

Choose a reason for hiding this comment

github-actions bot Feb 15, 2025

Choose a reason for hiding this comment

github-actions bot Feb 15, 2025

Choose a reason for hiding this comment

jingyanwangms commented Jan 8, 2025 •

edited

Loading

chilo-ms commented Jan 14, 2025 •

edited

Loading

jingyanwangms commented Jan 22, 2025 •

edited

Loading