TensorRT example with torch.compile #3203

agunapal · 2024-06-22T19:05:03Z

Description

Show TensorRT example with torch.compile

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

2024-06-22T18:40:15,859 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:res50-trt,model_version:default|#hostname:ip-172-31-4-205,timestamp:1719081615
2024-06-22T18:40:15,861 [DEBUG] W-9000-res50-trt_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1719081615861
2024-06-22T18:40:15,861 [INFO ] W-9000-res50-trt_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1719081615861
2024-06-22T18:40:15,863 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Backend received inference at: 1719081615
2024-06-22T18:40:15,873 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]ts_handler_preprocess.Milliseconds:10.249504089355469|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081615,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:16,795 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Using Default Torch-TRT Runtime (as requested by user)
2024-06-22T18:40:16,796 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Device not specified, using Torch default current device - cuda:0. If this is incorrect, please specify an input device, via the device keyword.
2024-06-22T18:40:16,796 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Compilation Settings: CompilationSettings(enabled_precisions={<dtype.f32: 7>}, debug=False, workspace_size=0, min_block_size=5, torch_executed_ops=set(), pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False, device=Device(type=DeviceType.GPU, gpu_id=0), require_full_compilation=False, disable_tf32=False, assume_dynamic_shape_support=False, sparse_weights=False, refit=False, engine_capability=<EngineCapability.STANDARD: 1>, num_avg_timing_iters=1, dla_sram_size=1048576, dla_local_dram_size=1073741824, dla_global_dram_size=536870912, dryrun=False, hardware_compatible=False)
2024-06-22T18:40:16,796 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - 
2024-06-22T18:40:18,108 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Node _param_constant0 of op type get_attr does not have metadata. This could sometimes lead to undefined behavior.
2024-06-22T18:40:18,109 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Some nodes do not have metadata (shape and dtype information). This could lead to problems sometimes if the graph has PyTorch and TensorRT segments.
2024-06-22T18:40:18,762 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 398, GPU 568 (MiB)
2024-06-22T18:40:20,965 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [MemUsageChange] Init builder kernel library: CPU +1621, GPU +290, now: CPU 2167, GPU 858 (MiB)
2024-06-22T18:40:22,096 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - TRT INetwork construction elapsed time: 0:00:01.094557
2024-06-22T18:40:22,113 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Global timing cache in use. Profiling results in this builder pass will be stored.

2024-06-22T18:40:35,970 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Detected 1 inputs and 1 output network tensors.
2024-06-22T18:40:36,566 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Host Persistent Memory: 363840
2024-06-22T18:40:36,567 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Device Persistent Memory: 6656
2024-06-22T18:40:36,567 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Scratch Memory: 524800
2024-06-22T18:40:36,567 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [BlockAssignment] Started assigning block shifts. This will take 97 steps to complete.
2024-06-22T18:40:36,568 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [BlockAssignment] Algorithm ShiftNTopDown took 2.07453ms to assign 5 blocks to 97 nodes requiring 7326208 bytes.
2024-06-22T18:40:36,569 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Activation Memory: 7325696
2024-06-22T18:40:36,569 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Weights Memory: 112691616
2024-06-22T18:40:36,574 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Engine generation completed in 14.4618 seconds.
2024-06-22T18:40:36,575 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 115 MiB
2024-06-22T18:40:36,640 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 4113 MiB
2024-06-22T18:40:36,644 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Serialized 26 bytes of code generator cache.
2024-06-22T18:40:36,645 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Serialized 320 timing cache entries
2024-06-22T18:40:36,645 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Build TRT engine elapsed time: 0:00:14.548270
2024-06-22T18:40:36,645 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - TRT Engine uses: 114533172 bytes of Memory
2024-06-22T18:40:37,268 [WARN ] W-9000-res50-trt_1.0-stderr MODEL_LOG - WARNING: [Torch-TensorRT] - Using default stream in enqueueV3() may lead to performance issues due to additional calls to cudaStreamSynchronize() by TensorRT to ensure correct synchronization. Please use non-default stream instead.
2024-06-22T18:40:37,270 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]ts_handler_inference.Milliseconds:21395.96484375|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081637,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:37,308 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]ts_handler_postprocess.Milliseconds:38.030784606933594|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081637,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:37,308 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]HandlerTime.Milliseconds:21445.23|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081637,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:37,309 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_METRICS - HandlerTime.ms:21445.23|#ModelName:res50-trt,Level:Model|#hostname:ip-172-31-4-205,requestID:7a5845b4-cbfb-4a97-bbc4-6f7900a1870f,timestamp:1719081637
2024-06-22T18:40:37,309 [INFO ] W-9000-res50-trt_1.0 org.pytorch.serve.wlm.BatchAggregator - Sending response for jobId 7a5845b4-cbfb-4a97-bbc4-6f7900a1870f
2024-06-22T18:40:37,309 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:21445.32|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081637,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:37,309 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_METRICS - PredictionTime.ms:21445.32|#ModelName:res50-trt,Level:Model|#hostname:ip-172-31-4-205,requestID:7a5845b4-cbfb-4a97-bbc4-6f7900a1870f,timestamp:1719081637
2024-06-22T18:40:37,310 [INFO ] W-9000-res50-trt_1.0 ACCESS_LOG - /127.0.0.1:49456 "PUT /predictions/res50-trt HTTP/1.1" 200 21453
2024-06-22T18:40:37,311 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:ip-172-31-4-205,timestamp:1719081637
2024-06-22T18:40:37,311 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:2.1448542294E7|#model_name:res50-trt,model_version:default|#hostname:ip-172-31-4-205,timestamp:1719081637
2024-06-22T18:40:37,311 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:181.33|#model_name:res50-trt,model_version:default|#hostname:ip-172-31-4-205,timestamp:1719081637
2024-06-22T18:40:37,312 [DEBUG] W-9000-res50-trt_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 181330, Backend time ns: 21450549064
2024-06-22T18:40:37,312 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:ip-172-31-4-205,timestamp:1719081637
2024-06-22T18:40:37,312 [INFO ] W-9000-res50-trt_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 21447
2024-06-22T18:40:37,312 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:4.0|#Level:Host|#hostname:ip-172-31-4-205,timestamp:1719081637
{
  "tabby": 0.27221813797950745,
  "tiger_cat": 0.13754481077194214,
  "Egyptian_cat": 0.04620043560862541,
  "lynx": 0.003195191267877817,
  "lens_cap": 0.00225762533955276

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

mreso

LGTM, minor comments

examples/torch_tensorrt/torchcompile/README.md

examples/torch_tensorrt/torchscript/README.md

Co-authored-by: Matthias Reso <13337103+mreso@users.noreply.github.com>

agunapal added 2 commits June 22, 2024 19:03

TensorRT example with torch.compile

0286710

TensorRT example with torch.compile

12f9cd3

agunapal added this to the v0.12.0 milestone Jun 22, 2024

agunapal added the torch.compile label Jun 22, 2024

agunapal requested a review from mreso June 22, 2024 19:26

Update README.md

a4bedb7

mreso approved these changes Jun 25, 2024

View reviewed changes

examples/torch_tensorrt/torchcompile/README.md Outdated Show resolved Hide resolved

examples/torch_tensorrt/torchscript/README.md Outdated Show resolved Hide resolved

mreso and others added 4 commits June 25, 2024 11:52

Merge branch 'master' into examples/tensorrt_compile

f018150

Update examples/torch_tensorrt/torchscript/README.md

7d76ee4

Co-authored-by: Matthias Reso <13337103+mreso@users.noreply.github.com>

Update README.md

2a070c0

Merge branch 'master' into examples/tensorrt_compile

e4b18af

agunapal enabled auto-merge June 26, 2024 01:40

agunapal added this pull request to the merge queue Jun 26, 2024

Merged via the queue into master with commit 7a9b145 Jun 26, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT example with torch.compile #3203

TensorRT example with torch.compile #3203

agunapal commented Jun 22, 2024

mreso left a comment

TensorRT example with torch.compile #3203

TensorRT example with torch.compile #3203

Conversation

agunapal commented Jun 22, 2024

Description

Type of change

Feature/Issue validation/testing

Checklist:

mreso left a comment

Choose a reason for hiding this comment