Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT example with torch.compile #3203

Merged
merged 7 commits into from
Jun 26, 2024
Merged

Conversation

agunapal
Copy link
Collaborator

Description

Show TensorRT example with torch.compile

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

2024-06-22T18:40:15,859 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:res50-trt,model_version:default|#hostname:ip-172-31-4-205,timestamp:1719081615
2024-06-22T18:40:15,861 [DEBUG] W-9000-res50-trt_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1719081615861
2024-06-22T18:40:15,861 [INFO ] W-9000-res50-trt_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1719081615861
2024-06-22T18:40:15,863 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Backend received inference at: 1719081615
2024-06-22T18:40:15,873 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]ts_handler_preprocess.Milliseconds:10.249504089355469|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081615,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:16,795 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Using Default Torch-TRT Runtime (as requested by user)
2024-06-22T18:40:16,796 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Device not specified, using Torch default current device - cuda:0. If this is incorrect, please specify an input device, via the device keyword.
2024-06-22T18:40:16,796 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Compilation Settings: CompilationSettings(enabled_precisions={<dtype.f32: 7>}, debug=False, workspace_size=0, min_block_size=5, torch_executed_ops=set(), pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False, device=Device(type=DeviceType.GPU, gpu_id=0), require_full_compilation=False, disable_tf32=False, assume_dynamic_shape_support=False, sparse_weights=False, refit=False, engine_capability=<EngineCapability.STANDARD: 1>, num_avg_timing_iters=1, dla_sram_size=1048576, dla_local_dram_size=1073741824, dla_global_dram_size=536870912, dryrun=False, hardware_compatible=False)
2024-06-22T18:40:16,796 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - 
2024-06-22T18:40:18,108 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Node _param_constant0 of op type get_attr does not have metadata. This could sometimes lead to undefined behavior.
2024-06-22T18:40:18,109 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Some nodes do not have metadata (shape and dtype information). This could lead to problems sometimes if the graph has PyTorch and TensorRT segments.
2024-06-22T18:40:18,762 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 398, GPU 568 (MiB)
2024-06-22T18:40:20,965 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [MemUsageChange] Init builder kernel library: CPU +1621, GPU +290, now: CPU 2167, GPU 858 (MiB)
2024-06-22T18:40:22,096 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - TRT INetwork construction elapsed time: 0:00:01.094557
2024-06-22T18:40:22,113 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Global timing cache in use. Profiling results in this builder pass will be stored.

2024-06-22T18:40:35,970 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Detected 1 inputs and 1 output network tensors.
2024-06-22T18:40:36,566 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Host Persistent Memory: 363840
2024-06-22T18:40:36,567 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Device Persistent Memory: 6656
2024-06-22T18:40:36,567 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Scratch Memory: 524800
2024-06-22T18:40:36,567 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [BlockAssignment] Started assigning block shifts. This will take 97 steps to complete.
2024-06-22T18:40:36,568 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [BlockAssignment] Algorithm ShiftNTopDown took 2.07453ms to assign 5 blocks to 97 nodes requiring 7326208 bytes.
2024-06-22T18:40:36,569 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Activation Memory: 7325696
2024-06-22T18:40:36,569 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Total Weights Memory: 112691616
2024-06-22T18:40:36,574 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Engine generation completed in 14.4618 seconds.
2024-06-22T18:40:36,575 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 115 MiB
2024-06-22T18:40:36,640 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 4113 MiB
2024-06-22T18:40:36,644 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Serialized 26 bytes of code generator cache.
2024-06-22T18:40:36,645 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Serialized 320 timing cache entries
2024-06-22T18:40:36,645 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - Build TRT engine elapsed time: 0:00:14.548270
2024-06-22T18:40:36,645 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_LOG - TRT Engine uses: 114533172 bytes of Memory
2024-06-22T18:40:37,268 [WARN ] W-9000-res50-trt_1.0-stderr MODEL_LOG - WARNING: [Torch-TensorRT] - Using default stream in enqueueV3() may lead to performance issues due to additional calls to cudaStreamSynchronize() by TensorRT to ensure correct synchronization. Please use non-default stream instead.
2024-06-22T18:40:37,270 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]ts_handler_inference.Milliseconds:21395.96484375|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081637,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:37,308 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]ts_handler_postprocess.Milliseconds:38.030784606933594|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081637,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:37,308 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]HandlerTime.Milliseconds:21445.23|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081637,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:37,309 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_METRICS - HandlerTime.ms:21445.23|#ModelName:res50-trt,Level:Model|#hostname:ip-172-31-4-205,requestID:7a5845b4-cbfb-4a97-bbc4-6f7900a1870f,timestamp:1719081637
2024-06-22T18:40:37,309 [INFO ] W-9000-res50-trt_1.0 org.pytorch.serve.wlm.BatchAggregator - Sending response for jobId 7a5845b4-cbfb-4a97-bbc4-6f7900a1870f
2024-06-22T18:40:37,309 [INFO ] W-9000-res50-trt_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:21445.32|#ModelName:res50-trt,Level:Model|#type:GAUGE|#hostname:ip-172-31-4-205,1719081637,7a5845b4-cbfb-4a97-bbc4-6f7900a1870f, pattern=[METRICS]
2024-06-22T18:40:37,309 [INFO ] W-9000-res50-trt_1.0-stdout MODEL_METRICS - PredictionTime.ms:21445.32|#ModelName:res50-trt,Level:Model|#hostname:ip-172-31-4-205,requestID:7a5845b4-cbfb-4a97-bbc4-6f7900a1870f,timestamp:1719081637
2024-06-22T18:40:37,310 [INFO ] W-9000-res50-trt_1.0 ACCESS_LOG - /127.0.0.1:49456 "PUT /predictions/res50-trt HTTP/1.1" 200 21453
2024-06-22T18:40:37,311 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:ip-172-31-4-205,timestamp:1719081637
2024-06-22T18:40:37,311 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:2.1448542294E7|#model_name:res50-trt,model_version:default|#hostname:ip-172-31-4-205,timestamp:1719081637
2024-06-22T18:40:37,311 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:181.33|#model_name:res50-trt,model_version:default|#hostname:ip-172-31-4-205,timestamp:1719081637
2024-06-22T18:40:37,312 [DEBUG] W-9000-res50-trt_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 181330, Backend time ns: 21450549064
2024-06-22T18:40:37,312 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:ip-172-31-4-205,timestamp:1719081637
2024-06-22T18:40:37,312 [INFO ] W-9000-res50-trt_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 21447
2024-06-22T18:40:37,312 [INFO ] W-9000-res50-trt_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:4.0|#Level:Host|#hostname:ip-172-31-4-205,timestamp:1719081637
{
  "tabby": 0.27221813797950745,
  "tiger_cat": 0.13754481077194214,
  "Egyptian_cat": 0.04620043560862541,
  "lynx": 0.003195191267877817,
  "lens_cap": 0.00225762533955276

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@agunapal agunapal added this to the v0.12.0 milestone Jun 22, 2024
@agunapal agunapal requested a review from mreso June 22, 2024 19:26
Copy link
Collaborator

@mreso mreso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments

examples/torch_tensorrt/torchcompile/README.md Outdated Show resolved Hide resolved
examples/torch_tensorrt/torchscript/README.md Outdated Show resolved Hide resolved
@agunapal agunapal enabled auto-merge June 26, 2024 01:40
@agunapal agunapal added this pull request to the merge queue Jun 26, 2024
Merged via the queue into master with commit 7a9b145 Jun 26, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants