Add support for Torch-TensorRT Nightly Install in Docker #1909

gs-olive · 2023-09-19T02:34:57Z

Add install for Torch-TRT nightly
Add validation to ensure Torch-TRT nightly versions are installed correctly and are functional

xuzhao9 · 2023-09-19T14:32:18Z

Using TensorRT in PyTorch requires both torch_tensorrt and the NVidia's TensorRT SDK: https://developer.nvidia.com/tensorrt. Currently, the Docker does not have the TensorRT SDK installed.

gs-olive · 2023-09-19T18:25:05Z

@xuzhao9 - the torch_tensorrt install should pull in the tensorrt Python package, so we would not need the SDK installed in the container to use torch_tensorrt with the nightly distributions here. I am wondering if there is a way I can sample a build of the nightly Docker to validate the installation of the Python package.

xuzhao9 · 2023-09-19T20:51:40Z

@gs-olive You can get the latest built docker image using the following command:

docker pull ghcr.io/pytorch/torchbench:latest

gs-olive · 2023-09-20T00:13:45Z

Thanks for the resource! I just tested out:

pip install --pre --no-cache-dir torch torchvision torchaudio torch_tensorrt -i https://download.pytorch.org/whl/nightly/cu118

I believe the above should be the same command which would run in utils/cuda_utils.py. The command does not succeed, as pip downloads all available versions of the torch_tensorrt package, then errors. I replaced the above with the following, and the command succeeds and installs the correct versions:

pip install --pre --no-cache-dir torch torchvision torchaudio torch_tensorrt --extra-index-url https://download.pytorch.org/whl/nightly/cu118

The above appears to have the same install behavior as the previous command for torch, torchvision, and torchaudio, and it also works for torch_tensorrt. I have made the corresponding change in this PR. Please let me know what you think.

xuzhao9 · 2023-09-20T20:54:26Z

Hi @gs-olive thanks for your effort. I am wondering if you could test torch_tensorrt backend after installing it? For example, what is the output of the following command?

python run.py resnet50 -d cuda -t eval --backend torch_trt

gs-olive · 2023-09-21T19:18:50Z

Here is the output of the command you referenced: python run.py resnet50 -d cuda -t eval --backend torch_trt

Compiling resnet50 with batch size 32, precision fp16, and default IR
INFO:torch_tensorrt._compile:ir was set to default, using dynamo as ir
WARNING:torch_tensorrt.dynamo.compile:The Dynamo backend is an experimental feature, for which only the following arguments are supported: {enabled_precisions, debug, workspace_size, min_block_size, max_aux_streams, version_compatible, optimization_level, torch_executed_ops, pass_through_build_failures, use_fast_partitioner, enable_experimental_decompositions, require_full_compilation}
INFO:torch_tensorrt.dynamo.compile:Compilation Settings: CompilationSettings(precision=torch.float16, debug=False, workspace_size=0, min_block_size=5, torch_executed_ops=[], pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False, device=Device(type=DeviceType.GPU, gpu_id=0), require_full_compilation=False)

INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.005506
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:36.552237
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 102760960 bytes of Memory
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:03.441001
[09/21/2023-17:55:44] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[09/21/2023-17:55:44] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[09/21/2023-17:55:44] [TRT] [W] Check verbose logs for the list of affected weights.
[09/21/2023-17:55:44] [TRT] [W] - 1 weights are affected by this issue: Detected subnormal FP16 values.
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:03:20.398813
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 256901120 bytes of Memory
Running eval method from resnet50 on cuda in torch_trt mode with input batch size 32 and precision fp16.
INFO:numba.cuda.cudadrv.driver:init
GPU Time:             10.823 milliseconds
CPU Total Wall Time:  10.841 milliseconds
GPU 0 Peak Memory:              1.4231 GB
CPU Peak Memory:                4.0303 GB
PT2 Compilation time:       1.405 seconds

Below is the output of a similar command: python run.py resnet50 -d cuda -t eval --backend torch_trt --ir torch_compile, which uses the torch_compile backend we want to test:

Compiling resnet50 with batch size 32, precision fp16, and torch_compile IR
Running eval method from resnet50 on cuda in torch_trt mode with input batch size 32 and precision fp16.
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Device not specified, using Torch default current device - cuda:0. If this is incorrect, please specify an input device, via the device keyword.
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float16, debug=False, workspace_size=0, min_block_size=5, torch_executed_ops=set(), pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False, device=Device(type=DeviceType.GPU, gpu_id=0), require_full_compilation=False)

INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.005175
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:35.786096
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 102760960 bytes of Memory
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:03.359519
[09/21/2023-18:14:10] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[09/21/2023-18:14:10] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[09/21/2023-18:14:10] [TRT] [W] Check verbose logs for the list of affected weights.
[09/21/2023-18:14:10] [TRT] [W] - 1 weights are affected by this issue: Detected subnormal FP16 values.
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:03:15.177063
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 269746176 bytes of Memory
INFO:numba.cuda.cudadrv.driver:init
GPU Time:             11.090 milliseconds
CPU Total Wall Time:  11.111 milliseconds
GPU 0 Peak Memory:              1.3997 GB
CPU Peak Memory:                3.7539 GB
PT2 Compilation time:     249.850 seconds

xuzhao9

LGTM!

xuzhao9 · 2023-09-22T21:49:44Z

The result is looks great! Do we still need to make sure that the version of torch_trt nightly matches torch nightly? I am thinking if the package sets the dependency correctly and publishes at https://download.pytorch.org/, we should be able to install the compatible versions.

The downside is, if on some day the torch_trt nightly package fails to build, all other torchbench workflows that do not depend on torch_trt will also fail because the nightly docker build would fail. We can adopt it in the nightly docker run for now, and if we found its build to be unreliable, we have to revert this PR and make installing torch_trt optional by only installing it for the torch_trt userbenchmark.

gs-olive · 2023-09-22T22:10:04Z

@xuzhao9 - I see - this is a good point. I will look into adding a check similar to check_torch_nightly_version, to this PR for an extra layer of validation for the torch_tensorrt package. That way, if a nightly is missed it will only fail to run Torch-TRT benchmarks and not all others, too.

gs-olive · 2023-09-26T20:30:41Z

@xuzhao9 - I've added install validation for the Torch-TensorRT package and separated out the torch_tensorrt package from the other install dependencies. Please let me know what you think of the changes!

xuzhao9

Looking good to me, we can accept this after the inline comment is addressed.

utils/cuda_utils.py

- Add install for Torch-TRT nightly - Add install validation

facebook-github-bot · 2023-09-27T13:33:32Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-09-27T16:41:35Z

@xuzhao9 merged this pull request in 160bcfe.

gs-olive · 2023-09-27T21:39:29Z

Thanks @xuzhao9! I am wondering where I would find the results for future runs, to see if the installation and testing is working properly?

xuzhao9 · 2023-09-28T13:21:56Z

@gs-olive
First, you could inspect the log of https://github.com/pytorch/benchmark/actions/workflows/build-nightly-docker.yml to find if torch-tensorrt is included in the nightly docker image (or pull ghcr.io/pytorch/torchbench:latest and inspect manually).

Second, after it's been built in the nightly docker, ping me at #1823, and I can start another CI workflow run of the torch_trt userbenchmark, and if it succeeds, the benchmark metrics can be downloaded as GitHub artifacts.

xuzhao9 · 2023-09-28T16:21:28Z

@gs-olive I tried running the userbenchmark on the latest nightly docker container with torch_trt nightly installed: https://github.com/pytorch/benchmark/actions/runs/6341149525/job/17224134160

It seems to fail on compiling the BERT_pytorch model.

The convention in userbenchmark is that we have a parent process that runs a single model in a child process, and if the child process crashes or throws exception, the parent process can still run the next model.

gs-olive · 2023-09-28T16:56:38Z

Thanks for this! It seems this is testing against a different IR than I had intended (it's using the default IR, but it should be torch_compile) - I will add a PR to fix the issue.

gs-olive · 2023-09-28T17:29:40Z

Added #1946 to fix the selected IR - after this I expect the BERT pytorch benchmark to complete, as I had verified that one locally.

facebook-github-bot added the cla signed label Sep 19, 2023

gs-olive temporarily deployed to docker-s3-upload September 19, 2023 02:35 — with GitHub Actions Inactive

gs-olive force-pushed the add_torch_tensorrt_nightly branch from 674bd84 to 5a1f7ec Compare September 20, 2023 00:12

gs-olive temporarily deployed to docker-s3-upload September 20, 2023 00:13 — with GitHub Actions Inactive

xuzhao9 self-requested a review September 22, 2023 21:44

xuzhao9 approved these changes Sep 22, 2023

View reviewed changes

gs-olive force-pushed the add_torch_tensorrt_nightly branch from 5a1f7ec to 3f84f62 Compare September 26, 2023 20:23

gs-olive had a problem deploying to docker-s3-upload September 26, 2023 20:23 — with GitHub Actions Error

gs-olive had a problem deploying to docker-s3-upload September 26, 2023 20:24 — with GitHub Actions Error

gs-olive requested a review from xuzhao9 September 26, 2023 20:27

gs-olive force-pushed the add_torch_tensorrt_nightly branch from 3f84f62 to ba8722d Compare September 26, 2023 20:28

gs-olive marked this pull request as ready for review September 26, 2023 20:28

gs-olive had a problem deploying to docker-s3-upload September 26, 2023 20:28 — with GitHub Actions Error

gs-olive force-pushed the add_torch_tensorrt_nightly branch from ba8722d to aa27dc0 Compare September 26, 2023 20:29

gs-olive temporarily deployed to docker-s3-upload September 26, 2023 20:29 — with GitHub Actions Inactive

gs-olive temporarily deployed to docker-s3-upload September 26, 2023 20:30 — with GitHub Actions Inactive

xuzhao9 reviewed Sep 26, 2023

View reviewed changes

utils/cuda_utils.py Show resolved Hide resolved

gs-olive force-pushed the add_torch_tensorrt_nightly branch from aa27dc0 to 7694449 Compare September 27, 2023 06:03

gs-olive had a problem deploying to docker-s3-upload September 27, 2023 06:03 — with GitHub Actions Error

gs-olive had a problem deploying to docker-s3-upload September 27, 2023 06:04 — with GitHub Actions Error

Add support for Torch-TensorRT in Docker

e527a45

- Add install for Torch-TRT nightly - Add install validation

gs-olive force-pushed the add_torch_tensorrt_nightly branch from 7694449 to e527a45 Compare September 27, 2023 06:11

gs-olive temporarily deployed to docker-s3-upload September 27, 2023 06:12 — with GitHub Actions Inactive

facebook-github-bot closed this in 160bcfe Sep 27, 2023

facebook-github-bot added the Merged label Sep 27, 2023

gs-olive deleted the add_torch_tensorrt_nightly branch September 27, 2023 16:54

gs-olive mentioned this pull request Sep 27, 2023

TorchBench Integration Part 3 pytorch/TensorRT#2093

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Torch-TensorRT Nightly Install in Docker #1909

Add support for Torch-TensorRT Nightly Install in Docker #1909

gs-olive commented Sep 19, 2023 •

edited

Loading

xuzhao9 commented Sep 19, 2023

gs-olive commented Sep 19, 2023

xuzhao9 commented Sep 19, 2023

gs-olive commented Sep 20, 2023

xuzhao9 commented Sep 20, 2023 •

edited

Loading

gs-olive commented Sep 21, 2023

xuzhao9 left a comment

xuzhao9 commented Sep 22, 2023

gs-olive commented Sep 22, 2023

gs-olive commented Sep 26, 2023

xuzhao9 left a comment

facebook-github-bot commented Sep 27, 2023

facebook-github-bot commented Sep 27, 2023

gs-olive commented Sep 27, 2023 •

edited

Loading

xuzhao9 commented Sep 28, 2023

xuzhao9 commented Sep 28, 2023 •

edited

Loading

gs-olive commented Sep 28, 2023 •

edited

Loading

gs-olive commented Sep 28, 2023

Add support for Torch-TensorRT Nightly Install in Docker #1909

Add support for Torch-TensorRT Nightly Install in Docker #1909

Conversation

gs-olive commented Sep 19, 2023 • edited Loading

xuzhao9 commented Sep 19, 2023

gs-olive commented Sep 19, 2023

xuzhao9 commented Sep 19, 2023

gs-olive commented Sep 20, 2023

xuzhao9 commented Sep 20, 2023 • edited Loading

gs-olive commented Sep 21, 2023

xuzhao9 left a comment

Choose a reason for hiding this comment

xuzhao9 commented Sep 22, 2023

gs-olive commented Sep 22, 2023

gs-olive commented Sep 26, 2023

xuzhao9 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Sep 27, 2023

facebook-github-bot commented Sep 27, 2023

gs-olive commented Sep 27, 2023 • edited Loading

xuzhao9 commented Sep 28, 2023

xuzhao9 commented Sep 28, 2023 • edited Loading

gs-olive commented Sep 28, 2023 • edited Loading

gs-olive commented Sep 28, 2023

gs-olive commented Sep 19, 2023 •

edited

Loading

xuzhao9 commented Sep 20, 2023 •

edited

Loading

gs-olive commented Sep 27, 2023 •

edited

Loading

xuzhao9 commented Sep 28, 2023 •

edited

Loading

gs-olive commented Sep 28, 2023 •

edited

Loading