Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Torch-TensorRT Nightly Install in Docker #1909

Closed
wants to merge 1 commit into from

Conversation

gs-olive
Copy link
Contributor

@gs-olive gs-olive commented Sep 19, 2023

  • Add install for Torch-TRT nightly
  • Add validation to ensure Torch-TRT nightly versions are installed correctly and are functional

@gs-olive gs-olive temporarily deployed to docker-s3-upload September 19, 2023 02:35 — with GitHub Actions Inactive
@gs-olive gs-olive temporarily deployed to docker-s3-upload September 19, 2023 02:35 — with GitHub Actions Inactive
@xuzhao9
Copy link
Contributor

xuzhao9 commented Sep 19, 2023

Using TensorRT in PyTorch requires both torch_tensorrt and the NVidia's TensorRT SDK: https://developer.nvidia.com/tensorrt. Currently, the Docker does not have the TensorRT SDK installed.

@gs-olive
Copy link
Contributor Author

@xuzhao9 - the torch_tensorrt install should pull in the tensorrt Python package, so we would not need the SDK installed in the container to use torch_tensorrt with the nightly distributions here. I am wondering if there is a way I can sample a build of the nightly Docker to validate the installation of the Python package.

@xuzhao9
Copy link
Contributor

xuzhao9 commented Sep 19, 2023

@gs-olive You can get the latest built docker image using the following command:

docker pull ghcr.io/pytorch/torchbench:latest

@gs-olive gs-olive force-pushed the add_torch_tensorrt_nightly branch from 674bd84 to 5a1f7ec Compare September 20, 2023 00:12
@gs-olive gs-olive temporarily deployed to docker-s3-upload September 20, 2023 00:13 — with GitHub Actions Inactive
@gs-olive gs-olive temporarily deployed to docker-s3-upload September 20, 2023 00:13 — with GitHub Actions Inactive
@gs-olive
Copy link
Contributor Author

Thanks for the resource! I just tested out:

pip install --pre --no-cache-dir torch torchvision torchaudio torch_tensorrt -i https://download.pytorch.org/whl/nightly/cu118

I believe the above should be the same command which would run in utils/cuda_utils.py. The command does not succeed, as pip downloads all available versions of the torch_tensorrt package, then errors. I replaced the above with the following, and the command succeeds and installs the correct versions:

pip install --pre --no-cache-dir torch torchvision torchaudio torch_tensorrt --extra-index-url https://download.pytorch.org/whl/nightly/cu118

The above appears to have the same install behavior as the previous command for torch, torchvision, and torchaudio, and it also works for torch_tensorrt. I have made the corresponding change in this PR. Please let me know what you think.

@xuzhao9
Copy link
Contributor

xuzhao9 commented Sep 20, 2023

Hi @gs-olive thanks for your effort. I am wondering if you could test torch_tensorrt backend after installing it? For example, what is the output of the following command?

python run.py resnet50 -d cuda -t eval --backend torch_trt

@gs-olive
Copy link
Contributor Author

Here is the output of the command you referenced: python run.py resnet50 -d cuda -t eval --backend torch_trt

Compiling resnet50 with batch size 32, precision fp16, and default IR
INFO:torch_tensorrt._compile:ir was set to default, using dynamo as ir
WARNING:torch_tensorrt.dynamo.compile:The Dynamo backend is an experimental feature, for which only the following arguments are supported: {enabled_precisions, debug, workspace_size, min_block_size, max_aux_streams, version_compatible, optimization_level, torch_executed_ops, pass_through_build_failures, use_fast_partitioner, enable_experimental_decompositions, require_full_compilation}
INFO:torch_tensorrt.dynamo.compile:Compilation Settings: CompilationSettings(precision=torch.float16, debug=False, workspace_size=0, min_block_size=5, torch_executed_ops=[], pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False, device=Device(type=DeviceType.GPU, gpu_id=0), require_full_compilation=False)

INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.005506
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:36.552237
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 102760960 bytes of Memory
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:03.441001
[09/21/2023-17:55:44] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[09/21/2023-17:55:44] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[09/21/2023-17:55:44] [TRT] [W] Check verbose logs for the list of affected weights.
[09/21/2023-17:55:44] [TRT] [W] - 1 weights are affected by this issue: Detected subnormal FP16 values.
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:03:20.398813
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 256901120 bytes of Memory
Running eval method from resnet50 on cuda in torch_trt mode with input batch size 32 and precision fp16.
INFO:numba.cuda.cudadrv.driver:init
GPU Time:             10.823 milliseconds
CPU Total Wall Time:  10.841 milliseconds
GPU 0 Peak Memory:              1.4231 GB
CPU Peak Memory:                4.0303 GB
PT2 Compilation time:       1.405 seconds

Below is the output of a similar command: python run.py resnet50 -d cuda -t eval --backend torch_trt --ir torch_compile, which uses the torch_compile backend we want to test:

Compiling resnet50 with batch size 32, precision fp16, and torch_compile IR
Running eval method from resnet50 on cuda in torch_trt mode with input batch size 32 and precision fp16.
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Device not specified, using Torch default current device - cuda:0. If this is incorrect, please specify an input device, via the device keyword.
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float16, debug=False, workspace_size=0, min_block_size=5, torch_executed_ops=set(), pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False, device=Device(type=DeviceType.GPU, gpu_id=0), require_full_compilation=False)

INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.005175
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:35.786096
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 102760960 bytes of Memory
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:03.359519
[09/21/2023-18:14:10] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[09/21/2023-18:14:10] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[09/21/2023-18:14:10] [TRT] [W] Check verbose logs for the list of affected weights.
[09/21/2023-18:14:10] [TRT] [W] - 1 weights are affected by this issue: Detected subnormal FP16 values.
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:03:15.177063
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 269746176 bytes of Memory
INFO:numba.cuda.cudadrv.driver:init
GPU Time:             11.090 milliseconds
CPU Total Wall Time:  11.111 milliseconds
GPU 0 Peak Memory:              1.3997 GB
CPU Peak Memory:                3.7539 GB
PT2 Compilation time:     249.850 seconds

@xuzhao9 xuzhao9 self-requested a review September 22, 2023 21:44
Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@xuzhao9
Copy link
Contributor

xuzhao9 commented Sep 22, 2023

The result is looks great! Do we still need to make sure that the version of torch_trt nightly matches torch nightly? I am thinking if the package sets the dependency correctly and publishes at https://download.pytorch.org/, we should be able to install the compatible versions.

The downside is, if on some day the torch_trt nightly package fails to build, all other torchbench workflows that do not depend on torch_trt will also fail because the nightly docker build would fail. We can adopt it in the nightly docker run for now, and if we found its build to be unreliable, we have to revert this PR and make installing torch_trt optional by only installing it for the torch_trt userbenchmark.

@gs-olive
Copy link
Contributor Author

@xuzhao9 - I see - this is a good point. I will look into adding a check similar to check_torch_nightly_version, to this PR for an extra layer of validation for the torch_tensorrt package. That way, if a nightly is missed it will only fail to run Torch-TRT benchmarks and not all others, too.

@gs-olive gs-olive force-pushed the add_torch_tensorrt_nightly branch from 5a1f7ec to 3f84f62 Compare September 26, 2023 20:23
@gs-olive gs-olive requested a review from xuzhao9 September 26, 2023 20:27
@gs-olive gs-olive force-pushed the add_torch_tensorrt_nightly branch from 3f84f62 to ba8722d Compare September 26, 2023 20:28
@gs-olive gs-olive marked this pull request as ready for review September 26, 2023 20:28
@gs-olive gs-olive force-pushed the add_torch_tensorrt_nightly branch from ba8722d to aa27dc0 Compare September 26, 2023 20:29
@gs-olive gs-olive temporarily deployed to docker-s3-upload September 26, 2023 20:29 — with GitHub Actions Inactive
@gs-olive gs-olive temporarily deployed to docker-s3-upload September 26, 2023 20:30 — with GitHub Actions Inactive
@gs-olive
Copy link
Contributor Author

@xuzhao9 - I've added install validation for the Torch-TensorRT package and separated out the torch_tensorrt package from the other install dependencies. Please let me know what you think of the changes!

Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me, we can accept this after the inline comment is addressed.

utils/cuda_utils.py Show resolved Hide resolved
- Add install for Torch-TRT nightly
- Add install validation
@gs-olive gs-olive force-pushed the add_torch_tensorrt_nightly branch from 7694449 to e527a45 Compare September 27, 2023 06:11
@gs-olive gs-olive temporarily deployed to docker-s3-upload September 27, 2023 06:12 — with GitHub Actions Inactive
@gs-olive gs-olive temporarily deployed to docker-s3-upload September 27, 2023 06:12 — with GitHub Actions Inactive
@facebook-github-bot
Copy link
Contributor

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xuzhao9 merged this pull request in 160bcfe.

@gs-olive gs-olive deleted the add_torch_tensorrt_nightly branch September 27, 2023 16:54
@gs-olive
Copy link
Contributor Author

gs-olive commented Sep 27, 2023

Thanks @xuzhao9! I am wondering where I would find the results for future runs, to see if the installation and testing is working properly?

@xuzhao9
Copy link
Contributor

xuzhao9 commented Sep 28, 2023

@gs-olive
First, you could inspect the log of https://github.com/pytorch/benchmark/actions/workflows/build-nightly-docker.yml to find if torch-tensorrt is included in the nightly docker image (or pull ghcr.io/pytorch/torchbench:latest and inspect manually).

Second, after it's been built in the nightly docker, ping me at #1823, and I can start another CI workflow run of the torch_trt userbenchmark, and if it succeeds, the benchmark metrics can be downloaded as GitHub artifacts.

@xuzhao9
Copy link
Contributor

xuzhao9 commented Sep 28, 2023

@gs-olive I tried running the userbenchmark on the latest nightly docker container with torch_trt nightly installed: https://github.com/pytorch/benchmark/actions/runs/6341149525/job/17224134160

It seems to fail on compiling the BERT_pytorch model.

The convention in userbenchmark is that we have a parent process that runs a single model in a child process, and if the child process crashes or throws exception, the parent process can still run the next model.

@gs-olive
Copy link
Contributor Author

gs-olive commented Sep 28, 2023

Thanks for this! It seems this is testing against a different IR than I had intended (it's using the default IR, but it should be torch_compile) - I will add a PR to fix the issue.

@gs-olive
Copy link
Contributor Author

Added #1946 to fix the selected IR - after this I expect the BERT pytorch benchmark to complete, as I had verified that one locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants