Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backends] Add functionality to TRT backend #1753

Closed
wants to merge 4 commits into from

Conversation

gs-olive
Copy link
Contributor

@gs-olive gs-olive commented Jul 7, 2023

  • Add argument parsing for backend arguments to be passed to Torch-TRT
  • Add capability to specify IR and other Torch-TRT fields via command line interface
  • Add functionality to compilation path and clean up code

cc: @narendasan

@xuzhao9 xuzhao9 self-requested a review July 7, 2023 20:06
@gs-olive gs-olive force-pushed the trt_backend_improvements branch 3 times, most recently from 23724b7 to 0465851 Compare July 12, 2023 16:59
Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just curious will you be interested in setting up a CI to maintain this?

@gs-olive
Copy link
Contributor Author

Thank you for the review. I think we would be interested in setting up a CI to maintain it. @narendasan - what are your thoughts on this?

@gs-olive
Copy link
Contributor Author

@xuzhao9 - what would you need from us to set up a CI to maintain the TRT backend?

- Add argument parsing for backend arguments to pass to TRT
- Add capability to specify IR via command line CLI
- Add functionality to compilation path and clean up code
@gs-olive gs-olive force-pushed the trt_backend_improvements branch from 0465851 to 695842c Compare July 12, 2023 22:24
@xuzhao9
Copy link
Contributor

xuzhao9 commented Jul 12, 2023

@gs-olive Here is the guide on how to develop a CI: https://github.com/pytorch/benchmark/blob/main/userbenchmark/ADDING_USERBENCHMARKS.md

- Add functionality to perform benchmarking using the Torch-TRT backend,
along with output metrics
@gs-olive
Copy link
Contributor Author

gs-olive commented Jul 14, 2023

Hi @xuzhao9 - thanks for the reference. I've added functionality to this PR in a new commit which adds the userbenchmark folder and adapts functionality from the main run.py. I also added a ci.yaml file. I wasn't sure how the benchmarks are invoked in CI, however. My intent was to have this sort of usage:

# Invokes a singular model
python run_benchmark.py torch_trt --model resnet18 --precision fp32 --bs 4 --ir ts

# Invokes all models in the directory
python run_benchmark.py torch_trt --precision fp32 --bs 4

@gs-olive gs-olive requested a review from xuzhao9 July 14, 2023 20:37

trt_input = [
torch_tensorrt.Input(shape=input_.shape, dtype=input_.dtype)
for input_ in example_inputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add check that the type of example_inputs is List[tensor]? Actually, many models have different types of inputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion is to use pytree to traverse the input and cast them to torch_tensorrt.Input. Similar to this:

from torch.utils._pytree import tree_map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the different ir choices for Torch-TRT process inputs differently, but all can handle Torch Tensor inputs, I passed the example inputs directly to the compiler instead, so the selected ir can handle the inputs/casting as necessary.

userbenchmark/torch_trt/__init__.py Outdated Show resolved Hide resolved
all_metrics = {}

for Model in list_models():
metrics = run_single_model(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will work for running a single model. However, it won't work to run a batch of models.

This is because there is no isolation between running the models. For example, model 1 might set some global torch configuration that will model 2 to be very slow or even crash (for example, torch.cudnn.benchmark). Some models have benign "memory leak" that won't cause problem in model training, but it will cause problem in benchmarking multiple models in the same process.

We suggest using the ModelTask() approach used by the torch-nightly userbenchmark: https://github.com/pytorch/benchmark/blob/main/userbenchmark/torch-nightly/run.py#L163
It will run each model in an isolated process, and doesn't have the limits mentioned above.

@xuzhao9
Copy link
Contributor

xuzhao9 commented Jul 21, 2023

I suggest we can first remove the ci.yaml file, land this PR, then submit a follow-up PR to fix the userbenchmark as I commented and add the CI.

@gs-olive
Copy link
Contributor Author

@xuzhao9 - Thank you for the detailed comments and review. This sounds like a good idea to me; I have removed the ci.yaml file as requested, and I will follow up with a subsequent PR addressing the remaining comments.

Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please address my inline comment about checking the input type of example_inputs, thanks!

@gs-olive
Copy link
Contributor Author

gs-olive commented Jul 21, 2023

I have addressed the comment regarding example inputs here: #1753 (comment), with this commit: 8ccf8e6

Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@facebook-github-bot
Copy link
Contributor

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xuzhao9 merged this pull request in 30a3c4a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants