This demo application ("demoDiffusion") showcases the acceleration of Stable Diffusion and ControlNet pipeline using TensorRT.
git clone git@github.com:NVIDIA/TensorRT.git -b release/8.6 --single-branch
cd TensorRT
Install nvidia-docker using these intructions.
docker run --rm -it --gpus all -v $PWD:/workspace nvcr.io/nvidia/pytorch:23.06-py3 /bin/bash
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade tensorrt
Minimum required version is TensorRT 8.6.0. Check your installed version using:
python3 -c 'import tensorrt;print(tensorrt.__version__)'
NOTE: Alternatively, you can download and install TensorRT packages from NVIDIA TensorRT Developer Zone.
export TRT_OSSPATH=/workspace
cd $TRT_OSSPATH/demo/Diffusion
pip3 install -r requirements.txt
NOTE: demoDiffusion has been tested on systems with NVIDIA A100, RTX3090, and RTX4090 GPUs, and the following software configuration.
diffusers 0.14.0
onnx 1.13.1
onnx-graphsurgeon 0.3.26
onnxruntime 1.14.1
polygraphy 0.47.1
tensorrt 8.6.1.6
tokenizers 0.13.2
torch 2.1.0
transformers 4.26.1
controlnet-aux 0.0.6
NOTE: optionally install HuggingFace accelerate package for faster and less memory-intense model loading.
python3 demo_txt2img.py --help
python3 demo_img2img.py --help
python3 demo_inpaint.py --help
python3 demo_controlnet.py --help
To download the model checkpoints for the Stable Diffusion pipeline, you will need a read
access token. See instructions.
export HF_TOKEN=<your access token>
python3 demo_txt2img.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN
python3 demo_img2img.py "photorealistic new zealand hills" --hf-token=$HF_TOKEN
Use --input-image=<path to image(s)>
to specify your image. Otherwise the example image will be downloaded from the Internet.
python3 demo_inpaint.py "a mecha robot sitting on a bench" --hf-token=$HF_TOKEN --version=1.5 --onnx-dir=onnx-1.5 --engine-dir=engine-1.5
Use --input-image=<path to image(s)>
and --mask-image=<path to mask>
to specify your inputs. They must have the same dimensions. Otherwise the example image and mask will be downloaded from the Internet.
python3 demo_controlnet.py "bag" --hf-token=$HF_TOKEN --version=1.5 --denoising-steps 20 --controlnet-type scribble
Use --input-image=<path to image(s)>
to specify your image inputs. If not specified, the image will be downloaded and preprocessed depending on the --controlnet-type
input. If the images are provided using the --input-image
argument, they are expected to be preprocessed for the appropriate controlnet type.
The ControlNet type can be specified using the --controlnet-type
argument. Choose from canny depth hed mlsd normal openpose scribble seg
.
Multiple ControlNet types can also be specified to combine the conditionings. While specifying multiple conditionings, controlnet scales should also be provided. The scales signify the importance of each conditioning in relation with the other. For example, to condition using openpose
and canny
with scales of 1.0 and 0.8 respectively, the arguments provided would be --controlnet-type openpose canny
and --controlnet-scale 1.0 0.8
. Note that the number of controlnet scales provided should match the number of controlnet types.
Run the below command to generate an image with Stable Diffusion XL
python3 demo_txt2img_xl.py "a beautiful photograph of Mt. Fuji during cherry blossom" --hf-token=$HF_TOKEN --version=xl-1.0
- One can set schdeuler using
--scheduler=EulerA
. Note that some schedulers are not available for some pipelines or version. - To accelerate engine building time one can use
--timing-cache=<path to cache file>
. This cache file will be created if does not exist. Note, that it may influence the performance if the cache file created on the other hardware is used. It is suggested to use this flag only during development. To achieve the best perfromance during deployment, please, build engines without timing cache. - To switch between versions or pipelines one needs either to clear onnx and engine dirs, or to specify
--force-onnx-export --force-onnx-optimize --force-engine-build
or to create new dirs and to specify--onnx-dir=<new onnx dir> --engine-dir=<new engine dir>
. - Inference performance can be improved by enabling CUDA graphs using
--use-cuda-graph
. Enabling CUDA graphs requires fixed input shapes, so this flag must be combined with--build-static-batch
and cannot be combined with--build-dynamic-shape
.