DPIR

DPIR, or Plug-and-Play Image Restoration with Deep Denoiser Prior, is a denoise and deblocking neural network. See also https://github.com/HolyWu/vs-dpir.

DPIR requires a strength parameter.

Link:

(stable) https://github.com/AmusementClub/vs-mlrt/releases/download/model-20211209/dpir_v3.7z

Includes these models:

Denoise models, default sigma is 5.0
- drunet_gray: GRAY denoise
- drunet_color: RGB denoise
Deblocking models, default sigma is 50.0
- drunet_deblocking_grayscale: GRAY deblocking
- drunet_deblocking_color: RGB deblocking

Requirements & Parameters

block_w and block_h (tile size) must be multiples of 8.
All DPIR models require a strength parameter, or sigma, and you need to pass that in the form of a GRAYS clip (with normalization factor 1.0/255), see examples below for details.

`vsmlrt.py` wrapper Usage

In order to simplify usage, we provided a Python wrapper module vsmlrt that provides a more Pythonic interface:

from vsmlrt import DPIR, DPIRModel, Backend

src = core.std.BlankClip(format=vs.RGBS) # or vs.GRAYS for gray only models

# backend could be:
#  - CPU Backend.OV_CPU(): the recommended CPU backend; generally faster than ORT-CPU.
#  - CPU Backend.ORT_CPU(num_streams=1, verbosity=2): vs-ort cpu backend.
#  - GPU Backend.ORT_CUDA(device_id=0, cudnn_benchmark=True, num_streams=1, verbosity=2)
#     - use device_id to select device
#     - set cudnn_benchmark=False to reduce script reload latency when debugging, but with slight throughput performance penalty.
#  - GPU Backend.TRT(fp16=True, device_id=0, num_streams=1): TensorRT runtime, the fastest NV GPU runtime.
# DPIR is a huge model and GPU backend is highly recommended (use TRT to provide the best performance)
# If the model runs out of GPU memory, increase the tiles parameter.
flt = DPIR(src, strength=5, model=DPIRModel.drunet_color, tiles=2, backend=Backend.ORT_CUDA())

If you want to use variable strength, you can also pass a GRAYS or GRAY8 clip as strength parameter that has the same dimension as the input clip where each pixel stores the DPIR strength for that pixel.

Raw Model Usage

src = core.std.BlankClip(width=640, height=360, format=vs.GRAYS)
sigma = 2.0
flt = core.ov.Model([src, core.std.BlankClip(src, color=sigma/255.0)], "drunet_gray.onnx")

Notes

DPIR is a huge network and it is extremely slow when running on CPU (e.g. for 360p input, you might see 0.05fps/cpu).

Benchmarking

Measurements: FPS / Device Memory (MB)

Device memory:

GPU: device memory including context

RTX 3090

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35
vs-mlrt v8 (driver 511.79)

Performance

FP32

Model	[1] ort-cuda	[1] trt	[2] cuda	[2] trt	[3] ort-cuda	[3] trt	[3] trt (no tf32)
gray	2.46 / 5947	2.95 / 4157	2.34 / 12015	2.43 / 4300	2.92 / 5759	3.26 / 4243	3.07 / 4261
color	2.30 / 5979	2.75 / 4187	2.13 / 12031	2.12 / 4384	2.86 / 5790	3.25 / 4330	3.02 / 4291

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[2] trt	[3] ort-cuda	[3] trt	[3] trt (2 streams)
gray	3.67 / 3777	9.60 / 3585	10.6 / 5430	3.47 / 11751	7.18 / 4015	4.65 / 5759	10.9 / 2397	11.6 / 3895
color	3.26 / 3817	8.65 / 3619	10.5 / 5492	3.02 / 11765	5.67 / 4277	4.41 / 3628	9.85 / 2440	11.5 / 3975

RTX 2080 Ti

Software: VapourSynth R57, Windows 10 LTSC 2021, Graphics Driver 511.23.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35
vs-mlrt v8 (driver 511.79)

Performance

FP32

Model	[1] ort-cuda	[1] trt	[2] cuda	[2] trt	[3] ort-cuda	[3] trt
gray	1.68 / 5277	1.84 / 4004	1.67 / 6916	1.87 / 4163	1.60 / 5190	1.91 / 3659
color	1.53 / 5309	1.66 / 4034	1.56 / 6942	1.71 / 4183	1.57 / 5222	1.78 / 3691

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[2] trt	[3] ort-cuda	[3] trt	[3] trt (2 streams)
gray	3.04 / 3619	6.18 / 2780	6.77 / 4531	3.07 / 6730	5.98 / 3249	3.10 / 3276	7.22 / 2101	7.89 / 3529
color	2.70 / 3659	5.64 / 2598	6.72 / 4274	2.65 / 6744	4.78 / 3261	2.93 / 3571	6.38 / 2323	7.64 / 3874

Tesla V100

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[2] trt
gray	2.45 / 5188	2.59 / 3979	2.59 / 6829	2.27 / 11552	2.45 / 3959
color	2.39 / 5220	2.51 / 4011	2.56 / 6893	2.12 / 11558	2.26 / 3979

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[2] trt
gray	5.20 / 3018	8.09 / 2831	8.50 / 4617	5.09 / 11289	6.93 / 3461
color	4.95 / 3058	7.54 / 2863	8.47 / 4687	4.29 / 11302	5.60 / 3473

Tesla A10

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[2] trt
gray	2.34 / 5791	2.75 / 4015	2.78 / 6641	2.20 / 11837	2.67 / 4189
color	2.29 / 5823	2.73 / 4075	2.78 / 6747	2.12 / 11853	2.54 / 4209

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[2] trt
gray	3.73 / 3621	6.67 / 3437	6.33 / 5285	3.72 / 11853	6.17 / 4079
color	3.65 / 3661	6.26 / 3423	6.32 / 5277	3.45 / 11597	5.25 / 4103

Tesla A10G

Software: VapourSynth R58, Windows Server 2022, Graphics Driver 511.65, lock the GPU clocks at max frequency.

Input size: 1920x1080

Backends

vs-mlrt v8

Performance

FP32

Model	[1] trt
gray	2.75 / 4285
color	2.70 / 4317

FP16

Model	[1] trt
gray	7.00 / 2336
color	6.80 / 2368

Tesla A100 (PCIe, 40 GB)

Software: VapourSynth R57, Windows Server 2019, Graphics Driver 511.23.

Input size: 1920x1080

Backends

vs-mlrt v6
vs-dpir v1.7.1, PyTorch 1.10.1+cu113, TensorRT 8.2.2, torch2trt 2732b35

Performance

FP32

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[2] trt
gray	7.12 / 5853	9.68 / 4111	10.3 / 6737	6.43 / 11973	8.56 / 4261
color	6.95 / 5885	9.31 / 4143	10.2 / 6801	5.62 / 11979	7.21 / 4281

FP16

Model	[1] ort-cuda	[1] trt	[1] trt (2 streams)	[2] cuda	[2] trt
gray	10.1 / 3683	18.9 / 3015	20.5 / 4603	9.67 / 11709	14.6 / 3679
color	9.55 / 3723	17.7 / 3041	20.3 / 4657	7.65 / 11713	10.5 / 3691

Tesla A100 (SXM4, 80 GB)

Software: VapourSynth R57-A4, Windows Server 2022, Graphics Driver 516.94.

Input size: 1920x1080

Backends

vs-mlrt v9

Performance

FP16

Model	[1] trt	[1] trt (2 streams)
color	20.5 / 2022	24.3 / 3325

Home

Runtimes
Models
- waifu2x
- DPIR
- RealESRGANv2
- Real-CUGAN
- RIFE
- External models
Device-specific benchmarks

DPIR

Requirements & Parameters

vsmlrt.py wrapper Usage

Raw Model Usage

Notes

Benchmarking

RTX 3090

Backends

Performance

FP32

FP16

RTX 2080 Ti

Backends

Performance

FP32

FP16

Tesla V100

Backends

Performance

FP32

FP16

Tesla A10

Backends

Performance

FP32

FP16

Tesla A10G

Backends

Performance

FP32

FP16

Tesla A100 (PCIe, 40 GB)

Backends

Performance

FP32

FP16

Tesla A100 (SXM4, 80 GB)

Backends

Performance

FP16

Clone this wiki locally

`vsmlrt.py` wrapper Usage