v6 further performance optimizations of vs-trt and vs-ov&vs-ort bugfix
This release contains some performance optimization of the vs-trt plugin. The general takeaway is that vs-trt can beat all benchmarked solutions on DPIR, waifu2x and RealESRGANv2 models. Specific highlights are as follows:
- waifu2x: when using CPU, vs-ov beats waifu2x-w2xc by 2.7x (Intel 32C64T); when using GPU, vs-ort/vs-trt beats vulkan-ncnn by ~4x.
- DPIR: vs-trt beats existing implementations on both Volta (Tesla V100) and Ampere (A10) platforms (by at most 1.5x), and vs-ort saves significant amount of GPU memory (by as much as 3.7x) compared to its counterpart
- RealESRGANv2: vs-trt, being the only backend that utilizes TensorRT, is up to 3.3x faster than the reference implementation
Please see detailed benchmark results in the wiki:
- waifu2x: https://github.com/AmusementClub/vs-mlrt/wiki/waifu2x#benchmarking
- DPIR: https://github.com/AmusementClub/vs-mlrt/wiki/DPIR#benchmarking
- RealESRGANv2: https://github.com/AmusementClub/vs-mlrt/wiki/RealESRGANv2#benchmarking
This release also fixed the following two bugs:
- vs-ov: some openvino error messages from openvino were sent to stdout, affecting
vspipe | x265
usage. - vs-ort/vs-ov: error in converting RealESRGANv2 model to fp16 format.