v10: new vulkan based vsncnn (AMD GPU supported)
Release Highlight
Vulkan based AMD GPU support added with the new vsncnn-vk backend.
Major features
- Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete).
- Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
- Please refer to the benchmark below for performance details. Tl;dr it's comparable to vsort-cuda on most networks (except waifu2x-cunet), but (significantly) slower than vstrt. Owing to its C++ implementation, it's generally faster than Python based ncnn implementations.
- Hint: If your GPU has enough memory, please consider setting
num_streams>1
to extract more performance. - Even though it's possible to use software based Vulkan implementations (as we did in the GHA tests), if you want to do CPU-only inference, it's much better to use vsov-cpu (or vsort-cpu).
- Introduced a new smaller Vulkan-based GPU binary package (
vsmlrt-windows-x64-vk.v10.7z
) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 2~8x faster. Now there shouldn't be any reasons not to use vs-mlrt.
Benchmark
Configuration: NVIDIA RTX 3090, driver 516.94, windows server 2019, vs r60, python 3.10.7, 1080p fp16
Backends: ncnn-vk, ort-cuda, trt from vs-mlrt v10, dpir-ncnn v2.0.0, w2xncnnvk r2
Data format: fps / GPU memory usage (MB)
dpir color
backend | 1 stream | 2 streams |
---|---|---|
ncnn-vk | 4.33/3347 | 4.72/6119 |
ort-cuda | 4.56/3595 | |
trt | 10.64/2595 | 11.10/4593 |
dpir-ncnn | 3.68/3326 |
waifu2x upconv_7
backend | 1 stream | 2 streams |
---|---|---|
ncnn-vk | 9.46/6820 | 14.71/13468 |
ort-cuda | 12.10/6411 | 13.98/11273 |
trt | 21.32/3317 | 29.10/ 5053 |
w2xncnnvk | 6.68/6931 | 12.70/13626 |
waifu2x cunet
backend | 1 stream | 2 streams |
---|---|---|
ncnn-vk | 1.46/11908 | 1.53/23574 |
ort-cuda | 4.85/ 8793 | 5.18/16231 |
trt | 11.60/ 4960 | 15.60/ 9057 |
w2xncnnvk | 1.38/11966 | 1.58/23687 |
realesrgan v2/v3
backend | 1 stream | 2 streams |
---|---|---|
ncnn-vk | 7.23/2781 | 8.35/5330 |
ort-cuda | 9.05/2669 | 10.18/4539 |
trt | 15.93/1667 | 19.58/2543 |