Skip to content

v10: new vulkan based vsncnn (AMD GPU supported)

Compare
Choose a tag to compare
@github-actions github-actions released this 15 Sep 11:02
· 271 commits to master since this release

Release Highlight

Vulkan based AMD GPU support added with the new vsncnn-vk backend.

Major features

  • Introduced ncnn-based vsncnn plugin that supports any GPU with Vulkan support (NVidia, AMD, Intel integrated & discrete).
    • Good news for AMD GPU users! vs-mlrt has finally achieved full platform coverage: from x86 CPU to GPU of all three major vendors.
    • Please refer to the benchmark below for performance details. Tl;dr it's comparable to vsort-cuda on most networks (except waifu2x-cunet), but (significantly) slower than vstrt. Owing to its C++ implementation, it's generally faster than Python based ncnn implementations.
    • Hint: If your GPU has enough memory, please consider setting num_streams>1 to extract more performance.
    • Even though it's possible to use software based Vulkan implementations (as we did in the GHA tests), if you want to do CPU-only inference, it's much better to use vsov-cpu (or vsort-cpu).
  • Introduced a new smaller Vulkan-based GPU binary package (vsmlrt-windows-x64-vk.v10.7z) that only includes vsov-{cpu,gpu}, vsort-cpu and vsncnn-vk. Use this if you only use Intel/AMD GPU or don't want to download 1GB data in exchange for a backend that is merely 2~8x faster. Now there shouldn't be any reasons not to use vs-mlrt.

Benchmark

Configuration: NVIDIA RTX 3090, driver 516.94, windows server 2019, vs r60, python 3.10.7, 1080p fp16

Backends: ncnn-vk, ort-cuda, trt from vs-mlrt v10, dpir-ncnn v2.0.0, w2xncnnvk r2

Data format: fps / GPU memory usage (MB)

dpir color

backend 1 stream 2 streams
ncnn-vk 4.33/3347 4.72/6119
ort-cuda 4.56/3595
trt 10.64/2595 11.10/4593
dpir-ncnn 3.68/3326

waifu2x upconv_7

backend 1 stream 2 streams
ncnn-vk 9.46/6820 14.71/13468
ort-cuda 12.10/6411 13.98/11273
trt 21.32/3317 29.10/ 5053
w2xncnnvk 6.68/6931 12.70/13626

waifu2x cunet

backend 1 stream 2 streams
ncnn-vk 1.46/11908 1.53/23574
ort-cuda 4.85/ 8793 5.18/16231
trt 11.60/ 4960 15.60/ 9057
w2xncnnvk 1.38/11966 1.58/23687

realesrgan v2/v3

backend 1 stream 2 streams
ncnn-vk 7.23/2781 8.35/5330
ort-cuda 9.05/2669 10.18/4539
trt 15.93/1667 19.58/2543