Skip to content

Decoders

WangBin edited this page Sep 16, 2023 · 142 revisions

Video Decoders

Decoders are selected by Player.setDecoders(MediaType::Video, ...), or -c:v names option for examples(glfwplay, mdkplay etc.)

To test decoder performance and options, use DecodeFps or framereader program in sdk package.

Available builtin decoder names and properties are:

FFmpeg

All platforms. Provides software decoding if no property is specified. You can use system ffmpeg if version >= 4.0 because it may provide more features, for example ffmpeg in RaspberryPi OS supports hevc hardware acceleteration and v4l2m2m drm prime output, mdk can benefit from it with OpenGLES(desktop GL is not suitable for hevc) contexts created by EGL(glfwplay -gl -c:v V4L2M2M,FFmpeg:hwcontext=drm).

It's the default decoder if no name is specified. Properties(also for ffmpeg based hw decoders):

  • threads: number of decoding threads. 0: logical cpu cores+1
  • codec: force a codec name
  • hwaccel: codec name suffix, a (hardware) decoder name is {codec}_{suffix} will be selected. For example FFmpeg:hwaccel=rkmpp:hwcontext=drm.
  • hwcontext: AVHWDeviceType to use. the value is av_hwdevice_get_type_name(). For decoding in current FFmpeg version it's cuda, drm, dxva2, d3d11va, qsv, vaapi, vdpau, videotoolbox, mediacodec. Also support for a future hardware decoder with internal hwconfig method.
  • drop: 0, 1(default), 2, 3. control frame drop in decoder
    • 0: never drop
    • 1: drop non-ref frames when seeking
    • 2: drop non-ref frames
    • 3: only decode I-frames
  • sw_fallback: 0(default) or 1. fallback to sw dec if hw dec error occured. Usually should be disabled, you can use a decoder list instead Player.setDecoders(MediaType::Video, ["D3D11", "FFmpeg"]), otherwise won't try the next decoder if failed.
  • pool: frame pool for direct rendering. Currently it can be "CVPixelBuffer" for Apple OSes.
  • ignore_profile: 0(default) or 1. ignore codec profile. may result in incorrect result(d3d11/dxva hevc) if turned on
  • error: int, default -1. number of decode errors allowed. -1 means unlimited. if decode error count < this value, continue to decode. decoder error usually appears when seeking mpegts(it's a bug in ffmpeg, avbuild fixed the bug, but here we set error to -1 to be compatible with official ffmpeg's bug)
  • other ffmpeg avcodec properties. for example lowres=N for jpeg2000.
  • external: 0(default) or 1(auto detect). use drm prime as an rgb texture in renderer. It's auto enabled when required(e.g. RaspberryPi OS hevc hardware decoder). Can also be tested on VAAPI and VADRM decoder. can also use global option eglimage.reuse
  • reuse: 0(default) or 1. set 1 to reuse texture to get better performance, known to support on raspberry pi with mesa driver. can also use global option eglimage.reuse

R3D

You have to enable the plugin via SetGlobalOption("plugins", "mdk-r3d") before any Player is created, and put R3DSDK runtime(from official sdk or download from here) to app working dir, or SetGlobalOption("R3DSDK_DIR", "your r3d runtime dir"). mdk-r3d plugin must be in the path can be dynamically loaded, for macOS, main bundle's PlugIns/mdk and PlugIns dir are searched first. To enable multiple plugins, SetGlobalOption("plugins", "mdk-r3d:mdk-braw")

Properties:

  • gpu:
    • opencl: default. should work everywhere.
    • cuda: seems slowest
    • metal: debay by metal. decompress must be async or gpu, format must be rgb48le, output is rgba64le
    • other values: use cpu decoder. other decoders will fallback to cpu decoder if fail to initialize.
  • scale: down scale to a resolution closest to given target resolution. can significantly improve performance. usually you can use screen resolution or video renderer resolution. value patterns are
    • widthxheight: set target width and height
    • width: set target width, height = width
    • 1/2, 1/4, 1/8, 1/16: scale to given value
  • format: output frame format. can be:
    • bgr0: default. bgra with alpha ignored.
    • bgra
    • rgbp16le: planar rgb, 16bit unsigned channels
    • rgb48le: 16bit rgb. gpu=metal will output rgba64le, otherwise packed 3 channels rgb is not recommended because not supported by gpu renderers except OpenGL.
  • decompress:
    • r3d: works on all platforms
    • async: currently macOS only. faster then r3d. gpu must be metal, format must be rgb48le.
    • gpu: currently macOS only. gpu must be metal, format must be rgb48le. lower cpu load, very high gpu load. fps does not increase if scale to a lower resolution than 1/4.

BRAW

Full featured and high performance Blackmagic RAW decoder via Blackmagic RAW SDK. You have to enable the plugin via SetGlobalOption("plugins", "mdk-braw") before any Player is created, and put BlackmacgicRawAPI runtime(from sdk or download from here) BlackmagicRawAPI.dll(windows x64)/BlackmagicRawAPI.framework(mac)/libBlackmagicRawAPI.so(linux x64) to the location can be loaded. mdk-braw plugin must be in the path can be dynamically loaded, for macOS, main bundle's PlugIns/mdk and PlugIns dir are searched first. To enable multiple plugins, SetGlobalOption("plugins", "mdk-r3d:mdk-braw")

Properties:

  • gpu or pipeline:
    • no, cpu(default). lower fps, higher cpu load.
    • auto: metal > cuda > opencl > cpu
    • metal: metal gpu pipeline. Requires braw runtime component DecoderMetal
    • opencl: opencl gpu pipeline. Requires braw runtime component DecoderOpenCL.dll(windows) or libDecoderOpenCL.so(linux).
    • cuda: cuda gpu pipeline. Requires braw runtime component DecoderCUDA.dll(windows) or libDecoderCUDA.so(linux).
  • device: if there are multiple devices supports a pipeline, a device name can be used to select one
    • nvidia: nvidia gpu. multiple gpu devices support opencl pipeline, select via device name. (nvidia opencl pipeline does not work in my tests)
    • avx2: instruction for cpu pipeline. Requires braw runtime component InstructionSetServicesAVX2.dll(windows), libInstructionSetServicesAVX2.so(linux) or InstructionSetServicesAVX2(mac)
    • sse: instruction for cpu pipeline. full name is SSE 4.1.
    • avx: instruction for cpu pipeline. WORST performance! Requires braw runtime component InstructionSetServicesAVX.dll(windows) or libInstructionSetServicesAVX.so(linux) or InstructionSetServicesAVX(mac)
    • other names: will list all devices of the pipeline in the log, then you can choose the substring of one
  • scale: down scale to a resolution closest to given target resolution. can significantly improve performance. usually you can use screen resolution or video renderer resolution. value patterns are
    • widthxheight: set target width and height
    • width: set target width, height = width
  • copy: copy gpu resource to host memory or not
    • 1(default)
    • 0: currently only works for metal and cuda pipeline. opencl pipeline will always in copy mode.
  • format: output frame format. can be:
    • rgba: default. rgba 8bit
    • bgra: bgra
    • rgb48le: 16bit rgb. packed 3 channels rgb is not recommended because not supported by gpu renderers except OpenGL.
    • rgba64le: 16bit rgba
    • bgra64le: 16bit bgra
    • rgbp16le: 16bit planar rgb
    • rgbpf32le: 32bit planar rgb
    • bgraf32le: 32bit packed bgra

Recommended options: BRAW:gpu=auto:copy=1:scale=ScreenWidthxScreenHeight

VT

RECOMMENDED for apple platforms. MDK's own implementation. Async mode(default) has better performance than VideoToolbox decoder in ffmpeg. Support more codecs, for example VP9 on macOS11+, 10bit H264, 10bit YUV444 and H264/HEVC GBRP pixel format on M1 and ProRes. Support more output formats, 10~16bit yuv 422/444. Properties:

  • copy: can be 0(default), 1 and 2.
    • 0: (default) no copy. best performance.
    • 1: lock decoded buffers to be accessible for CPU.
    • 2: lock decoded buffers and copy to host memory. slowest.
  • async: can be 1(default) or 0. enable async decoding or not
  • hardware: 0 or 1(default). enable hardware decoding(1) or not(0)
  • realTime: 0 or 1(default)
  • threads: number of decoding threads. only supported if hardware=0 for some decoders
  • deinterlace: 0 or 1(default)
  • width: desired output width in pixels
  • height: desired output height in pixels
  • format: decoded frame pixel format. default is the closest supported format(same chroma subsample size) of the optimal output, usually it's a semi-planar format, e.g. nv12 for 8bit yuv 420, p010le for 10bit yuv 420. It also can be packed formats, e.g. uyvy422 for 8bit yuv 422. Some default(optimal?) output formats are not supported in mdk, mdk choose the closest format. for example 10bit yuv422 default output is 'v216', mdk choose 'p210', and choose p416('sv44') instead of 'y416' for yuv444p12 input . Apple silicon may produce 'p420' for 10bit yuv420, and Y planeis r10g10b10a2 alike, mdk choose p010le format
  • fourcc or CVPixelFormat: force the output CVPixelFormat (for testing). To show the complete format list on your device, run cvformats in the sdk package.
  • alpha: 1(default) or 0. Enable hevc with alpha channel decoding.

MFT

RECOMMENDED for windows. Win32, UWP hardware and software decoder via Media Foundation. MDK's own implementation. Performance is better than D3D11 decoder in ffmpeg. Plugins from microsoft store is required to support HEVC and AV1. Source Code. Properties:

  • d3d: can be 0, 9, 11, 12.
    • 0: not using d3d, software decoder.
    • 9: (win32) use d3d9 device, i.e. DXVA2 hardware accelerated.
    • 11: default. (win7+,UWP) use d3d11 device, i.e. d3d11 hardware accelerated.
    • 12: use d3d12 device, hardware accelerated. currently 0-copy is only supported by d3d12 renderer.
  • fallback:
    • 1: default. fallback to lower d3d version if specified version is unsupported. will not fallback to d3d=0
    • 0: returns error if failed to use given d3d version
  • pool:
    • 0: default for audio(required by dolby audio decoder?). allocate memory for every decoded sample if necessary.
    • 1: default for video. use a pool for decoded samples to reduce memory allocation and copy.
  • copy: can be 0, 1, 2.
    • 0: default. for d3d=9/11. No GPU to host memory copy. Best performance
    • 1: map to host memory for d3d=9/11, or copy to host memory for d3d=0
    • 2: map and copy to host memory
  • feature_level: d3d11 device feature level to create. from "9.1" to "12.1"(default)
  • vendor: d3d11 device vendor name. can be intel, amd, nvidia, qualcom, microsoft
  • adapter: d3d9/11 adapter number. default 0. Used to select GPU for decoding. -1 to select by system
  • low_latency: low latency mode. can be 0(default) and 1. decoded frames can be out of order for some videos with b-frames if low latency is enabled.
  • ignore_profile
  • ignore_level
  • kmt: 0(default) or 1. resource sharing via keyed mutex. kmt may produce error error ProcessOutput: (80070057), ProcessInput c00d36b5. for d3d=11 win8+.
  • shared: 0 or 1(default). resource sharing via legacy mechanism. for d3d=11 win8+.
  • shader_resource: 0(default) or 1. d3d=11 and win8+ only. best performance if output texture is shader resource, but may result in decode error for some gpus.
  • blacklist: codec blacklist because of poor support. string, "codec1codec2..." or "codec1+codec2+...", default is "mpeg4".
  • debug: 0(default) or 1. enable d3d11 debug layer

Recommended options: MFT:d3d=11

AMediaCodec

RECOMMENDED for android. Android hardware decoder. MDK's own implementation. Support both NDK(default if available) and java implementation. Supports async mode. . Properties:

  • java: can be 0(default) or 1. Use java or NDK api. DO NOT set both 0 and 1 for multiple decoders at the same time
    • 0: prefer NDK C API if available. the default and recommended. ndk api is available since android 5.0(api level 21).
    • 1: always use java API. NOT RECOMMENDED.
  • surface:
    • 0: decode to host memory(ByteBuffer). slower
    • 1: default. decode to a surface provided by SurfaceTexture, AImage or user. Better performance.
    • ANativeWindow address: decode into user provided ANativeWindow. NDK mode only
  • image: 0 or 1
    • 0: use SurfaceTexture or user provided surface
    • 1: use AImageReader to provide Surface. Required by Vulkan(WIP). Best performance. NDK mode only
  • copy: can be 0(default) or 1
    • 0: no copy. faster
    • 1: copy from decoded ByteBuffer. Conflicts with surface=1 and will be ignored
  • async: can be 0 or 1(default).
    • 0: sync mode
    • 1: async mode if available. API 28(android P) and java=0 is required.
  • dv: use dolby vision decoder for dolby vision videos. can be 0, 1(default) or -1.
    • 0: do not use dolby vision decoder
    • 1: use dolby vision decoder for profiles not backward compatible if necessary
    • -1: always use dolby vision decoder to process color

CUDA

Win32 and linux hardware decoder for NVIDIA GPUs. MDK's own implementation. Properties:

  • copy: 0(default) or 1. copy to host memory or not. 0 is faster.
  • deinterlace: deinterlace algorithm. can be bob, weave, adaptive
  • surfaces: (default=20) number of decoder surfaces
  • maxdelay: (default=4) Max display queue delay. recommend is 2~4
  • lock: (default=0) use cuvid lock or not
  • padding: 0(default) or 1. Output has padding data.(test only)
  • map: (default 8) Maximum number of output surfaces simultaneously mapped.

VAAPI

Linux hardware decoder for intel and AMD GPUs(from ffmpeg). Also supported on windows. Properties:

  • copy: 0(default) or 1

  • threads: number of decoding threads. 0: logical cpu cores+1

  • hwdevice: drm fd device path, or XOpenDisplay() parameter

  • display: va display type, can be x11, drm

  • x11: x11 display ptr value as a string

  • drm: drm device path

  • device: drm device path on linux(same as drm). adapter index or vendor name(case insensitive substring) on windows, e.g. device=nv or device=amd.

  • DRMFd: address of drm fd

  • interop: interop method for 0-copy rendering. can be x11, drm, drm2, dri3. legacy drm may not work on new drivers.

    • x11: glx texture or egl image from pixmap. requires libva-x11. may not work for some drivers/hardwares, e.g. intel iHD/i965 driver may report error in vaPutSurface, and eglCreateImageKHR may fail, then will fallback to dri3 interop. Supports external=1 property.
    • dri3: glx texture or egl image from x11 dri3 pixmap from drm exported from libva. requires VAEntrypointVideoProc(vainfo -a). for intel you may need to install intel-media-va-driver-non-free, other drivers(opensource iHD and i965) may not support VideoProc. Tested on intel, amd gpus. dri3 may be more efficient than x11 interop(no blit). Supports external=1 property.
    • drm: legacy drm prime api from libva-drm
    • drm2: drm prime 2 api from libva-drm
  • composed: 0(default) or 1, drm2 interop only. 1: export composed multi-planes yuv layer. 0: export single plane multi layers, 1 layer per plane.

  • external: 0(default) or 1(auto detect). OpenGLES is required. Use drm prime or x11/dri3 pixmap as an rgb texture in renderer. It's auto enabled when required(e.g. RaspberryPi OS hevc hardware decoder). Can also be tested on VADRM decoder.

For drm/drm2 interop, all display, interop, composed and external combinations are supported.

display\interop x11 pixmap legacy drm drm2
x11 EGL/GLX EGL EGL
drm EGL EGL

You may have to set environment var LIBVA_DRIVER_NAME if the default does not work. For example export LIBVA_DRIVER_NAME=i965 for intel

For x11, you MUST call SetGlobalOption("X11Display", DisplayPtr) to enable 0-copy rendering, otherwise may fail. Qt example. You also have to ensure XInitThread() is called before any x11 call(Qt, glfw already does).

VADRM

almost the same as VAAPI except the output is drm prime instead of vaapi surface. May not work

VDPAU

Linux hardware decoder for NVIDIA GPUs(from ffmpeg). Properties:

  • copy: 0(default) or 1
  • threads: number of decoding threads. 0: logical cpu cores+1
  • hwdevice: XOpenDisplay() parameter
  • interop: force an interop method, can be
    • video: map VdpVideoSurface as yuv textures. default for nvidia gpus
    • output: map VdpOutputSurface as an rgb texture. default for non-nvidia gpus
    • pixmap: render to an x11 pixmap and use as an rgb texture. support any combination of EGL+OpenGLES and GLX+OpenGL/OpenGLES. But seems broken in my recent test on some devices.
  • connection_type: can be "drm" or "x11". NOT IMPLEMENTED
  • kernel_driver: NOT IMPLEMENTED
  • driver: NOT IMPLEMENTED

You may have to set environment var VDPAU_DRIVER if the default does not work. For example export VDPAU_DRIVER=radeonsi for amd

You MUST call SetGlobalOption("X11Display", DisplayPtr) to enable 0-copy rendering, otherwise may fail. Qt example. You also have to ensure XInitThread() is called before any x11 call(Qt already does).

MMAL

Raspberry pi hardware decoder(from ffmpeg). Properties:

  • copy: 0(default) or 1
  • threads: number of decoding threads. 0: logical cpu cores+1

mmal

Raspberry pi hardware decoder. MDK's own implementation. Properties:

  • copy: 0(default), 1 or 2.
    • 0: decoded frame can be rendered without copy. Best performance
    • 1: decode into host memory.
    • 2: decode into host memory and copy. slowest.
  • format: decoded pixel format name. Seems only yuv420p works.
  • extra_in: (default 10) extra input buffers
  • extra_out: (default 0) extra output buffers

CedarX

Sunxi hardware decoder. MDK's own implementation. Can be rendered via arm UMP acceleration.

V4L2M2M

From ffmpeg. alias of FFmpeg:hwaccel=v4l2m2m. RaspberryPi OS system ffmpeg supports drm_prime frame output, you can use it to get maximum performance.

dav1d

software av1 decoder via dav1d, requires dav1d runtime library, e.g. libdav1d.5.dylib for apple, libdav1d.so.5 for linux, libdav1d.dll(download from https://code.videolan.org/videolan/dav1d/-/pipelines) for windows and libdav1d.so for android. Will automatically fallback to dav1d decoder if Player.setDecoders() is not called and dav1d library is found. Properties:

  • threads: default 0(logical cores)
  • tile_threads: tile threads. default is 0(sqrt(threads))
  • frame_threads: frame threads. default is 0(threads/tile_threads)

__NOTE: if setDecoders(MediaType::Video,...) is not called, dav1d will be automatically used when necessary, otherwise you have to explicitly add dav1d in decoder list setDecoders(MediaType::Video, {..., "dav1d", ...})

hap

Hap is decoded as compressed textures, and uncompressed on GPU. Lower cpu, gpu and memory load compared with ffmpeg hap decoder. Only desktop platforms are supported, including windows, uwp, macOs and linux. Supported codecs are Hap1, Hap5, HapY and HapM.

__NOTE: if setDecoders(MediaType::Video,...) is not called, hap will be automatically used when necessary, otherwise you have to explicitly add hap in decoder list setDecoders(MediaType::Video, {..., "hap", ...}). FFmpeg based hardware decoders may fallback to ffmpeg hap decoder, so you may insert hap before those decoders

D3D11

Win32, UWP hardware decoder(from ffmpeg). Properties:

  • copy: 0(default) or 1
  • threads: number of decoding threads. 0: logical cpu cores+1
  • hwdevice: dxgi adapter index
  • debug: debug layer
  • shader_resource: 0(default) or 1. set srv flags to AVD3D11VAFramesContext so can be used by renderer

DXVA

Win32 hardware decoder(from ffmpeg). Properties:

  • copy: 0(default) or 1
  • threads: number of decoding threads. 0: logical cpu cores+1
  • hwdevice: d3d9 adapter index

QSV

Intel QuickSync (tested on windows) (from ffmpeg). Properties:

  • copy: 0(default) or 1
  • d3d: 11(default) or 9. requires ffmpeg5.0+. output d3d11 or d3d9.
  • threads: number of decoding threads. 0: logical cpu cores+1
  • hwdevice : implementation. can be "auto", "sw", "hw", "auto_any", "hw_any", "hw2", "hw3", "hw4"

NVDEC

Win32 and linux hardware decoder for NVIDIA GPUs(from ffmpeg). Properties:

  • copy: 0(default) or 1
  • threads: number of decoding threads. 0: logical cpu cores+1
  • hwdevice: cuda device index

VideoToolbox

macOS and iOS hardware decoder(from ffmpeg).

  • format: software pixel format. can be "nv12", "p010le", "yuv420p", "uyvy422" etc.

MediaCodec

Android hardware decoder(from ffmpeg) Properties:

  • copy: 0(default) or 1
  • threads: number of decoding threads. 0: logical cpu cores+1

Audio Decoders

Decoders are selected by Player.setDecoders(MediaType::Audio, ...), or -c:a names option for examples(glfwplay, mdkplay etc.)

Available builtin decoder names and properties are:

FFmpeg

MFT

AMediaCodec

Clone this wiki locally