inputdrv_videodecoder

VirtualDub Plugin SDK 1.2

Video decoder

The video decoder is responsible for converting samples into frames.

Interaction with the video decoder model

A video decoder cannot request frames by itself, so it is dependent upon the host providing the necessary samples for each desired frame. It does this by consulting a video decoding model, which simulates the decoder and allows the host to prefetch video samples ahead of time. This means that any internal buffer manipulation in the decoder must match that done by the decoder model, or the two will desynchronize and frame decoding will malfunction.

The video decoder must operate entirely independently of the video decoder model, as video frames can be decoded at different times and on different threads than the decoder model. It is also possible, although not generally required, to use multiple video decoder models with a video decoder or vice versa. Like the decoder model, however, the video decoder is tied to and can share data structures with its parent video source.

Video decoder structure

A video decoder consists of a number of internal buffers and a frame buffer which is exposed to the host. Each decoded sample changes one or more of the internal buffers, one of which is then converted into a decoded frame in the frame buffer. The host need not know the number or structure of the internal buffers, as these are abstracted by the video decoder model.

Frame buffer formats

Video decoders can expose a number of video frame formats to the host, which are exposed via the VDXPixmapFormat enumeration. A video decoder may support only a small subset of these formats, but it must support at least one, and it is recommended that at least 24-bit RGB (RGB888) be supported. If the video format uses YCbCr internally, as most do, supporting at least one of the 4:2:2 interleaved YCbCr formats (YUV422_UYVY or YUV422_YUYV) is also recommended.

Frame buffer layout linearization

Video decoders expose their frame buffers using the VDXPixmap structure, which describes up to three planes of arbitrary addressing and pitch. For interleaved (chunky) formats, the planes must be aligned to at least the natural alignment of a pixel group (2 bytes for 16-bit RGB formats, 4 bytes for 32-bit RGB and 4:2:2 YCbCr). The pitch describes the memory offset between scanlines and may be greater than the valid data length within a row, in order to accommodate extra padding for alignment, and it may be negative to support bottom-up orientation.

The host can request frame buffer linearization when selecting a format. When this is set, the decoder is required to arrange the planes back-to-back at a known starting address (returned by GetFrameBuffer()) and in a Windows DIB compatible layout. This means:

RGB pixmaps must be stored in a bottom-up orientation, with scanlines aligned and padded to DWORD (4 byte) boundaries.
YCbCr planes must be stored back-to-back in top-down orientation without padding between scanlines. The plane order is Y, Cr (V), Cb (U).

Linearization allows the host to pass the data directly to a video codec without requiring an additional memory copy.