Support yuv444p/yuvj444p in `to_ndarray`/`from_ndarray` #788

NPN · 2021-05-21T08:50:30Z

No description provided.

WyattBlue

This doesn't look right. Applying these changes with a yuv444p or yuvj444p pixel format produces a video with wild distortions.

Original

Processed

I'm curious, how did you determine how the data was processed? I'm trying add support for the yuv420p10le and I'm struggling to find good resources?

command used to generate test video:

ffmpeg -f lavfi -i testsrc -t 30 -pix_fmt yuv444p testsrc.mp4

NPN · 2021-08-26T08:49:58Z

How are you getting that error? I generated the test video and used this code, which reads in the video, converts each frame to a yuv444p ndarray, then back to a yuv444p frame, and then writes out the frame:

import av

with av.open("testsrc.mp4", "r") as vin, av.open("out.mp4", "w") as vout:
    sout = vout.add_stream("h264", rate=25)
    sout.width = 320
    sout.height = 240
    sout.pix_fmt = "yuv444p"

    for frame in vin.decode(video=0):
        assert frame.format.name == "yuv444p"
        array = frame.to_ndarray()
        assert array.shape == (240, 320, 3)
        new_frame = av.VideoFrame.from_ndarray(array, "yuv444p")
        for packet in sout.encode(new_frame):
            vout.mux(packet)

    for packet in sout.encode():
        vout.mux(packet)

For me, the output out.mp4 looks the same as testsrc.mp4.

As for figuring out the pixel format, I think I searched and read until I knew enough to understand the terminology around pixel formats. Then I could reference FFmpeg's pixfmt.h header, which has some comments about each pixel format. For yuv420p10le, it says:

Notice that each 9/10 bits sample is stored in 16 bits with extra padding.
...
planar YUV 4:2:0, 15bpp, (1 Cr & Cb sample per 2x2 Y samples), little-endian

So we know that this format is:

planar: Pixels are stored as YYY...UUU...VVV..., as opposed to packed/interleaved, which is YUVYUV...
YUV: as opposed to RGB
4:2:0 chroma subsampling (Wikipedia has a section on the chroma subsampling notation)
15bpp (bits per pixel): 4 real pixels are stored as 4 luma (Y) pixels and 2 chroma (Cr, Cb) pixels. Each format pixel is 10 bits, so that's (6 format pixels * 10 bits) / 4 real pixels = 15 bits per pixel

little-endian: This means that each 10-bit sample is packed into 16 bits as follows:

               MSB      LSB
               v        v
10-bit pixel:  0123456789

               Byte 1      Byte 2
               v           v
16-bit pixel:  23456789    ......01

If you understand all the terminology, the description for each format should be enough. Unfortunately, I don't know of a resource that describes everything in one place, but as long as you search around, look at existing code, etc., you should be good.

FirefoxMetzger · 2022-02-02T11:24:19Z

This is ultimately an API design decision, but I am not 100% sure if pyAV should convert yuv444p to channel-last internally. p indicates a planar pixel format, and I would have consequentially expected the axis order of [channel, height, width] and not [height, width, channel]. The latter, I would have expected to see from yuyv444 (which is a packed yuv444).

Doing this internal conversion may result in interesting behavior. The final np.moveaxis in this PR creates a view into a planar buffer (not a copy!), so depending on which application consumes the data later on, you may see artifacts due to malaligned channels (@WyattBlue 's comment). Here is a snippet to demonstrate the "problem":

>>> import av
>>> import numpy as np
>>> foo = av.open("testsrc.mp4")      
>>> frame = next(foo.decode(video=0))
>>> np_frame = np.stack([np.frombuffer(frame.planes[idx], dtype=np.uint8) for idx in range(3)]).reshape(3, frame.height, frame.width)
>>> np_frame2 = np.moveaxis(np_frame, 0, -1)
>>> np_frame2.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

Another point is that np.stack will definitely create a copy of the frame's data, which could be avoided by directly using frame._buffer instead of splitting the data into planes and reassembling later. This suggestion does come with a grain of salt though, because I don't know if FFMPEG guarantees contiguity of planes (though I don't see any obvious reason why it shouldn't be able to do so), and further because I don't know the memory management model that pyAV uses for frames.

@WyattBlue You can find a good overview/breakdown of the various names used by ffmpeg here: https://ffmpeg.org/doxygen/trunk/pixfmt_8h_source.html (it's C code, but the comments are the important part).

yuv420p10le is a bit more involved than yuv444p, because it is subsampled and not 8-bit. If you want to get it into a similar format as discussed here ([height, width, channels]), you have to upsample the UV channels. (Fortunately, the data comes byte-aligned instead of being a bitstream)

>>> bar = av.open("testsrc.mp4")
>>> frame = next(bar.decode(video=0))
>>> # cast the planes to np.uint16 (they are 10-bit each in little endian, but have 2-byte alignment)
>>> y_plane, u_plane, v_plane = tuple(np.frombuffer(frame.planes[idx], dtype="<u2").reshape(frame.planes[idx].height, frame.planes[idx].width) for idx in range(3))
>>> # upsample U and V to match Y plane (this essentially creates a YUV444p image)
>>> np_frame = np.stack([y_plane, np.repeat(np.repeat(u_plane, 2, axis=0), 2, axis=1), np.repeat(np.repeat(v_plane, 2, axis=0), 2, axis=1)])
>>> # as per the above, if you want channel-last contiguous data you need to copy or use axis=-1 when stacking
>>> np_frame2 = np.moveaxis(np_frame, 0, -1).copy()

The above is a manual example of doing the extraction. You can alternatively use frame.reformat("yuyv444") and have FFMPEG do this conversion for you. While at it, why not reformat to RGB or which ever your target format is?

NPN · 2022-02-08T07:32:13Z

@FirefoxMetzger Thanks for the suggestion--it does make more sense to put channels first. I think I was blindly following the order of the RGB/BGR/etc formats without realizing that they were packed.

WyattBlue · 2022-02-08T22:27:11Z

@NPN Your last commit solved the problem I had before. This pull request is ready to be merged.

FirefoxMetzger · 2022-02-09T06:29:02Z

@jlaine When you do your next round of issue reporting, could you let us know pyAV's policy on tests and coverage? Is it mandatory, and if so to what extent?

Side question: How does pyAV manage ffmpeg's frame object? Is it ref-counted and decommissioned by FFmpeg (afaik default), or are you handling the frame yourself? I.e., is it sane to do something like gray_image = np.frombuffer(frame.planes[0], dtype=np.uint8).reshape(...) in python land or do I have to worry that FFmpeg will garbage collect that buffer, because it doesn't know about numpy holding a pointer/reference to it? I'm asking because this has performance implications in this PR, but also in other places.

jlaine · 2022-02-25T14:19:15Z

Hi @FirefoxMetzger yes we do need unit tests for these new formats. I unfortunately haven't been able to wire up an automatic check for this, I'm not sure how coverage works on Cython code. Could we also have this code rebased on top of main as there have been some changes in VideoFrame?

I'm not sure I understand the question about memory ownership? As long a you hold a reference to the frame I don't expect its buffer to disappear under your feet?

EDIT: I've rebased on top of main, but we do need tests

FirefoxMetzger · 2022-02-25T15:00:15Z

I unfortunately haven't been able to wire up an automatic check for this, I'm not sure how coverage works on Cython code.

I guess you are aware of coverage.py support for cython (see here)? I know that it exists, but never actually set it up myself.

After you get the coverage xml, it should behave like any other project, i.e., upload to CodeCov (or another provider) and have it hook into CI to pass/fail the test suite based on coverage. This part I've set up for ImageIO, so I'm happy to help if you decide to go for it.

I'm not sure I understand the question about memory ownership? As long a you hold a reference to the frame I don't expect its buffer to disappear under your feet?

>>> import av
>>> import numpy as np
>>> frame = av.VideoFrame(16, 16, "rgb24")
>>> no_copy_array = np.frombuffer(frame.planes[0], dtype=np.uint8).reshape((16, 16, 3))
>>> del frame  # will this be dangerous?

Here, we created the image buffer via lib.av_image_alloc during frame's construction and then we free it via av_freep during the deconstruction of the frame. However, python land still holds a reference to that buffer (no_copy_array) which FFmpeg doesn't know about, so it will happily free the buffer. At least I think that's what will happen, but I am not sure. I wonder if this could lead to undefined behavior, especially if we do the above on a decoded frame instead.

jlaine · 2022-02-25T15:27:20Z

Feel free to open a new discussion on the memory ownership question, I'd rather keep discussion here narrowly focused on the PR.

jlaine · 2022-02-25T22:06:44Z

I've been trying to put together a unit test for this, and the test would look like this:

    def test_ndarray_yuv444p(self):
        array = numpy.random.randint(0, 256, size=(3, 480, 640), dtype=numpy.uint8)
        frame = VideoFrame.from_ndarray(array, format="yuv444p")
        self.assertEqual(frame.width, 640)
        self.assertEqual(frame.height, 480)
        self.assertEqual(frame.format.name, "yuv444p")
        self.assertNdarraysEqual(frame.to_ndarray(), array)

=> AFAICT this would be the only format for which our ndarray has the channel along the first axis. For all the other formats it looks like (height, widtht, channel)?

FirefoxMetzger · 2022-02-26T06:02:20Z

AFAICT this would be the only format for which our ndarray has the channel along the first axis. For all the other formats it looks like (height, widtht, channel)?

There is also yuv420p and yuvj420p that are supported and technically channel-first; For these formats, however, it seems like we just get an array of lines:

>>> import av
>>> foo = av.VideoFrame(5*16, 5*16, "yuv420p")
>>> bar = foo.to_ndarray()
>>> bar.shape
(120, 80)

So I guess the answer is no you can get other layouts than (height, width, channel).

I wonder if subsampled formats like this should raise an error, promote to yuv444p or return ~~macro-block~~ macro-pixel strided arrays instead of an array of lines, but that's a different discussion. Is it worth having @jlaine ? If so, I'll open a new issue.

NPN · 2022-03-29T23:42:03Z

I have added tests based on @jlaine's comment above. Let me know if anything else is needed.

jlaine · 2022-03-30T06:54:35Z

As stated I'm still not comfortable with the array dimensions, this creates an element of surprise for the user: the other formats do (height, width, channels). Either convince me or change it :)

FirefoxMetzger · 2022-03-30T07:51:00Z

As stated I'm still not comfortable with the array dimensions, this creates an element of surprise for the user: the other formats do (height, width, channels). Either convince me or change it :)

@jlaine What about yuv420p and yuvj420p? Both are supported and don't return (height, width, channels). (See #788 (comment) for an example)

jlaine · 2022-03-30T08:25:08Z

As stated I'm still not comfortable with the array dimensions, this creates an element of surprise for the user: the other formats do (height, width, channels). Either convince me or change it :)

@jlaine What about yuv420p and yuvj420p? Both are supported and don't return (height, width, channels). (See #788 (comment) for an example)

Those formats decimate the chrominance, so there is no way they are do have a "normal" array, regardless of the order of dimensions :)

FirefoxMetzger · 2022-03-30T09:26:59Z

Those formats decimate the chrominance, so there is no way they are do have a "normal" array, regardless of the order of dimensions :)

@jlaine What would be the difference between copying pixels here (channel-first to channel-last) and copying pixels for yuv420p? (we do need a copy here, because #788 (review))

yuv420p shares a chroma pixel amongst multiple luma pixels, so making a new copy that duplicates values and serves the familiar [height, width, channels] format seems about as surprising to me as yuv444p (which is planar/channel-first) serving [height, width, channels] (which is channel-last). Personally, I would not copy in either case, because the result is really surprising and unexpected (to me). I ask for something that is channel-first (yuv444p), but get something that is channel-last instead; worse, yuv444 is - afaik - not even a valid format in FFMPEG. If we have to do a copy, I would at least like to to be consistent across YUV formats in that they all make the copy to [height, width, channels].

NPN · 2022-03-30T09:36:45Z

For channels first, the benefit is faithfulness to the underlying pixel format. So, code which operates on the raw ndarray data and expects a planar ordering will work correctly (e.g. WyattBlue's example above).

For channels last, the benefit is consistency with the other non-chroma subsampled formats. So, if you just want to get/set pixels using ndarray indexing, then you don't have to worry about whether the underlying format is planar or packed. Also, Pillow seems to always use channels last (though their YCbCr is based on JPEG, anyway).

Honestly, I'm not sure which is better. Unless someone else has an opinion, we can go with channels last, if that's what you prefer. I can just go back to the old code (after changing it to make a copy instead of a view, since np.moveaxis seems to cause problems?)

Edit: Somehow I missed FirefoxMetzger's comment above when writing this response. Those arguments seem pretty good, so I suppose it's up to what jlaine thinks.

WyattBlue · 2023-10-08T05:32:36Z

@NPN @FirefoxMetzger @jlaine I have merged these changes as-is in my fork.

FirefoxMetzger · 2023-10-12T18:28:38Z

Nice!

@WyattBlue Are you planning on forking PyAV and maintaining it, or is this more of a "one-off" thing?

WyattBlue · 2023-10-14T20:21:16Z

Nice!

@WyattBlue Are you planning on forking PyAV and maintaining it, or is this more of a "one-off" thing?

I definitely plan on maintaining it. I'm already using the fork for my project, auto-editor.

Co-authored-by: Jeremy Lainé <jeremy.laine@m4x.org>

jlaine · 2023-10-30T15:21:25Z

Thanks @NPN , it's merged!

NPN changed the title ~~Support yuv444p in to_ndarray~~ Support yuv444p/yuvj444p in to_ndarray/from_ndarray May 21, 2021

WyattBlue requested changes Aug 18, 2021

View reviewed changes

FirefoxMetzger mentioned this pull request Feb 25, 2022

Memory Management of VideoFrame image buffers #891

Closed

jlaine force-pushed the main branch from 895bcf2 to 6d2d9aa Compare March 7, 2022 23:46

jlaine added the changes requested label Mar 21, 2022

WyattBlue approved these changes Oct 8, 2023

View reviewed changes

Add VideoFrame ndarray operations for yuv444p/yuvj444p formats

ad3d7b4

Co-authored-by: Jeremy Lainé <jeremy.laine@m4x.org>

jlaine merged commit 29ef112 into PyAV-Org:main Oct 30, 2023
17 checks passed

NPN deleted the patch-1 branch December 1, 2023 00:43

WyattBlue mentioned this pull request May 5, 2024

Create fast path for returning YUVPlanes to numpy arrays #1393

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support yuv444p/yuvj444p in `to_ndarray`/`from_ndarray` #788

Support yuv444p/yuvj444p in `to_ndarray`/`from_ndarray` #788

NPN commented May 21, 2021

WyattBlue left a comment

NPN commented Aug 26, 2021

FirefoxMetzger commented Feb 2, 2022

NPN commented Feb 8, 2022

WyattBlue commented Feb 8, 2022

FirefoxMetzger commented Feb 9, 2022

jlaine commented Feb 25, 2022 •

edited

Loading

FirefoxMetzger commented Feb 25, 2022

jlaine commented Feb 25, 2022

jlaine commented Feb 25, 2022 •

edited

Loading

FirefoxMetzger commented Feb 26, 2022 •

edited

Loading

NPN commented Mar 29, 2022

jlaine commented Mar 30, 2022

FirefoxMetzger commented Mar 30, 2022 •

edited

Loading

jlaine commented Mar 30, 2022

FirefoxMetzger commented Mar 30, 2022

NPN commented Mar 30, 2022 •

edited

Loading

WyattBlue commented Oct 8, 2023

FirefoxMetzger commented Oct 12, 2023

WyattBlue commented Oct 14, 2023

jlaine commented Oct 30, 2023

Support yuv444p/yuvj444p in to_ndarray/from_ndarray #788

Support yuv444p/yuvj444p in to_ndarray/from_ndarray #788

Conversation

NPN commented May 21, 2021

WyattBlue left a comment

Choose a reason for hiding this comment

NPN commented Aug 26, 2021

FirefoxMetzger commented Feb 2, 2022

NPN commented Feb 8, 2022

WyattBlue commented Feb 8, 2022

FirefoxMetzger commented Feb 9, 2022

jlaine commented Feb 25, 2022 • edited Loading

FirefoxMetzger commented Feb 25, 2022

jlaine commented Feb 25, 2022

jlaine commented Feb 25, 2022 • edited Loading

FirefoxMetzger commented Feb 26, 2022 • edited Loading

NPN commented Mar 29, 2022

jlaine commented Mar 30, 2022

FirefoxMetzger commented Mar 30, 2022 • edited Loading

jlaine commented Mar 30, 2022

FirefoxMetzger commented Mar 30, 2022

NPN commented Mar 30, 2022 • edited Loading

WyattBlue commented Oct 8, 2023

FirefoxMetzger commented Oct 12, 2023

WyattBlue commented Oct 14, 2023

jlaine commented Oct 30, 2023

Support yuv444p/yuvj444p in `to_ndarray`/`from_ndarray` #788

Support yuv444p/yuvj444p in `to_ndarray`/`from_ndarray` #788

jlaine commented Feb 25, 2022 •

edited

Loading

jlaine commented Feb 25, 2022 •

edited

Loading

FirefoxMetzger commented Feb 26, 2022 •

edited

Loading

FirefoxMetzger commented Mar 30, 2022 •

edited

Loading

NPN commented Mar 30, 2022 •

edited

Loading