-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support yuv444p/yuvj444p in to_ndarray
/from_ndarray
#788
Conversation
to_ndarray
/from_ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right. Applying these changes with a yuv444p
or yuvj444p
pixel format produces a video with wild distortions.
I'm curious, how did you determine how the data was processed? I'm trying add support for the yuv420p10le
and I'm struggling to find good resources?
command used to generate test video:
ffmpeg -f lavfi -i testsrc -t 30 -pix_fmt yuv444p testsrc.mp4
How are you getting that error? I generated the test video and used this code, which reads in the video, converts each frame to a import av
with av.open("testsrc.mp4", "r") as vin, av.open("out.mp4", "w") as vout:
sout = vout.add_stream("h264", rate=25)
sout.width = 320
sout.height = 240
sout.pix_fmt = "yuv444p"
for frame in vin.decode(video=0):
assert frame.format.name == "yuv444p"
array = frame.to_ndarray()
assert array.shape == (240, 320, 3)
new_frame = av.VideoFrame.from_ndarray(array, "yuv444p")
for packet in sout.encode(new_frame):
vout.mux(packet)
for packet in sout.encode():
vout.mux(packet) For me, the output As for figuring out the pixel format, I think I searched and read until I knew enough to understand the terminology around pixel formats. Then I could reference FFmpeg's
So we know that this format is:
If you understand all the terminology, the description for each format should be enough. Unfortunately, I don't know of a resource that describes everything in one place, but as long as you search around, look at existing code, etc., you should be good. |
This is ultimately an API design decision, but I am not 100% sure if pyAV should convert yuv444p to channel-last internally. Doing this internal conversion may result in interesting behavior. The final >>> import av
>>> import numpy as np
>>> foo = av.open("testsrc.mp4")
>>> frame = next(foo.decode(video=0))
>>> np_frame = np.stack([np.frombuffer(frame.planes[idx], dtype=np.uint8) for idx in range(3)]).reshape(3, frame.height, frame.width)
>>> np_frame2 = np.moveaxis(np_frame, 0, -1)
>>> np_frame2.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False Another point is that @WyattBlue You can find a good overview/breakdown of the various names used by ffmpeg here: https://ffmpeg.org/doxygen/trunk/pixfmt_8h_source.html (it's C code, but the comments are the important part). yuv420p10le is a bit more involved than yuv444p, because it is subsampled and not 8-bit. If you want to get it into a similar format as discussed here ( >>> bar = av.open("testsrc.mp4")
>>> frame = next(bar.decode(video=0))
>>> # cast the planes to np.uint16 (they are 10-bit each in little endian, but have 2-byte alignment)
>>> y_plane, u_plane, v_plane = tuple(np.frombuffer(frame.planes[idx], dtype="<u2").reshape(frame.planes[idx].height, frame.planes[idx].width) for idx in range(3))
>>> # upsample U and V to match Y plane (this essentially creates a YUV444p image)
>>> np_frame = np.stack([y_plane, np.repeat(np.repeat(u_plane, 2, axis=0), 2, axis=1), np.repeat(np.repeat(v_plane, 2, axis=0), 2, axis=1)])
>>> # as per the above, if you want channel-last contiguous data you need to copy or use axis=-1 when stacking
>>> np_frame2 = np.moveaxis(np_frame, 0, -1).copy() The above is a manual example of doing the extraction. You can alternatively use |
@FirefoxMetzger Thanks for the suggestion--it does make more sense to put channels first. I think I was blindly following the order of the RGB/BGR/etc formats without realizing that they were packed. |
@NPN Your last commit solved the problem I had before. This pull request is ready to be merged. |
@jlaine When you do your next round of issue reporting, could you let us know pyAV's policy on tests and coverage? Is it mandatory, and if so to what extent? Side question: How does pyAV manage ffmpeg's frame object? Is it ref-counted and decommissioned by FFmpeg (afaik default), or are you handling the frame yourself? I.e., is it sane to do something like |
Hi @FirefoxMetzger yes we do need unit tests for these new formats. I unfortunately haven't been able to wire up an automatic check for this, I'm not sure how coverage works on Cython code. Could we also have this code rebased on top of I'm not sure I understand the question about memory ownership? As long a you hold a reference to the frame I don't expect its buffer to disappear under your feet? EDIT: I've rebased on top of |
I guess you are aware of coverage.py support for cython (see here)? I know that it exists, but never actually set it up myself. After you get the coverage xml, it should behave like any other project, i.e., upload to CodeCov (or another provider) and have it hook into CI to pass/fail the test suite based on coverage. This part I've set up for ImageIO, so I'm happy to help if you decide to go for it.
>>> import av
>>> import numpy as np
>>> frame = av.VideoFrame(16, 16, "rgb24")
>>> no_copy_array = np.frombuffer(frame.planes[0], dtype=np.uint8).reshape((16, 16, 3))
>>> del frame # will this be dangerous? Here, we created the image buffer via |
Feel free to open a new discussion on the memory ownership question, I'd rather keep discussion here narrowly focused on the PR. |
I've been trying to put together a unit test for this, and the test would look like this: def test_ndarray_yuv444p(self):
array = numpy.random.randint(0, 256, size=(3, 480, 640), dtype=numpy.uint8)
frame = VideoFrame.from_ndarray(array, format="yuv444p")
self.assertEqual(frame.width, 640)
self.assertEqual(frame.height, 480)
self.assertEqual(frame.format.name, "yuv444p")
self.assertNdarraysEqual(frame.to_ndarray(), array) => AFAICT this would be the only format for which our ndarray has the channel along the first axis. For all the other formats it looks like (height, widtht, channel)? |
There is also >>> import av
>>> foo = av.VideoFrame(5*16, 5*16, "yuv420p")
>>> bar = foo.to_ndarray()
>>> bar.shape
(120, 80) So I guess the answer is no you can get other layouts than I wonder if subsampled formats like this should raise an error, promote to yuv444p or return |
I have added tests based on @jlaine's comment above. Let me know if anything else is needed. |
As stated I'm still not comfortable with the array dimensions, this creates an element of surprise for the user: the other formats do (height, width, channels). Either convince me or change it :) |
@jlaine What about |
Those formats decimate the chrominance, so there is no way they are do have a "normal" array, regardless of the order of dimensions :) |
@jlaine What would be the difference between copying pixels here (channel-first to channel-last) and copying pixels for
|
For channels first, the benefit is faithfulness to the underlying pixel format. So, code which operates on the raw For channels last, the benefit is consistency with the other non-chroma subsampled formats. So, if you just want to get/set pixels using
Edit: Somehow I missed FirefoxMetzger's comment above when writing this response. Those arguments seem pretty good, so I suppose it's up to what jlaine thinks. |
@NPN @FirefoxMetzger @jlaine I have merged these changes as-is in my fork. |
Nice! @WyattBlue Are you planning on forking PyAV and maintaining it, or is this more of a "one-off" thing? |
I definitely plan on maintaining it. I'm already using the fork for my project, auto-editor. |
Co-authored-by: Jeremy Lainé <jeremy.laine@m4x.org>
Thanks @NPN , it's merged! |
No description provided.