Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create fast path for returning YUVPlanes to numpy arrays #1393

Closed
wants to merge 2 commits into from

Conversation

hmaarrfk
Copy link
Contributor

@hmaarrfk hmaarrfk commented May 4, 2024

No description provided.

@hmaarrfk hmaarrfk marked this pull request as ready for review May 4, 2024 14:18
@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 4, 2024

Thank you for considering.

@WyattBlue
Copy link
Member

The problem with this patch is that when the fast path is used, it disrupts the structure of the underlying pixel format. Any code that relies on that behavior is going to break. This was discussed in #788

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 5, 2024

Yes. I remember now. Ultimately this function doesn't give that guarantee. It al4eady changes between a view and a copy.

Would you consider the addition of a keyword argument to specify the users intention?

@WyattBlue
Copy link
Member

Any libraries that uses on to_ndarray() for VideoFrames probably without realizing it, reply on the underlying data format to be preserved. auto-editor, for example, depends on frame.to_ndarray().to_bytes() to return the underlying bytes so the ffmpeg cli can treat them as raw frames. This allows for efficient rendering videos.

What is the point of to_ndarray() being fast if it returns data in an undefined layout? Do you have a particular use case?

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 6, 2024

Any libraries that uses on to_ndarray() for VideoFrames probably without realizing it, reply on the underlying data format to be preserved.

The data "format" hasn't changed.

What has changed is the fact that VideoFrame and the returned numpy array share the same memory. If you think this is an issue, i'm happy to add different function, but the in its current form on main to_ndarray does the following

  • yuv420p yuvj420p -- data copy
  • yuyv422 -- data copy
  • gbrp10be", "gbrp12be", "gbrp14be", "gbrp16be", "gbrp10le", "gbrp12le", "gbrp14le", "gbrp16le" -- data copy
  • "gbrpf32be", "gbrpf32le" -- data copy
  • "rgb24", "bgr24" -- data view
  • "gray", "gray8", "rgb8", "bgr8" -- data view

so rgb24 and gray formats are returning views. I presume it was because it was easier to guarantee correctness then.

I'm asking here that yuv420p and yuvj420p return "maybe views" if the underlying image data is well organized (according to the checks in the if statement).

auto-editor, for example, depends on frame.to_ndarray().to_bytes() to return the underlying bytes so the ffmpeg cli can treat them as raw frames.

This should still work. I can add a test for this usecase.

What is the point of to_ndarray() being fast if it returns data in an undefined layout?

I utilize the well defined layout of the YUV420 format in FFMPEG to avoid the memory copy.

For a 8x4 array of pixels, YUV420 and YUV420J would store:

yyyyyyyy
yyyyyyyy
yyyyyyyy
yyyyyyyy
uuuuuuuu
vvvvvvvv

From wikipedia: https://en.wikipedia.org/wiki/Chroma_subsampling
image

So the "fastpath" checks that the memory is all contiguous, and simply returns a pointer to the underlying data from the FFMPEG VideoFrame object.

Avoiding this copy is pretty important since for 4k (or 8k! which I am interested in) video decoding:

3840 x 2160 x 1.5 = 12MB

per frame.

Copying this around unnecessarily creates REAL problems with your CPU cache and just slows things down alot.

As for my usecase, it revolves around 4k and 8k video encoding / decoding. And as such benefits from what appears to be micro optimizations.

I can add a test if you want, but generally speaking, I remember that without my check, something already failed. I recall it being due to some odd image shape in the testing suite which was interesting to learn about.

@WyattBlue
Copy link
Member

It applying the fastpath does break auto-editor though:
mpv-shot0001

It wouldn't be an issue as long as the fastpath is not applied by default. I'm sympathetic to your use-case. I do think having a new default argument is a little janky, with the **kwargs already there.

I think the way forward is make a new method to_ndarray2(). It would handle the yuv420p yuvj420p like your code does now. Making it a new method has the benefit of side-stepping any backwards compatibility problems.

@hmaarrfk

This comment was marked as off-topic.

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 7, 2024

It applying the fastpath does break auto-editor though:

This is really strange....

I find on place where you use the "naked" to_ndarray
https://github.com/WyattBlue/auto-editor/blob/8f11ab54fade4fa5ab4cd7cb43ae54a4212dbff3/auto_editor/analyze.py#L392

but nothing jumps to me as getting mutated.

(In fact you could save yourself some time by doing current_frame = frame.to_ndarray().astype('int16') but that is beside the point)

I would love to recreate that screenshot to see what might be going on?

I have a feeling that maybe I'm not adding enough safeguards on the fastpath.

Can you teach me how you got to your screenshot?

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 7, 2024

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 7, 2024

I suspect the reason that I don't see this bug it is because you are making use of elaborate filters and using filter graphs which is not what workflow i go through.

I added two checks for data continuity:

                y_plane._buffer_ptr + y_plane.buffer_size * bytes_per_pixel == u_plane._buffer_ptr and
                u_plane._buffer_ptr + u_plane.buffer_size * bytes_per_pixel == v_plane._buffer_ptr

I think they should resolve things, but I need to check that it is still "correct" in the sense that the fast path is still triggered.

@hmaarrfk hmaarrfk marked this pull request as draft May 7, 2024 12:04
@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 7, 2024

I need to recheck my buffer continuity check, it doesn't seem to be triggering "true" on what I thought it should.

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 7, 2024

Hmmm, it seems that the array isn't guaranteed to be contiguous, even in the "nice" cases

ipdb> u_plane.buffer_ptr - (y_plane.buffer_ptr + y_plane.buffer_size)
4864
ipdb> v_plane.buffer_ptr - (u_plane.buffer_ptr + u_plane.buffer_size)
1728

how is any of my code working.....

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 7, 2024

it seems that maybe the function i'm looking for is:
get_buffer2 that would allow me to create the buffer that I need to ensure that memory isn't copied.
https://ffmpeg.org/doxygen/6.0/structAVCodecContext.html#aef79333a4c6abf1628c55d75ec82bede

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented May 7, 2024

Thank you for your thoroughness in testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants