Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Pyav objects to be serialized by pickle #652

Closed
vtexier opened this issue May 1, 2020 · 9 comments
Closed

Allow Pyav objects to be serialized by pickle #652

vtexier opened this issue May 1, 2020 · 9 comments

Comments

@vtexier
Copy link

vtexier commented May 1, 2020

Overview

Be able to serialize/de-serialize Pyav objects with pickle.

Pickle raise an error, complaining about __reduce__ magic function to be implemented.

pickle.dumps(packet)
  File "stringsource", line 2, in av.packet.Packet.__reduce_cython__
TypeError: no default __reduce__ due to non-trivial __cinit__

Desired Behavior

Pyav objects should be able to be serialized/de-serialized with pickle.

Example API

# pickle an av.packet.Packet instance
pickled = pickle.dumps(mypacket)

# get back packet instance
unpickled = pickle.loads(pickled)

To achieve that, it seems that there is a decorator in Cython since 0.26.

@cython.auto_pickle(True)

http://blog.behnel.de/posts/whats-new-in-cython-026.html

https://stackoverflow.com/questions/12646436/pickle-cython-class

Additional context

I am trying to use Pyav on Apache Spark, to distribute encoding across servers.

Pyspark (Python implementation of Spark) use pickle to transfer code object instances to servers.

So, making Pyav classes pickabled will allow to distribute Pyav jobs on a farm of servers, with pyspark or other similar tools.

@mikeboers
Copy link
Member

I'm not convinced this is a good idea, because there is a lot of internal state we will never be able to pickle.

We could likely pickle Packet and Frame allowing that all connections to the streams/codecs/contexts they come from break. And I'm not sure I like it.

It is easy enough to convert a packet to bytes and back.

@vtexier
Copy link
Author

vtexier commented May 1, 2020

I am new to Pyav, so may be there is something I miss here...

I don't know how to distribute the cpu consuming functions on servers without instances.

To my understanding, pyspark map function running on each frame of the datasets need to apply some Pyav functions.

Distributed frame encoding requires 1 function and 2 classes:

  • Stream.encode(Frame)

Distributed Frame Graph filtering requires 3 functions and 3 classes:

  • Graph.push(Frame)
  • Graph.pull()
  • Stream.encode(Frame)

As the Graph can be created in the map function, only the data (Frame class, Stream class) passed to it needs be pickable.

May be using __reduce__ can help pickle keep some tricky and mandatory information in this 2 classes.

I'll be pleased to spend plenty of time to do some tests and research on the subject to enable Pyav to be used in a fully distributed map function.

@mikeboers
Copy link
Member

You can't distribute encoding of most codecs in this way. They depend upon the previous, and often future frames.

It may be that they depend upon the previous and future raw frames instead of encoded ones, but I don't think FFmpeg will be easy to trick to do that.

I'd be happy to be proved wrong though.

@jlaine
Copy link
Member

jlaine commented May 1, 2020

I'm going to side with @mikeboers here, PyAV's objects bind deeply into the FFmpeg libraries with pointers into FFmpeg's datastructures. As for distributing encoding this would have to be something supported by FFmpeg itself, you can't just tack this on from the outside.

@mikeboers
Copy link
Member

It could be possible, but a quick pickle of the PyAV objects won't do it.

@jlaine
Copy link
Member

jlaine commented May 2, 2020

I really have no plans to do this, so either the poster is planning to work on a PR or this issue should be closed as won't fix.

@koenvo
Copy link

koenvo commented May 2, 2020

When you group the packets per GOP decoding should be possible right?

I believe I have a working poc somewhere where I split reading and decoding into totally separate processes. But the extradata needs to be shared. If you like I can search for the code.

@vtexier
Copy link
Author

vtexier commented May 2, 2020

@jlaine Having no experience in C binding I can not contribute on any C side solution and it seems not trivial to do clean distributed work with ffmpeg, so you can close as Wont' fix.

@koenvo Thanks! I will be very happy to have any help on distributing encoding among server/process with Pyav. I propose to talk about it in a new issue.

@jlaine
Copy link
Member

jlaine commented Feb 26, 2021

Closing as "won't fix" as suggested by @vtexier

@jlaine jlaine closed this as completed Feb 26, 2021
@WyattBlue WyattBlue closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants