Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use out-of-band buffers for Python buffer protocol supporting objects #365

Open
jakirkham opened this issue May 7, 2020 · 10 comments
Open

Comments

@jakirkham
Copy link
Member

For objects supporting Python's buffer protocol (like memoryview), it would be great to have a path where we use Pickle's protocol 5 for serialization of out-of-band buffers. This should avoid some copies when pickling/unpickling and allow more efficient transmission.

@pierreglaser
Copy link
Member

Sure. Is there any reason for why CPython does not have a memoryview reducer?

@jakirkham
Copy link
Member Author

That's a good question. I don't know. @pitrou, do you know? 🙂

@pitrou
Copy link
Member

pitrou commented May 8, 2020

Because it's not obvious how to serialize a memoryview.

@jakirkham
Copy link
Member Author

Meaning we don't know what memory to use for backing it?

@pitrou
Copy link
Member

pitrou commented May 8, 2020

Yes, exactly. One way of punting the question is to decide to deserialize a memoryview by recreating a bytes object with similar content. But what if your memoryview was backed by e.g. a mmap'ed file?

@jakirkham
Copy link
Member Author

Yeah that makes sense. I think in cloudpickle we have chosen to just use bytearray, but it makes sense this would be ambiguous generally and may not make sense outside of cloudpickle.

What about other objects like bytes, bytearray, array, etc. where we do know how to allocate for them? Does it make sense for CPython to support out-of-band buffer serialization for them or are there some issues there as well?

@pitrou
Copy link
Member

pitrou commented May 8, 2020

It would certainly make sense indeed. But it's not necessarily trivial and someone has to do the work ;-)

(note that a memoryview may have been sliced or casted... of course, you could choose to forbid those cases, but you would still have to detect them)

@jakirkham
Copy link
Member Author

What challenges do you envision in implementing out-of-band serialization for those other objects?

(yeah that's fair. there is a lot the current implementation brushes under the rug ;)

@jakirkham
Copy link
Member Author

FWIW opened upstream issue ( https://bugs.python.org/issue40718 ) on serializing builtin bytes-like types.

@jakirkham
Copy link
Member Author

The one trick for performing out-of-band pickling of memoryviews is we would need access to the protocol version (as is the case with __reduce_ex__. A function won't work as we can't pass more arguments to the function when using the dispatch_table. Maybe we can workaround this constraint by using a method and accessing the protocol version via class state. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants