Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share PyArrow table data with the server through ByteBuffer #4936

Merged
merged 7 commits into from
Jan 10, 2024

Conversation

jmao-denver
Copy link
Contributor

@jmao-denver jmao-denver commented Dec 11, 2023

Fixes #4935
Fixes #3709

depends on jpy-consortium/jpy#112
depends on #5022

@jmao-denver jmao-denver added this to the December 2023 milestone Dec 11, 2023
@jmao-denver jmao-denver self-assigned this Dec 11, 2023
@jmao-denver jmao-denver changed the title Use ByteBuffer to receive data from Python Share PyArrow table data with the server through ByteBuffer Dec 13, 2023
chipkent
chipkent previously approved these changes Dec 18, 2023
@@ -70,7 +69,7 @@ private static BarrageProtoUtil.MessageInfo parseArrowIpcMessage(final byte[] ip
}

@ScriptApi
public synchronized void setSchema(final byte[] ipcMessage) {
public synchronized void setSchema(final ByteBuffer ipcMessage) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think we need to add a big warning here and link to jpy-consortium/jpy#126; no strong references to this buffer can exist by the end of this method. I think we need to extend this warning into the implementation of parseSchema as well - it can't hold any references to Schema header (since that has references to the byte buffer).

@@ -192,7 +198,7 @@ protected BarrageMessage createBarrageMessage(BarrageProtoUtil.MessageInfo mi, i
return msg;
}

private BarrageProtoUtil.MessageInfo getMessageInfo(byte[] ipcMessage) {
private BarrageProtoUtil.MessageInfo getMessageInfo(ByteBuffer ipcMessage) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flatbuffer almost certainly retains references to ByteBuffer. We need to be very careful that nothing retains this flatbuffer object. I think we need to add a big warning in io.deephaven.extensions.barrage.util.ArrowToTableConverter#createBarrageMessage that nothing can retain reference to BarrageProtoUtil.MessageInfo mi

@devinrsmith
Copy link
Member

I think we need to carefully audit all of the parsing logic here to make sure no ByteBuffer / Flatbuffer object is retained. It's not obvious to me without digging in further to the implementations, which is why trying to use jpy.byte_buffer as it exists today is so dangerous. I think @nbauernfeind to review with this context in mind and go down through the implementations to double check.

chipkent
chipkent previously approved these changes Jan 10, 2024
Copy link
Member

@chipkent chipkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with me if all of @devinrsmith 's suggestions have been addressed. My comments are optional.

nbauernfeind
nbauernfeind previously approved these changes Jan 10, 2024
Copy link
Member

@nbauernfeind nbauernfeind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug in a bit. All of the flatbuffer usages that could reference the byte buffer (such as for strings) perform copies. (In particular the internal flatbuffer __string method copies during decoding). The extractChunkFromInputStream implementations similarly will copy via new String(buf, offset, length) invocations. Similarly any arrays extract a copy from the contiguous body payload rather than an effective slice.

AFAICT this change looks safe.

Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add warning comment inside of parseSchema and createBarrageMessage as well.

@devinrsmith devinrsmith self-requested a review January 10, 2024 21:52
devinrsmith
devinrsmith previously approved these changes Jan 10, 2024
@jmao-denver jmao-denver merged commit 4e783a1 into deephaven:main Jan 10, 2024
19 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jan 10, 2024
@jmao-denver jmao-denver deleted the 4935-arrow-use-bytebuffer branch March 4, 2024 15:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants