-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41728:[Java][Vector] Add ExtensionWriter and fix StructVector construction with ExtensionTypes #41731
base: main
Are you sure you want to change the base?
Conversation
|
import org.apache.arrow.vector.ExtensionTypeVector; | ||
import org.apache.arrow.vector.types.pojo.Field; | ||
|
||
public class UnionExtensionWriter extends AbstractFieldWriter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is currently more skeleton like as the concrete details of the ExtensionWriter
interface will need to be nailed down.
934466e
to
b4bda84
Compare
@@ -61,20 +61,25 @@ public interface StructWriter extends BaseWriter { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where some design decisions have to be made of how to actually expose the ExtensionWriter
to the user and how the user should actually extend some ExtensionWriter
.
As mentioned above I would make this one interface BaswWriter
exposing all the possible writers and leaving them abstract in the concrete implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds reasonable to me.
@@ -342,6 +347,34 @@ public void set(int index, UUID uuid) { | |||
bb.putLong(uuid.getLeastSignificantBits()); | |||
getUnderlyingVector().set(index, bb.array()); | |||
} | |||
|
|||
@Override | |||
public TransferPair makeTransferPair(ValueVector to) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TransferPair
is likely a very common pattern for extension types, so one could create a concrete ExtensionWriter
which actually implements this and something like UuidWriter
could inherit from.
We do something similar for our writers: https://github.com/xtdb/xtdb/blob/e9a15ee12244367c1d3d1a75c00e4f5c0e10008f/core/src/main/kotlin/xtdb/vector/extensions/XtExtensionVector.kt#L30-L40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be fine, let's get this one working and further evaluate.
b4bda84
to
f4f0e4d
Compare
@FiV0 I will take a look at this PR later today or earlier tomorrow. |
f4f0e4d
to
7cb7fd5
Compare
@@ -85,21 +88,21 @@ public void roundtripUuid() throws IOException { | |||
final ArrowFileReader reader = new ArrowFileReader(channel, allocator)) { | |||
reader.loadNextBatch(); | |||
final VectorSchemaRoot readerRoot = reader.getVectorSchemaRoot(); | |||
Assert.assertEquals(root.getSchema(), readerRoot.getSchema()); | |||
assertEquals(root.getSchema(), readerRoot.getSchema()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: since we are updating test cases, shall we use JUnit5? juputer.api.
?
@@ -17,12 +17,17 @@ | |||
|
|||
package org.apache.arrow.vector; | |||
|
|||
import static org.junit.Assert.*; | |||
import static org.junit.Assert.assertEquals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use JUnit5?
final StructVector toVector = (StructVector) tp.getTo(); | ||
assertEquals(s1.getField(), toVector.getField()); | ||
assertEquals(s1.getField().getChildren().get(0), toVector.getField().getChildren().get(0)); | ||
// also fails but probably another issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also debugged the code. It seems the copy didn't work properly. I guess you're still not finished with this work, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall idea looks fine to me, but need to run another pass.
And the added test case seems to need some work as we don't get the expected results from the transferPair AFAIU.
I think the main part of moving this forward is the design work on the writer or more specifically
I think all of this should have a clear answer before moving forward with this proposal. Maybe have a look at what other language implementations do in this case. |
They don't really have answers for this. Does it work to just ask users to build the underlying type directly for now? |
@lidavidm Ok, I will then create a Test for |
Thanks @FiV0! |
|
Rationale for this change
For me personally I mainly want to fix apache/arrow-java#87, but this change also includes some larger changes to the writer interface. This will likely include some design work and some back and forth as I can't make these decisions.
What changes are included in this PR?
A commit that extends the writer interfaces with an
ExtensionWriter
as well as dealing with theMinorType.EXTENSIONTYPE
in various places. TheExtensionWriter
was left rather skeleton like as this needs some design decisions to be made before implementing the concrete details. I will leave some comments throughout the code of where I think stuff needs to be decided.This next bit are my 2cents (hence taken with a grain of salt) of how I would do things as I have the impression that one will eventually hit a wall with the current writer interfaces. For example the
PromotableWriter
currently makes the assumption that the parent vector is a struct or a list vector, making it difficult to extend this to composite extension types.I would just create one single writer interface (
VectorWriter
or whatever you want to call) implemented by all concrete writers. If a writer doesn't support an operation just keep it abstract or throw an UOE.Are these changes tested?
There are two tests that test the issue of apache/arrow-java#87. I have not tested (because of the lack of a concrete implementation) the
ExtensionWriter
interface.Are there any user-facing changes?
Yes.
ExtensionWriter
will be exposed for the first time. This is only an additive change and shouldn't break any existing API.