-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[help/feature] Streaming multi collection size serializer #2694
Comments
@Chuckame The way you would normally implement something like this in a format would be to use a specialised decoder for the collection. This decoder would then record the item counts for the blocks (and special case of the empty block). You can record these counts when the collection serializer requests this information from the format. |
Sorry @pdvrieze I don't really understand. Do you have an example to provide ? Currently I'm using the provided code in my original post, where you can see that I'm overriding |
@Chuckame Looking back at your original post, I guess that the main challenge you have is that:
The issues you encounter are:
As to the solution, what you want to do is write the collection implementation of decodeSerializableValue to pretend it is actually using a composite deserializer that flattens a collection of collections (you kind of have this already). You also need a way to detect the end of this list. So what you do is to have a new decoder (all the boring bits left out): internal class ListSizeDecoder(val delegate: Decoder): Decoder, CompositeDecoder {
// only include "interesting bits" in the example -> most is delegated to the `delegate`
var lastListSize = -1
var compositeDelegate: CompositeDecoder?
override fun beginStructure() {
compositeDelegate = delegate.beginStructure()
return compositeDelegate // in endStructure you want to set it to null
}
override fun decodeCollectionSize(descriptor: SerialDescriptor) {
lastListSize = compositeDelegate.decodeCollectionSize(descriptor)
return lastListSize
}
} Using this decoder as the first parameter when calling |
For primitives you may want to have a special case (use the |
After reading multiple times, I think I did not get how in your example it will read multiple blocks :/
I cannot as it is protected. Also, readAll is not calling By the way, after decompiling, I can see how is deserialized a list item as a very good entrypoint for this need: @NotNull
public Clients deserialize(@NotNull Decoder decoder) {
Intrinsics.checkNotNullParameter(decoder, "decoder");
SerialDescriptor var2 = this.getDescriptor();
boolean var3 = true;
boolean var4 = false;
int var5 = 0;
List var6 = null;
CompositeDecoder var7 = decoder.beginStructure(var2);
KSerializer[] var8 = Clients.$childSerializers;
if (var7.decodeSequentially()) {
var6 = (List)var7.decodeSerializableElement(var2, 0, (DeserializationStrategy)var8[0], var6);
var5 |= 1;
} else {
while(var3) {
int var9 = var7.decodeElementIndex(var2);
switch (var9) {
case -1:
var3 = false;
break;
case 0:
var6 = (List)var7.decodeSerializableElement(var2, 0, (DeserializationStrategy)var8[0], var6);
var5 |= 1;
break;
default:
throw new UnknownFieldException(var9);
}
}
}
var7.endStructure(var2);
return new Clients(var5, var6, (SerializationConstructorMarker)null);
}
|
Actually, for your example code, if you create the |
So I'll need 2 implementations:
Is it what you meant ? I'll try it |
Yes. You need to create a specific decoder for lists. Note also that this may work differently with beginStructure/endStructure as you may have markers for regular structs that differ from what is used for lists - I don't know the specifics of your datastructure. |
Sorry for the bad title, it's quite difficult to sum-up 😞
I need to implement the array serialization for avro, but it works differently than usual encodings.
A collection (arrays & maps) is serialized as blocks, where each block starts with the size of the collection (an int). When a a size is 0, then the collection is finished. Here a more visual explanation:
So just one block would be serialized like this:
Encoding is not an issue as we can make chunks quite easily.
But decoding is harder:
sequential decoding
, we can only read 1 block and ensure that this block is finished with a zero, but we are not able of decoding multiple blocks a sequential encoding only relies once on the decoded size.decodeElementIndex
, then we can manipulate the blocks easily, but this is really ineficient as the array or the map is initialized with the minimal size, and grows on each decoded element.I also tried to change the behavior inside
decodeSerializableValue
butT.collectionSize()
is not accessible as it isprotected
. All the possible implementations ofAbstractCollectionSerializer
are alsointernal
so I'm not able of getting the real type likeHashMap
orArrayList
to be able of getting the collection size.Here is the "wanted" code:
Currently, I check the type of
result
to get its size properly, but this become hard to maintain:Proposal / Ideas
AbstractCollectionSerializer.collectionSize()
public instead of protected (and maybe the other methods for API consistency)AbstractCollectionSerializer
to access theprotected
methods to enable delegationThe text was updated successfully, but these errors were encountered: