Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: New Avro entrypoint #186

Merged
merged 4 commits into from
Apr 22, 2024
Merged

feat: New Avro entrypoint #186

merged 4 commits into from
Apr 22, 2024

Conversation

Chuckame
Copy link
Contributor

@Chuckame Chuckame commented Feb 7, 2024

Here is the new proposed avro entrypoint 🥳

Goal

Provide an unified, easy and extensible way of encoding and decoding avro. Unified and easy as it should only have methods on the Avro class (no streams, no builders, ...). Extensible as avro can be used in multiple contexts, and some requires adding some metadata (data files) or schema info (confluent schema registry, single object).

Closes #163 #141

Requirements to merge:

  • Encode everything at root (not only records)
  • Encode natively to bytes

Breaking changes:

  • No more AvroInputStream and AvroOutputStream, thanks to the encodeSequenceXxx and decodeSequenceXxx replacements with AvroSerializationDelegate
  • No more encoding to and from Generic data, as the final goal of this library is to encode and decode in avro, not in Generic data, and also another step of removing the java avro library (no more public api in java, except for the Schemas)
  • encodeToByteArray and decodeFromByteArray will now be pure avro (no header, just the encoded data itself without decorators) while before it was in Data mode (avro data file format)

Tickets to open after that:

  • Create the necessary serializers to encode and decode Generic data (from bytes to generic, and from generic to bytes)

The methods

Each base method having a Schema and a SerializerStrategy or DeserializerStrategy has its related extension method to automagically get the SerializerStrategy or DeserializerStrategy and the schema from the reified type.

We now fully support java streams and ByteArray.

We are now able of streaming the avro contents with the new encodeSequence and decodeSequence methods, useful for reading avro files by example!

image image image

Example of usage

  • Just encode a string payload
val bytes = Avro.encodeToByteArray("a value to be encoded")
val value = Avro.decodeFromByteArray<String>(bytes)
  • Just encode a record payload
val bytes = Avro.encodeToByteArray(ARecord(stringField = "something", intField = 42))
val value = Avro.decodeFromByteArray<ARecord>(bytes)
  • Encode to the avro data file format
val avro = Avro {
    serializationDelegate = Builtins.DataFile
}
File("file.avro").outputStream().use { outputStream ->
    AvroObjectContainerFile().encodeToStream(sequenceOf("a", "b", "c"), outputStream)
}
File("file.avro").inputStream().use { inputStream ->
    val sequence = avro.decodeFromStream<String>(inputStream)
}
  • Just generate a schema from a type:
val schema = Avro.schema<MySerializedType>()

@Chuckame Chuckame linked an issue Feb 7, 2024 that may be closed by this pull request
@jangalinski
Copy link

Seems like you are aiming for static access, while in the current version, there is a default instance (Avro.default) to use, but also the option to create a custom instance. This is quite important when working with custom logical types .. am I missing st. or would we loose the option of configuring a custom instance?

@Chuckame
Copy link
Contributor Author

Seems like you are aiming for static access, while in the current version, there is a default instance (Avro.default) to use, but also the option to create a custom instance. This is quite important when working with custom logical types .. am I missing st. or would we loose the option of configuring a custom instance?

To encode in single data object mode, then you are forced to not use the default avro instance but you need to create your own instance. Avro {...} is the new constructor, exactly the same way as Json from official kotlinx format.

So to sum up, technically it's exactly the same as before (you can use the default instance Avro, that was previously Avro.default, or you can create and customize your own instance with Avro{...}, that is the same as Avro({...}), were before it was Avro(AvroConfiguration(...))), while now we have a more kotlin-esque api.

@Chuckame Chuckame force-pushed the avro-api branch 2 times, most recently from 95ba93d to 030c6d6 Compare April 11, 2024 12:27
@Chuckame Chuckame force-pushed the avro-api branch 4 times, most recently from dbc3be9 to 7485020 Compare April 22, 2024 23:16
@Chuckame Chuckame marked this pull request as ready for review April 22, 2024 23:22
@Chuckame Chuckame merged commit 85dc5c5 into avro-kotlin:main-v2 Apr 22, 2024
1 check passed
@Chuckame Chuckame deleted the avro-api branch April 22, 2024 23:57
@Chuckame Chuckame linked an issue Jul 15, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Beautify the Avro api entrypoint Support avro single object encoding
2 participants