Skip to content

v2.0.0-RC2

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 29 May 19:54
2040d25

Introduction of v2

Back in the days, Avro4k has been created in 2019. During 5 years, a lot of work has been done greatly around avro generic records and generating schemas.

Recently, kotlinx-serialization and kotlin did big releases, improving a lot of stuff (features, performances, better APIs). The json API of kotlinx-serialization propose a great API, so we tried to replicate its simplicity.

A big focus has been done to make Avro4k more lenient to simplify devs' life and improve adoption.

I hope this major release will make Avro easier to use, even more in pure kotlin 🚀

As a side note, we may implement our own plugins to generate data classes and schemas, stay tuned !

Highlights and Breaking changes

Party hard

Needs Kotlin 2.0.0 and kotlinx.serialization 1.7.0-RC

You need at least Kotlin 2.0.0 and kotlinx.serialization 1.7.0-RC to use Avro4k v2 (version matrix is indicated in the README) as there is breaking changes in kotlinx-serialization plugin and library (released in tandem with kotlin version).

More information here: kotlinx-serialization v1.7.0-RC

ExperimentalSerializationApi

Since the API deeply changed, all the new functions, properties, classes, annotations that are annotated with ExperimentalSerializationApi will show you a warn as they could change at any moment. Those annotated members will be un-annotated after a few releases if they proved their stability 🪨

To suppress this warning, you may opt-in the experimental serialization API. It is advised to not opt-in globally in the compiler arguments to avoid surprises when using experimental stuff 😅

Direct binary serialization

Before, serializing avro using Avro4k was done through a generic step, that converted first the data classes to generic maps, and then pass this generic data to the apache avro library.

Now, encoding to and decoding from binary is done directly, that improved a lot the performances (see Performances & benchmark section).

Note

We are still supporting the generic data serialization as long as there is a solution for kafka schema registry serialization (future avro4k module to be created), but it will be removed in the future to simplify the avro4k library as it is not really a serialization but more a conversion.

Support anything to encode and decode at root level

Now, no need to wrap your value in a record, you can serialize nearly everything and generate the corresponding schema!

This includes any data class, enum, sealed interface, value class, primitive or contextual values 🚀

Totally new API

The previous API needed to well understand how to use it, especially when playing with InputStream and OutputStream.

There is now different entrypoints for different purposes:

  • Avro: the main entrypoint to generate schemas, encode and decode avro format. This is the pure raw avro format without anything else
  • AvroObjectContainerFile: the entrypoint to encode avro data files, following the official spec, and using Avro for each value serialization.
  • AvroSingleObject: the entrypoint for encoding a single object prefixed with the schema fingerprint, following the official spec, and also using Avro for value serialization.

Here are some examples of the changes:

Pure avro serialization (no specific format, no prefix, no magic byte, just pure avro binary)
// Previously
val bytes = Avro.default.encodeToByteArray(TheDataClass.serializer(), TheDataClass(...))
Avro.default.decodeFromByteArray(TheDataClass.serializer(), bytes)

// Now
val bytes = Avro.encodeToByteArray(TheDataClass(...))
Avro.decodeFromByteArray<TheDataClass>(bytes)
generic data serialization (convert a kotlin data class to a GenericRecord to then be handled by a `GenericDatumWriter` in avro)
// Previously
val genericRecord: GenericRecord = Avro.default.toRecord(TheDataClass.serializer(), TheDataClass(...))
Avro.default.fromRecord(TheDataClass.serializer(), genericRecord)

// Now
val genericData: Any? = Avro.encodeToGenericData(TheDataClass(...))
Avro.decodeFromGenericData<TheDataClass>(genericData)
Configure the `Avro` instance
// Previously
val avro = Avro(
    AvroConfiguration(
        namingStrategy = FieldNamingStrategy.SnackCase,
        implicitNulls = true,
    ),
    SerializersModule {
         contextual(CustomSerializer())
    }
)

// Now
val avro = Avro {
    namingStrategy = FieldNamingStrategy.SnackCase
    implicitNulls = true
    serializersModule = SerializersModule {
         contextual(CustomSerializer())
    }
}
Changing the name of a record
// Previously
@AvroName("TheName")
@AvroNamespace("a.custom.namespace")
data class TheDataClass(...)

// Now
@SerialName("a.custom.namespace.TheName")
data class TheDataClass(...)
Writing an avro object container file with a custom field naming strategy
// Previously
Files.newOutputStream(Path("/your/file.avro")).use { outputStream ->
    Avro(AvroConfiguration(namingStrategy = SnakeCaseNamingStrategy))
        .openOutputStream(TheDataClass.serializer()) { encodeFormat = AvroEncodeFormat.Data(CodecFactory.snappyCodec()) }
        .to(outputStream)
        .write(TheDataClass(...))
        .write(TheDataClass(...))
        .write(TheDataClass(...))
        .close()
}


// Now
val dataSequence = sequenceOf(
    TheDataClass(...),
    TheDataClass(...),
    TheDataClass(...),
)
val avro = Avro { fieldNamingStrategy = FieldNamingStrategy.SnakeCase }
Files.newOutputStream(Path("/your/file.avro")).use { outputStream ->
    AvroObjectContainerFile(avro)
        .encodeToStream(dataSequence, outputStream) {
            codec(CodecFactory.snappyCodec())
            // you can also add your metadata !
            metadata("myProp", 1234L)
            metadata("a string metadata", "hello")
        }
}

Warning

Migration guide: WIP

Implicit nulls by default

Previously, when nothing were decoded for a nullable field was failing.

Now, it decodes null and is not failing. To opt-out this feature, configure your Avro instance with implicitNulls = false.

It has been enabled by default to simplify the use of Avro4k and make it

Lenient

The apache avro library is strict regarding the types and strongly follow the avro spec. An example is that a float in kotlin can be written and read as a float and a double in avro.

Avro4k is pushing the lenience where a float can be written and read as a float, a double, a string, an int and a long in avro.

A type matrix has been written inside README.

No more reflection

Thanks to this little change,

Absolutely no more reflection, so that allows using android or GraalVM AOT native compilation (need kotlinx-serialization 1.7.0).

Unified & cleaned annotations

Some numbers: 4 annotations has been removed over 12!

  • AvroJsonProp has been merged toAvroProp: the json content is automatically detected, so any non-json content is handled as a string
  • AvroAliases has been merged toAvroAlias: there is now a varags to pass as many aliases as you want using the same annotation
  • AvroInline has been removed in favor of kotlin native value class
  • AvroEnumDefault is now to be applied directly on the default enum member
  • ScalePrecision has been renamed to AvroDecimal to keep a common prefix
  • AvroNamespace and AvroName has been replaced by the native kotlinx-serialization SerialName annotation
  • AvroNamespaceOverride has been created to allow replacing the namespace of a field schema (⚠️ this annotation is not stable and can disappear at any moment)

Caching

All schemas are cached using WeakIdentityHashMap to allow the GC to remove the cache entries in case of low available memory.

Also some other internal expensive parts are cached for quicker encoding and decoding.

Performances & benchmark

Warning

WIP

What's Changed

  • fix: Assume kotlin.Pair as a normal data class instead of an union by @Chuckame in #174
  • feat!: No more reflection and customizable logical types by @Chuckame in #175
  • feat: Add support for decoding with avro aliases by @Chuckame in #177
  • Generalize encoding/decoding tests (#168) by @Chuckame in #179
  • chore: Add spotless with ktlint + editorconfig by @Chuckame in #180
  • feat: Support kotlin's value classes by @Chuckame in #183
  • feat: Revamp naming strategy and related annotations by @Chuckame in #182
  • feat: Merge ScalePrecision to AvroDecimalLogicalType by @Chuckame in #191
  • chore: Upgrade github actions and use standard gradle actions by @Chuckame in #192
  • feat: revamp the schema generation by @Chuckame in #190
  • feat: New Avro entrypoint by @Chuckame in #186
  • feat: Support everything at root level by @Chuckame in #202
  • feat!: Set @AvroEnumDefault directly to the enum value instead of the class by @Chuckame in #203
  • feat!: Merge AvroJsonProp to AvroProp by @Chuckame in #204
  • build: Explicit API mode to prevent exposing internal stuff by @Chuckame in #205
  • Union perf improvement by @Chuckame in #208
  • deps: Upgrade kotlinx-serialization and kotlin by @Chuckame in #209
  • docs: Improve documentation by @Chuckame in #210
  • feat!: No more kotlin-reflect for logical types by @Chuckame in #214
  • Direct encoding by @Chuckame in #215
  • feat: Allow generating a release on a non-main branch by @Chuckame in #217

Full Changelog: v1.10.1...v2.0.0-RC2