v2.0.0
Introduction of v2
Back in the days, Avro4k has been created in 2019. During 5 years, a lot of work has been done greatly around avro generic records and generating schemas.
Recently, kotlinx-serialization and kotlin did big releases, improving a lot of stuff (features, performances, better APIs). The json API of kotlinx-serialization propose a great API, so we tried to replicate its simplicity.
A big focus has been done to make Avro4k more lenient to simplify devs' life and improve adoption.
I hope this major release will make Avro easier to use, even more in pure kotlin 🚀
As a side note, we may implement our own plugins to generate data classes and schemas, stay tuned !
Highlights and Breaking changes
Performances & benchmark
Long story
Well... Trying to make a similar benchmark is complicated, as the v2 adds a lot of features and fixes compared to v1.The following benchmark is not fully representative as it is not comparing all the features.
We will compare an easy use case: encoding and decoding a simple data class with all the primitive types, a String and a list of strings:
@Serializable
data class SimpleDataClass(
val bool: Boolean,
val byte: Byte,
val short: Short,
val int: Int,
val long: Long,
val float: Float,
val double: Double,
val string: String,
val bytes: ByteArray,
)
The benchmark has been executed on a Macbook air M2 in a mono-threaded environment.
Avro4k v2 (binary) is MUCH faster than v1 (generic records), and also now more performant than jackson and the standard apache avro (using reflection). Not tested for the moment with SpecificRecord.
Encoding Performance
Version | Encoding (ops/s) | Relative Difference (%) |
---|---|---|
Avro4k v1 (generic records) | 109 327 | 0% |
Jackson | 134 774 | +23% |
Avro4k v2 (generic records) | 190 365 | +74% |
Apache avro ReflectData (direct binary) | 332 438 | +204% |
Avro4k v2 (direct binary) | 459 751 | +321% 🚀 |
Decoding Performance
Version | Decoding (ops/s) | Relative Difference (%) |
---|---|---|
Avro4k v1 (generic records) | 67 825 | 0% |
Jackson | 71 146 | +5% |
Avro4k v2 (generic records) | 114 511 | +69% |
Apache avro ReflectData (direct binary) | 151 287 | +123% |
Avro4k v2 (direct binary) | 174 063 | +157% 🚀 |
Migration guide
As there is a lot of changed APIs, classes, packages, and more, here is the migration guide. Don't hesitate to file an issue if something is missing!
Needs Kotlin 2.0.0 and kotlinx.serialization 1.7.0
You need at least Kotlin 2.0.0 and kotlinx.serialization 1.7.0 to use Avro4k v2.0.0+ (version matrix is indicated in the README) as there is breaking changes in kotlinx-serialization plugin and library (released in tandem with kotlin version).
More information here: kotlinx-serialization v1.7.0
ExperimentalSerializationApi
Since the API deeply changed, all the new functions, properties, classes, annotations that are annotated with ExperimentalSerializationApi
will show you a warn as they could change at any moment. Those annotated members will be un-annotated after a few releases if they proved their stability 🪨
You can experience a lot of ExperimentalSerializationApi
warnings, as everything has been reworked. The common APIs may be stable more quickly, so they could be un-annotated in the next minor release. For the more complex or less used APIs, they could be un-annotated later.
To suppress this warning, you may opt-in the experimental serialization API. It is advised to not opt-in globally in the compiler arguments to avoid surprises when using experimental stuff 😅
Warning
Any API removal with ExperimentalSerializationApi
won't be considered as a breaking change regarding the semver standard, so given a version A.B.C
, only the minor B
number will be incremented, not the major A
.
Direct binary serialization
Before, serializing avro using Avro4k was done through a generic step, that converted first the data classes to generic maps, and then pass this generic data to the apache avro library.
Now, encoding to and decoding from binary is done directly, that improved a lot the performances (see Performances & benchmark section).
Note
We are still supporting the generic data serialization as long as there is a solution for kafka schema registry serialization (future avro4k module to be created), but it may be removed in the future to simplify the avro4k library as it is not really a serialization but more a conversion.
Support anything to encode and decode at root level
Before, we were only able to encode and decode GenericRecord
. No primitive, no arrays, no value class, just generic records.
Now, no need to wrap your value in a record, you can serialize nearly everything and generate the corresponding schema!
This includes any data class, enum, sealed interface or class, value class, primitive values or contextual serializers 🚀
Totally new API
The previous API needed to well understand how to use it, especially when playing with InputStream and OutputStream.
There is now different entrypoints for different purposes:
Avro
: the main entrypoint to generate schemas, encode and decode in the avro format. This is the pure raw avro format without anything else around it.AvroObjectContainer
: the entrypoint to encode avro data files, following the official spec, and usingAvro
for each value serialization.AvroSingleObject
: the entrypoint for encoding a single object prefixed with the schema fingerprint, following the official spec, and also usingAvro
for value serialization.
Warning
Avro.encodeToByteArray
is now encoding in pure binary avro. If you still need to encode in the object container format as the v1 (in the DATA format), you have to use AvroObjectContainer
Implicit nulls by default
Previously, when a nullable field was missing from the writer schema while decoding, then a failure happened.
Now, it decodes null
and is not failing for all the nullable fields. To opt-out this feature, configure your Avro
instance with implicitNulls = false
.
It has been enabled by default to simplify the use of Avro4k and make it more lenient for a better adoption.
Implicit empty maps, collections and arrays by default
Previously, when a map or collection-like field was missing from the writer schema while decoding, then a failure happened.
Now, it decodes an empty collection and is not failing (an empty map, list, array or set depending on the field type). To opt-out this feature, configure your Avro
instance with implicitEmptyCollections = false
.
It has been enabled by default to simplify the use of Avro4k and make it more lenient for a better adoption.
Lenient
The apache avro library is strict regarding the types and strongly follow the avro spec. As an example, a float in kotlin can be written as a float, while being decoded as a float and a double.
Avro4k is pushing the lenience where a float can be written and read as a float, a double, a string, an int and a long in avro.
A type matrix has been written inside README
.
No more reflection
Thanks to this little change,
Absolutely no more reflection, so that allows you to use android or GraalVM AOT native compilation (not tested, but should work, let us know!).
Unified & cleaned annotations
AvroJsonProp
has been merged toAvroProp
: the json content is automatically detected, so any non-json content is handled as a stringAvroAliases
has been merged toAvroAlias
: there is now avarags
to pass as many aliases as you want using the same annotationAvroInline
has been removed in favor of kotlin nativevalue class
AvroEnumDefault
is now to be applied directly on the default enum memberScalePrecision
has been renamed toAvroDecimal
to keep and unify to a common prefix. Also, thedecimal
'sscale
andprecision
do not have defaults anymoreAvroNamespace
andAvroName
has been replaced by the native kotlinx-serializationSerialName
annotationAvroStringable
has been added to easily for a field type to be inferred as a string (this is working for all the primitive types and the built-in logical types)AvroFixed
is now only applying on compatible types (ByteArray, String, decimal logical type), annotating other types will just do nothing
Only ByteArray is now handled as BYTES
Previously, all the collections-like of bytes were handled as BYTES.
Now, only ByteArray is handled as BYTES, and the other collections-like of bytes are handled as arrays of INT. If you still want to encode a BYTES
type, just use the ByteArray
type or write your own AvroSerializer
to control the schema and its serialization.
Improved custom serializers API
The custom serializer AvroSerializer
API has been improved to enforce the custom encodings to provide their own schema (where it was before optional and covering only a sub-part of the use cases).
It also provides two additional methods serializeGeneric
and deserializeGeneric
to allow the custom serializer being used by other non-avro formats, that way we can now use the same classes and serializers for both avro and json formats 🚀
To finish, AvroEncoder
now provides encodeResolving
and AvroDecoder
provides decodeResolving
to delegate the possible union
resolution and focus the custom serialization to the main types.
It has been included publicly as it is heavily used internally, and it provides a clean and performant way to handle unions thanks to inlined functions. Note that it's still experimental and could change in the future.
So for any custom serialization, schema, or logical type, you must implement your own AvroSerializer
.
Caching
All schemas are cached using WeakIdentityHashMap
to allow the GC to remove the cache entries in case of low available memory.
Also, some other internal expensive parts are cached for quicker encoding and decoding.
Normally we could use a
WeakHashMap
but we cannot rely on the equals/hashCode as different classes could have the same serial descriptor.
It should not happen, but let's be safe first and then iterate on it if needed 🛡
New logical type: duration
Following the new avro specs, the logical type duration
has been added to the built-in logical types. It have been implemented for the following types:
kotlin.time.Duration
(do not annotate it with@Serializable
or@Contextual
as it is a native kotlinx-serialization type)- A new avro4k class
AvroDuration
(same) java.time.Duration
(this time it needs to be annotated with@Serializable(with = JavaDurationSerializer::class)
or@Contextual
)java.time.Period
(this time it needs to be annotated with@Serializable(with = JavaPeriodSerializer::class)
or@Contextual
)
Better documentation
Last but not least, all the documentation has been reworked from scratch to fit all that new stuff! 📚
What's Changed
- fix: Assume kotlin.Pair as a normal data class instead of an union by @Chuckame in #174
- feat!: No more reflection and customizable logical types by @Chuckame in #175
- feat: Add support for decoding with avro aliases by @Chuckame in #177
- Generalize encoding/decoding tests (#168) by @Chuckame in #179
- chore: Add spotless with ktlint + editorconfig by @Chuckame in #180
- feat: Support kotlin's value classes by @Chuckame in #183
- feat: Revamp naming strategy and related annotations by @Chuckame in #182
- feat: Merge ScalePrecision to AvroDecimalLogicalType by @Chuckame in #191
- chore: Upgrade github actions and use standard gradle actions by @Chuckame in #192
- feat: revamp the schema generation by @Chuckame in #190
- feat: New Avro entrypoint by @Chuckame in #186
- feat: Support everything at root level by @Chuckame in #202
- feat!: Set @AvroEnumDefault directly to the enum value instead of the class by @Chuckame in #203
- feat!: Merge AvroJsonProp to AvroProp by @Chuckame in #204
- build: Explicit API mode to prevent exposing internal stuff by @Chuckame in #205
- Union perf improvement by @Chuckame in #208
- deps: Upgrade kotlinx-serialization and kotlin by @Chuckame in #209
- docs: Improve documentation by @Chuckame in #210
- feat!: No more kotlin-reflect for logical types by @Chuckame in #214
- Direct encoding by @Chuckame in #215
- feat: Allow generating a release on a non-main branch by @Chuckame in #217
- feat: Allow adding props to a given type using value classes by @Chuckame in #219
- deps: Use non-RC version of kotlinx-serialization by @Chuckame in #221
- deps: Upgrade plugins, trying to fix publication failing by @Chuckame in #222
- fix: Removed AvroNamespaceOverride as it was not fully implemented by @Chuckame in #224
- Improve benchmark by @Chuckame in #225
- Add dependabot by @Chuckame in #227
- docs: Fix avro version in docs by @Chuckame in #226
- handle nullable bytearrays and add null values in benchmark by @Chuckame in #228
- feat: Add duration logical type by @Chuckame in #233
- fix: Only handle ByteArrays as bytes or fixed, and collection of Byte as arrays of int by @Chuckame in #234
- fix: No more automatic padding for fixed type by @Chuckame in #235
- fix: Update docs by @Chuckame in #237
- feat: Add @AvroStringable by @Chuckame in #236
- feat: Remove AvroDecimal defaults by @Chuckame in #238
- feat: Implicit empty collections by @Chuckame in #239
- Release v2 by @Chuckame in #240
Full Changelog: v1.10.1...v2.0.0