Apache Spark and Confluent Avro/Schema Registry Integration POC

A repo to hold the work done for the spark confluent avro/schema registry integration POC.

Confluent Avro Background

Avro is an open source data serialization system for data exchange between systems.
Avro helps you define a compact binary format for your data.

Confluent Schema Registry

Confluent Schema Registry provides a serving layer for your metadata (schemas). It provides a RESTful interface for storing and retrieving your Avro, JSON Schema, and Protobuf schemas.

Avro Wire Format

Bytes	Area	Description
0	Magic Byte	Confluent serialization format version number; currently always 0.
1-4	Schema ID	4-byte schema ID as returned by the Schema Registry
5-…	Data	Avro serialized data in Avro’s binary encoding. The only exception is raw bytes, which will be written directly without any special Avro encoding.

Advantages of Confluent Avro

It has a direct mapping to and from JSON.
Very compact format compared bulky JSON serialization where you're repeating every field name with every single record.
Serde is very fast.
Polyglot bindings to many programming languages (although mainly used with Java)
Rich extensible schema language defined in pure JSON.
Schema evolution (The ability to update the schema used to write new data, while maintaining backwards compatibility with the schema of your old data so they can be treated as one dataset)

Spark Confluent Avro Integration

Kafka Message (Confluent Avro) -> Spark (Dataframe)

Spark (Dataframe) -> Kafka Message (Confluent Avro)

This integration is provided on data bricks platform but you have to pay for the databricks platform to use it (paid service) and cannot make changes to suit use cases.

ABRiS - Avro Bridge for Spark

An open sourced library that integrates Spark with Confluent Avro and vice versa. Seems to be the only lib providing this. We use this it in this poc for the integration.

https://github.com/AbsaOSS/ABRiS

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src/main/scala/poc		src/main/scala/poc
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Spark and Confluent Avro/Schema Registry Integration POC

Confluent Avro Background

Confluent Schema Registry

Avro Wire Format

Advantages of Confluent Avro

Spark Confluent Avro Integration

ABRiS - Avro Bridge for Spark

About

Releases

Packages

Languages

License

jhsb25/spark-confluent-avro-poc

Folders and files

Latest commit

History

Repository files navigation

Apache Spark and Confluent Avro/Schema Registry Integration POC

Confluent Avro Background

Confluent Schema Registry

Avro Wire Format

Advantages of Confluent Avro

Spark Confluent Avro Integration

ABRiS - Avro Bridge for Spark

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages