Kafi1 is a Python library for anybody working with Kafka (or any solution based on the Kafka API). It is your Swiss army knife for Kafka. It has already been presented at Current 2023 and Current 2024 (you can find the Jupyter notebook here).
Kafi supports two main modes:
- Real Kafka
- Kafka API via confluent_kafka
- Kafka REST Proxy API
- Emulated Kafka/files
- local file system
- S3
- Azure Blob Storage
Emulated Kafka is e.g. useful for debugging, as there is need to run an additional Kafka cluster. It can also be used to download snapshots of Kafka topics or to do backups.
Kafi also fully supports the Schema Registry API, including full support for Avro, Protobuf and JSONSchema.
Kafi is fun to use either in the interactive Python interpreter (acting a bit like a shell), or inside your Python (micro-)service code, and - it's the ideal tool for Kafka in your Jupyter notebooks :-)
This "README" is split into a basic part:
...and a more detailed part:
- Full Configuration
- More on Producing Messages
- More on Consuming Messages
- Architecture
- Kafka Emulation
- All Methods
Kafi is on PyPI. Hence:
pip install kafi
Kafi is configured using YAML files. As an example, here is a YAML file for a local Kafka installation, including Schema Registry:
kafka:
bootstrap.servers: localhost:9092
schema_registry:
schema.registry.url: http://localhost:8081
And this is a YAML file for a local emulated Kafka in the /tmp-directory:
local:
root.dir: /tmp
Kafi is looking for these YAML files in:
- the local directory (
.
) or the directory set inKAFI_HOME
(if set) - the
configs/<storage type>/<storage config>
sub-directory of 1 (.
orKAFI_HOME
). Here,storage_type
is eitherazblobs
,clusters
,locals
,restproxies
ors3s
andstorage_config
is your configuration file (in Kafi, a connection to one of its back-ends is called storage)
Within Kafi, you can refer to these files by their name without the .yml
or .yaml
suffix, e.g. local
for local.yaml
.
You can also use environment variables in the YAML files, e.g.:
kafka:
bootstrap.servers: ${KAFI_KAFKA_SERVER}
security.protocol: SASL_SSL
sasl.mechanisms: PLAIN
sasl.username: ${KAFI_KAFKA_USERNAME}
sasl.password: ${KAFI_KAFKA_PASSWORD}
schema_registry:
schema.registry.url: ${KAFI_SCHEMA_REGISTRY_URL}
basic.auth.credentials.source: USER_INFO
basic.auth.user.info: ${KAFI_SCHEMA_REGISTRY_USER_INFO}
We provide example YAML files in this GitHub repository under configs
:
- Real Kafka
- Kafka API:
- Local Kafka installation:
clusters/local.yaml
- Confluent Cloud:
clusters/ccloud.yaml
- Redpanda:
clusters/redpanda.yaml
- Local Kafka installation:
- Kafka REST Proxy API:
- Local Kafka/REST Proxy installation:
restproxies/local.yaml
- Local Kafka/REST Proxy installation:
- Kafka API:
- Emulated Kafka/files
- local file system:
locals/local.yaml
- S3:
s3s/local.yaml
- Azure Blob Storage:
azureblobs/local.yaml
- local file system:
More details on configuring Kafi can be found here.
What can Kafi be for you?
I initially started development on Kafi because I was not a big fan of the existing Kafka CLI tools. Hence, one way Kafi can help you is to act as an alternative to these tools, e.g. those from the Apache Kafka distribution. Just have a look.
To get started, just enter your Python interpreter, import Kafi and create a Cluster
object (e.g. pointing to your local Kafka cluster):
from kafi.kafi import *
c = Cluster("local")
Now you can create topics with a shell-inspired command:
c.touch("topic_json")
instead of:
kafka-topics --bootstrap-server localhost:9092 --topic topic_json --create
You can list topics:
c.ls()
instead of:
kafka-topics --bootstrap-server localhost:9092 --list
Produce messages (pure JSON without schema):
p = c.producer("topic_json")
p.produce({"bla": 123}, key="123")
p.produce({"bla": 456}, key="456")
p.produce({"bla": 789}, key="789")
p.close()
instead of:
kafka-console-producer \
--bootstrap-server localhost:9092 \
--topic topic_json \
--property parse.key=true \
--property key.separator=':'
123:{"bla": 123}
456:{"bla": 456}
789:{"bla": 789}
And consume them:
c.cat("topic_json")
[{'topic': 'topic_json', 'headers': None, 'partition': 0, 'offset': 0, 'timestamp': (1, 1732660705555), 'key': '123', 'value': {'bla': 123}}, {'topic': 'snacks_json', 'headers': None, 'partition': 0, 'offset': 1, 'timestamp': (1, 1732660710565), 'key': '456', 'value': {'bla': 456}}, {'topic': 'snacks_json', 'headers': None, 'partition': 0, 'offset': 2, 'timestamp': (1, 1732660714166), 'key': '789', 'value': {'bla': 789}}]
instead of:
kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic topic_json \
--from-beginning
{"bla": 123}
{"bla": 456}
{"bla": 789}
^CProcessed a total of 3 messages
Producing messages with a schema is as effortless as possible with Kafi. Here is a simple example using an Avro schema:
t = "topic_avro"
s = """
{
"type": "record",
"name": "myrecord",
"fields": [
{
"name": "bla",
"type": "int"
}
]
}
"""
p = c.producer(t, value_type="avro", value_schema=s)
p.produce({"bla": 123}, key="123")
p.produce({"bla": 456}, key="456")
p.produce({"bla": 789}, key="789")
p.close()
instead of:
kafka-avro-console-producer \
--broker-list localhost:9092 \
--topic topic_avro \
--property schema.registry.url=http://localhost:8081 \
--property key.serializer=org.apache.kafka.common.serialization.StringSerializer \
--property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"bla","type":"int"}]}' \
--property parse.key=true \
--property key.separator=':'
123:{"bla": 123}
456:{"bla": 456}
789:{"bla": 789}
t = "topic_protobuf"
s = """
message value {
required int32 bla = 1;
}
"""
p = c.producer(t, value_type="protobuf", value_schema=s)
p.produce({"bla": 123}, key="123")
p.produce({"bla": 456}, key="456")
p.produce({"bla": 789}, key="789")
p.close()
instead of:
kafka-protobuf-console-producer \
--broker-list localhost:9092 \
--topic topic_protobuf \
--property schema.registry.url=http://localhost:8081 \
--property key.serializer=org.apache.kafka.common.serialization.StringSerializer \
--property value.schema='message value { required int32 bla = 1; }' \
--property parse.key=true \
--property key.separator=':'
123:{"bla": 123}
456:{"bla": 456}
789:{"bla": 789}
t = "topic_jsonschema"
s = """
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"title": "myrecord",
"properties": {
"bla": {
"type": "integer"
}
},
"required": ["bla"],
"additionalProperties": false
}
"""
p = c.producer(t, value_type="jsonschema", value_schema=s)
p.produce({"bla": 123}, key="123")
p.produce({"bla": 456}, key="456")
p.produce({"bla": 789}, key="789")
p.close()
instead of:
kafka-json-schema-console-producer \
--broker-list localhost:9092 \
--topic topic_protobuf \
--property schema.registry.url=http://localhost:8081 \
--property key.serializer=org.apache.kafka.common.serialization.StringSerializer \
--property value.schema='{ "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "title": "myrecord", "properties": { "bla": { "type": "integer" } }, "required": ["bla"], "additionalProperties": false }' \
--property parse.key=true \
--property key.separator=':'
123:{"bla": 123}
456:{"bla": 456}
789:{"bla": 789}
c.grep("topic_avro", ".*456.*", value_type="avro")
([{'topic': 'topic_avro', 'headers': None, 'partition': 0, 'offset': 1, 'timestamp': (1, 1732666578986), 'key': '456', 'value': {'bla': 456}}], 1, 3)
instead of:
kafka-avro-console-consumer \
--bootstrap-server localhost:9092 \
--property schema.registry.url=http://localhost:8081 \
--topic topic_avro \
--from-beginning \
| grep 456
{"bla":456}
^CProcessed a total of 3 message
The supported types are:
bytes
: Pure bytesstr
: String (Default for keys)json
: Pure JSON (Default for values)avro
: Avro (requires Schema Registry)protobuf
orpb
: Protobuf (requires Schema Registry)jsonschema
orjson_sr
: JSONSchema (requires Schema Registry)
You can specify the serialization/deserialization types as follows:
key_type
/key_schema
/key_schema_id
: Type/schema/schema ID for the keyvalue_type
/value_schema
/value_schema_id
: Type/schema/schema ID for the valuetype
: Same type for both the key and the value
You can also use Kafi to directly interact with the Schema Registry API. Here are some examples.
c.get_subjects()
['topic_avro-value', 'topic_jsonschema-value', 'topic_protobuf-value']
First soft-delete:
c.delete_subject("topic_avro-value")
[1]
Then list the subjects again:
c.get_subjects()
['topic_jsonschema-value', 'topic_protobuf-value']
List also the soft-deleted subjects:
c.get_subjects(deleted=True)
['topic_avro-value', 'topic_jsonschema-value', 'topic_protobuf-value']
Then hard-delete the subject:
c.delete_subject("topic_avro-value", permanent=True)
[1]
And check whether it is really gone:
c.get_subjects(deleted=True)
['topic_jsonschema-value', 'topic_protobuf-value']
c.get_latest_version("topic_jsonschema-value")
{'schema_id': 3, 'schema': {'schema_str': '{"$schema":"http://json-schema.org/draft-07/schema#","type":"object","title":"myrecord","properties":{"bla":{"type":"integer"}},"required":["bla"],"additionalProperties":false}', 'schema_type': 'JSON'}, 'subject': 'topic_jsonschema-value', 'version': 1}
etc.
You can also use Kafi as a simple non-stateful stream processing tool.
You can use Kafi to just copy topics2:
c.cp("topic_json", c, "topic_json_copy")
(3, 3)
Of course you can also use schemas here, e.g. you could convert a Protobuf topic to a pure JSON topic:
c.cp("topic_protobuf", c, "topic_avro_json_copy", source_value_type="protobuf")
(3, 3)
...or copy a pure JSON topic to an Avro topic:
s = """
{
"type": "record",
"name": "myrecord",
"fields": [
{
"name": "bla",
"type": "int"
}
]
}
"""
c.cp("topic_json", c, "topic_json_avro_copy", target_value_type="avro", target_value_schema=s)
(3, 3)
In the example below, we use a single message transform. In our map_function
, we add 42 the "bla" fields or all messages from the input topic topic_json
and write the processed messages to the output topic topic_json_mapped
:
def plus_42(x):
x["value"]["bla"] += 42
return x
c.cp("topic_json", c, "topic_json_mapped", map_function=plus_42)
(3, 3)
...and look at the result:
c.cat("topic_json_mapped")
[{'topic': 'topic_json_mapped', 'headers': None, 'partition': 0, 'offset': 0, 'timestamp': (1, 1732668466442), 'key': '123', 'value': {'bla': 165}}, {'topic': 'topic_json_mapped', 'headers': None, 'partition': 0, 'offset': 1, 'timestamp': (1, 1732668466442), 'key': '456', 'value': {'bla': 498}}, {'topic': 'topic_json_mapped', 'headers': None, 'partition': 0, 'offset': 2, 'timestamp': (1, 1732668466442), 'key': '789', 'value': {'bla': 831}}]
Of course, all that also works seamlessly with schemas, for example:
c.cp("topic_protobuf", c, "topic_protobuf_json_mapped", map_function=plus_42, source_value_type="protobuf")
(3, 3)
You can also use Kafi for filtering (or exploding) using its flatmap
functionality. In the example below, we only keep those messages from the input topic topic_json
where "bla" equals 4711. Only those messages are written to the output topic topic_json_flatmapped
:
def filter_out_456(x):
if x["value"]["bla"] == 456:
return [x]
else:
return []
c.cp("topic_json", c, "topic_json_flatmapped", flatmap_function=filter_out_456)
(3, 1)
The input and output topics can be on any cluster - i.e., you can easily do simple stream processing across clusters. In a sense, Kafi thus allows you to easily spin up your own simple MirrorMaker (below, c1
is the source cluster, and c2
the target):
c1 = Cluster("cluster1")
c2 = Cluster("cluster2")
c1.cp("my_topic_on_cluster1", c2, "my_topic_on_cluster2")
This works analogously to setting the serialization/deserialization types above - you just add the prefixes source_
and target_
:
source_key_type
/source_key_schema
/source_key_schema_id
: Type/schema/schema ID for the key of the source topicsource_value_type
/source_value_schema
/source_value_schema_id
: Type/schema/schema ID for the value of the source topicsource_type
: Same type for both the key and the value of the source topic
...and analogously for target_
.
You can also use Kafi as a backup tool - using its built-in "Kafka emulation".
In the example, the source (cluster
) is a real Kafka cluster and the target (localfs
) is Kafi's Kafka emulation on your local file system. Kafi's Kafka emulation keeps all the Kafka metadata (keys, values, headers, timestamps) such that you can later easily restore the backed-up topics without losing data. We set the type to "bytes" to have a 1:1 carbon copy of the data in our backup (no deserialization/serialization).
cluster = Cluster("cluster")
localfs = Local("local")
cluster.cp("my_topic", localfs, "my_topic_backup", type="bytes")
Below, we bring back the backed-up data to Kafka:
localfs.cp("my_topic_backup", cluster, "my_topic", type="bytes")
Works exactly in the same way, you just need to configure s3
correctly beforehand:
cluster.cp("my_topic", s3, "my_topic_backup", type="bytes")
If you are e.g. a data scientist, Kafi can play the role of a bridge between Kafka and files for you. Based on Pandas, it allows you to e.g. transform Kafka topics into Pandas dataframes and vice versa, and similarly for all kinds of file formats:
- CSV
- Feather
- JSON
- ORC
- Parquet
- Excel
- XML
This is as simple as:
df = c.topic_to_df("topic_protobuf", value_type="protobuf")
df
bla
0 123
1 456
2 789
The other way round:
c.df_to_topic(df, "topic_json_from_df")
c.cat("topic_json_from_df)
[{'topic': 'topic_json_from_df', 'headers': None, 'partition': 0, 'offset': 0, 'timestamp': (1, 1732669665739), 'key': None, 'value': {'bla': 123}}, {'topic': 'topic_json_from_df', 'headers': None, 'partition': 0, 'offset': 1, 'timestamp': (1, 1732669666743), 'key': None, 'value': {'bla': 456}}, {'topic': 'topic_json_from_df', 'headers': None, 'partition': 0, 'offset': 2, 'timestamp': (1, 1732669666744), 'key': None, 'value': {'bla': 789}}]
This is as simple as:
l = Local("local")
c.topic_to_file("topic_json", l, "topic_json.xlsx")
Similar:
l = Local("local")
c.topic_to_file("topic_json", l, "topic_json.parquet")
The other way round:
l = Local("local")
l.file_to_topic("topic_json.parquet", c, "topic_json_from_parquet")
Because Kafi is just a Python library integrated into the Python ecosystem, it can be a powerful tool for debugging and fixing bugs - for developers and Kafka administrators alike. Here are some examples.
A typical reoccurring problem is that at the beginning of their development, producers forget to use a proper serializer and the first bunch of messages on dev are not e.g. JSONSchema-serialized. This is how you can find the first N messages in a topic that do not start with the magic byte 0:
c.filter("my_topic", type="bytes", filter_function=lambda x: x["value"][0] != 0)
Kafi supports all of the not-too-specific AdminClient methods of confluent_kafka, so you can use it to do (and automate) all kinds of configuration tasks. For example deleting the first 100 messages of a topic:
c.delete_records({"my_topic": {0: 100}})
...and then to get the watermarks of a topic:
c.watermarks("my_topic")
etc.
The following Kafi code snippet collects the list of schema IDs used in a topic and prints out the corresponding schemas retrieved from the Schema Registry:
def collect_ids(acc, x):
id = int.from_bytes(x["value"][1:5], "big")
acc.add(id)
return acc
(ids, _) = c.foldl("my_topic", collect_ids, set(), type="bytes")
for id in ids:
print(c.get_schema(id))
In Kafi one configuration file corresponds to a "connection" to a so-called storage (Kafka API, Kafka REST Proxy API, Local file system, S3 and Azure Blob Storage). Each storage has one section that only makes sense for itself:
- Kafka API:
kafka
- Kafka REST Proxy API:
rest_proxy
- Local file system:
local
- S3:
s3
- Azure Blob Storage:
azure_blob
In addition, the storages can all have one or two of the following sections:
schema_registry
(Schema Registry configuration)kafi
(additional configuration items)
Please also have a look at the example YAML files in the GitHub repo for further illustration.
The following configuration items are shared across all storages (defaults in brackets):
-
schema_registry
schema.registry.url
basic.auth.credentials.source
basic.auth.user.info
-
kafi
progress.num.messages
(1000
)consume.batch.size
(1000
)produce.batch.size
(1000
)verbose
(1
if run in the interactive Python interpreter,0
if not)auto.offset.reset
(earliest
)consumer.group.prefix
(""
)enable.auto.commit
(false
)commit.after.processing
(true
)key.type
(str
)value.type
(json
)
-
kafka
bootstrap.servers
security.protocol
sasl.mechanisms
sasl.username
sasl.password
log_level
(3
if run in the interactive Python interpreter,6
if not))- etc. librdkafka configuration
-
kafi
flush.timeout
(-1.0
)retention.ms
(604800000
)consume.timeout
(5.0
)session.timeout.ms
(45000
)block.num.retries
(10
)block.interval
(0.5
)
-
rest_proxy
:rest.proxy.url
basic.auth.user.info
-
kafi
:fetch.min.bytes
(-1
)consumer.request.timeout.ms
(1000
)consume.num.attempts
(3
)requests.num.retries
(10
)
local
:root.dir
(.
)
s3
:endpoint
access.key
secret.key
bucket.name
(test
)root.dir
(""
)
azure_blob
:connection.string
container.name
(test
)root.dir
(""
)
To streamline its syntax, Kafi employs a number of defaults/assumptions. All of them can of course be overridden.
Look at the following code from above:
p = c.producer("topic_json")
p.produce({"bla": 123}, key="123")
p.produce({"bla": 456}, key="456")
p.produce({"bla": 789}, key="789")
p.close()
Kafi uses the following defaults/assumptions here. First, for setting up the producer object:
- The maximum batch size for producing is set to the corresponding value
produce.batch.size
in thekafi
section of the configuration file, e.g.1000
inclusters/local.yaml
. - The flush timeout for
flush
calls to the Kafka API is set to the corresponding valueflush.timeout
in thekafi
section of the configuration file, e.g.-1.0
inclusters/local.yaml
. - The default key type is set to the corresponding value
key.type
in thekafi
section of the configuration file, e.g.str
inclusters/local.yaml
. It can also be overridden with thekey_type
kwargs parameter. - The default value type is set to the corresponding value
value.type
in thekafi
section of the configuration file, e.g.json
inclusters/local.yaml
. It can also be overridden with thevalue_type
kwargs parameter. - No delivery callback function is called. This can be overridden with the
delivery_function
kwargs parameter.
Then, for each individual produce
call:
- there are no headers (you can add headers using the
headers
kwargs parameter). - the target partition is any (=-1) (you can set the target partition explicitly using the
partition
kwargs parameter). - the timestamp is set automatically using the
CURRENT_TIME
setting (=0) (you can set the timestamp to a specfic value using thetimestamp
kwargs parameter). - after producing each message, the producer does not call
flush
from the Kafka API (you can control this behavior using theflush
kwargs parameter).
The call could also be written out as follows, assuming the values from the example configuration file clusters/local.yaml
:
c.produce_batch_size(1000)
c.flush_timeout(-1.0)
c.key_type("str")
c.value_type("json")
p = c.producer("topic_json", delivery_function=None)
p.produce({"bla": 123}, key="123", headers=None, partition=-1, timestamp=0, flush=False)
p.produce({"bla": 456}, key="456", headers=None, partition=-1, timestamp=0, flush=False)
p.produce({"bla": 789}, key="789", headers=None, partition=-1, timestamp=0, flush=False)
p.close()
For consuming messages, Kafi also makes use of a number of defaults/assumptions.
To illustrate this, look at the following call:
c.cat("topic_json")
Kafi uses the following defaults/assumptions here. For one, cat
hides the implicit creation of a consumer object. Then, for setting up the consumer:
- It implicitly creates a new consumer group based on the current timestamp.
auto.offset.reset
is set to the corresponding valueauto.offset.reset
in thekafi
section of the configuration file, e.g.earliest
inclusters/local.yaml
.session.timeout.ms
is set to the corresponding valuesession.timeout.ms
in thekafi
section of the configuration file, e.g.45000
milliseconds inclusters/local.yaml
.enable.auto.commit
is set to the corresponding valueenable.auto.commit
in thekafi
section of the configuration file, e.g.false
inclusters/local.yaml
.- The consume timeout is set to the corresponding value
consume.timeout
in thekafi
section of the configuration file, e.g.1.0
(for 1 second) inclusters/local.yaml
. If you set this to -1, Kafi will "wait forever", as in a typical neverending consumer loop. - The consumer group prefix is set to the corresponding value
consumer.group.prefix
in thekafi
section of the configuration file, e.g.""
inclusters/local.yaml
. - the maximum batch size for consuming is set to the corresponding value
consume.batch.size
in thekafi
section of the configuration file, e.g.1000
inclusters/local.yaml
. - The default key type is set to the corresponding value
key.type
in thekafi
section of the configuration file, e.g.str
inclusters/local.yaml
. It can also be overridden with thekey_type
kwargs parameter. - The default value type is set to the corresponding value
value.type
in thekafi
section of the configuration file, e.g.json
inclusters/local.yaml
. It can also be overridden with thevalue_type
kwargs parameter.
And for the consume
calls:
- It attempts to read infinitely many messages (parameter
n=-1
)
The call could also be written out as follows, assuming that the current timestamp is 1732669768728
and the values from the example configuration file clusters/local.yaml
:
c.auto_offset_reset("earliest")
c.session_timeout_ms(45000)
c.enable_auto_commit(False)
c.consume_timeout(1.0)
c.consumer_group_prefix("")
c.consume_batch_size(1000)
c.key_type("str")
c.value_type("json")
co = c.consumer("topic_json", group="1732669768728")
co.consume(n=-1)
co.close()
Thus, you can freely change these settings either in your configuration file or, like here, in the code (using the accessor methods, e.g. auto_offset_reset
for the auto.offset.reset
configuration item).
This section is about the architecture of Kafi.
Essentially, Kafi is built on the concept of a "Storage". There are two kinds of Storages:
- Kafka (real Kafka: Kafka API or Kafka REST Proxy API)
- FS (file system: local file system, S3 or Azure Blob Storage)
The Storage
class inherits from:
Shell
: Shell-like commands likecat
,head
,tail
,cp
...Files
: Copying Kafka topics to files (topic_to_file
) and vice versa (file_to_topic
)AddOns
: Higher-level add-on methods (compact
,compact_to
,recreate
)SchemaRegistry
: Schema Registry API
The classes Shell
, Files
and AddOns
inherit from the class Functional
which offers functional methods (foldl
, flatmap
, map
, filter
, foreach
, zip_foldl
, foldl_to
, flatmap_to
, map_to
, filter_to
, zip_foldl_to
)
Files
inherits indirectly from Functional
through Pandas
which allows to copy topics to Pandas dataframes (topic_to_df
) and vice versa (df_to_topic
).
---
title: Kafi class diagram (Storage)
---
classDiagram
Functional <|-- Shell
Functional <|-- AddOns
Functional <|-- Pandas
Pandas <|-- Files
Shell <|-- Storage
Files <|-- Storage
AddOns <|-- Storage
SchemaRegistry <|-- Storage
Storage <|-- Kafka
Kafka <|-- Cluster
Kafka <|-- RestProxy
Storage <|-- FS
FS <|-- Local
FS <|-- S3
FS <|-- AzureBlob
StorageConsumer
is the base class for consuming records. It inherits from Deserializer
, which in turn inherits from SchemaRegistry
.
The individual storages have their own implementations.
---
title: Kafi class diagram (Consumer)
---
classDiagram
SchemaRegistry <|-- Deserializer
Deserializer <|-- StorageConsumer
StorageConsumer <|-- KafkaConsumer
KafkaConsumer <|-- ClusterConsumer
KafkaConsumer <|-- RestProxyConsumer
StorageConsumer <|-- FSConsumer
FSConsumer <|-- LocalConsumer
FSConsumer <|-- S3Consumer
FSConsumer <|-- AzureBlobConsumer
StorageProducer
is the base class for producing records. It inherits from Serializer
, which in turn inherits from SchemaRegistry
.
The individual storages have their own implementations.
---
title: Kafi class diagram (Producer)
---
classDiagram
SchemaRegistry <|-- Serializer
Serializer <|-- StorageProducer
StorageProducer <|-- KafkaProducer
KafkaProducer <|-- ClusterProducer
KafkaProducer <|-- RestProxyProducer
StorageProducer <|-- FSProducer
FSProducer <|-- LocalProducer
FSProducer <|-- S3Producer
FSProducer <|-- AzureBlobProducer
StorageAdmin
is the base class for administrative methods (e.g. for the Kafka API, the implementation is based on the Kafka Admin Client API).
---
title: Kafi class diagram (Admin)
---
classDiagram
StorageAdmin <|-- KafkaAdmin
KafkaAdmin <|-- ClusterAdmin
KafkaAdmin <|-- RestProxyAdmin
StorageAdmin <|-- FSAdmin
FSAdmin <|-- LocalAdmin
FSAdmin <|-- S3Admin
FSAdmin <|-- AzureBlobAdmin
...
...
Footnotes
-
"Kafi" stands for "(Ka)fka and (fi)les". And, "Kafi" is the Swiss word for a coffee or a coffee place. Kafi is the successor of kash.py which is the successor of streampunk. ↩
-
Please note that you need to set the
consume_timeout
to-1
on the source cluster for Kafi to always wait for new messages:c.consume_timeout(-1)
. ↩