Apache Kafka is a distributed streaming platform. It provides publish-subscribe APIs and can store and process streams of records at large scale.
Kafka is made up of the following components:
- Broker: One of more brokers form a Kafka cluster
- Producer: Client to send records into Kafka
- Consumer: Client to read records from Kafka
- Admin: Client to manage Kafka clusters
- Connect: Runtime for Sink and Source Connectors to stream data between external systems and Kafka
- Streams: Library to process streams of records in real time
You can learn more about Kafka in this introductory article, "What is Apache Kafka" or in this conference presentation, "Introducing Apache Kafka."
In order to complete this workshop, you need to have the following dependencies installed:
In this workshop, we will use the Kafka command line tools.
Download the 2.5.1 source package from the Apache Kafka website, uncompress it, and compile it.
> tar -xzf kafka-2.5.1-src.tgz
> cd kafka-2.5.1-src
> gradle
> ./gradlew assemble # This command takes a few minutes to complete
This workshop assumes you are running a Unix-based platform like macOS or Linux and will use the scripts from the bin
directory in the examples. If you are on Windows, use the scripts that are in the bin/windows
directory.
This workshop requires access to a Kafka cluster. If you don't already have a Kafka cluster, you can use IBM Event Streams or setup a Kafka cluster on your computer.
-
If you want to use IBM Event Streams, follow the Event Streams setup steps.
-
If you want to use a local Kafka cluster, follow the local Kafka setup steps.
Continue to Part 2.