Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow up on #2655 Clean up kafka source connector docs #12343

Merged
merged 3 commits into from
May 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 2 additions & 8 deletions airbyte-integrations/connectors/source-kafka/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,16 +43,10 @@ We use `JUnit` for Java tests.
Place unit tests under `src/test/io/airbyte/integrations/source/kafka`.

#### Acceptance Tests
Airbyte has a standard test suite that all destination connectors must pass. Implement the `TODO`s in
`src/test-integration/java/io/airbyte/integrations/source/KafkaSourceAcceptanceTest.java`.
Airbyte has a standard test suite that all source connectors must pass.

### Using gradle to run tests
All commands should be run from airbyte project root.
To run unit tests:
```
./gradlew :airbyte-integrations:connectors:source-kafka:unitTest
```
To run acceptance and custom integration tests:
All commands should be run from airbyte project root. To run acceptance and custom integration tests:
```
./gradlew :airbyte-integrations:connectors:source-kafka:integrationTest
```
Expand Down
83 changes: 27 additions & 56 deletions docs/integrations/sources/kafka.md
Original file line number Diff line number Diff line change
@@ -1,73 +1,44 @@
# Kafka

## Overview
This page guides you through the process of setting up the Kafka source connector.

The Airbyte Kafka source allows you to sync data from Kafka. Each Kafka topic is written to the corresponding stream.
# Set up guide

### Sync overview
## Step 1: Set up Kafka

#### Output schema
To use the Kafka source connector, you'll need:

Each Kafka topic will be output into a stream.
* [A Kafka cluster 1.0 or above](https://kafka.apache.org/quickstart)
* Airbyte user should be allowed to read messages from topics, and these topics should be created before reading from Kafka.

Currently, this connector only reads data with JSON format. More formats \(e.g. Apache Avro\) will be supported in the future.
## Step 2: Setup the Kafka source in Airbyte

#### Features
You'll need the following information to configure the Kafka source:

| Feature | Supported?\(Yes/No\) | Notes |
| :--- | :--- | :--- |
| Full Refresh Sync | Yes | |
| Incremental - Append Sync | Yes | |
| Namespaces | No | |

## Getting started

### Requirements

To use the Kafka source, you'll need:

* A Kafka cluster 1.0 or above.

### Setup guide

#### Network Access
* **Group ID** - The Group ID is how you distinguish different consumer groups. (e.g. group.id)
* **Protocol** - The Protocol used to communicate with brokers.
* **Client ID** - An ID string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging. (e.g. airbyte-consumer)
* **Test Topic** - The Topic to test in case the Airbyte can consume messages. (e.g. test.topic)
* **Subscription Method** - You can choose to manually assign a list of partitions, or subscribe to all topics matching specified pattern to get dynamically assigned partitions.
* **List of topic**
* **Bootstrap Servers** - A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.

Make sure your Kafka brokers can be accessed by Airbyte.
### For Airbyte Cloud:

#### **Permissions**
1. [Log into your Airbyte Cloud](https://cloud.airbyte.io/workspaces) account.
2. In the left navigation bar, click **Sources**. In the top-right corner, click **+new source**.
3. On the Set up the source page, enter the name for the Kafka connector and select **Kafka** from the Source type dropdown.
4. Follow the [Setup the Kafka source in Airbyte](kafka.md#Setup-the-Kafka-Source-in-Airbyte)

Airbyte should be allowed to read messages from topics, and these topics should be created before reading from Kafka.
## Supported sync modes

#### Target topics
The Kafka source connector supports the following[sync modes](https://docs.airbyte.com/cloud/core-concepts#connection-sync-modes):

You can determine the topics from which messages are read via the `topic_pattern` configuration parameter. Messages can be read from a hardcoded, pre-defined topic.

To read all messages from a single hardcoded topic, enter its name in the `topic_pattern` field e.g: setting `topic_pattern` to `my-topic-name` will read all messages from that topic.

You can determine the topic partitions from which messages are read via the `topic_partitions` configuration parameter.

### Setup the Kafka destination in Airbyte

You should now have all the requirements needed to configure Kafka as a destination in the UI. You can configure the following parameters on the Kafka destination \(though many of these are optional or have default values\):

* **Bootstrap servers**
* **Topic pattern**
* **Topic partition**
* **Test topic**
* **Group ID**
* **Max poll records**
* **SASL JAAS config**
* **SASL mechanism**
* **Client ID**
* **Enable auto commit**
* **Auto commit interval ms**
* **Client DNS lookup**
* **Retry backoff ms**
* **Request timeout ms**
* **Receive buffer bytes**
* **Repeated calls**

More info about this can be found in the [Kafka consumer configs documentation site](https://kafka.apache.org/documentation/#consumerconfigs).
| Feature | Supported?\(Yes/No\) | Notes |
| :--- | :--- | :--- |
| Full Refresh Sync | Yes | |
| Incremental - Append Sync | Yes | |
| Namespaces | No | |

## Changelog

Expand Down