This set of playbooks automates the installation, basic configuration, and tuning of the Confluent platform using either enterprise or community components. It currently does not configure TLS based encryption between nodes, however that functionality is available as part of the DataNexus platform.
Configure the ansible hostsfile
to resemble your preferred cluster topology, placing the IP address of each node in its respective section. Note that you can co-locate multiple services as long as each node has sufficient memory.
# all hosts configured to act as zookeepers
[zookeeper]
10.10.1.122 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.28 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.32 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as kafka brokers
[kafka_broker]
10.10.1.142 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.216 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.196 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.13 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as schema registries
[registry]
10.10.1.154 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as distributed connectors
[kafka_connect]
10.10.1.252 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as rest proxies
[rest_proxy]
10.10.1.71 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as ksql servers
[kafka_ksql]
10.10.1.155 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as control center servers (note they must also run kafka brokers)
[controlcenter]
10.10.1.13 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
Run the code using the following format:
./deploy HOSTSFILE TENANT PROJECT CLOUD REGION DOMAIN CLUSTER
If deploying on bare metal, every parameter after HOSTSFILE
is required, but ignored. For AWS, the following is sufficient:
./deploy hostsfile datanexus demo aws us-east-1 development none
/roles
- the ansible code that does all the work- aws - configure AWS only security groups across each node
- confluent - configure confluent repos
- connect - install distributed connector
- controlcenter - install control center
- kafka - install kafka brokers
- kafkarest - install kafka rest service
- ksql - install ksql
- preflight - configure cloud only data file systems and hostnames
- registry - install schema registry
- zookeeper - install zookeeper
/vars
- default variables for each platform component most likely to change across any given installationbuild_aws_static_inventory.sh
- helper script for generating an ansible hosts file based off meta-data tags (requires AWS CLI)deploy
- simple shell wrapper for calling ansible with CLI variablesprovision-confluent
- playbook for calling roles in a specific order (entry point into code)
- ansible version 2.7.5 (slightly older versions will likely work just fine)
- validated SSH connectivity to each platform node (knowledgable configuration of
.ssh/config
does wonders)
Minimal node specs:
- 2 VCPUs
- 8 GB RAM
- 10 GB root filesystem
- 10 GB data filesystem
CentOS/RedHat (7+) linux on each node.
Easily deploy secure cloud infrastructure using the DataNexus platform. Remember to zero out any previous jumphosts in ~/.ssh.config. Application overlay errors are expected since we aren't technically deploying any overlays:
export key_path=/DataNexus/Demos/infrastructure
time groves/orchestrator --keypath $key_path --tenantpath `pwd`/datanexus infrastructure-small-unified.yml directives/confluent.yml
The confluent platform consists of seven separate components:
- zookeeper - required
- kafka brokers - required
- schema registry - optional, but usually deployed
- control center - optional
- kafka connect - optional, but usually deployed
- kafka rest - optional
- ksql - optional
If you wish to skip a particular component, simply leave the ansible host group blank. The code will handle the impact of the absense or presence of any particular component across the platform.
The /vars
subdirectory contains the variables most likely to change per deployment. Reasonable defaults have been chosen.
/vars/confluent.yml
- JVM version, confluent platform version/vars/zookeeper.yml
- data directory for zookeeper/vars/kafka.yml
- data directory for kafka, default partitions per topic, topic deletion enablement/vars/registry.yml
- nothing defined/vars/controlcenter.yml
- data directory for control center, location of confluent license (note that confluent grants 30 days usage without a valid file)/vars/connect.yml
- nothing defined/vars/rest.yml
- nothing defined/vars/ksql.yml
- nothing defined
The /defaults
subdirectory under each ansible role contains variables that generally require more specialized knowledge of the confluent platform before changing, such as TCP ports and JVM memory.
Easy generation of a static hosts file is supported within AWS only (this can be run as soon as the VMs are application tagged with the ansible group names):
./build_aws_static_inventory.sh | tee hostsfile
Otherwise, configure the ansible hostsfile
to resemble your preferred cluster topology, placing the IP address of each node in its respective section. Note that you can co-locate multiple services as long as each node has sufficient memory.
# all hosts configured to act as zookeepers
[zookeeper]
10.10.1.122 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.28 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.32 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as kafka brokers
[kafka_broker]
10.10.1.142 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.216 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.196 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
10.10.1.13 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as schema registries
[registry]
10.10.1.154 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as distributed connectors
[kafka_connect]
10.10.1.252 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as rest proxies
[rest_proxy]
10.10.1.71 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as ksql servers
[kafka_ksql]
10.10.1.155 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
# all hosts configured to act as control center servers (note they must also run kafka brokers)
[controlcenter]
10.10.1.13 ansible_user=centos ansible_ssh_private_key_file=./server-key.pem
Once the ansible hostsfile
has been generated (either automatically or by hand). Note that the installation is idempotent and multiple runs of the playbook is permissible:
./deploy hostsfile datanexus demo aws us-east-1 development none
Once complete, verify the platform is running by port forwarding the control center UI over SSH:
ssh -i aws-us-east-1-demo-broker-development-private-key.pem 10.10.1.67 -L 9021:localhost:9021
Open a tab in your local browser to http://localhost:9021.
Deploy the source cluster:
./deploy hostsfile.a datanexus demo aws us-east-1 development a
Deploy the destination cluster:
./deploy hostsfile.b datanexus demo aws us-east-1 development b
Deploy the connect replicator (in this case the replicator is not part of any other cluster):
./deploy hostsfile.replication datanexus demo aws us-east-1 development none replication
To check for configuration drift from the baseline, the ansible dry run output can be piped through the check wrapper:
./deploy hostsfile datanexus demo aws us-east-1 development none drift
./collector -i hostsfile.a
Logs are compressed and can be unarchived with:
tar -zxf ./log.tar.gz