Zookeeper

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed. Other projects such as HBase and Kafka use Zookeeper to coordinate leader election such that there is always high availability.

## Requirements At least 3 AWS Instances ## Dependencies Zookeeper requires Java. If Java is not installed on all the machines, install with the following command

node:~$ sudo apt-get update
node:~$ sudo apt-get install openjdk-7-jdk

You can check if the installation was successful by typing the following

node:~$ which java
/usr/bin/java
node:~$ java -version
java version "1.7.0.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

## Install Zookeeper This installation process must be executed on each machine.

We will grab the zookeeper 3.4.6 version and save it to a Downloads folder. Next we will install it into our /usr/local directory and rename the folder to simply ‘zookeeper’

node:~$ wget http://mirror.cc.columbia.edu/pub/software/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz -P ~/Downloads
node:~$ sudo tar zxvf ~/Downloads/zookeeper-*.tar.gz -C /usr/local
node:~$ sudo mv /usr/local/zookeeper-3.4.6/ /usr/local/zookeeper

## Configure Zookeeper First we will copy the sample configuration file to be the actual zookeeper configuration file. Each machine’s configuration file will be exactly the same. Current configuration file example is for a cluster with 3 nodes.

node:~$ cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg
node:~$ sudo nano /usr/local/zookeeper/conf/zoo.cfg

Change bolded text in /usr/local/zookeeper/conf/zoo.cfg

 # The number of milliseconds of each tick
 tickTime=2000
 # The number of ticks that the initial
 # synchronization phase can take
 initLimit=10
 # The number of ticks that can pass between
 # sending a request and getting an acknowledgement
 syncLimit=5
 # the directory where the snapshot is stored.
 # do not use /tmp for storage, /tmp here is just
 # example sakes.
 dataDir=/tmp/zookeeper
 # the port at which the clients will connect
 clientPort=2181
 # the maximum number of client connections.
 # increase this if you need to handle more clients
 #maxClientCnxns=60
 …
 …
 …
 #----------------------------------------------------------------------------#
 /tmp/zookeeper => /var/lib/zookeeper

 # Add more to server if you have more nodes
 clientPort=2181 => clientPort=2181
                   server.1=public_dns_1:2888:3888
                   server.2=public_dns_2:2888:3888
                   server.3=public_dns_3:2888:3888
…
…
…

Also uncomment the lines for auto-purging, otherwise Zookeeper will keep logs and eventually run out of space.

# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
autopurge.purgeInterval=1

Each zookeeper node identifies itself by the number in the myid file. This is located in /var/lib/zookeeper/myid on each machine and must be created. The number associated with each node must match that in the zookeeper configuration file as done earlier. You can do a simple shell command to assign each machine to their appropriate ID. Executed on the first zookeeper node

node-1:~$ sudo mkdir /var/lib/zookeeper
node-1:~$ sudo touch /var/lib/zookeeper/myid
node-1:~$ echo 'echo 1 >> /var/lib/zookeeper/myid' | sudo -s

Executed on the second zookeeper node and so on

node-2:~$ sudo mkdir /var/lib/zookeeper
node-2:~$ sudo touch /var/lib/zookeeper/myid
node-2:~$ echo 'echo 2 >> /var/lib/zookeeper/myid' | sudo -s

Now that zookeeper is configured on each node, we can start the zookeeper server on each machine with the following. This must be executed on each machine.

node:~$ sudo /usr/local/zookeeper/bin/zkServer.sh start

## Check Zookeeper Configuration We can check if Zookeeper is running correctly by SSH-ing into each node and seeing if it is a follower or a leader. There should be only one leader in a cluster. You can check the status of each zookeeper by executing the following command

node:~$ echo srvr | nc localhost 2181 | grep Mode

The output on each node should look like the following.

One of Zookeepers features is the ability to elect a new leader in the event of a node failure. We can simulate this by stopping the server on the leader node. After we shut the leader node server down, we can check the other node statuses again. You should notice that another node has been elected the leader.

leader-node:~$ sudo /usr/local/zookeeper/bin/zkServer.sh stop

In the event that we recover the originally dead node it will come back as a follower. We do this by simply starting the zookeeper server on the node we had just stopped

old-leader-node:~$ sudo /usr/local/zookeeper/bin/zkServer.sh start

Interested in transitioning to a career in data engineering?

Find out more about the Insight Data Engineering Fellows Program in New York and Silicon Valley, apply today, or sign up for program updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zookeeper

Zookeeper

Contents

Interested in transitioning to a career in data engineering?

Home

AWS

Ingestion

File Systems

Batch Processing

Stream Processing

Databases

Web frameworks

Other

Clone this wiki locally