Skip to content

EC2 Performance Testing Quickstart

Andrew J. Stone edited this page Jul 31, 2013 · 4 revisions

Introduction

In order to configure instances on EC2 to run riak and riak_cs, as well as perform benchmarking, we use Ansible. In particular we use Ansible-Riak-Benchmarking.

This document will describe the basics of setting up clusters on Amazon EC2 and running benchmarks using Ansible. It will prescribe the minimum setup required to benchmark, and link to more specific documentation as appropriate. The setup described below uses Ubuntu 12.04.

Setting up a Riak CS cluster with HA-PROXY

EC2 Setup

Launch a bunch of nodes on EC2 with your tool of choice. Make sure they are very beefy nodes. I've been using cr1.8xlarge nodes running Ubuntu 12.04.

All instances should be launched in the same security group and should use the same keypair. Make sure you have the private key stored locally and ensure that you have port 22 (ssh) open to 0.0.0.0/0 for your security group so Ansible can talk to the machines. Also, open port 8080 to the 0.0.0.0/0 so clients can talk to individual CS instances from the HTTP interface, as well as HAProxy.

Getting the ansible code

Some of what is written below will almost certainly change in the future. It is all based off of ansible-riak-benchmarking. The code is written in python and no build step is required. Just clone the code and follow the instructions below.

git clone git@github.com:basho/ansible-riak-benchmarking.git
cd ansible-riak-benchmarking

Modifying Ansible Config

Currently all configuration files live inside the repo and you will need to modify a few files in order to get ansible working properly on AWS. The files to modify are the following:

  • projects/riakcs/inventory/hosts: Defines all hosts in the system and gives them symbolic names
  • projects/riakcs/inventory/group_vars/all: All settings for hosts as named in the hosts file

Modifying the hosts file

  1. Under [riak_cluster] remove the IP addresses as currently configured and add one line for the public dns of each host you want to use as part of the cluster. Each line should have a node_type variable indicating that it is one of the following three types:
    • primary - all nodes attempt to join this node
    • last - this node commits cluster changes
    • middle - all other nodes

The riak_cluster section should look like the following when complete:

[riak_cluster]
ec2-54-227-34-31.compute-1.amazonaws.com node_type=primary
ec2-54-227-41-156.compute-1.amazonaws.com node_type=middle
ec2-54-227-41-157.compute-1.amazonaws.com node_type=middle
ec2-54-227-41-158.compute-1.amazonaws.com node_type=middle
ec2-184-73-104-243.compute-1.amazonaws.com node_type=last
  1. Under [haproxy] swap the ip address for an EC2 instance that is not one of the cluster nodes
  2. Under [stanchion] pick a node from your riak cluster to use as the stanchion node and replace the ip address.
  3. Under [all:vars] change the username to match your setup. For most ubuntu VMs, the user is ubuntu. There is no password, as your EC2 instances are configured to use an RSA keypair.

Your [all:vars] section should look like the following when using Ubuntu 12.04:

[all:vars]
ansible_ssh_user=ubuntu

Modifying group_vars/all

Ubuntu on EC2 uses eth0 for the network interface, so you'll have to set iface: eth0 for all nodes. Furthermore, the versions are currently set with CentOS package names. They will need to be changed to the Ubuntu versions listed below:

riak:
  version: 1.3.2~precise1
riakcs:
  version: 1.3.1-1
stanchion:
  version: 1.3.1-1

You can also use different package versions provided they are in an accessible repository. It may be beneficial in the future to run our own private repos for unreleased packages that we are testing or allow arbitrary package upload and install.

You also need to comment out the redhat partition line and add in the ubuntu version as shown below:

# partition: /dev/mapper/VolGroup-lv_root # for redhat
partition: /dev/mapper/ubuntu--server--1204--x64-root #for ubuntu

Don't worry about the admin key and secrets. Ansible will set those and rewrite the file for you.

Running Ansible

Once your instances have been launched and you have modified the ansible config files, you are ready to configure your machines. You can do that with the following command:

ansible-playbook -vv projects/riakcs/setup.yml -i projects/riakcs/inventory/hosts --private-key ~/.ssh/riakcs.pem

The above line installs and starts riak, riak cs, stanchion and haproxy on the configured nodes. It utilizes the private key that is part of the keypair you configured your EC2 instances with. When the command completes you should have a riak cluster with riak cs running on top being fronted by haproxy. An s3cfg file will be generated for you in /tmp/s3cfg-ansible and can be used to talk to the new cluster. You will need to change the proxy_host to the Public DNS name of the load balancer for it to work, however. You also need to set progress_meter = Yes.