Testing

Data

The raw data produced as a result of this test plan can be found below:

Raw Data

Test Plan

1. Test-Plan Identifier

This test plan has the ID: Express.Testing.Integration.1.

2. Introduction

2.1 Objectives.

The focus of this test plan is on testing the horizontal scalability of the Express project with an application container deployed within it. This will test how the various services respond to load when integrated together in a production environment. The objective is to evaluate whether automatic linear horizontal scaling is achieved. This means that when an additional machine is added for the services to run on, the number of requests per second the system can handle should increase linearly with no diminishing returns.

2.2 Background

It is extremely important to test the ability for the Express project to automatically scale across multiple machines. One of the major incentives to use the Express project for deploying application containers is that it allows applications to be scaled across multiple machines without any configuration by the container's developer. All of the services within the Express environment have been designed to scale horizontally; however, to ensure that the system meets this requirement it needs to be tested under load in a production environment.

3. Test Items

The integration of the services within Express environment is to be tested. The tests focus on the ability for the services to automatically scale horizontally as the number of requests increases. This means the following services will be tested together for Express v1.0.0-alpha.1.

Authorization and Identity Service.
- Documentation
- Design
Nginx Reverse Proxy.
- Documentation
- Design
Test API Application Container.
Distributed Cassandra Database.

4. Features to be Tested

The primary feature that will be tested is the ability for the Express environment to scale when an application container is deployed inside of it. This will test the following features.

Automatic Horizontal Scaling. The ability for the services in the Express environment to automatically scale across multiple physical nodes as the number of requests increases.
1. The authorization and identity service.
2. The deployed application container.
3. The distributed Cassandra database.
Request Latency. The impact horizontal scalability has on request latency.
Service Integration. The integration of all services within the Express environment.
1. Reverse proxy delegation of requests for protected resources to the authorization and identity service.
2. Sharing of the distributed Cassandra database between the authorization and identity service and the deployed application container.
3. Identity management for the deployed application container by the authorization and identity service.

5. Features Not to be Tested

The Express environment consists of multiple separate services. The correctness of these individual services will not be evaluated directly by these tests. As a result, the following features will not be tested.

Express Controller. The Express Controller will be used as part of the tests to deploy, update and teardown application contains but will not be tested itself. It will not be directly evaluated because unit and functional tests are included separately.
Express. Correctness or functionality of the individual services will not be tested. Correctness will be assumed based on separate unit tests and the presence of errors, or lack thereof, while performing the testing tasks.
1. Authorization and Identity Service
2. Deployed Application Container
3. Nginx Reverse Proxy

6. Approach

The goal of the testing is to evaluate the Express project in a production environment with realistic traffic. Because of this, an approach based on separating the Express project from the test driver is used.

6.1 Express Services

The Express services are to be deployed to a Kubernetes cluster with 1 to 4 machines each with their own dedicated resources. Having their own dedicated resources allows the Express services to be tested in a production environment.

6.2 Test Application

To evaluate the Express environment, a test API needed to be developed to deploy within it. A CRUD API to accept scans for campgrounds within Canadian National Park was created for this purpose.

Although the API is capable of accepting and persisting scan requests, actual scanning would require a supporting web scraping service. This accompanying service was not used because it would have no impact on the results of the tests due to having no public interface with scheduled scans being performed independently from the number of requests and development time constraints.

The test API has the following endpoints.

/campgrounds/v1/add Allows an account with admin scope to add a new campground including the that can be included in scan requests.
/campgrounds/v1/list Provides a list of all campgrounds that can be scanned for.
/searches/v1/add Allows an account with user scope to add a new scan request to search for an available campsite from a list of provided campgrounds for a given date range.
/searches/v1/list Allows an account with user scope to list their scan requests.

6.3 Test Driver

The test driver is to be run on a separate machine that communicates with the Express services over HTTP. Using this approach ensures that the test driver does not have any impact on the performance of the Express services running on the cluster.

To simulate realistic traffic to the Express services, Locust will be used. This allows traffic to be modelled using a user simulation with each test case defined by a maximum number of users, the starting number of users and the rate at which new users are to be created.

This approach allows different parameters to be used to test different factors.

Maximum Requests Per Second. Start with a high maximum number of users and a low user spawn rate to test where latency starts to increase and/or requests start to timeout.
Request Spikes. Start with a maximum number of users equal to or below the maximum requests per second and use a high spawn rate to test how latency, timeout and error frequency change with rapid increases in the number of requests per second.

User Model

To use this approach to test the Express services, a user model must be created to simulate the HTTP requests that a user might send. This user must interact with the test API and authorization / identity service in a pattern that simulates real use. To achieve this, the following model was developed.

/searches/v1/list
- Weight: 30
/campgrounds/v1/list
- Weight: 30
/searches/v1/add
- Weight: 15
/oauth2/login
- Weight: 2
/oauth2/refresh
- Weight: 2
/oauth2/register
- Weight: 1

With this model, the simulated users register accounts, login, refresh their access token, list the available campgrounds, list their requested searches and add new searches. These requests are sent randomly; however, requests with higher weights are sent more frequently.

Common requests like listing the searches and campgrounds are weighted much higher to simulate a higher frequency. Logging in and refreshing a new token have much lower weights to reflect that they are performed less frequently. Registration occurs the least frequently; however, the simulated users must periodically re-register to avoid continually growing the list of searches which would skew latency due to the response size increasing as the test duration increases.

7. Item Pass / Fail Criteria

7.1 Cutoff Criteria

The purpose of the test plan is to evaluate the ability for the Express environment to automatically scale across multiple machines. This involves repeating the testing tasks on Kubernetes clusters consisting of 1 to 4 machines. These tasks are based on cutoff criteria rather than pass/fail because the goal is to determine the maximum number of requests per second that can be handled. The test will be cutoff if any of the following criteria are met.

Average sliding-window response time. The average sliding-window response time exceeds 500 ms.
Request Failures. The proportion of failed requests exceeds 2%.

7.2 Pass / Fail Criteria

The pass / fail criteria can be determined once the testing tasks have been performed on the clusters consisting of 1 to 4 machines. The goal is to achieve perfect horizontal scalability. To test this, linear regression will be used with x representing the number of machines in the cluster and y representing the maximum number of requests per second. The test will be considered a pass if the coefficient of correlation exceeds 0.7. For the purposes of this analysis, a point with 0 requests for a cluster of size 0 will be added.

8. Suspension Criteria and Resumption Requirements

8.1 Suspension Criteria

Testing will be suspended if the latency between the machine running the test driver and the machines running the Kubernetes cluster exceeds 200 ms when not under load. Verification should be performed by pinging the host.

8.2 Resumption Requirements

Testing can be resumed when the latency between the test machine and cluster host has dropped below 200 ms when not under load. This may require reducing the noise on the channel, reducing the distance to the cluster, or increasing the bandwidth of the channel.

9. Test Deliverables

The following documents shall be included:

Test Plan. This document is the test plan for the Express project.
Test Summary Reports. A brief summary will be included each time the testing tasks are performed to discuss any notable trends. The summary will not include analysis of these trends.
Test Input Data. This will be in the form of the number of users to simulate and the number of users spawned/second.
Test Output Data. Each time the testing tasks are performed the following will be produced.
- distribution.csv containing the absolute number of each HTTP request sent during the test.
- exceptions.csv containing any errors that occurred during the test.
- requests.csv containing the requests per second, average sliding-window response time, and number of users.

10. Testing Tasks

The testing tasks are to performed four times. First on a Kubernetes cluster consisting of a single physical or virtual machine. The tasks must then be repeated on the same cluster after the addition of another machine. This process will be performed until the testing tasks have been completed on clusters consisting of one, two, three and four machines.

10.1 Preparation

All preparation will be performed from the single test machine. The preparation assumes that the environmental needs in section 11 have been met. It is also required that the express-controller executable and Express v1.0.0-alpha.1 archive have been downloaded to the single test machine from the Releases section of the Express Github repository.

Cluster Reset

The Kubernetes cluster must be in a consistent state when starting the tasks. This can be accomplished by performing the following tasks. These tasks are only necessary if the testing tasks have already been performed on the cluster.

Remove the previously deployed application using express-controller teardown testapi.
Remove the Express environment from the cluster using helm delete terrifying-bronco.
Remove all persistent volume claims using kubectl delete pvc terrifying-bronco-database-0.
Remove all persistent volumes using kubectl delete pv terrifying-bronco-database-0.
Wait until No resources found. is shown when using kubectl get pods.

Cluster Setup

The Kubernetes cluster must be prepared for running the testing tasks. This can be done by performing the following tasks.

Navigate to the root directory of the Express v1.0.0-alpha.1 download in bash.
Add the Express environment to the cluster using helm install --replace --name terrifying-bronco ./express/.
Wait until all pods are ready when using kubectl get pods.
Deploy the test CRUD API using express-controller deploy testapi allankerr/api --port 8080 --max X --endpoint-config ./containers/api/endpoints.yaml. The parameter max must be changed to the number of nodes in the cluster.
Wait until all pods are ready when using kubectl get pods.
Create a new admin OAuth2 client for the authorization service using ./bin/authorization-cli.sh client create admin demo-password.
Add the user scope to the admin OAuth2 client using ./bin/authorization-cli.sh scope add admin user.
Wait until the cluster is stable by ensuring the horizontal pod autoscalers do not have <unknown> targets when using kubectl get hpa.

10.2 Test Procedure

With the cluster fully configured for running the tests, the test procedure can be executed. This is handled by the Locust test driver by completing the following steps.

Navigate to the /integration-testing directory of the Express v1.0.0-alpha.1 download in bash.
Execute the command locust --host=https://<cluster-ip> substituting <cluster-ip> for the IP address or domain name of the the Kubernetes cluster being used.
Open a web browser and navigate to http://localhost:8089.
Enter 20,000 as the number of users to simulate and 1 as the hatch rate and click Start Swarming. The Locust test driver will now automatically perform the user simulation.
The test driver should stopped when either of the cutoff criteria are met:
- Average sliding-window response time. The average sliding-window response time exceeds 500 ms.
- Request Failures. The proportion of failed requests exceeds 2%.
After stopping the test driver, the statistics should be downloaded by going to the Locust download's tab in the web browser and downloading the following.
- request statistics CSV
- response time distribution CSV
- exceptions CSV
Save the statistics in a new directory along with a text document containing the number of machines in the cluster that the test was performed on.

11. Environmental Needs

11.1 Hardware Needs

Four physical or virtual machines with identical hardware specifications are required to run the Kubernetes cluster. Each machine must have a minimum of 2 GB of memory and the following minimum CPU requirements.

2 AWS vCPU
2 GCP Core
2 Azure vCore
2 Hyperthread on a bare-metal Intel processor with Hyperthreading

A single physical or virtual test machine is required to run the tests and interface with the Kubernetes cluster. It is recommended that the machine's CPU is at minimum an Intel Core i5-3330 or equivalent.

The test machine must be able to communicate with the four physical machines over a public or private network using HTTP. The connection much have a maximum latency of 200 ms when not under load. For this reason, it is recommended that all machines are on the same continent.

11.2 Software Needs

The four machines must meet the following software requirements.

Kubernetes cluster. A Kubernetes cluster running version 1.8.5 must be deployed on the machines. Documentation on setting up a Kubernetes cluster can be found here.
Helm. The Helm server must be initialized on the Kubernetes cluster using the helm init command after installing the Helm client and kubectl on the single test machine.

The single test machine must meet the following software requirements.

kubectl. kubectl must be installed to interface with the Kubernetes cluster. Documentation on setting up kubectl can be found here. The kubectl config must be configured to access the deployed Kubernetes cluster.
Helm. The Helm client must be installed. Documentation on setting up the Helm client can be found here.
Python 3.0.
Bash.

11.3 Description of Actual Testing Environment

For executing the tests, Kubernetes Engine on the Google Cloud Platform will be utilized. The node pool will be setup in the us-central1-a zone with autoscaling disabled. Four Compute Engine VM instances with 2 vCPU cores and 4 GB of memory will be utilized. The cluster will be setup to use Kubernetes v1.8.5.

To execute the tests on the Kubernetes cluster, a Mid 2014 13-inch MacBook Pro running macOS Sierra Version 10.12.6 with 16 GB of memory and a 3 GHz Intel Core i7 processor will be utilized. The University of Saskatchewan's wireless network will be used to perform the tests.

12. Responsibilities

Allan Kerr will be responsible for managing, designing, preparing and executing the tests. All test items are provided as part of the Express project and the open source projects it utilizes. Environmental needs will be satisfied with hardware provided by Allan Kerr and machines provisioned using the Google Cloud Platform.

13. Staffing and Training

Testing will be performed by Allan Kerr; however, the tests can be performed by anyone. Knowledge in the following areas is required to perform the tests.

Express. Knowledge of how to use the Express project is assumed. Understanding the following documentation is sufficient.
- Authorization / Identity Service Documentation
- Express Controller Documentation
Environment. A basic understanding of Kubernetes and Helm is assumed for purposes of setting up the test environment.
- Kubernetes Documentation
- Helm Documentation
Test Driver. A basic understanding of Python and Locust is assumed for running the tests.
- Locust Documentation

14. Schedule

Testing will be completed by February 18, 2018. Testing may continue beyond this date; however, further testing will be performed on Express v1.0.0-alpha.2 containing changes made to Express v1.0.0-alpha.1 as a result of the tests.