Skip to content

Latest commit

 

History

History
197 lines (143 loc) · 9.32 KB

File metadata and controls

197 lines (143 loc) · 9.32 KB

Traffic Capture

Table of Contents

Overview

This directory provides a suite of tools designed to facilitate the migration and upgrade of OpenSearch clusters. Each tool serves a unique function, and are used in combination to provide an end-to-end migrations and upgrades solution.

Tools

Docker Solution

The Docker Solution is a containerized environment that allows for the easy setup and deployment of the other tools in this repository. For more details, check out the Docker Solution README.

Traffic Capture Proxy Server

The Traffic Capture Proxy Server acts as a middleman, capturing traffic going to a source, which can then be used by the Traffic Replayer.

This tool can be attached to coordinator nodes in clusters with a minimum of two coordinator nodes and start capturing traffic while having zero downtime on the cluster. Be aware that zero downtime is only achievable if the remaining nodes in-service can handle the additional load on the cluster. More details on attaching a Capture Proxy can be found here: Capture Proxy.

Traffic Replayer

The Traffic Replayer consumes streams of IP packets that were previously recorded by the Traffic Capture Proxy Server and replays the requests to another HTTP server, recording the packet traffic of the new interactions.

Learn more about its functionality and setup here: Traffic Replayer

Migration Console

A container with a script to run different OpenSearch Benchmark workloads is brought up as part of the solution.

The workloads are started with the Traffic Capture Proxy Server set as the target, which will capture the requests sent by OpenSearch Benchmark, save them in Kafka before they continue to a "Source Cluster". The Traffic Capture Puller then takes the captured traffic and sends it to the Traffic Replayer. The Traffic Replayer's logs (Tuples consisting of a request, a pair of responses) is then stored in persistent storage for the user's own analytics usages

Note that the script must be manually started.

Partial example output of the OpenSearch Benchmark tool:


Running opensearch-benchmark w/ 'geonames' workload...

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
/_/

[INFO] You did not provide an explicit timeout in the client options. Assuming default of 10 seconds.
[INFO] Downloading workload data (20.5 kB total size)                             [100.0%]
[INFO] Decompressing workload data from [/root/.benchmark/benchmarks/data/geonames/documents-2-1k.json.bz2] to [/root/.benchmark/benchmarks/data/geonames/documents-2-1k.json] ... [OK]
[INFO] Preparing file offset table for [/root/.benchmark/benchmarks/data/geonames/documents-2-1k.json] ... [OK]
[INFO] Executing test with workload [geonames], test_procedure [append-no-conflicts] and provision_config_instance ['external'] with version [7.10.2].

[WARNING] indexing_total_time is 14 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 65 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index                                                           [100% done]
Running create-index                                                           [100% done]
Running check-cluster-health                                                   [100% done]
Running index-append                                                           [100% done]

The runTestBenchmarks tool has a few configurable options. It will attempt to guess the correct endpoint to send traffic to, and it will automatically attach the basic auth user/password admin/admin.

To set a custom endpoint, specify it with --endpoint, for example ./runTestBenchmarks --endpoint https://capture-proxy-domain.com:9200.

To set custom basic auth params, use --auth_user and --auth_pass. To prevent the script from attaching any auth params, use the --no_auth flag. This flag overrides any other auth params, so if you use both --auth_user and --no_auth, the end result will be no auth being applied.

As an example of including multiple options:

./runTestBenchmarks --endpoint https://capture-proxy-domain.com:9200 --auth_pass Admin123!

will send requests to capture-proxy-domain.com, using the auth combo admin/Admin123!.

Support for Sigv4 signing and other auth options may be a future option.

Understanding Data from the Replayer

The Migration Console can be used to access and help interpret the data from the replayer.

The data generated from the replayer is stored on an Elastic File System volume shared between the Replayer and Migration Console. It is mounted to the Migration Console at the path /shared_replayer_output. The Replayer generates files named output_tuples.log. These files are rolled over as they hit 10 MB to a series of output_tuples-%d{yyyy-MM-dd-HH:mm}.log files.

The data in these files is in the format of JSON lines, each of which is a log message containing a specific request-response-response tuple. The body of the messages is sometimes gzipped which makes it difficult to represent as text in a JSON. Therefore, the body field of all requests and responses is base64 encoded before it is logged. This makes the files stable, but not human-readable.

We have provided a utility script that can parse these files and output them to a human-readable format: the bodies are base64 decoded and parsed as JSON if applicable. They're then saved back to JSON format to stdout or file.

To use this utility from the Migration Console,

$ console tuples show --help
Usage: console tuples convert [OPTIONS]

Options:
  --in FILENAME
  --out FILENAME
  --help          Show this message and exit.

# By default, the input and output files are `stdin` and `stdout` respectively, so they can be piped together with other tools.
$ console tuples show --in /shared-logs-output/traffic-replayer-default/86ca83e66197/tuples/mini_tuples.log | jq
{
  "sourceRequest": {
    "Request-URI": "/",
    "Method": "GET",
    "HTTP-Version": "HTTP/1.1",
    "Host": "capture-proxy:9200",
    "User-Agent": "python-requests/2.32.3",
    "Accept-Encoding": "gzip, deflate, zstd",
    "Accept": "*/*",
    "Connection": "keep-alive",
    "Authorization": "Basic YWRtaW46YWRtaW4=",
    "body": ""
  },
  "sourceResponse": {
    "HTTP-Version": {
      "keepAliveDefault": true
    },
    "Status-Code": 200,
    "Reason-Phrase": "OK",
    ...
  },
  ...
}
# A specific output file can also be specified.
$ console tuples show --in /shared_replayer_output/tuples.log --out local-tuples.log
Converted tuples output to local-tuples.log

Capture Kafka Offloader

The Capture Kafka Offloader will act as a Kafka Producer for offloading captured traffic logs to the configured Kafka cluster.

Learn more about its functionality and setup here: Capture Kafka Offloader

Building

The building process for this project is streamlined through the use of Gradle. This section outlines the necessary steps to build the project and execute tests effectively.

To compile the project and execute unit tests, use the following command:

../gradlew build

This command compiles the source code and runs the quick unit tests, ensuring the project is correctly assembled and functional.

For a comprehensive test run, including both quick unit tests and more extensive slow tests, execute:

../gradlew allTests --rerun

This command initiates all tests, ensuring thorough validation of the project. The --rerun option is used to ignore existing task output cache for the specified tasks.

Gradle Scans

Gradle Scans offer a more intuitive understanding of build outputs. This action requires acceptance of the Gradle Scan terms of service. To automate this acceptance and enable scans by default, set the OS_MIGRATIONS_GRADLE_SCAN_TOS_AGREE_AND_ENABLED environment variable:

export OS_MIGRATIONS_GRADLE_SCAN_TOS_AGREE_AND_ENABLED=

For persistent configuration in Zsh:

echo 'export OS_MIGRATIONS_GRADLE_SCAN_TOS_AGREE_AND_ENABLED=' >> ~/.zshrc

Access your detailed build reports by following the link provided at the end of your Gradle command's output.