This repo contains a simple framework for running single node OpenSearch experiments for the k-NN plugin, using Docker compose and OpenSearch Benchmarks.
The main goal of this project is to allow users to run highly-controlled performance tests on PoC code in an extremely efficient, yet configurable manner. Specifically, the goals are
- Abstract away build of plugin and opensearch docker image - instead, provide a k-NN plugin endpoint and some versioning information, and the framework will take care of the rest
- Provide extensive profiles - for PoC experiments, it is important to get insights into what is bottlenecking the system. However, it can be a hassle to setup JFR or async profiler. The framework will do this automatically
- Provide extensive metrics - Fine-grained metrics should be available for experiments
- Provide extensive telemetry - Give users extensive system metrics that can be used to understand behavior
- Provide out of the box OSB environment - setup OSB environment so user doesnt have to do any setup
- Make OSB extendable - PoC features will typically not be available in OSB. So, it should be possible to extend OSB to support new experimental features
The system is architected with a single docker-compose file. This docker compose will do the following:
- Build a custom test OpenSearch docker image based on provided Github repo and test branch
- Run a single node cluster with the custom docker image and resource constraints. It will add a JFR and system metrics and profiling
- Run a lightweight separate OpenSearch metric cluster for OSB to output results to (this collects all metrics in addition to final report)
- Build a custom OSB image with the provided extensions
- Run the configured OSB workload identified by the run ID and kick off an async profile on the OpenSearch process
All results will be sent to the file "/tmp/share-data". This folder is read/write from all containers. It is important that this is created up front before execution. All results will be identified based on the RUN_ID parameter provided as input. See Parameters for what needs to be filled out in the env file.
# Run the test based on configuration in test.env file.
docker compose --env-file test.env -f compose.yaml up -d
# Stop the framework
docker compose --env-file test.env -f compose.yaml down
In more complex setups, you may want to write a script that starts/stops the containers. This can be done while preserving data ingested. See experiments/low-mem-knn-exp/exp-1/run.sh as an example.
There are several environment variables that need to be configured in order to run the docker compose setup
Key Name | Description |
---|---|
RUN_ID | Run identifier. Will be used in file names |
TEST_REPO | Link to k-NN repo. Plugin will be built from source from here. (i.e. https://github.com/opensearch-project/k-NN.git) |
TEST_BRANCH | k-NN branch name. Plugin will be built from source from here |
TEST_JVM | Amount of JVM to be used for test container (i.e. 32g) |
TEST_CPU_COUNT | Number of CPUs test container will get. (i.e. 2) |
TEST_MEM_SIZE | Amount of total memory test container will be limited at. (i.e. 4G) |
METRICS_JVM | Amount of JVM to be used for metrics container (i.e. 1g) |
METRICS_CPU_COUNT | Number of CPUs metrics container will get. (i.e. 2) |
METRICS_MEM_SIZE | Amount of total memory metrics container will be limited at. (i.e. 4G) |
OSB_PROCEDURE | OSB procedure to be run |
OSB_PARAMS | OSB params to be used (include .json extension) |
OSB_SHOULD_PROFILE | Should profiling be triggered for this run (i.e. true or false) |
OSB_CPU_COUNT | Number of CPUs OSB gets (i.e 2) |
OSB_MEM_SIZE | Amount of memory OSB gets (i.e. 4g) |
OPENSEARCH_VERSION | Version of OpenSearch to use (i.e. 3.0.0 or 2.15.0) |
OSB is the main benchmarking framework used for k-NN in OpenSearch.
From a high level, to run an experiment, we need to fill the osb/custom/params directory with the parameters we want to use for the run. This will tell OSB what to do. See experiments for examples of different parameter configurations that can be selected.
Also, if you want custom procedure or extensions, you need to add them directly to the source in the osb directory.
For more general info about OSB and vector search, see https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/vectorsearch.
Several different system metrics are captured during runs. See test-image/utils/process-stats-collector.sh for more details. In general, every second, the following metrics are collected from the OS process and output to the file "/tmp/share-data/telemetry/process-stats-${RUN_ID}.csv":
- CPU_USAGE
- MEM_USAGE
- MINOR_FAULTS
- MAJOR_FAULTS
- ANON_RSS
- ANON_FILE
Also, it is possible to capture full system metrics like io in wrapper scripts. See experiments/low-mem-knn-exp/exp-4/io-poll.sh for an example
From its entry-point, OSB will start an async profile on the OpenSearch process that is delayed 120 seconds and runs for 60 seconds to capture a flamegraph of the process.
Additionally, the OpenSearch process is ran with JFR configured to give more profiling insights.
OSB will output its results to the "/tmp/share-data/results" path. In it, there will be a csv file that contains the OSB report.