- Testing a Distribution
- Testing in CI/CD
- Manifest Files
- Dependency Management
- S3 Permission Model
- Appendix
Testing is run via ./test.sh
.
The following options are available.
name | description |
---|---|
test-type | Run tests of a test suite. [integ-test, bwc-test, perf-test] |
test-manifest-path | Specify a test manifest path. |
--paths | Location of manifest(s). |
--test-run-id | Unique identifier for a test run. |
--component [name ...] | Test a subset of specific components. |
--keep | Do not delete the temporary working directory on both success or error. |
-v, --verbose | Show more verbose output. |
Runs integration tests invoking run_integ_test.py
in each component from distribution manifest.
To run integration tests locally, use below command. This pulls down the built bundle and its manifest file, reads all components of the distribution, and runs integration tests against each component.
Usage:
./test.sh integ-test <test-manifest-path> <target>
For example, build locally and run integration tests.
./build.sh manifests/1.3.5/opensearch-1.3.5.yml
./assemble.sh builds/opensearch/manifest.yml
./test.sh integ-test manifests/1.3.5/opensearch-1.3.5-test.yml . # looks for "./builds/opensearch/manifest.yml" and "./dist/opensearch/manifest.yml"
Or run integration tests against an existing build.
./test.sh integ-test manifests/1.3.5/opensearch-1.3.5-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/1.3.5/5960/linux/x64/tar # looks for https://.../builds/opensearch/manifest.yml and https://.../dist/opensearch/manifest.yml
To run OpenSearch Dashboards integration tests.
./test.sh integ-test manifests/1.3.0/opensearch-dashboards-1.3.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/1.3.5/5960/linux/x64/tar
opensearch-dashboards=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch-dashboards/1.3.5/4056/linux/x64/tar
To run OpenSearch Dashboards integration tests with local artifacts on different distributions
./test.sh integ-test manifests/2.0.0/opensearch-dashboards-2.0.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.0.0-rc1/latest/linux/x64/tar opensearch-dashboards=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch-dashboards/2.0.0-rc1/latest/linux/x64/tar
./test.sh integ-test manifests/2.0.0/opensearch-dashboards-2.0.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.0.0-rc1/latest/linux/x64/rpm opensearch-dashboards=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch-dashboards/2.0.0-rc1/latest/linux/x64/rpm
./test.sh
command with sudo permission, as rpm requires root to install and start the service.
Runs backward compatibility invoking run_bwc_test.py
in each component from a distribution manifest.
Usage:
./test.sh bwc-test <test-manifest-path> <target>
For example, build locally and run BWC tests.
./build.sh manifests/1.3.0/opensearch-1.3.0.yml
./assemble.sh builds/opensearch/manifest.yml
./test.sh bwc-test manifests/1.3.0/opensearch-1.3.0-test.yml . # looks for "./builds/opensearch/manifest.yml" and "./dist/opensearch/manifest.yml"
Or run BWC tests against an existing build.
./test.sh bwc-test manifests/1.3.0/opensearch-1.3.0-test.yml --paths opensearch=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.0.0-rc1/latest/linux/x64/tar # looks for https://.../builds/opensearch/manifest.yml and https://.../dist/opensearch/manifest.yml
To run OpenSearch Dashboards BWC tests.
./test.sh bwc-test manifests/1.3.0/opensearch-dashboards-1.3.0-test.yml --paths
opensearch-dashboards=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch-dashboards/1.3.5/4056/linux/x64/tar
The BWC tests running on distribution level are using the same framework from OpenSearch. The test cluster is spin up with the latest
distribution bundle of provided version exclusively when the project is initialized with property -PcustomDistributionDownloadType=bundle
. In this repo, the test workflow will be enable this gradle property by default.BWC test script
Example distribution bundle URL: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/1.3.2/latest/linux/x64/tar/dist/opensearch/opensearch-1.3.2-linux-x64.tar.gz
This feature for BWC testing is supported for distribution versions starting v1.3.2
.
On CI level for plugins, security certificates need to be manually imported when spinning up the test cluster as security plugin is included in the distribution bundle. When upgrading the version within the test cluster, nextNodeToNextVersion
is used for a single node upgrade and goToNextVersion
is for a full restart upgrade.
See anomaly-detection#766 or observability#1366 for more information.
TODO: Add instructions for running performance tests with test.sh
Performance tests from test.sh
are executed using an internal service which automatically provisions hosts that run OpenSearch Benchmark. Work to open source these internal features is being tracked in opensearch-benchmark#97.
Comparable performance data can be generated by directly using OpenSearch Benchmark, assuming that the same cluster and workload setups are used. More details on the performance testing configuration used for the nightly runs can be found in OpenSearch#2461.
In addition to the standard performance tests that run on the order of hours, longevity tests are run which load to a cluster for days or weeks. These tests are meant to validate cluster stability over a longer timeframe. Longevity tests are also executed using OpenSearch Benchmark, using a modified version of the nyc_taxis workload that repeats the schedule for hundreds of iterations.
Before trying to identify a performance regression a set of baseline tests should be run, in order to establish expected values for performance metrics and to understand the variance between tests for the same configuration. Performance regressions are primarily determined based on decreased indexing throughput and/or increased query latency. There is some amount of variance expected between any two tests. Empirically it has been found that generally tests for the same configuration can differ by about 5% of the mean for average indexing throughput and by about 10% of the mean for p90 or p99 query latency. Note that these values may vary depending on the underlying hardware of the cluster and the workload being used.
If performance metrics for a certain testing configuration consistently fall outside the range created by the expected value for a metric +/- the standard deviation for the metric in the baseline tests then there is likely a performance regression.
The nightly performance runs use the nyc_taxis workload with 2 warmup and 3 test iterations; tests using this configuration can also use the particular values defined in this section for identifying performance regression.
Additionally, error rates can be indicative of a performance regression. Error rates on the order of 0.01% are acceptable, though higher values are cause for concern. High error rates may point to issues with cluster availability or a change in the logic for processing a specific operation.
For tests using OpenSearch Benchmark with an external OpenSearch cluster configured as the data store, more details on the cause of the errors can be found by searching for the test execution ID in the benchmark-metrics-*
index of the metrics data store.
Using the aggregate results from the nightly performance test runs, compare indexing and query metrics to the specifications laid out in the table below. The data for this table came from tests using OpenSearch 1.2 build #762 (arm64/x64), more details on the test setup can be found in OpenSearch#2461.
Please keep in mind the following:
- Changing the number of iterations or the workload type for a test can drastically change performance characteristics. This table is not necessarily applicable to other workload configurations.
- StDev% Mean is the standard deviation as a percentage of the mean. It is expected that metrics for a test will be +/- this value relative to the expected value. If the average of several tests consistently falls outside this bound relative to the expected value there may be a performance regression (or improvement).
- MinMax% Diff is the worst case variance between any two tests with the same configuration. If there is a difference greater than this value than there is likely a performance regression or an issue with the test setup. In general, comparing one off test runs should be avoided if possible.
Instance Type | Security | Expected Indexing Throughput Avg (req/s) | Expected Indexing Error Rate | Indexing StDev% Mean | Indexing MinMax% Diff | Expected Query Latency p90 (ms) | Expected Query Latency p99 (ms) | Expected Query Error Rate | Query StDev% Mean | Query MinMax% Diff |
---|---|---|---|---|---|---|---|---|---|---|
m5.xlarge | Enabled | 30554 | 0 | ~5% | ~12% | 431 | 449 | 0 | ~10% | ~23% |
m5.xlarge | Disabled | 34472 | 0 | ~5% | ~15% | 418 | 444 | 0 | ~10% | ~25% |
m6g.xlarge | Enabled | 38625 | 0 | ~3% | ~8% | 497 | 512 | 0 | ~8% | ~23 |
m6g.xlarge | Disabled | 45447 | 0 | ~2% | ~3% | 470 | 480 | 0 | ~5% | ~15% |
Longevity tests are long running performance tests meant to measure the stability of a cluster over the course of several days or weeks. Internal tools provide dashboards for monitoring cluster behavior during these tests. Use the following steps to spot issues in automated longevity tests:
- Navigate to the Jenkins build for a longevity test.
- In the Console Output search for
INFO:root:Test can be monitored on <link>
- Navigate to that link then click the link for "Live Dashboards"
- Use the table below to monitor metrics for the test:
Metric | Health Indicators / Expected Values | Requires investigations / Cause for concerns |
---|---|---|
Memory | saw tooth graph | upward trends |
CPU | upward trends or rising towards 100% | |
Threadpool | 0 rejections | any rejections |
Indexing Throughput | Consistent rate during each test iteration | downward trends |
Query Throughput | Varies based on the query being issued | downward trends between iterations |
Indexing Latency | Consistent during each test iteration | upward trends |
Query Latency | Varies based on the query being issued | upward trends |
Runs benchmarking tests on a remote opensource OpenSearch cluster, uses OpenSearch Benchmark.
At a high-level the benchmarking test workflow uses opensearch-cluster-cdk to first set-up an OpenSearch cluster (single/multi-node) and then executes opensearch-benchmark
to run benchmark test against that cluster. The performance metric that opensearch-benchmark generates during the run are ingested into another OS cluster for further analysis and dashboarding purpose.
The benchmarking tests will be run nightly and if you have a feature in any released/un-released OpenSearch version that you want to benchmark periodically please create an issue and the team will reach out to you. In case you want to run the benchmarking test locally you can use opensearch-cluster-cdk
repo to spin up an OS cluster in your personal AWS account and then use opensearch-benchmark
to run performance test against it. The detailed instructions are available on respective GitHub repositories.
- Checkout opensearch-build repo and open
jenkins/opensearch/benchmark-test.jenkinsfile
file. - You will then need add an entry in
parameterizedCron
section of the jenkinsfile. - The structure of the
parameterizedCron
section as follows:- Schedule:
H <HOUR> * * *
, edit theHOUR
section to any hour of the day, 0-24.H
adds a jitter to the cron to make sure multiple crons are not started together. - BUNDLE_MANIFEST_URL: The distribution manifest URL that contains the artifact details such as tar location, arch, build id, commit-id, etc.
- TEST_WORKLOAD: This could be any workload that opensearch-benchmark-workload repo provides, if not provided
nyc-taxis
is used as default. - SINGLE_NODE_CLUSTER: Values are
true/false
. Do you want to run the benchmark against a single-node cluster or multi-node. - USE_50_PERCENT_HEAP: Values are
true/false
. Recommended to use 50 percent physical memory as heap. Keep thistrue
. - MIN_DISTRIBUTION: Values are
true/false
. If theBUNDLE_MANIFEST_URL
you provided is for a min/snapshot distribution then set this astrue
else don't provide this parameter. - ADDITIONAL_CONFIG: The configuration that needs to be added to
opensearch.yml
to enable your feature. - USER_TAGS: The metadata that needs to be added to the benchmark metrics ingested in datastore, this helps filter out the metrics for each use-case. Mandatory tags are
run-type:nightly,segrep:<disabled|enabled>,arch:<arm64|x64>,instance-type:<instance-type>,major-version:<3x|2x>,cluster-config:<arch>-<instance-type>-<string that will help identify the feature>
- WORKLOAD_PARAMS: Additional parameters that need to be passed to opensearch-benchmark workload.
- To get more information on each parameter and explore more options please visit here
- Schedule:
Here's the sample entry for enabling nightly runs for remote-store
feature
H 9 * * * %BUNDLE_MANIFEST_URL=https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.10.0/latest/linux/arm64/tar/dist/opensearch/manifest.yml;TEST_WORKLOAD=http_logs;SINGLE_NODE_CLUSTER=false;DATA_NODE_COUNT=3;USE_50_PERCENT_HEAP=true;ENABLE_REMOTE_STORE=true;CAPTURE_SEGMENT_REPLICATION_STAT=true;USER_TAGS=run-type:nightly,segrep:enabled-with-remote-store,arch:arm64,instance-type:r6g.xlarge,major-version:2x,cluster-config:arm64-r6g.xlarge-3-data-3-shards;ADDITIONAL_CONFIG=opensearch.experimental.feature.remote_store.enabled:true cluster.remote_store.enabled:true opensearch.experimental.feature.segment_replication_experimental.enabled:true cluster.indices.replication.strategy:SEGMENT;WORKLOAD_PARAMS={"number_of_replicas":"2","number_of_shards":"3"}
Once you have added the configuration in the jenkinsfile please raise the PR and opensearch-infra team will review it.
The CI/CD infrastructure is divided into two main workflows - build
and test
. The build
workflow automates the process to generate all OpenSearch and OpenSearch Dashboards artifacts, and provide them as distributions to the test
workflow, which runs exhaustive testing on the artifacts based on the artifact type. The next section talks in detail about the test workflow.
The test workflow development is in progress. To see a previously-unfinished prototype design, see #609.
The progress of this design is tracked in meta issue #123.
This pipeline is in development. To see a previously-unfinished prototype design, see #423, #523.
The development of test-orchestration-pipeline
is tracked by meta issue #123
It is a Jenkins job that runs integration tests on a build artifact. It reads the build artifact composition from the associated manifest files and spins up parallel, independent integrationTest runs for each component built inside the artifact. For instance, if the artifact is a full distribution, which has all OpenSearch plugins, the job will kick off integration test suite for each individual plugin. Each plugin integration tests would run against a dedicated single node cluster, which is created from the built artifact. Once all integration tests complete, this job publishes the test results to an S3 bucket.
See the integration test configuration file and related jenkins job
The development of integTest
job is tracked by meta issue #818
It is a Jenkins job that runs bwc tests on the current version and compatible bwc versions of the artifact. OpenSearch core and each plugin would have their backwards compatibility tests and a bwctest.sh
which will be used to trigger the bwc tests. Currently, only core and anomaly-detection bwc tests run and the other plugins can be added once they have their bwc tests ready. For example, for anomaly-detection, the job currently runs bwc tests for current version 1.1.0.0 and bwc version 1.13.0.0.
When the bwc test is triggered for a particular component, the tests set up their own cluster and test the required functionalities in the upgrade paths, for the example above, a multi-node cluster starts with bwc versions of OpenSearch and AD installed on it, one or more nodes are upgraded to the current version of OpenSearch and AD installed on it and backwards compatibility is tested. The plugins would add tests for all bwc versions (similar to OpenSearch core) and they can be triggered from the bwcTest job.
See the bwc test configuration file and related jenkins job
The development of the bwc test automation is tracked by meta issue #90.
It is a Jenkins job that runs performance tests on the bundled artifact using OpenSearch Benchmark (Mensor). It reads the bundle-manifest, config files and spins up a remote cluster with the bundled artifact installed on it. It will run performance test with and without security for specified architecture of the opensearch bundle. The job will kick off the single node cdk that sets up a remote cluster. It will then run the performance tests on those cluster using the mensor APIs from the whitelisted account and remote cluster endpoint(accessible to mensor system). These tests are bundle level tests. Any plugin on-boarding does not need to be a separate process. If the plugin is a part of the bundle, it is already onboarded.
Once the performance tests completes (usually takes 5-8 hours for nyc_taxis track), it will report the test results and publish a human readable report in S3 bucket.
See the performance test configuration file and related jenkins job
You can download the test results report using below url:
https://ci.opensearch.org/ci/dbc/perf-test/<version>/<distribution-build-number>/linux/x64/tar/test-results/<job-build-number>/perf-test/<with/without-security>/perf-test.html
You can download the json format for above results using same url and replacing .html
with .json
Note: The without security test results might be not present for distribution that lacks the security plugin. As of now we only run performance tests on tarballs.
Conversion of Performance Test results to HTML file and JSON file:
The result conversion of the perf-test is as follows- After the performance test completes, it will report back the test results as well as the test-id. The results here will be in JSON format where they are converted into a tabular HTML string using a template library which is json2html. They will be then written to a HTML file where the file-name’s format is in .html. Along with generating a HTML file, a JSON file having the raw JSON data will also be generated whose filename’s format is in .json. The reason for creating a JSON file is to give the user the option to view both the JSON data in tabular format in the HTML file and the raw JSON data in the JSON file. The .html as well as .json file generated, will be stored in the path given by the user during the test-suite flow as a command-line args. These will be then taken and published in the S3 bucket.
The development is tracked by meta issue #126
Manifest files are configurations for a particular bundle. test-workflow
uses three types of manifest files to run test suites.
test-manifest.yml
provides a list of test configurations to run against a given component in the bundle. An example of a configuration would be, integration testindex-management
pluginwith-security
andwithout-security
. This manifest file serves as a support matrix config for the testing and should be updated by plugins if new components or test suites are to be added as part of the release workflow. See herebuild-manifest.yml
created by the build-workflow and provides a list of artifacts built as part of the build-workflow. It assiststest-workflow
pull the maven and build dependencies to run the test suites.bundle-manfest.yml
created by the build-workflow and provides a list of components packaged in a given bundle. It assiststest-workflow
to identify what components should be tested for a given bundle.
This section talks about how the test-workflow
gets the dependencies required by plugins for running integration test suite (and will be extended to backward compatibility in future). There are two types of dependencies -
maven
- maven artifacts required by pluginsbuild
- assembled zip artifacts required by plugins, e.g. job-scheduler zip required by index-management plugin.
Plugins depend on a bunch of maven artifacts to successfully run integration tests. Normally the plugin build system pulls these maven artifacts from maven central repository. However, when testing on unreleased candidates, these maven dependencies are not yet available in maven central repo.
In order to get around this issue, the instrumentation logic in test-workflow
provides these dependencies into maven local repo, before kicking off the integration test for plugins. The test workflow installs these maven dependencies from the s3 bucket where the build-workflow publishes them during bundle creation phase. Once these dependencies are available in maven local, the plugin build system can use them to run integration tests.
Similarly, some plugins have dependency on other plugins and require their zip artifacts for running integration tests. Example, index-management
requires job-scheduler zip
artifacts. These are referred to as build dependencies and are made available by the test-workflow
to the plugins before the test is started.
This section defines how the permissions are configured for reading bundle, tarball, maven dependencies etc. from S3 and writing the results log back to s3 once the tests complete.
The Jenkins infrastructure is setup in an AWS account via cdk. The account provides a test-orchestrator-role
with permission policy to read and write from s3, see [1] test-orchestrator-role policy . The instance-profile
role has assume-role
permissions on this test-orchestrator role
which allows the jenkins instance to read and write from required s3 locations.
[1] test-orchestrator-role policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"<S3 bucket arn>"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"<S3 bucker arn>/builds/*"
]
},
{
"Action": "s3:GetObject",
"Resource": [
"<S3 bucket arn>/bundles/*"
],
"Effect": "Allow"
},
{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "<S3 bucket arn>/bundles/*/tests/*"
}
]
}