You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an issue based off one of the proposed priorities in this RFC: #627
Background
The current implementation of the Compare API in OSB allows users to compare and analyze the results of two benchmark test executions by providing the unique IDs (UIDs) of the test executions. However, users have expressed interest in comparing aggregated results across multiple runs of the same test. Running the same test multiple times is a common practice in performance testing to account for variability and ensure consistent results. Therefore, the ability to aggregate results across multiple test runs and compare these aggregated results is essential to the performance testing experience.
Proposed Design
To address this requirement, we propose introducing a new aggregate subcommand in OSB. This subcommand will allow users to specify a list of test execution IDs, and OSB will compute weighted averages or medians for each metric across the specified test runs.
For metrics involving percentiles (e.g., query latencies), the aggregation will compute the percentile values based on the combined and weighted distribution of all data points from the individual test runs. The contributions for each test run will be weighted proportionally based on the number of iterations or data points in that run.
When aggregating results across multiple test executions, a validation step will be added to ensure that the underlying workload configuration (the type of operations being performed, the data set being used, the cluster configuration, etc.) is consistent across all the test executions being aggregated. Aggregating results from different workloads or configurations could lead to misleading or invalid results.
The aggregated result will be assigned a new ID for future reference and stored in a separate folder to maintain a less cluttered file system.
Example:
If we have three test executions with the following median indexing throughput values and iteration counts:
- Test Execution 1: Median Indexing Throughput = 20,000 docs/s, Iterations = 1,000
- Test Execution 2: Median Indexing Throughput = 18,000 docs/s, Iterations = 2,000
- Test Execution 3: Median Indexing Throughput = 22,000 docs/s, Iterations = 1,500
The weighted average median indexing throughput would be calculated as follows:
Weighted Sum = (20,000 * 1,000) + (18,000 * 2,000) + (22,000 * 1,500)
= 20,000,000 + 36,000,000 + 33,000,000
= 89,000,000
Total Iterations = 1,000 + 2,000 + 1,500 = 4,500
Weighted Average Median Indexing Throughput = Weighted Sum / Total Iterations
= 89,000,000 / 4,500
= 19,777.78 docs/s
Proposed Priority
The ability to aggregate results across multiple test runs and compare these aggregated results is a highly requested feature from users. It will significantly enhance the performance testing experience and provide more reliable and representative performance measurements, enabling users to make more informed decisions about OpenSearch configurations and optimizations.
The text was updated successfully, but these errors were encountered:
This is an issue based off one of the proposed priorities in this RFC: #627
Background
The current implementation of the Compare API in OSB allows users to compare and analyze the results of two benchmark test executions by providing the unique IDs (UIDs) of the test executions. However, users have expressed interest in comparing aggregated results across multiple runs of the same test. Running the same test multiple times is a common practice in performance testing to account for variability and ensure consistent results. Therefore, the ability to aggregate results across multiple test runs and compare these aggregated results is essential to the performance testing experience.
Proposed Design
To address this requirement, we propose introducing a new aggregate subcommand in OSB. This subcommand will allow users to specify a list of test execution IDs, and OSB will compute weighted averages or medians for each metric across the specified test runs.
For metrics involving percentiles (e.g., query latencies), the aggregation will compute the percentile values based on the combined and weighted distribution of all data points from the individual test runs. The contributions for each test run will be weighted proportionally based on the number of iterations or data points in that run.
When aggregating results across multiple test executions, a validation step will be added to ensure that the underlying workload configuration (the type of operations being performed, the data set being used, the cluster configuration, etc.) is consistent across all the test executions being aggregated. Aggregating results from different workloads or configurations could lead to misleading or invalid results.
The aggregated result will be assigned a new ID for future reference and stored in a separate folder to maintain a less cluttered file system.
Example:
If we have three test executions with the following median indexing throughput values and iteration counts:
The weighted average median indexing throughput would be calculated as follows:
Proposed Priority
The ability to aggregate results across multiple test runs and compare these aggregated results is a highly requested feature from users. It will significantly enhance the performance testing experience and provide more reliable and representative performance measurements, enabling users to make more informed decisions about OpenSearch configurations and optimizations.
The text was updated successfully, but these errors were encountered: