Add aggregate command #638

OVI3D0 · 2024-09-11T22:51:32Z

Description

Adds the aggregate feature, allowing users to aggregate multiple test executions into one aggregated test result. Also compatible with other features such as compare

Usage

To aggregate multiple test executions, you can use the aggregate command like so:
opensearch-benchmark aggregate --test-executions=<test_execution_id1>,<test_execution_id2>,...

Sample output:

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/

Aggregate test execution ID:  aggregate_results_geonames_9aafcfb8-d3b7-4583-864e-4598b5886c4f

-------------------------------
[INFO] SUCCESS (took 1 seconds)
-------------------------------

The results will then be aggregated into one test execution and stored under the ID shown.

Additional feature flags:

--test-execution-id: Define a unique ID for the aggregated test execution.
--results-file: Write the aggregated results to the provided file.
--workload-repository: Define the repository from where OSB will load workloads (default: default).

Issues Resolved

#629 #630

Testing

Tested using make it and make test

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Michael Oviedo <mikeovi@amazon.com>

OVI3D0 · 2024-09-12T00:25:05Z

osbenchmark/metrics.py

@@ -1428,7 +1428,7 @@ def as_dict(self):
            }
        }
        if self.results:
-            d["results"] = self.results.as_dict()
+            d["results"] = self.results if isinstance(self.results, dict) else self.results.as_dict()


Wondering your guys thoughts on this. I figured since it needs to be a dict anyway that we can write it to accept results that are already dicts

Will there be cases where self.results is already a dict? I don't think it hurts to keep it as is and reinforce it

Makes sense. Reverted this change

osbenchmark/aggregator.py

osbenchmark/benchmark.py

IanHoang · 2024-09-12T16:45:39Z

osbenchmark/aggregator.py

+        self.accumulated_iterations: Dict[str, int] = {}
+
+    # count iterations for each operation in the workload
+    def iterations(self) -> None:


Nit: Might be better to rename this to be more descriptive. Something like count_iterations or count_iterations_for_each_op could work. If we go something descriptive like the later, we can also remove the comment above the method

Another conditional: After we verify that all test execution ids have the same test procedure, we'll need to add test_procedure to the config as well so that we can use that here to collect all the operations belonging to that test procedure.

An edge case: Line 20 is getting the default iterations from the workload but some users override this. For situations like this, we'll need to forgo the default iterations from the workload definition and count the iterations based on the count recorded in the results file:

"test_procedure": "big5", "workload-params": { "iterations": 200, "search_clients": 3000, "target_throughput": 3000 }

Had a sync with Michael offline: For now, will check if user overrode iterations. If not, will grab default from workload (as the code is doing here). Will address cases like time-period in future PR.

Will also open issues to add configurable iterations to workloads, in order to handle workloads ran with custom iteration numbers as well.

Another conditional: After we verify that all test execution ids have the same test procedure, we'll need to add test_procedure to the config as well so that we can use that here to collect all the operations belonging to that test procedure.

If we already verified the test procedures are the same, shouldn't the operations be identical as well? I'm not sure what I would do with the collected operations

Nit: Might be better to rename this to be more descriptive. Something like count_iterations or count_iterations_for_each_op could work. If we go something descriptive like the later, we can also remove the comment above the method

Fixed!

osbenchmark/aggregator.py

IanHoang · 2024-09-12T17:13:29Z

osbenchmark/aggregator.py

+        for id in self.test_executions.keys():
+            test_execution = test_store.find_by_test_execution_id(id)
+            if test_execution:
+                if test_execution.workload != workload:


On top of checking workload, we should also verify if the first test execution id's test_procedure matches the rest. Reason being some workloads have multiple test_procedures.

For example, NYC Taxis has 4 test procedures (3 form default.json):https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/main/nyc_taxis/test_procedures/default.json

If a user aggregated a group of test execution ids that use the same workload but differ in test procedures, we could still run into the issue of comparing different operations.

Added a check for this👍

osbenchmark/aggregator.py

IanHoang

A much needed contribution. Left some comments

IanHoang · 2024-09-12T17:31:26Z

osbenchmark/aggregator.py

+        self.accumulated_iterations: Dict[str, int] = {}
+
+    # count iterations for each operation in the workload
+    def iterations(self) -> None:


Another conditional: After we verify that all test execution ids have the same test procedure, we'll need to add test_procedure to the config as well so that we can use that here to collect all the operations belonging to that test procedure.

IanHoang · 2024-09-12T18:04:57Z

osbenchmark/aggregator.py

+        self.accumulated_iterations: Dict[str, int] = {}
+
+    # count iterations for each operation in the workload
+    def iterations(self) -> None:


An edge case: Line 20 is getting the default iterations from the workload but some users override this. For situations like this, we'll need to forgo the default iterations from the workload definition and count the iterations based on the count recorded in the results file:

"test_procedure": "big5", "workload-params": { "iterations": 200, "search_clients": 3000, "target_throughput": 3000 }

IanHoang · 2024-09-12T18:17:09Z

osbenchmark/aggregator.py

+    # count iterations for each operation in the workload
+    def iterations(self) -> None:
+        loaded_workload = workload.load_workload(self.config)
+        for task in loaded_workload.test_procedures:


I think loaded_workload.test_procedures returns a list of test procedures instead of an instance of test procedure. If that's the case, we should update line 17 to use test_procedure instead of task. We should also update line 18 to use operation to task and from task.schedule to test_procedure.schedule

if the test_procedure matches the test_procedures from the test execution ids, we'll get those tasks / operations

if the test_procedure matches the test_procedures from the test execution ids, we'll get those tasks / operations

Missed this comment, will add a check for this 👍

IanHoang · 2024-09-12T18:21:54Z

osbenchmark/benchmark.py

+        type=non_empty_list,
+        required=True,
+        help="Comma-separated list of TestExecution IDs to aggregate")
+


We should include other common options like --test-execution-id and --results-file

Added these! Let me know if I should add others as well

IanHoang · 2024-09-12T18:23:52Z

osbenchmark/aggregator.py

+
+            aggregated_results = self.build_aggregated_results(test_execution_store)
+            file_test_exe_store = FileTestExecutionStore(self.config)
+            file_test_exe_store.store_test_execution(aggregated_results)


A couple of questions:

Will we be storing the aggregated results to ~/.benchmark/benchmarks/test_executions or to a separate directory for aggregated results?

Will we also store this in a OSTestExecutionStore if the user has their benchmark.ini file configured to use an external metrics data store? If so, let's implement this in a separate PR

For now, we'll store the results to the benchmarks test_executions folder, but I can add a separate folder in a future PR.

I did some testing and this does store in an OSTestExecutionStore when my benchmark.ini file is configured to use it!

Awesome, sounds good!

osbenchmark/aggregator.py

IanHoang · 2024-09-16T17:23:25Z

It'd be nice to include a sample output of how this command is used in the PR description.

OVI3D0 · 2024-09-17T22:08:16Z

It'd be nice to include a sample output of how this command is used in the PR description.

Updated the description with some more detail. Let me know what you think!

Signed-off-by: Michael Oviedo <mikeovi@amazon.com>

IanHoang · 2024-09-18T18:26:31Z

osbenchmark/aggregator.py

-    # accumulate metrics for each task from test execution results
-    def results(self, test_execution: Any) -> None:
+        for test_procedure in loaded_workload.test_procedures:
+            if test_procedure.name == self.config.opts("workload", "test_procedure.name"):


Would be good practice to add error handling here in case test_procedure from the test executions is not found in the loaded_workload.

Even though all test iterations might have the same test procedure by this point, it's possible where users might be using a workload (e.g. a custom workload or modified official workload) where a test procedure isn't available.

Added this!

IanHoang · 2024-09-18T18:27:27Z

osbenchmark/aggregator.py

        for item in test_execution.results.get("op_metrics", []):
            task = item.get("task", "")
            self.accumulated_results.setdefault(task, {})
-            for metric in ["throughput", "latency", "service_time", "client_processing_time", "processing_time", "error_rate", "duration"]:
+            for metric in self.statistics:


This is clean!

IanHoang · 2024-09-18T18:28:58Z

osbenchmark/aggregator.py

        self.test_executions = test_executions_dict
        self.accumulated_results: Dict[str, Dict[str, List[Any]]] = {}
        self.accumulated_iterations: Dict[str, int] = {}
+        self.statistics = ["throughput", "latency", "service_time", "client_processing_time", "processing_time", "error_rate", "duration"]


Nit: we use metrics instead of statistics, which would standardize on official documentation terminiology: https://opensearch.org/docs/latest/benchmark/reference/metrics/metric-keys/

Makes sense, updated this name

IanHoang · 2024-09-18T18:31:56Z

osbenchmark/aggregator.py

+                if test_execution.test_procedure != test_procedure:
+                    raise ValueError(
+                        f"Incompatible test procedure: test {id} has test procedure '{test_execution.test_procedure}'\n"
+                        f"instead of '{test_procedure}'"


Nit: It's good that we stated what's wrong but it'd also be nice to point the user in the right direction:
f"Ensure that all test ids have the same test procedure from the same workload"

This is especially useful to inexperienced users who are not familiar with how OSB works. This can be applied to both ValueErrors in line 205 and 208-209.

Added this to all the error messages

IanHoang

Left a few additional comments but this is great work and overall looks good! Thanks for doing this.

Signed-off-by: Michael Oviedo <mikeovi@amazon.com>

add aggregate feature

1d99f20

Signed-off-by: Michael Oviedo <mikeovi@amazon.com>

OVI3D0 force-pushed the add-aggregate-command branch 3 times, most recently from 95cb97e to ba9ae77 Compare September 11, 2024 23:55

OVI3D0 commented Sep 12, 2024

View reviewed changes

OVI3D0 marked this pull request as ready for review September 12, 2024 00:25

OVI3D0 requested review from IanHoang, gkamat, beaioun, cgchinmay, rishabh6788 and VijayanB as code owners September 12, 2024 00:25

IanHoang reviewed Sep 12, 2024

View reviewed changes

osbenchmark/aggregator.py Outdated Show resolved Hide resolved

IanHoang reviewed Sep 12, 2024

View reviewed changes

osbenchmark/benchmark.py Show resolved Hide resolved

Merge remote-tracking branch 'upstream/main' into add-aggregate-command

9526f06

OVI3D0 force-pushed the add-aggregate-command branch from ba9ae77 to 9526f06 Compare September 12, 2024 16:34

IanHoang reviewed Sep 12, 2024

View reviewed changes

osbenchmark/aggregator.py Outdated Show resolved Hide resolved

IanHoang reviewed Sep 12, 2024

View reviewed changes

osbenchmark/aggregator.py Outdated Show resolved Hide resolved

IanHoang requested changes Sep 12, 2024

View reviewed changes

OVI3D0 force-pushed the add-aggregate-command branch 2 times, most recently from b5da7f2 to 36cbcc7 Compare September 17, 2024 21:24

revision for aggregate PR

9b35fcd

Signed-off-by: Michael Oviedo <mikeovi@amazon.com>

OVI3D0 force-pushed the add-aggregate-command branch from 36cbcc7 to 9b35fcd Compare September 17, 2024 23:06

OVI3D0 requested a review from IanHoang September 18, 2024 16:43

IanHoang reviewed Sep 18, 2024

View reviewed changes

IanHoang approved these changes Sep 18, 2024

View reviewed changes

update error handling and error messages

504d0b7

Signed-off-by: Michael Oviedo <mikeovi@amazon.com>

IanHoang merged commit c6ec0a1 into opensearch-project:main Sep 18, 2024
10 checks passed

OVI3D0 mentioned this pull request Sep 19, 2024

[FEATURE] Add configurable iterations parameters per operation opensearch-project/opensearch-benchmark-workloads#393

Closed

12 tasks

OVI3D0 deleted the add-aggregate-command branch September 23, 2024 17:24

IanHoang mentioned this pull request Oct 2, 2024

[META] Introduce Aggregate Command #628

Closed

OVI3D0 mentioned this pull request Oct 11, 2024

add documentation for new aggregate command + auto aggregation opensearch-project/documentation-website#8521

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add aggregate command #638

Add aggregate command #638

OVI3D0 commented Sep 11, 2024 •

edited

Loading

OVI3D0 Sep 12, 2024

IanHoang Sep 12, 2024 •

edited

Loading

OVI3D0 Sep 17, 2024

IanHoang Sep 12, 2024 •

edited

Loading

IanHoang Sep 12, 2024

IanHoang Sep 12, 2024

IanHoang Sep 16, 2024

OVI3D0 Sep 17, 2024

OVI3D0 Sep 17, 2024

OVI3D0 Sep 17, 2024

IanHoang Sep 12, 2024 •

edited

Loading

OVI3D0 Sep 17, 2024

IanHoang left a comment •

edited

Loading

IanHoang Sep 12, 2024

IanHoang Sep 12, 2024

IanHoang Sep 12, 2024

IanHoang Sep 12, 2024

OVI3D0 Sep 17, 2024 •

edited

Loading

IanHoang Sep 12, 2024

OVI3D0 Sep 17, 2024

IanHoang Sep 12, 2024

OVI3D0 Sep 17, 2024

IanHoang Sep 18, 2024

IanHoang commented Sep 16, 2024

OVI3D0 commented Sep 17, 2024

IanHoang Sep 18, 2024

OVI3D0 Sep 18, 2024

IanHoang Sep 18, 2024

IanHoang Sep 18, 2024

OVI3D0 Sep 18, 2024

IanHoang Sep 18, 2024 •

edited

Loading

OVI3D0 Sep 18, 2024

IanHoang left a comment

Add aggregate command #638

Add aggregate command #638

Conversation

OVI3D0 commented Sep 11, 2024 • edited Loading

Description

Usage

Issues Resolved

Testing

Choose a reason for hiding this comment

IanHoang Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IanHoang Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IanHoang Sep 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IanHoang left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OVI3D0 Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IanHoang commented Sep 16, 2024

OVI3D0 commented Sep 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IanHoang Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IanHoang left a comment

Choose a reason for hiding this comment

OVI3D0 commented Sep 11, 2024 •

edited

Loading

IanHoang Sep 12, 2024 •

edited

Loading

IanHoang Sep 12, 2024 •

edited

Loading

IanHoang Sep 12, 2024 •

edited

Loading

IanHoang left a comment •

edited

Loading

OVI3D0 Sep 17, 2024 •

edited

Loading

IanHoang Sep 18, 2024 •

edited

Loading