Refactoring _run_analysis in composite_analysis #1268 #1286

Musa-Sina-Ertugrul · 2023-10-14T09:14:17Z

Summary

Please describe what this PR changes as concisely as possible. Link to the issue(s) that this addresses, if any.
#1268

Details and comments

Some details that should be in this section include:

#1268 You can check this issue all answers. This issue has been created by @nkanazawa1989

Why this change was necessary
What alternative solutions were considered and why the current solution was chosen
What tests and documentation have been added/updated
What do users and developers need to know about this change

Note that this entire PR description field will be used as the commit message upon merge, so please keep it updated along with the PR. Secondary discussions, such as intermediate testing and bug statuses that do not affect the final PR, should be in the PR comments.

PR checklist (delete when all criteria are met)

I have read the contributing guide CONTRIBUTING.md.
I have added the tests to cover my changes.
I have updated the documentation accordingly.
I have added a release note file using reno if this change needs to be documented in the release notes.

CLAassistant · 2023-10-14T09:14:23Z

All committers have signed the CLA.

nkanazawa1989

Just some hints for make this work. Thanks for tackling :)

nkanazawa1989 · 2023-10-16T01:32:20Z

qiskit_experiments/framework/experiment_data.py

+        return [marginalized_data[i] for i in sorted(marginalized_data.keys())]
+
+    @staticmethod
+    def _format_memory(datum: Dict, composite_clbits: List):


Can you also replace this with the merginal_memory function? This is also implemented with Rust, and must be faster.

nkanazawa1989 · 2023-10-16T01:46:40Z

qiskit_experiments/framework/experiment_data.py

@@ -795,6 +797,116 @@ def add_data(
                    self._add_result_data(datum)
                else:
                    raise TypeError(f"Invalid data type {type(datum)}.")
+
+    def _add_data(


You need to update this line instead. Having a dedicated method for marginalization is also okey.
https://github.com/Qiskit-Extensions/qiskit-experiments/blob/f3a26684196b8acb2b0ca942cb7b262ce33c9c3c/qiskit_experiments/framework/experiment_data.py#L793

The returned Qiskit Result object is formatted into a canonical dictionary format, which contains experiment data and metadata (the metadata is created by the experiment class). When you run a composite experiment, your experiment object gains a special metadata field
https://github.com/Qiskit-Extensions/qiskit-experiments/blob/f3a26684196b8acb2b0ca942cb7b262ce33c9c3c/qiskit_experiments/framework/composite/composite_experiment.py#L193-L201
The structure looks ugly when you nest composite experiment -- this indicates you need to recursively check the field, e.g. datum["metadata"]["component_metadata"][0]["metadata"]["component_metadata"][0]... -- I want to refactor this at some point, but not for now.

You would check the datum["metadata"]. When these fields appear, you can instantiate component experiment data and set them to the current instance. Then, you check next datum, and crate new component data when the corresponding container is missing (for example, if you run batch experiment of two parallel experiments, first half of data contains result for half of qubits, and latter half contains the rest of qubits). This is just a clue for implementation, so you can try your own idea if you have in mind :)

nkanazawa1989 · 2023-10-31T04:11:31Z

qiskit_experiments/framework/experiment_data.py

+        with self._result_data.lock:
+            for datum in data:
+                if isinstance(datum, dict):
+                    if datum["metadata"].get("composite_metadata"):


Suggested change

if datum["metadata"].get("composite_metadata"):

if "composite_metadata" in datum["metadata"]:

if value is not necessary

nkanazawa1989 · 2023-10-31T04:27:53Z

qiskit_experiments/framework/experiment_data.py

+            for datum in data:
+                if isinstance(datum, dict):
+                    if datum["metadata"].get("composite_metadata"):
+                        tmp_exp_data = ExperimentData()


You need another conditional branching here. Note that each element in the list contains the result of composite experiment. Let's assume a nested composite experiment consisting of three inner experiments. Here the result_expX_circY indicates a result of Y-th circuit of experiment X. The list of dictionary you may receive would look like:

[ {result_expA_circ0, result_expB_circ0}, # loop 0 {result_expC_circ0}, # loop 1 {result_expA_circ1, result_expB_circ1}, # loop 2 {result_expC_circ1}, # loop 3 ]

Experiment A and B are batched by ParallelExperiment in this example. Now you need to create an ExperimentData instance with three child ExperimentData instance for A, B, C, and each child instance includes two .data() for circuit0 and circuit 1. What currently you are implementing is making child instance for each loop.

Instead, in each loop, you need to check whether corresponding child container is already initialized, make new one if not exist otherwise just call existing one, then add marginalize data to corresponding container. More specifically,

loop1: - Create sub-container for A and B and set them to outer container - Marginalize count for A and B, and add marginalized count data of circuit0 to sub-container loop2: - Create sub-container for C and set this to outer container - Skip marginalize because number of sub element is 1, add data to above loop3: - Call child data for A and B - Marginalize and add data of circuit 1 to above loop4: - Call child data for C - Add count data to above

nkanazawa1989 · 2023-10-31T04:31:15Z

qiskit_experiments/framework/experiment_data.py

+                else:
+                    raise TypeError(f"Invalid data type {type(datum)}.")
+
+    def __add_data(


I don't think subroutine is necessary. You can just recursively call add_data as long as you find composite_metadata in the circuit metadata dictionary.

nkanazawa1989 · 2023-10-31T04:35:06Z

qiskit_experiments/framework/experiment_data.py

@@ -792,9 +838,128 @@ def add_data(
                if isinstance(datum, dict):
                    self._result_data.append(datum)
                elif isinstance(datum, Result):
-                    self._add_result_data(datum)
+                    if datum["metadata"]:


Result is this object which doesn't implement __getitem__ method.

nkanazawa1989

Thanks! This is good progress.

nkanazawa1989 · 2023-11-20T18:36:52Z

qiskit_experiments/framework/experiment_data.py

+                tmp_exp_data._set_child_data(list(experiment_seperator.values()))
+                self._set_child_data([tmp_exp_data])
+
+        return tmp_exp_data


This is creating new object. Instead of preparing and returning new tmp_exp_data, you need to update self._result_data. Note that ExperimentData receives job object from an experiment, and it extracts result data by itself. This means the running experiment data instance should mutate itself through the .add_data() method.

While investigating the codebase, I found another place we need to update. Let me explain step by step. When the experiment class creates a Qiskit Job, the ExpeirmentData receives the result data with the following steps

Call .add_jobs() method, and it creates a future object with a callback ._add_job_future() to wait for result data at this line.

This callback further submits another callback ._add_job_data() to the executor, that extract data from result.

In the ._add_job_data() callback, it calls another callback ._add_result_data() at this line.

Finally, in the ._add_result_data() callback, it directly updates self._result_data at this line.

The implementation is quite complicated right now... But please note that the final step should call this .add_data() method with the list of dictionary, instead of appending data in each loop. This allows ExperimentData to bootstrap result data for component analysis.

nkanazawa1989 · 2023-11-20T18:43:00Z

qiskit_experiments/framework/experiment_data.py

+                                if inner_datum[0]["metadata"]["experiment_type"] in experiment_seperator:
+                                    experiment_seperator[inner_datum[0]["metadata"]["experiment_type"]].add_data(inner_datum[0])
+                                else:
+                                    experiment_seperator[inner_datum[0]["metadata"]["experiment_type"]] = ExperimentData()


Python defaultdict is convenient in this situation.

nkanazawa1989 · 2023-11-20T18:49:27Z

qiskit_experiments/framework/experiment_data.py

+                if isinstance(datum, dict):
+                    if "composite_metadata" in datum["metadata"]:
+                        composite_flag = True
+                        marginalized_data = self._marginalized_component_data([datum])


Since this call is always with a single element, it's much simpler to assume Dict as an input instead of List[Dict].

nkanazawa1989 · 2023-11-20T18:55:20Z

qiskit_experiments/framework/experiment_data.py

+                        marginalized_data = self._marginalized_component_data([datum])
+                        for inner_datum in marginalized_data:
+                            #print(inner_datum)
+                            if "experiment_type" in inner_datum[0]["metadata"]:


You need to consider the case that composite experiment is nested. This experiment_type can be some composite experiment subclass. If you encounter this situation, you need to create another layer in the experiment data tree. I suggest turning the code inside this loop into a separate protected method (or inner function) so that you can call it recursively.

…mmunity#1268

nkanazawa1989 · 2023-12-18T01:12:08Z

qiskit_experiments/framework/experiment_data.py

+                        marginalized_datum = self._marginalized_component_data([datum])
+                        for inner_datum in marginalized_datum:
+                            for inner_inner_datum in inner_datum:
+                                experiment_seperator[datum["metadata"]["experiment_type"]].__add_data([inner_inner_datum])


Hmm interesting, did you confirm the code works as expected? This is a private method and the access from the outside instance is limited. See https://stackoverflow.com/questions/9145499/calling-private-function-within-the-same-class-python.

In addition, keying on datum["metadata"]["experiment_type"] will cause conflict because we often build a composite (parallel) experiment of the same experiment type with different qubit index. You should use composite index as you implemented below.

nkanazawa1989 · 2023-12-18T01:15:23Z

qiskit_experiments/framework/experiment_data.py

+                        for inner_datum in marginalized_datum:
+                            for inner_inner_datum in inner_datum:
+                                experiment_seperator[datum["metadata"]["experiment_type"]].__add_data([inner_inner_datum])
+                    elif "composite_metadata" in datum:


When this evaluation returns True? Because "composite_metadata" is a part of the metadata, and when the datum is just a metadata, there is no count information in the object, and there is nothing to store.

nkanazawa1989 · 2023-12-18T01:20:23Z

qiskit_experiments/framework/experiment_data.py

+
+                elif isinstance(datum, Result):
+                    if datum["metadata"]:
+                        self._set_child_data(datum["metadata"]._metadata())


As I said before Result is not a dictionary subclass and doesn't implement __getitem__. I guess this raises an error. You can just remove this if clause.

nkanazawa1989 · 2023-12-18T01:27:15Z

qiskit_experiments/framework/experiment_data.py

+
+                component_index = self.metadata.get("component_child_index", [])
+                component_expdata = [self.child_data(i) for i in component_index]
+                marginalized_data = self._marginalized_component_data(data)


Why do you need to call marginalize multiple times here? You've defined experiment_separator object but seems like the object is just discarded.

nkanazawa1989 · 2023-12-18T01:29:51Z

qiskit_experiments/framework/experiment_data.py

+
+        # Directly add non-job data
+        with self._result_data.lock:
+            tmp_exp_data = ExperimentData()


Can you make this more efficient? If data is not composite type, you don't need to instantiate new object, right? Instead of using experiment_separator dictionary, you can do

check if "compoite_index" is in the metadata; if not, just append the datum to self._result_data.

marginalize the datum

zip loop the composite_index and marginalized datum (they should have the same length)

get child experiment data of a composite index; if not exist (try-except), set new child data

store marginalized datum to the child data container, i.e. through the add_data method of the child container; this covers nested composite case because composite check is done recursively.

…mmunity#1268

nkanazawa1989

Hi @Musa-Sina-Ertugrul we are getting very close! Likely you have a mistake in the implementation of composite analysis. In this review round let's focus on this. After you resolve this, the skeleton of new implementation becomes almost good, and we can focus on some details for performance gain and readability.

What I have in my mind now is use of iterator of child data and Rust version of the memory formatter. But I'll write this in next review. Let's finish step by step :)

nkanazawa1989 · 2023-12-22T00:23:14Z

qiskit_experiments/framework/experiment_data.py

+                            for inner_datum in datum["metadata"]["composite_metadata"]:
+                                if "composite_index" in inner_datum:
+                                    for sub_expdata in composite_expdata:
+                                        self.add_data(inner_datum)


I guess you wanted to handle the case a composite experiment is nested, but it may happen deeply, i.e. CompositeExpeirment(BatchExperiment(BatchExperiment(...))) so this logic is not enough. Actually, you've already managed this case in above with

sub_expdata.add_data(sub_data)

here sub_data is marginalized inner experiment result. If it further contains inner results, they are recursively handled by calling sub_expdata.add_data(). Note that composite experiment doesn't need experiment results (it just runs inner analyses), so you don't need to store inner results in the outermost container. I think this reduces the size of the entire object.

qiskit_experiments/framework/experiment_data.py

nkanazawa1989 · 2023-12-22T00:36:27Z

qiskit_experiments/framework/experiment_data.py

+                            for inner_datum in marginalized_datum:
+                                new_child.add_data(inner_datum)
+
+                        self._result_data.append(datum)


You can also remove this line, because composite analysis doesn't require results data. If some tests fail with this change, you can remove them and write about new behavior in the release note. You also need to add new unittest to check if inner data containers are immediately created.

nkanazawa1989 · 2023-12-22T00:39:16Z

qiskit_experiments/framework/experiment_data.py

                elif isinstance(datum, Result):
                    self._add_result_data(datum)
                else:
                    raise TypeError(f"Invalid data type {type(datum)}.")

+
+    def _marginalized_component_data(self, composite_data: List[Dict]) -> List[List[Dict]]:


As I mentioned before, with new implementation the input data length is always 1. So you can at least remove outer loop.

nkanazawa1989 · 2023-12-22T00:49:28Z

qiskit_experiments/framework/composite/composite_analysis.py

@@ -142,7 +142,26 @@ def run(
    def _run_analysis(self, experiment_data: ExperimentData):
        # Return list of experiment data containers for each component experiment
        # containing the marginalized data from the composite experiment
-        component_expdata = self._component_experiment_data(experiment_data)
+        component_expdata = []


Actually you should remove all of these lines. You've already marginalized the data and bootstrapped child container preparations (and this is why you needed to append datum in self._results_data). What you are doing here is:
(1) Prepare child data with add_data, but keep the original (un-marginalized) data in the outer container.
(2) Ignore prepared child data and prepare child data again in composite analysis.
Indeed this is very inefficient (you just introduced extra overhead of data preparation).

In the _run_analysis, you just need to run

if len(self.analyses) != len(experiment_data.child_data()): raise RuntimeError( "Number of composite analyses and child results don't match. " "Please check your circuit metadata." ) for inner_analysis, child_data in zip(self.analyses, experiment_data.child_data()): inner_analysis.run(child_data, replace_results=True)

In principle you can remove almost all protected methods of the composite analysis, since you delegate the responsibility of inner data preparation to ExperimentData with this PR. This simplification helps refactoring of composite analysis.

Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>

…ing analysis

Fix marginalize problems

nkanazawa1989 · 2024-02-15T06:19:30Z

Can we close this PR in favor of #1381 ?

Refactoring _run_analysis in composite_analysis qiskit-community#1268

d526f89

nkanazawa1989 reviewed Oct 16, 2023

View reviewed changes

nkanazawa1989 marked this pull request as draft October 16, 2023 01:50

Musa-Sina-Ertugrul added 6 commits October 17, 2023 15:44

Updated according to @nkanazawa1989 's suggestion qiskit-community#1268

09a3c99

Updated _add_data qiskit-community#1268

24b210c

Updated add_data for early initialization qiskit-community#1268

cbf544a

passed test to revert changes

e0924af

commit before pull

145ae45

Updated add_data method qiskit-community#1268

5c59e08

nkanazawa1989 reviewed Oct 31, 2023

View reviewed changes

Musa-Sina-Ertugrul added 3 commits November 4, 2023 16:16

Updated add_data method qiskit-community#1268

cc0b89a

Passed test new start

2dbba8a

Updated add_data tests passed qiskit-community#1268

e7f46c3

nkanazawa1989 reviewed Nov 20, 2023

View reviewed changes

Musa-Sina-Ertugrul and others added 6 commits December 10, 2023 14:49

Merge branch 'Qiskit-Extensions:main' into main

1c803ae

Updated add_data tests passed qiskit-community#1268

9eb2dba

Updated add_data and deprecated _add_data qiskit-community#1268

0bd3a18

Updated add_data and _add_result_data, deprecated _add_data qiskit-co…

5e4b9d2

…mmunity#1268

Updated add_data and _add_result_data, deprecated _add_data qiskit-co…

f752a4f

…mmunity#1268

Updated add_data qiskit-community#1268

dd257a2

nkanazawa1989 reviewed Dec 18, 2023

View reviewed changes

Musa-Sina-Ertugrul added 7 commits December 18, 2023 22:40

Updated add_data, _run_analysis, composite_test qiskit-community#1268

c79e888

commit before second approach

8f21278

Tests passed , Finished second approach add_data qiskit-community#1268

73db5bd

Updated add_data qiskit-community#1268

1ed676e

Tests passed second approach, Updated add_data qiskit-community#1268

745669f

Test passed, recursive approach started qiskit-community#1268

fd6df6c

Tests passed , Updated recursive approach, Updated add_data qiskit-co…

0565a1b

…mmunity#1268

nkanazawa1989 requested changes Dec 22, 2023

View reviewed changes

Musa-Sina-Ertugrul and others added 9 commits December 22, 2023 14:29

Merge branch 'Qiskit-Extensions:main' into main

a54a49f

Update qiskit_experiments/framework/experiment_data.py

814c22a

Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>

Started on new suggestions, suggestion 1 finished qiskit-community#1268

229692f

Waiting respond, not bootstrapped exp_data appear suddenly after runn…

288d18f

…ing analysis

Fix marginalize problems

a3abf4d

Merge branch 'Qiskit-Extensions:main' into main

02906dc

Merge pull request #1 from nkanazawa1989/musa-pr-1286-followup

daa36dc

Fix marginalize problems

Merge branch 'Qiskit-Extensions:main' into main

fdc276a

Merge branch 'Qiskit-Extensions:main' into main

b906461

Musa-Sina-Ertugrul closed this Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring _run_analysis in composite_analysis #1268 #1286

Refactoring _run_analysis in composite_analysis #1268 #1286

Musa-Sina-Ertugrul commented Oct 14, 2023 •

edited by nkanazawa1989

Loading

CLAassistant commented Oct 14, 2023 •

edited

Loading

nkanazawa1989 left a comment

nkanazawa1989 Oct 16, 2023

nkanazawa1989 Oct 16, 2023

nkanazawa1989 Oct 31, 2023

nkanazawa1989 Oct 31, 2023

nkanazawa1989 Oct 31, 2023

nkanazawa1989 Oct 31, 2023

nkanazawa1989 left a comment

nkanazawa1989 Nov 20, 2023

nkanazawa1989 Nov 20, 2023

nkanazawa1989 Nov 20, 2023

nkanazawa1989 Nov 20, 2023

nkanazawa1989 Dec 18, 2023

nkanazawa1989 Dec 18, 2023

nkanazawa1989 Dec 18, 2023

nkanazawa1989 Dec 18, 2023

nkanazawa1989 Dec 18, 2023

nkanazawa1989 Dec 18, 2023

nkanazawa1989 left a comment

nkanazawa1989 Dec 22, 2023

nkanazawa1989 Dec 22, 2023

nkanazawa1989 Dec 22, 2023

nkanazawa1989 Dec 22, 2023

nkanazawa1989 commented Feb 15, 2024

	if datum["metadata"].get("composite_metadata"):
	if "composite_metadata" in datum["metadata"]:

Refactoring _run_analysis in composite_analysis #1268 #1286

Refactoring _run_analysis in composite_analysis #1268 #1286

Conversation

Musa-Sina-Ertugrul commented Oct 14, 2023 • edited by nkanazawa1989 Loading

Summary

Details and comments

PR checklist (delete when all criteria are met)

CLAassistant commented Oct 14, 2023 • edited Loading

nkanazawa1989 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkanazawa1989 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkanazawa1989 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkanazawa1989 commented Feb 15, 2024

Musa-Sina-Ertugrul commented Oct 14, 2023 •

edited by nkanazawa1989

Loading

CLAassistant commented Oct 14, 2023 •

edited

Loading