Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring _run_analysis in composite_analysis #1268 #1286

Closed

Conversation

Musa-Sina-Ertugrul
Copy link

@Musa-Sina-Ertugrul Musa-Sina-Ertugrul commented Oct 14, 2023

Summary

Please describe what this PR changes as concisely as possible. Link to the issue(s) that this addresses, if any.
#1268

Details and comments

Some details that should be in this section include:

#1268 You can check this issue all answers. This issue has been created by @nkanazawa1989

  • Why this change was necessary
  • What alternative solutions were considered and why the current solution was chosen
  • What tests and documentation have been added/updated
  • What do users and developers need to know about this change

Note that this entire PR description field will be used as the commit message upon merge, so please keep it updated along with the PR. Secondary discussions, such as intermediate testing and bug statuses that do not affect the final PR, should be in the PR comments.

PR checklist (delete when all criteria are met)

  • I have read the contributing guide CONTRIBUTING.md.
  • I have added the tests to cover my changes.
  • I have updated the documentation accordingly.
  • I have added a release note file using reno if this change needs to be documented in the release notes.

@CLAassistant
Copy link

CLAassistant commented Oct 14, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some hints for make this work. Thanks for tackling :)

return [marginalized_data[i] for i in sorted(marginalized_data.keys())]

@staticmethod
def _format_memory(datum: Dict, composite_clbits: List):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also replace this with the merginal_memory function? This is also implemented with Rust, and must be faster.

@@ -795,6 +797,116 @@ def add_data(
self._add_result_data(datum)
else:
raise TypeError(f"Invalid data type {type(datum)}.")

def _add_data(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update this line instead. Having a dedicated method for marginalization is also okey.
https://github.com/Qiskit-Extensions/qiskit-experiments/blob/f3a26684196b8acb2b0ca942cb7b262ce33c9c3c/qiskit_experiments/framework/experiment_data.py#L793

The returned Qiskit Result object is formatted into a canonical dictionary format, which contains experiment data and metadata (the metadata is created by the experiment class). When you run a composite experiment, your experiment object gains a special metadata field
https://github.com/Qiskit-Extensions/qiskit-experiments/blob/f3a26684196b8acb2b0ca942cb7b262ce33c9c3c/qiskit_experiments/framework/composite/composite_experiment.py#L193-L201
The structure looks ugly when you nest composite experiment -- this indicates you need to recursively check the field, e.g. datum["metadata"]["component_metadata"][0]["metadata"]["component_metadata"][0]... -- I want to refactor this at some point, but not for now.

You would check the datum["metadata"]. When these fields appear, you can instantiate component experiment data and set them to the current instance. Then, you check next datum, and crate new component data when the corresponding container is missing (for example, if you run batch experiment of two parallel experiments, first half of data contains result for half of qubits, and latter half contains the rest of qubits). This is just a clue for implementation, so you can try your own idea if you have in mind :)

@nkanazawa1989 nkanazawa1989 marked this pull request as draft October 16, 2023 01:50
with self._result_data.lock:
for datum in data:
if isinstance(datum, dict):
if datum["metadata"].get("composite_metadata"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if datum["metadata"].get("composite_metadata"):
if "composite_metadata" in datum["metadata"]:

if value is not necessary

for datum in data:
if isinstance(datum, dict):
if datum["metadata"].get("composite_metadata"):
tmp_exp_data = ExperimentData()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need another conditional branching here. Note that each element in the list contains the result of composite experiment. Let's assume a nested composite experiment consisting of three inner experiments. Here the result_expX_circY indicates a result of Y-th circuit of experiment X. The list of dictionary you may receive would look like:

[
    {result_expA_circ0, result_expB_circ0},  # loop 0
    {result_expC_circ0},  # loop 1
    {result_expA_circ1, result_expB_circ1},  # loop 2
    {result_expC_circ1},  # loop 3
]

Experiment A and B are batched by ParallelExperiment in this example. Now you need to create an ExperimentData instance with three child ExperimentData instance for A, B, C, and each child instance includes two .data() for circuit0 and circuit 1. What currently you are implementing is making child instance for each loop.

Instead, in each loop, you need to check whether corresponding child container is already initialized, make new one if not exist otherwise just call existing one, then add marginalize data to corresponding container. More specifically,

loop1: 
  - Create sub-container for A and B and set them to outer container
  - Marginalize count for A and B, and add marginalized count data of circuit0 to sub-container
loop2:
  - Create sub-container for C and set this to outer container
  - Skip marginalize because number of sub element is 1, add data to above
loop3:
  - Call child data for A and B
  - Marginalize and add data of circuit 1 to above
loop4:
  - Call child data for C
  - Add count data to above

else:
raise TypeError(f"Invalid data type {type(datum)}.")

def __add_data(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think subroutine is necessary. You can just recursively call add_data as long as you find composite_metadata in the circuit metadata dictionary.

@@ -792,9 +838,128 @@ def add_data(
if isinstance(datum, dict):
self._result_data.append(datum)
elif isinstance(datum, Result):
self._add_result_data(datum)
if datum["metadata"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result is this object which doesn't implement __getitem__ method.

Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is good progress.

tmp_exp_data._set_child_data(list(experiment_seperator.values()))
self._set_child_data([tmp_exp_data])

return tmp_exp_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is creating new object. Instead of preparing and returning new tmp_exp_data, you need to update self._result_data. Note that ExperimentData receives job object from an experiment, and it extracts result data by itself. This means the running experiment data instance should mutate itself through the .add_data() method.

While investigating the codebase, I found another place we need to update. Let me explain step by step. When the experiment class creates a Qiskit Job, the ExpeirmentData receives the result data with the following steps

  • Call .add_jobs() method, and it creates a future object with a callback ._add_job_future() to wait for result data at this line.
  • This callback further submits another callback ._add_job_data() to the executor, that extract data from result.
  • In the ._add_job_data() callback, it calls another callback ._add_result_data() at this line.
  • Finally, in the ._add_result_data() callback, it directly updates self._result_data at this line.

The implementation is quite complicated right now... But please note that the final step should call this .add_data() method with the list of dictionary, instead of appending data in each loop. This allows ExperimentData to bootstrap result data for component analysis.

if inner_datum[0]["metadata"]["experiment_type"] in experiment_seperator:
experiment_seperator[inner_datum[0]["metadata"]["experiment_type"]].add_data(inner_datum[0])
else:
experiment_seperator[inner_datum[0]["metadata"]["experiment_type"]] = ExperimentData()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python defaultdict is convenient in this situation.

if isinstance(datum, dict):
if "composite_metadata" in datum["metadata"]:
composite_flag = True
marginalized_data = self._marginalized_component_data([datum])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this call is always with a single element, it's much simpler to assume Dict as an input instead of List[Dict].

marginalized_data = self._marginalized_component_data([datum])
for inner_datum in marginalized_data:
#print(inner_datum)
if "experiment_type" in inner_datum[0]["metadata"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to consider the case that composite experiment is nested. This experiment_type can be some composite experiment subclass. If you encounter this situation, you need to create another layer in the experiment data tree. I suggest turning the code inside this loop into a separate protected method (or inner function) so that you can call it recursively.

marginalized_datum = self._marginalized_component_data([datum])
for inner_datum in marginalized_datum:
for inner_inner_datum in inner_datum:
experiment_seperator[datum["metadata"]["experiment_type"]].__add_data([inner_inner_datum])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting, did you confirm the code works as expected? This is a private method and the access from the outside instance is limited. See https://stackoverflow.com/questions/9145499/calling-private-function-within-the-same-class-python.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, keying on datum["metadata"]["experiment_type"] will cause conflict because we often build a composite (parallel) experiment of the same experiment type with different qubit index. You should use composite index as you implemented below.

for inner_datum in marginalized_datum:
for inner_inner_datum in inner_datum:
experiment_seperator[datum["metadata"]["experiment_type"]].__add_data([inner_inner_datum])
elif "composite_metadata" in datum:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this evaluation returns True? Because "composite_metadata" is a part of the metadata, and when the datum is just a metadata, there is no count information in the object, and there is nothing to store.


elif isinstance(datum, Result):
if datum["metadata"]:
self._set_child_data(datum["metadata"]._metadata())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before Result is not a dictionary subclass and doesn't implement __getitem__. I guess this raises an error. You can just remove this if clause.


component_index = self.metadata.get("component_child_index", [])
component_expdata = [self.child_data(i) for i in component_index]
marginalized_data = self._marginalized_component_data(data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to call marginalize multiple times here? You've defined experiment_separator object but seems like the object is just discarded.


# Directly add non-job data
with self._result_data.lock:
tmp_exp_data = ExperimentData()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this more efficient? If data is not composite type, you don't need to instantiate new object, right? Instead of using experiment_separator dictionary, you can do

  • check if "compoite_index" is in the metadata; if not, just append the datum to self._result_data.
    • marginalize the datum
    • zip loop the composite_index and marginalized datum (they should have the same length)
      • get child experiment data of a composite index; if not exist (try-except), set new child data
      • store marginalized datum to the child data container, i.e. through the add_data method of the child container; this covers nested composite case because composite check is done recursively.

Copy link
Collaborator

@nkanazawa1989 nkanazawa1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Musa-Sina-Ertugrul we are getting very close! Likely you have a mistake in the implementation of composite analysis. In this review round let's focus on this. After you resolve this, the skeleton of new implementation becomes almost good, and we can focus on some details for performance gain and readability.

What I have in my mind now is use of iterator of child data and Rust version of the memory formatter. But I'll write this in next review. Let's finish step by step :)

Comment on lines 807 to 810
for inner_datum in datum["metadata"]["composite_metadata"]:
if "composite_index" in inner_datum:
for sub_expdata in composite_expdata:
self.add_data(inner_datum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you wanted to handle the case a composite experiment is nested, but it may happen deeply, i.e. CompositeExpeirment(BatchExperiment(BatchExperiment(...))) so this logic is not enough. Actually, you've already managed this case in above with

sub_expdata.add_data(sub_data)

here sub_data is marginalized inner experiment result. If it further contains inner results, they are recursively handled by calling sub_expdata.add_data(). Note that composite experiment doesn't need experiment results (it just runs inner analyses), so you don't need to store inner results in the outermost container. I think this reduces the size of the entire object.

qiskit_experiments/framework/experiment_data.py Outdated Show resolved Hide resolved
for inner_datum in marginalized_datum:
new_child.add_data(inner_datum)

self._result_data.append(datum)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also remove this line, because composite analysis doesn't require results data. If some tests fail with this change, you can remove them and write about new behavior in the release note. You also need to add new unittest to check if inner data containers are immediately created.

elif isinstance(datum, Result):
self._add_result_data(datum)
else:
raise TypeError(f"Invalid data type {type(datum)}.")


def _marginalized_component_data(self, composite_data: List[Dict]) -> List[List[Dict]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, with new implementation the input data length is always 1. So you can at least remove outer loop.

@@ -142,7 +142,26 @@ def run(
def _run_analysis(self, experiment_data: ExperimentData):
# Return list of experiment data containers for each component experiment
# containing the marginalized data from the composite experiment
component_expdata = self._component_experiment_data(experiment_data)
component_expdata = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you should remove all of these lines. You've already marginalized the data and bootstrapped child container preparations (and this is why you needed to append datum in self._results_data). What you are doing here is:
(1) Prepare child data with add_data, but keep the original (un-marginalized) data in the outer container.
(2) Ignore prepared child data and prepare child data again in composite analysis.
Indeed this is very inefficient (you just introduced extra overhead of data preparation).

In the _run_analysis, you just need to run

if len(self.analyses) != len(experiment_data.child_data()):
    raise RuntimeError(
        "Number of composite analyses and child results don't match. "
        "Please check your circuit metadata."
    )
for inner_analysis, child_data in zip(self.analyses, experiment_data.child_data()):
    inner_analysis.run(child_data, replace_results=True)

In principle you can remove almost all protected methods of the composite analysis, since you delegate the responsibility of inner data preparation to ExperimentData with this PR. This simplification helps refactoring of composite analysis.

@nkanazawa1989
Copy link
Collaborator

Can we close this PR in favor of #1381 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants