-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring _run_analysis in composite_analysis #1268 #1286
Refactoring _run_analysis in composite_analysis #1268 #1286
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some hints for make this work. Thanks for tackling :)
return [marginalized_data[i] for i in sorted(marginalized_data.keys())] | ||
|
||
@staticmethod | ||
def _format_memory(datum: Dict, composite_clbits: List): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also replace this with the merginal_memory function? This is also implemented with Rust, and must be faster.
@@ -795,6 +797,116 @@ def add_data( | |||
self._add_result_data(datum) | |||
else: | |||
raise TypeError(f"Invalid data type {type(datum)}.") | |||
|
|||
def _add_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to update this line instead. Having a dedicated method for marginalization is also okey.
https://github.com/Qiskit-Extensions/qiskit-experiments/blob/f3a26684196b8acb2b0ca942cb7b262ce33c9c3c/qiskit_experiments/framework/experiment_data.py#L793
The returned Qiskit Result
object is formatted into a canonical dictionary format, which contains experiment data and metadata (the metadata is created by the experiment class). When you run a composite experiment, your experiment object gains a special metadata field
https://github.com/Qiskit-Extensions/qiskit-experiments/blob/f3a26684196b8acb2b0ca942cb7b262ce33c9c3c/qiskit_experiments/framework/composite/composite_experiment.py#L193-L201
The structure looks ugly when you nest composite experiment -- this indicates you need to recursively check the field, e.g. datum["metadata"]["component_metadata"][0]["metadata"]["component_metadata"][0]...
-- I want to refactor this at some point, but not for now.
You would check the datum["metadata"]
. When these fields appear, you can instantiate component experiment data and set them to the current instance. Then, you check next datum
, and crate new component data when the corresponding container is missing (for example, if you run batch experiment of two parallel experiments, first half of data contains result for half of qubits, and latter half contains the rest of qubits). This is just a clue for implementation, so you can try your own idea if you have in mind :)
with self._result_data.lock: | ||
for datum in data: | ||
if isinstance(datum, dict): | ||
if datum["metadata"].get("composite_metadata"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if datum["metadata"].get("composite_metadata"): | |
if "composite_metadata" in datum["metadata"]: |
if value is not necessary
for datum in data: | ||
if isinstance(datum, dict): | ||
if datum["metadata"].get("composite_metadata"): | ||
tmp_exp_data = ExperimentData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need another conditional branching here. Note that each element in the list contains the result of composite experiment. Let's assume a nested composite experiment consisting of three inner experiments. Here the result_expX_circY
indicates a result of Y-th circuit of experiment X. The list of dictionary you may receive would look like:
[
{result_expA_circ0, result_expB_circ0}, # loop 0
{result_expC_circ0}, # loop 1
{result_expA_circ1, result_expB_circ1}, # loop 2
{result_expC_circ1}, # loop 3
]
Experiment A and B are batched by ParallelExperiment
in this example. Now you need to create an ExperimentData
instance with three child ExperimentData
instance for A, B, C, and each child instance includes two .data()
for circuit0 and circuit 1. What currently you are implementing is making child instance for each loop.
Instead, in each loop, you need to check whether corresponding child container is already initialized, make new one if not exist otherwise just call existing one, then add marginalize data to corresponding container. More specifically,
loop1:
- Create sub-container for A and B and set them to outer container
- Marginalize count for A and B, and add marginalized count data of circuit0 to sub-container
loop2:
- Create sub-container for C and set this to outer container
- Skip marginalize because number of sub element is 1, add data to above
loop3:
- Call child data for A and B
- Marginalize and add data of circuit 1 to above
loop4:
- Call child data for C
- Add count data to above
else: | ||
raise TypeError(f"Invalid data type {type(datum)}.") | ||
|
||
def __add_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think subroutine is necessary. You can just recursively call add_data
as long as you find composite_metadata
in the circuit metadata dictionary.
@@ -792,9 +838,128 @@ def add_data( | |||
if isinstance(datum, dict): | |||
self._result_data.append(datum) | |||
elif isinstance(datum, Result): | |||
self._add_result_data(datum) | |||
if datum["metadata"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Result
is this object which doesn't implement __getitem__
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is good progress.
tmp_exp_data._set_child_data(list(experiment_seperator.values())) | ||
self._set_child_data([tmp_exp_data]) | ||
|
||
return tmp_exp_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is creating new object. Instead of preparing and returning new tmp_exp_data
, you need to update self._result_data
. Note that ExperimentData
receives job object from an experiment, and it extracts result data by itself. This means the running experiment data instance should mutate itself through the .add_data()
method.
While investigating the codebase, I found another place we need to update. Let me explain step by step. When the experiment class creates a Qiskit Job, the ExpeirmentData
receives the result data with the following steps
- Call
.add_jobs()
method, and it creates a future object with a callback._add_job_future()
to wait for result data at this line. - This callback further submits another callback
._add_job_data()
to the executor, that extract data from result. - In the
._add_job_data()
callback, it calls another callback._add_result_data()
at this line. - Finally, in the
._add_result_data()
callback, it directly updatesself._result_data
at this line.
The implementation is quite complicated right now... But please note that the final step should call this .add_data()
method with the list of dictionary, instead of appending data
in each loop. This allows ExperimentData
to bootstrap result data for component analysis.
if inner_datum[0]["metadata"]["experiment_type"] in experiment_seperator: | ||
experiment_seperator[inner_datum[0]["metadata"]["experiment_type"]].add_data(inner_datum[0]) | ||
else: | ||
experiment_seperator[inner_datum[0]["metadata"]["experiment_type"]] = ExperimentData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python defaultdict is convenient in this situation.
if isinstance(datum, dict): | ||
if "composite_metadata" in datum["metadata"]: | ||
composite_flag = True | ||
marginalized_data = self._marginalized_component_data([datum]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this call is always with a single element, it's much simpler to assume Dict
as an input instead of List[Dict]
.
marginalized_data = self._marginalized_component_data([datum]) | ||
for inner_datum in marginalized_data: | ||
#print(inner_datum) | ||
if "experiment_type" in inner_datum[0]["metadata"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to consider the case that composite experiment is nested. This experiment_type can be some composite experiment subclass. If you encounter this situation, you need to create another layer in the experiment data tree. I suggest turning the code inside this loop into a separate protected method (or inner function) so that you can call it recursively.
marginalized_datum = self._marginalized_component_data([datum]) | ||
for inner_datum in marginalized_datum: | ||
for inner_inner_datum in inner_datum: | ||
experiment_seperator[datum["metadata"]["experiment_type"]].__add_data([inner_inner_datum]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting, did you confirm the code works as expected? This is a private method and the access from the outside instance is limited. See https://stackoverflow.com/questions/9145499/calling-private-function-within-the-same-class-python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, keying on datum["metadata"]["experiment_type"]
will cause conflict because we often build a composite (parallel) experiment of the same experiment type with different qubit index. You should use composite index as you implemented below.
for inner_datum in marginalized_datum: | ||
for inner_inner_datum in inner_datum: | ||
experiment_seperator[datum["metadata"]["experiment_type"]].__add_data([inner_inner_datum]) | ||
elif "composite_metadata" in datum: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this evaluation returns True? Because "composite_metadata"
is a part of the metadata, and when the datum
is just a metadata, there is no count information in the object, and there is nothing to store.
|
||
elif isinstance(datum, Result): | ||
if datum["metadata"]: | ||
self._set_child_data(datum["metadata"]._metadata()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before Result
is not a dictionary subclass and doesn't implement __getitem__
. I guess this raises an error. You can just remove this if clause.
|
||
component_index = self.metadata.get("component_child_index", []) | ||
component_expdata = [self.child_data(i) for i in component_index] | ||
marginalized_data = self._marginalized_component_data(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to call marginalize multiple times here? You've defined experiment_separator
object but seems like the object is just discarded.
|
||
# Directly add non-job data | ||
with self._result_data.lock: | ||
tmp_exp_data = ExperimentData() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this more efficient? If data is not composite type, you don't need to instantiate new object, right? Instead of using experiment_separator
dictionary, you can do
- check if "compoite_index" is in the metadata; if not, just append the datum to
self._result_data
.- marginalize the datum
- zip loop the composite_index and marginalized datum (they should have the same length)
- get child experiment data of a composite index; if not exist (try-except), set new child data
- store marginalized datum to the child data container, i.e. through the
add_data
method of the child container; this covers nested composite case because composite check is done recursively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Musa-Sina-Ertugrul we are getting very close! Likely you have a mistake in the implementation of composite analysis. In this review round let's focus on this. After you resolve this, the skeleton of new implementation becomes almost good, and we can focus on some details for performance gain and readability.
What I have in my mind now is use of iterator of child data and Rust version of the memory formatter. But I'll write this in next review. Let's finish step by step :)
for inner_datum in datum["metadata"]["composite_metadata"]: | ||
if "composite_index" in inner_datum: | ||
for sub_expdata in composite_expdata: | ||
self.add_data(inner_datum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you wanted to handle the case a composite experiment is nested, but it may happen deeply, i.e. CompositeExpeirment(BatchExperiment(BatchExperiment(...)))
so this logic is not enough. Actually, you've already managed this case in above with
sub_expdata.add_data(sub_data)
here sub_data
is marginalized inner experiment result. If it further contains inner results, they are recursively handled by calling sub_expdata.add_data()
. Note that composite experiment doesn't need experiment results (it just runs inner analyses), so you don't need to store inner results in the outermost container. I think this reduces the size of the entire object.
for inner_datum in marginalized_datum: | ||
new_child.add_data(inner_datum) | ||
|
||
self._result_data.append(datum) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also remove this line, because composite analysis doesn't require results data. If some tests fail with this change, you can remove them and write about new behavior in the release note. You also need to add new unittest to check if inner data containers are immediately created.
elif isinstance(datum, Result): | ||
self._add_result_data(datum) | ||
else: | ||
raise TypeError(f"Invalid data type {type(datum)}.") | ||
|
||
|
||
def _marginalized_component_data(self, composite_data: List[Dict]) -> List[List[Dict]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned before, with new implementation the input data length is always 1. So you can at least remove outer loop.
@@ -142,7 +142,26 @@ def run( | |||
def _run_analysis(self, experiment_data: ExperimentData): | |||
# Return list of experiment data containers for each component experiment | |||
# containing the marginalized data from the composite experiment | |||
component_expdata = self._component_experiment_data(experiment_data) | |||
component_expdata = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you should remove all of these lines. You've already marginalized the data and bootstrapped child container preparations (and this is why you needed to append datum
in self._results_data
). What you are doing here is:
(1) Prepare child data with add_data
, but keep the original (un-marginalized) data in the outer container.
(2) Ignore prepared child data and prepare child data again in composite analysis.
Indeed this is very inefficient (you just introduced extra overhead of data preparation).
In the _run_analysis
, you just need to run
if len(self.analyses) != len(experiment_data.child_data()):
raise RuntimeError(
"Number of composite analyses and child results don't match. "
"Please check your circuit metadata."
)
for inner_analysis, child_data in zip(self.analyses, experiment_data.child_data()):
inner_analysis.run(child_data, replace_results=True)
In principle you can remove almost all protected methods of the composite analysis, since you delegate the responsibility of inner data preparation to ExperimentData
with this PR. This simplification helps refactoring of composite analysis.
Co-authored-by: Naoki Kanazawa <nkanazawa1989@gmail.com>
Fix marginalize problems
Can we close this PR in favor of #1381 ? |
Summary
Please describe what this PR changes as concisely as possible. Link to the issue(s) that this addresses, if any.
#1268
Details and comments
Some details that should be in this section include:
#1268 You can check this issue all answers. This issue has been created by @nkanazawa1989
Note that this entire PR description field will be used as the commit message upon merge, so please keep it updated along with the PR. Secondary discussions, such as intermediate testing and bug statuses that do not affect the final PR, should be in the PR comments.
PR checklist (delete when all criteria are met)
CONTRIBUTING.md
.reno
if this change needs to be documented in the release notes.