Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring _run_analysis in composite_analysis #1268 #1286

Closed
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
d526f89
Refactoring _run_analysis in composite_analysis #1268
Musa-Sina-Ertugrul Oct 14, 2023
09a3c99
Updated according to @nkanazawa1989 's suggestion #1268
Musa-Sina-Ertugrul Oct 17, 2023
24b210c
Updated _add_data #1268
Musa-Sina-Ertugrul Oct 24, 2023
cbf544a
Updated add_data for early initialization #1268
Musa-Sina-Ertugrul Oct 24, 2023
e0924af
passed test to revert changes
Musa-Sina-Ertugrul Oct 27, 2023
145ae45
commit before pull
Musa-Sina-Ertugrul Oct 29, 2023
5c59e08
Updated add_data method #1268
Musa-Sina-Ertugrul Oct 29, 2023
cc0b89a
Updated add_data method #1268
Musa-Sina-Ertugrul Nov 4, 2023
2dbba8a
Passed test new start
Musa-Sina-Ertugrul Nov 15, 2023
e7f46c3
Updated add_data tests passed #1268
Musa-Sina-Ertugrul Nov 15, 2023
1c803ae
Merge branch 'Qiskit-Extensions:main' into main
Musa-Sina-Ertugrul Dec 10, 2023
9eb2dba
Updated add_data tests passed #1268
Musa-Sina-Ertugrul Nov 15, 2023
0bd3a18
Updated add_data and deprecated _add_data #1268
Musa-Sina-Ertugrul Dec 10, 2023
5e4b9d2
Updated add_data and _add_result_data, deprecated _add_data #1268
Musa-Sina-Ertugrul Dec 10, 2023
f752a4f
Updated add_data and _add_result_data, deprecated _add_data #1268
Musa-Sina-Ertugrul Dec 10, 2023
dd257a2
Updated add_data #1268
Musa-Sina-Ertugrul Dec 17, 2023
c79e888
Updated add_data, _run_analysis, composite_test #1268
Musa-Sina-Ertugrul Dec 18, 2023
8f21278
commit before second approach
Musa-Sina-Ertugrul Dec 19, 2023
73db5bd
Tests passed , Finished second approach add_data #1268
Musa-Sina-Ertugrul Dec 19, 2023
1ed676e
Updated add_data #1268
Musa-Sina-Ertugrul Dec 20, 2023
745669f
Tests passed second approach, Updated add_data #1268
Musa-Sina-Ertugrul Dec 20, 2023
fd6df6c
Test passed, recursive approach started #1268
Musa-Sina-Ertugrul Dec 20, 2023
0565a1b
Tests passed , Updated recursive approach, Updated add_data #1268
Musa-Sina-Ertugrul Dec 21, 2023
a54a49f
Merge branch 'Qiskit-Extensions:main' into main
Musa-Sina-Ertugrul Dec 22, 2023
814c22a
Update qiskit_experiments/framework/experiment_data.py
Musa-Sina-Ertugrul Dec 28, 2023
229692f
Started on new suggestions, suggestion 1 finished #1268
Musa-Sina-Ertugrul Dec 28, 2023
288d18f
Waiting respond, not bootstrapped exp_data appear suddenly after runn…
Musa-Sina-Ertugrul Dec 28, 2023
a3abf4d
Fix marginalize problems
nkanazawa1989 Jan 5, 2024
02906dc
Merge branch 'Qiskit-Extensions:main' into main
Musa-Sina-Ertugrul Jan 14, 2024
daa36dc
Merge pull request #1 from nkanazawa1989/musa-pr-1286-followup
Musa-Sina-Ertugrul Jan 21, 2024
fdc276a
Merge branch 'Qiskit-Extensions:main' into main
Musa-Sina-Ertugrul Jan 21, 2024
b906461
Merge branch 'Qiskit-Extensions:main' into main
Musa-Sina-Ertugrul Jan 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 16 additions & 44 deletions qiskit_experiments/framework/composite/composite_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,13 +124,27 @@ def run(
def _run_analysis(self, experiment_data: ExperimentData):
# Return list of experiment data containers for each component experiment
# containing the marginalized data from the composite experiment
component_expdata = self._component_experiment_data(experiment_data)
component_expdata = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you should remove all of these lines. You've already marginalized the data and bootstrapped child container preparations (and this is why you needed to append datum in self._results_data). What you are doing here is:
(1) Prepare child data with add_data, but keep the original (un-marginalized) data in the outer container.
(2) Ignore prepared child data and prepare child data again in composite analysis.
Indeed this is very inefficient (you just introduced extra overhead of data preparation).

In the _run_analysis, you just need to run

if len(self.analyses) != len(experiment_data.child_data()):
    raise RuntimeError(
        "Number of composite analyses and child results don't match. "
        "Please check your circuit metadata."
    )
for inner_analysis, child_data in zip(self.analyses, experiment_data.child_data()):
    inner_analysis.run(child_data, replace_results=True)

In principle you can remove almost all protected methods of the composite analysis, since you delegate the responsibility of inner data preparation to ExperimentData with this PR. This simplification helps refactoring of composite analysis.

if not self._flatten_results:
# Retrieve child data for component experiments for updating
component_index = experiment_data.metadata.get("component_child_index", [])
if not component_index:
raise AnalysisError("Unable to extract component child experiment data")
component_expdata = [experiment_data.child_data(i) for i in component_index]
else:
# Initialize temporary ExperimentData containers for
# each component experiment to analysis on. These will
# not be saved but results and figures will be collected
# from them
component_expdata = self._initialize_component_experiment_data(experiment_data)

experiment_data._add_data(experiment_data.child_data(),experiment_data.data())

# Run the component analysis on each component data
for i, sub_expdata in enumerate(component_expdata):
# Since copy for replace result is handled at the parent level
# we always run with replace result on component analysis
self._analyses[i].run(sub_expdata, replace_results=True)
sub_expdata._result_data = self._analyses[i].run(sub_expdata, replace_results=True)._result_data

# Analysis is running in parallel so we add loop to wait
# for all component analysis to finish before returning
Expand All @@ -147,48 +161,6 @@ def _run_analysis(self, experiment_data: ExperimentData):
return analysis_results, figures
return [], []

def _component_experiment_data(self, experiment_data: ExperimentData) -> List[ExperimentData]:
"""Return a list of marginalized experiment data for component experiments.

Args:
experiment_data: a composite experiment data container.

Returns:
The list of analysis-ready marginalized experiment data for each
component experiment.

Raises:
AnalysisError: If the component experiment data cannot be extracted.
"""
if not self._flatten_results:
# Retrieve child data for component experiments for updating
component_index = experiment_data.metadata.get("component_child_index", [])
if not component_index:
raise AnalysisError("Unable to extract component child experiment data")
component_expdata = [experiment_data.child_data(i) for i in component_index]
else:
# Initialize temporary ExperimentData containers for
# each component experiment to analysis on. These will
# not be saved but results and figures will be collected
# from them
component_expdata = self._initialize_component_experiment_data(experiment_data)

# Compute marginalize data for each component experiment
marginalized_data = self._marginalized_component_data(experiment_data.data())

# Add the marginalized component data and component job metadata
# to each component child experiment. Note that this will clear
# any currently stored data in the experiment. Since copying of
# child data is handled by the `replace_results` kwarg of the
# parent container it is safe to always clear and replace the
# results of child containers in this step
for sub_expdata, sub_data in zip(component_expdata, marginalized_data):
# Clear any previously stored data and add marginalized data
sub_expdata._result_data.clear()
sub_expdata.add_data(sub_data)

return component_expdata

def _marginalized_component_data(self, composite_data: List[Dict]) -> List[List[Dict]]:
"""Return marginalized data for component experiments.

Expand Down
167 changes: 166 additions & 1 deletion qiskit_experiments/framework/experiment_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@
from matplotlib import pyplot
from matplotlib.figure import Figure as MatplotlibFigure
from qiskit.result import Result
from qiskit.result import marginal_distribution
from qiskit.result.postprocess import format_counts_memory
from qiskit.result.utils import marginal_memory
from qiskit.providers.jobstatus import JobStatus, JOB_FINAL_STATES
from qiskit.exceptions import QiskitError
from qiskit.providers import Job, Backend, Provider
Expand Down Expand Up @@ -767,6 +770,49 @@ def add_data(
) -> None:
"""Add experiment data.

Args:
data: Experiment data to add. Several types are accepted for convenience:

* Result: Add data from this ``Result`` object.
* List[Result]: Add data from the ``Result`` objects.
* Dict: Add this data.
* List[Dict]: Add this list of data.

Raises:
TypeError: If the input data type is invalid.
"""

if any(not future.done() for future in self._analysis_futures.values()):
LOG.warning(
"Not all analysis has finished running. Adding new data may "
"create unexpected analysis results."
)
if not isinstance(data, list):
data = [data]

# Directly add non-job data
with self._result_data.lock:
for datum in data:
if isinstance(datum, dict):
if datum["metadata"].get("composite_metadata"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if datum["metadata"].get("composite_metadata"):
if "composite_metadata" in datum["metadata"]:

if value is not necessary

tmp_exp_data = ExperimentData()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need another conditional branching here. Note that each element in the list contains the result of composite experiment. Let's assume a nested composite experiment consisting of three inner experiments. Here the result_expX_circY indicates a result of Y-th circuit of experiment X. The list of dictionary you may receive would look like:

[
    {result_expA_circ0, result_expB_circ0},  # loop 0
    {result_expC_circ0},  # loop 1
    {result_expA_circ1, result_expB_circ1},  # loop 2
    {result_expC_circ1},  # loop 3
]

Experiment A and B are batched by ParallelExperiment in this example. Now you need to create an ExperimentData instance with three child ExperimentData instance for A, B, C, and each child instance includes two .data() for circuit0 and circuit 1. What currently you are implementing is making child instance for each loop.

Instead, in each loop, you need to check whether corresponding child container is already initialized, make new one if not exist otherwise just call existing one, then add marginalize data to corresponding container. More specifically,

loop1: 
  - Create sub-container for A and B and set them to outer container
  - Marginalize count for A and B, and add marginalized count data of circuit0 to sub-container
loop2:
  - Create sub-container for C and set this to outer container
  - Skip marginalize because number of sub element is 1, add data to above
loop3:
  - Call child data for A and B
  - Marginalize and add data of circuit 1 to above
loop4:
  - Call child data for C
  - Add count data to above

marginalized_data = self._marginalized_component_data([datum])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this call is always with a single element, it's much simpler to assume Dict as an input instead of List[Dict].

for inner_datum in marginalized_data:
tmp_exp_data.__add_data(inner_datum)
self._set_child_data([tmp_exp_data])
else:
self._result_data.append(datum)
elif isinstance(datum, Result):
self.__add_data(datum)
else:
raise TypeError(f"Invalid data type {type(datum)}.")

def __add_data(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think subroutine is necessary. You can just recursively call add_data as long as you find composite_metadata in the circuit metadata dictionary.

self,
data: Union[Result, List[Result], Dict, List[Dict]],
) -> None:
"""Add experiment data.

Args:
data: Experiment data to add. Several types are accepted for convenience:

Expand All @@ -792,9 +838,128 @@ def add_data(
if isinstance(datum, dict):
self._result_data.append(datum)
elif isinstance(datum, Result):
self._add_result_data(datum)
if datum["metadata"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result is this object which doesn't implement __getitem__ method.

self._set_child_data(datum["metadata"]._metadata())
else:
self._add_result_data(datum)
else:
raise TypeError(f"Invalid data type {type(datum)}.")

def _add_data(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update this line instead. Having a dedicated method for marginalization is also okey.
https://github.com/Qiskit-Extensions/qiskit-experiments/blob/f3a26684196b8acb2b0ca942cb7b262ce33c9c3c/qiskit_experiments/framework/experiment_data.py#L793

The returned Qiskit Result object is formatted into a canonical dictionary format, which contains experiment data and metadata (the metadata is created by the experiment class). When you run a composite experiment, your experiment object gains a special metadata field
https://github.com/Qiskit-Extensions/qiskit-experiments/blob/f3a26684196b8acb2b0ca942cb7b262ce33c9c3c/qiskit_experiments/framework/composite/composite_experiment.py#L193-L201
The structure looks ugly when you nest composite experiment -- this indicates you need to recursively check the field, e.g. datum["metadata"]["component_metadata"][0]["metadata"]["component_metadata"][0]... -- I want to refactor this at some point, but not for now.

You would check the datum["metadata"]. When these fields appear, you can instantiate component experiment data and set them to the current instance. Then, you check next datum, and crate new component data when the corresponding container is missing (for example, if you run batch experiment of two parallel experiments, first half of data contains result for half of qubits, and latter half contains the rest of qubits). This is just a clue for implementation, so you can try your own idea if you have in mind :)

self,
component_expdata: List[ExperimentData],
data: Union[Result, List[Result], Dict, List[Dict]],
) -> None:
"""Add experiment data.

Args:
data: Experiment data to add. Several types are accepted for convenience:

* Result: Add data from this ``Result`` object.
* List[Result]: Add data from the ``Result`` objects.
* Dict: Add this data.
* List[Dict]: Add this list of data.

Raises:
TypeError: If the input data type is invalid.
"""
#TODO: Continue from here

if not isinstance(data, list):
data = [data]

# self._marginalized_component_data()
# Directly add non-job data
marginalized_data = self._marginalized_component_data(data)

with self._result_data.lock:
for sub_expdata, sub_data in zip(component_expdata, marginalized_data):
# Clear any previously stored data and add marginalized data
sub_expdata._result_data.clear()
sub_expdata.__add_data(sub_data)

def _marginalized_component_data(self, composite_data: List[Dict]) -> List[List[Dict]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned before, with new implementation the input data length is always 1. So you can at least remove outer loop.

"""Return marginalized data for component experiments.

Args:
composite_data: a list of composite experiment circuit data.

Returns:
A List of lists of marginalized circuit data for each component
experiment in the composite experiment.
"""
# Marginalize data
marginalized_data = {}
for datum in composite_data:
metadata = datum.get("metadata", {})

# Add marginalized data to sub experiments
if "composite_clbits" in metadata:
composite_clbits = metadata["composite_clbits"]
else:
composite_clbits = None

# Pre-process the memory if any to avoid redundant calls to format_counts_memory
f_memory = None
if (
"memory" in datum
and composite_clbits is not None
and isinstance(datum["memory"][0], str)
):
f_memory = self._format_memory(datum, composite_clbits)

for i, index in enumerate(metadata["composite_index"]):
if index not in marginalized_data:
# Initialize data list for marginalized
marginalized_data[index] = []
sub_data = {"metadata": metadata["composite_metadata"][i]}
if "counts" in datum:
if composite_clbits is not None:
sub_data["counts"] = marginal_distribution(
counts=datum["counts"],
indices=composite_clbits[i],
)
else:
sub_data["counts"] = datum["counts"]
if "memory" in datum:
if composite_clbits is not None:
# level 2
if f_memory is not None:
idx = slice(
-1 - composite_clbits[i][-1], -composite_clbits[i][0] or None
)
sub_data["memory"] = [shot[idx] for shot in f_memory]
# level 1
else:
mem = np.array(datum["memory"])

# Averaged level 1 data
if len(mem.shape) == 2:
sub_data["memory"] = mem[composite_clbits[i]].tolist()
# Single-shot level 1 data
if len(mem.shape) == 3:
sub_data["memory"] = mem[:, composite_clbits[i]].tolist()
else:
sub_data["memory"] = datum["memory"]
marginalized_data[index].append(sub_data)

# Sort by index
return [marginalized_data[i] for i in sorted(marginalized_data.keys())]

@staticmethod
def _format_memory(datum: Dict, composite_clbits: List):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also replace this with the merginal_memory function? This is also implemented with Rust, and must be faster.

"""A helper method to convert level 2 memory (if it exists) to bit-string format."""
f_memory = None
if (
"memory" in datum
and composite_clbits is not None
and isinstance(datum["memory"][0], str)
):
num_cbits = 1 + max(cbit for cbit_list in composite_clbits for cbit in cbit_list)
header = {"memory_slots": num_cbits}
f_memory = list(format_counts_memory(shot, header) for shot in datum["memory"])

return f_memory

def add_jobs(
self,
Expand Down
2 changes: 1 addition & 1 deletion test/framework/test_composite.py
Original file line number Diff line number Diff line change
Expand Up @@ -651,10 +651,10 @@ def test_composite_count_memory_marginalization(self, memory):
}

test_data.add_data(datum)

sub_data = CompositeAnalysis([], flatten_results=False)._marginalized_component_data(
test_data.data()
)
# print(sub_data)
expected = [
[
{
Expand Down