-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in qpy the support for Discriminator and Kernel #10327
Conversation
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the the following people are requested to review this:
|
Pull Request Test Coverage Report for Build 5606590344
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good to me. The tests demonstrate the behavior we want. Based on my comments on the nested dict equality function, we might want some tests of Kernel/Discriminator equality.
A qpy expert should review the qpy changes (@jakelishman @mtreinish @nkanazawa1989). A qpy the LIST handling concerns me a bit -- it only works for json-serializable lists. Also, qpy tries to be compact, so falling back to json feels like it is out of character. I wonder if the sequence type could be used? One issue is that the kernel/discriminator parameters are open-ended in schema (I guess whatever the qobj json serializer supports).
qiskit/pulse/configuration.py
Outdated
return _assert_nested_dict_equal(self.__dict__, other.__dict__) | ||
return False | ||
|
||
def __hash__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a need for the hash? I didn't notice the kernel being used in a set or dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Kernel
is unhashable, the following test fails:
https://github.com/Qiskit/qiskit-terra/blob/b73115f80bc575b5dbc8b98ad003cd3b4ff9c00d/test/python/pulse/test_instructions.py#L68-L101
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I am not sure what the best approach is here.
The current behavior is that the kernel and discriminator use Python's default hashing and equality behavior where every instance gets a unique hash and the only equality is that an instance is equal to itself. So they are hashable but not in a particularly useful way. Overriding the default __eq__
method unsets the default __hash__
method. I wonder if other behavior relies on pulse instructions being hashable? I have not tried to check.
The kernel and discriminator classes are not well-specified -- basically just json serializable structures (extended to support numpy arrays) -- so nested dicts and lists of float, int, str, and ndarrays. That worked fine for qobj but does not fit as well for qpy. They don't need to be mutable so hashing the data should be okay. I think a more proper hash would be to replace lists with tuples and dicts with tuples of key-value tuples recursively, but that is a bit involved, especially if we don't really need hashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if other behavior relies on pulse instructions being hashable? I have not tried to check.
SymbolicPulse
is not hashable:
https://github.com/Qiskit/qiskit-terra/blob/e9f8b7c50968501e019d0cb426676ac606eb5a10/qiskit/pulse/library/symbolic_pulses.py#L570
So, not all of instructions are actually hashable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more proper hash would be to replace lists with tuples and dicts with tuples of key-value tuples recursively, but that is a bit involved, especially if we don't really need hashing.
Thank you so much for your suggestion!
Although I am still not sure whether Kernel
and Discriminator
should remain hashable (because the other operand SymbolicPulse
is unhashable and as a result not all of instructions are hashable), as long as we cannot eliminate the possibility that their hashing is used somewhere, I suppose that it is safer to keep them hashable.
I updated the hash methods following your suggestion :) 👉 0afd18d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree hash is no longer needed. The Kernel
and Discriminator
haven't been really maintained since they were implemented (last update was Apr. 2020!). At the time pulse module didn't take any Parameter
object and all instructions were immutable. I think this is why they implement the hash function. Feel free to drop hash, because they are no longer useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still WIP, so I am guessing the documentation and release notes will follow later, but I do want to raise the question if QPY version should be bumped.
Additionally, the compatibility test should be updated.
qiskit/qpy/common.py
Outdated
@@ -18,7 +18,7 @@ | |||
import io | |||
import struct | |||
|
|||
from qiskit.qpy import formats | |||
from qiskit.qpy import formats, type_keys | |||
|
|||
QPY_VERSION = 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should QPY version be bumped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure. I would like to hear from a QPY expert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version update is not necessary, because this change doesn't break any existing QPY data. However, if you want to be really strict, you can still bump the version so that you can raise a user warning when an acquire instruction is loaded from an old QPY data. Recall that discriminator and kernel are silently dropped before this change; if a user (maybe a client) dumps his/her circuit with QPY v7 and sends it to another user (maybe a host server) using v7+, the kernel and discriminator are accidentally dropped through this communication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This guidance was a little surprising to me at first since we are adding a new data type, but I think it makes sense. Here is how I think about it that makes sense for me:
qpy attempts to provide forwards compatibility for loading previously stored circuits (I think there is still freedom for terra to completely remove something and then it is okay not to load that any more). Since the way data is stored for a specific object might need to change, qpy versions its files and can use the version number to know to load an object stored with the old format. Here we are adding new types to be stored, so the change has no impact on loading previously stored data. The part that was confusing to me is that we could store a kernel in terra 0.25 with qpy version 7 and terra 0.24 would also have qpy version 7 but not be able to load the kernel. However, the terra version is also stored in the qpy data and a separate warning is generated when trying to load qpy data with an earlier version of terra than the one used to generate the data, so I think that covers that case.
qiskit/qpy/binary_io/schedules.py
Outdated
@@ -60,6 +61,22 @@ def _read_waveform(file_obj, version): | |||
) | |||
|
|||
|
|||
def _read_kernel_and_discriminator(file_obj, version, kernel_or_discriminator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am missing a technical reason to have these two objects handled in the same function, I think I would have gone for two separate functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find any difference between Kernel
and Discriminator
at their implementations, so I added the common function for them. But I guess having two separate functions may have more benefits such as ease of maintenance, so I do so :) 👉 e19e938
Co-authored-by: Will Shanks <wshaos@posteo.net>
Co-authored-by: Will Shanks <wshaos@posteo.net>
Thank you very much for pointing it out! I did not know it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good to me.
I do not like the hashing based on string representation and there are some questions about the qpy details, but I think this is looking good otherwise.
qiskit/qpy/type_keys.py
Outdated
if isinstance(obj, list): | ||
return cls.LIST | ||
if isinstance(obj, dict): | ||
return cls.DICT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not dug deeply into the internals of qpy. Why do we need these LIST
and DICT
types? Why can we not just use SEQUENCE
and MAPPING
? (I see that MAPPING
did not support nesting and you extended it with DICT
but could it not be extended with DICT
?).
EDIT: at the end what I meant was "could it be extended using nested MAPPING
rather than adding DICT
?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. @junnaka51 is there any reason not to use existing types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find SEQUENCE
and MAPPING
as types in https://github.com/Qiskit/qiskit-terra/blob/main/qiskit/qpy/type_keys.py .
If I understand correctly, we need an identifier of LIST
(or we can call it SEQUENCE
) and that of DICT
(or Mapping
) for reading them out from a binary.
I now realize that we can use the existing functions such as
https://github.com/Qiskit/qiskit-terra/blob/e9f8b7c50968501e019d0cb426676ac606eb5a10/qiskit/qpy/common.py#L203
instead of upgrading qpy.binary_io.value.dumps_value()
and qpy.binary_io.value.loads_value()
. But I think we still need the identifiers. For example, when we want to deal with LIST
as a value in a DICT
, then we may update qpy.common.write_mapping()
as
def write_mapping(file_obj, mapping, serializer, **kwargs):
"""Write a mapping in the file like object.
"""
num_elements = len(mapping)
file_obj.write(struct.pack(formats.SEQUENCE_PACK, num_elements))
for key, datum in mapping.items():
key_bytes = key.encode(ENCODE)
+ if isinstance(datum, list):
+ type_key, datum_bytes = b"l", sequence_to_binary(datum, serializer, **kwargs)
+ else:
+ type_key, datum_bytes = serializer(datum, **kwargs)
- type_key, datum_bytes = serializer(datum, **kwargs)
item_header = struct.pack(formats.MAP_ITEM_PACK, len(key_bytes), type_key, len(datum_bytes))
file_obj.write(item_header)
file_obj.write(key_bytes)
file_obj.write(datum_bytes)
and qpy.common.read_mapping()
as
def read_mapping(file_obj, deserializer, **kwargs):
"""Read a mapping from the file like object.
"""
mapping = {}
data = formats.SEQUENCE._make(
struct.unpack(formats.SEQUENCE_PACK, file_obj.read(formats.SEQUENCE_SIZE))
)
for _ in range(data.num_elements):
map_header = formats.MAP_ITEM._make(
struct.unpack(formats.MAP_ITEM_PACK, file_obj.read(formats.MAP_ITEM_SIZE))
)
key = file_obj.read(map_header.key_size).decode(ENCODE)
+ if map_header.type == b"l":
+ datum = common.read_sequence(file_obj, deserializer, **kwargs)
+ else:
+ datum = deserializer(map_header.type, file_obj.read(map_header.size), **kwargs)
- datum = deserializer(map_header.type, file_obj.read(map_header.size), **kwargs)
mapping[key] = datum
return mapping
, in both of which b"l"
is used as the identifier of LIST
(or SEQUENCE
if we want).
Of course this b"l"
must needs (edited) not be defined in qpy/type_keys.py
. But I thought qpy/type_keys.py
summarizes all the types, so I put it there.
Any suggestion would be very much appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right. I think nested dictionary is not a Value
type, and indeed I didn't assume this data structure when I wrote this function. Instead of calling value.dumps_value
, you can write a custom serializer function for discriminator/kernel config, which would look like
def my_custom_serializer(obj, version):
if isinstance(obj, dictionary):
with io.BytesIO() as container:
for key, value in obj.items()
# recursively generate binary data
binary_data = container.getvalue()
return b"D", binary_data
return dumps_value(obj, version)
what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nkanazawa1989 Thank you so much for your suggestion and advice 🙇
Please give me a little more time. It has been taking time to understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @junnaka51! By the way, this PR is currently labeled with the 0.25.0 milestone which should be released in about 10 days. There is no pressure to meet the milestone though. We can bump it back to the next milestone if needed. Leaving the milestone on the PR for now means that we can still merge it if it is ready (a newly opened PR would not generally by eligible for 0.25.0 at this point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wshanks Thank you for the info! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, I think with your last push this PR might be good now? The one thing I wonder about is the lack of type_keys
for b"D"
and b"l"
-- it feels like names would be clearer than plain strings, but I defer to @nkanazawa1989
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not yet. Although it works, I suppose how to deal with a nested dictionary is not yet well written.
I am trying to grasp @nkanazawa1989 `s comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nkanazawa1989 Thank you again for your advice and suggestion 🙇
I agree that your approach is more sophisticated. I added a new serializer and deserializer comparable in abstraction to value.dumps_value
and value.loads_value
, respectively, and put write_mapping
and read_mapping
back to their original 👉 cc878e6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @junnaka51 for the continuous contribution. The PR looks great. Let me add some comments.
qiskit/pulse/configuration.py
Outdated
return _assert_nested_dict_equal(self.__dict__, other.__dict__) | ||
return False | ||
|
||
def __hash__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree hash is no longer needed. The Kernel
and Discriminator
haven't been really maintained since they were implemented (last update was Apr. 2020!). At the time pulse module didn't take any Parameter
object and all instructions were immutable. I think this is why they implement the hash function. Feel free to drop hash, because they are no longer useful.
|
||
from .channels import PulseChannel, DriveChannel, MeasureChannel | ||
from .exceptions import PulseError | ||
|
||
|
||
def _assert_nested_dict_equal(a, b): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we can dump configuration data into string with IQX JSON encoder and compare the raw payload data. But maybe this is unnecessary coupling to IBM provider? @wshanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you do that without making the provider a (circular) dependency of terra?
I think this function is okay. Maybe a comment could be added that the only reason it is needed is because ndarray
overrides __eq__
in an unhelpful way? The other things the JSON encoder handles are complex
and ParameterExpression
which work okay with __eq__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, basically I agree with you. Unfortunately likely Qiskit doesn't have control of Kernel/Discriminator format, so we should continuously maintain this function to sync with the provider. I was thinking it was bit easier to delegate everything to the provider, but from the dependency viewpoint this approach is bit tough.
I think at some point we should define subclass and use something like pydantic rather than this free-form dictionary. Of course not in this PR :)
qiskit/qpy/binary_io/schedules.py
Outdated
|
||
|
||
def _write_discriminator(file_obj, data): | ||
_write_kernel(file_obj, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe to implement the duplicated code because in future we may update other format.
qiskit/qpy/common.py
Outdated
@@ -18,7 +18,7 @@ | |||
import io | |||
import struct | |||
|
|||
from qiskit.qpy import formats | |||
from qiskit.qpy import formats, type_keys | |||
|
|||
QPY_VERSION = 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version update is not necessary, because this change doesn't break any existing QPY data. However, if you want to be really strict, you can still bump the version so that you can raise a user warning when an acquire instruction is loaded from an old QPY data. Recall that discriminator and kernel are silently dropped before this change; if a user (maybe a client) dumps his/her circuit with QPY v7 and sends it to another user (maybe a host server) using v7+, the kernel and discriminator are accidentally dropped through this communication.
qiskit/qpy/common.py
Outdated
@@ -165,6 +171,15 @@ def write_mapping(file_obj, mapping, serializer, **kwargs): | |||
file_obj.write(struct.pack(formats.SEQUENCE_PACK, num_elements)) | |||
for key, datum in mapping.items(): | |||
key_bytes = key.encode(ENCODE) | |||
|
|||
if isinstance(datum, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, you can generate datum_bytes
with this trick by recursively calling write_mapping
with container
in the code below.
https://github.com/Qiskit/qiskit-terra/blob/e9f8b7c50968501e019d0cb426676ac606eb5a10/qiskit/qpy/common.py#L196-L198
qiskit/qpy/type_keys.py
Outdated
if isinstance(obj, list): | ||
return cls.LIST | ||
if isinstance(obj, dict): | ||
return cls.DICT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. @junnaka51 is there any reason not to use existing types?
@@ -93,5 +94,153 @@ def test_get_channel_lo(self): | |||
lo_config.channel_lo(MeasureChannel(1)) | |||
|
|||
|
|||
class TestKernel(QiskitTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
self.assertFalse(hash(kernel_a) == hash(kernel_c)) | ||
|
||
|
||
class TestDiscriminator(QiskitTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
qiskit/pulse/configuration.py
Outdated
return True | ||
|
||
|
||
def _get_hash(obj: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but following @nkanazawa1989's comment I think just removing hashing might be better since we can't think of a good use case and less code to maintain is always better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks totally functional and well structured to me. I don't quite follow why the nested sequence and mapping serialization/deserialization needs to be kept out of the main loads_value/dumps_value functions, but it matches what Naoki suggested.
@nkanazawa1989 Do you want to give final approval and get this merged in time for the next release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the QPY side this LGTM, we might want some docs on how KERNEL
and DISCRIMINATOR
are serialized. But it's not critical and we can do it post merge. The one thing to point out is that this will likely merge conflict with #10392 but the rebase shouldn't be too bad.
@wshanks Thank you so much for resolving the conflict! 🙇 |
@wshanks @nkanazawa1989 @TsafrirA @mtreinish Thank you so much for review! 🙇♂️ |
* add support for Discriminator and Kernel * make Kernel and Discriminator hashable * correct _assert_nested_dict_equal() Co-authored-by: Will Shanks <wshaos@posteo.net> * ensure b does not have extra keys Co-authored-by: Will Shanks <wshaos@posteo.net> * devide _read_kernel_and_discriminator and _write_kernel_and_discriminator * update docs * add tests testing Kernel/Discriminator equality * add a qpy compatibility test * add releasenote * improve hashing in Kernel/Discriminator * rm hashing from Kernel/Discriminator * use write/read_sequence() to serialize lists * add (de)serializer for dict and list --------- Co-authored-by: Will Shanks <wshaos@posteo.net> Co-authored-by: Will Shanks <willshanks@us.ibm.com>
Summary
This PR intends to solve #10050 by adding in
qpy
the support forDiscriminator
andKernel
.This enables users to serialize
Acquire
instructions withDiscriminator
andKernel
usingqpy
.Details and comments
The following are from #10050 and show that the error is resolved: