Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient representations for constants #1850

Open
zrho opened this issue Jan 8, 2025 · 2 comments
Open

Efficient representations for constants #1850

zrho opened this issue Jan 8, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request perf Performance issue

Comments

@zrho
Copy link
Contributor

zrho commented Jan 8, 2025

The size of JSON encoded HUGR modules has become a pressing problem. The time it takes to serialize and deserialize modules similarly is excessive. The binary serialization format based on hugr-model aims to alleviate this. However, the representation of constants in hugr-model as of #1838 still uses the JSON encoding. In some realistic examples, the size of the constants in this encoding completely dominates the size of the serialized file. Consider the following example:

Guppy code that generates a quantum circuit with two large matrices of constants.
import itertools
import random
from typing import no_type_check

from guppylang import guppy
from guppylang.std.angles import angle
from guppylang.std.builtins import array, py, result
from guppylang.std.qsystem import phased_x, rz, zz_phase
from guppylang.std.quantum import measure_array, qubit

random.seed(2)

N = 100
rounds = N

angles = [
    [[random.uniform(0, 2 * 3.14159) for _ in range(15)] for _ in range(N // 2)]  # noqa: S311
    for _ in range(rounds)
]
qb_ids = [list(range(N)) for _ in range(N)]
for qs in qb_ids:
    random.shuffle(qs)

qb_pairs = [list(itertools.batched(qs, 2)) for qs in qb_ids]


@guppy
@no_type_check
def main() -> None:
    qs = array(qubit() for _ in range(py(N)))

    pair_lists = py(qb_pairs)
    all_angles = py(angles)

    for rond in range(len(all_angles)):
        pairs = pair_lists[rond]
        round_angles = all_angles[rond]

        for pair_id in range(len(pairs)):
            a, b = pairs[pair_id]
            pair_angles_flt = round_angles[pair_id]
            pair_angles = array(angle(a) for a in pair_angles_flt)
            rz(qs[a], pair_angles[0])
            rz(qs[b], pair_angles[1])
            phased_x(qs[a], pair_angles[2], pair_angles[3])
            phased_x(qs[b], pair_angles[4], pair_angles[5])

            zz_phase(qs[a], qs[b], pair_angles[6])
            zz_phase(qs[a], qs[b], pair_angles[7])
            zz_phase(qs[a], qs[b], pair_angles[8])

            phased_x(qs[a], pair_angles[9], pair_angles[10])
            phased_x(qs[b], pair_angles[11], pair_angles[12])
            rz(qs[a], pair_angles[13])
            rz(qs[b], pair_angles[14])

    for b in measure_array(qs):
        result("c", b)


with open("qv_hugr.json", "w") as f:
    f.write(guppy.compile_module().package.to_json())

The HUGR module created by the guppy code contains two large constants. One is a nested array of 100 * 50 float64s, the other a nested array of 2 * 100 * 50 u64s. Together, the two constants with the JSON encoding use roughly 36MB while the size of the rest of the module is a rounding error. The ideal size (ignoring bitsets to account for the fact that guppy arrays can have missing values) is around 120kb. Some encoding overhead is to be expected, but at the moment we are at a factor of about 300 away from the ideal.

A chunk of the size might compress away, but that is a bandaid. The uncompressed size also affects the time that it takes to serialize/deserialize the module. I've observed the roundtrip to be 4.15s for JSON and 2.8s for hugr-model/capnp. The roundtrip should not take more than a single digit number of milliseconds. Adding compression would make the time even worse.

@zrho zrho added enhancement New feature or request perf Performance issue labels Jan 8, 2025
@zrho zrho self-assigned this Jan 8, 2025
@zrho
Copy link
Contributor Author

zrho commented Jan 9, 2025

I am experimenting with different encodings that could bring down the filesize. The sizes are given for the hugr-model/capnp encoding, with only the encoding for the constants being different.

  • Baseline: The baseline is 36MB using the JSON encoding of the constants.
  • Terms: When the constants are encoded as terms, the file size went down to 6.4MB.
  • Flexbuffer: With an encoding as flexbuffers instead, the file size went down again to 1.2MB.
  • Flexbuffer (optionals): Flexbuffer encoding but special cased optionals leads to 724kb.

@zrho
Copy link
Contributor Author

zrho commented Jan 21, 2025

#1857 special cased the encoding for array, f64 and integer constants via the encoding as terms. While this still has substantial overhead over the optimum encoding, it is a big improvement in size and an ever bigger improvement in serialisation roundtrip time (for the example above I once measured 4.2s using the full json encoding vs 150ms with hugr-model/capnp and constants encoded as terms).

In the future we should add term constructors that are able to encode dense tensors so that big constants that just contain a big chunk of primitives can be packed together efficiently. There's some design space here how to make them fit well with guppy since guppy arrays can have missing values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request perf Performance issue
Projects
None yet
Development

No branches or pull requests

1 participant