Efficient representations for constants #1850

zrho · 2025-01-08T18:42:20Z

The size of JSON encoded HUGR modules has become a pressing problem. The time it takes to serialize and deserialize modules similarly is excessive. The binary serialization format based on hugr-model aims to alleviate this. However, the representation of constants in hugr-model as of #1838 still uses the JSON encoding. In some realistic examples, the size of the constants in this encoding completely dominates the size of the serialized file. Consider the following example:

Guppy code that generates a quantum circuit with two large matrices of constants.

import itertools
import random
from typing import no_type_check

from guppylang import guppy
from guppylang.std.angles import angle
from guppylang.std.builtins import array, py, result
from guppylang.std.qsystem import phased_x, rz, zz_phase
from guppylang.std.quantum import measure_array, qubit

random.seed(2)

N = 100
rounds = N

angles = [
    [[random.uniform(0, 2 * 3.14159) for _ in range(15)] for _ in range(N // 2)]  # noqa: S311
    for _ in range(rounds)
]
qb_ids = [list(range(N)) for _ in range(N)]
for qs in qb_ids:
    random.shuffle(qs)

qb_pairs = [list(itertools.batched(qs, 2)) for qs in qb_ids]


@guppy
@no_type_check
def main() -> None:
    qs = array(qubit() for _ in range(py(N)))

    pair_lists = py(qb_pairs)
    all_angles = py(angles)

    for rond in range(len(all_angles)):
        pairs = pair_lists[rond]
        round_angles = all_angles[rond]

        for pair_id in range(len(pairs)):
            a, b = pairs[pair_id]
            pair_angles_flt = round_angles[pair_id]
            pair_angles = array(angle(a) for a in pair_angles_flt)
            rz(qs[a], pair_angles[0])
            rz(qs[b], pair_angles[1])
            phased_x(qs[a], pair_angles[2], pair_angles[3])
            phased_x(qs[b], pair_angles[4], pair_angles[5])

            zz_phase(qs[a], qs[b], pair_angles[6])
            zz_phase(qs[a], qs[b], pair_angles[7])
            zz_phase(qs[a], qs[b], pair_angles[8])

            phased_x(qs[a], pair_angles[9], pair_angles[10])
            phased_x(qs[b], pair_angles[11], pair_angles[12])
            rz(qs[a], pair_angles[13])
            rz(qs[b], pair_angles[14])

    for b in measure_array(qs):
        result("c", b)


with open("qv_hugr.json", "w") as f:
    f.write(guppy.compile_module().package.to_json())

The HUGR module created by the guppy code contains two large constants. One is a nested array of 100 * 50 float64s, the other a nested array of 2 * 100 * 50 u64s. Together, the two constants with the JSON encoding use roughly 36MB while the size of the rest of the module is a rounding error. The ideal size (ignoring bitsets to account for the fact that guppy arrays can have missing values) is around 120kb. Some encoding overhead is to be expected, but at the moment we are at a factor of about 300 away from the ideal.

A chunk of the size might compress away, but that is a bandaid. The uncompressed size also affects the time that it takes to serialize/deserialize the module. I've observed the roundtrip to be 4.15s for JSON and 2.8s for hugr-model/capnp. The roundtrip should not take more than a single digit number of milliseconds. Adding compression would make the time even worse.

The text was updated successfully, but these errors were encountered:

zrho · 2025-01-09T17:04:59Z

I am experimenting with different encodings that could bring down the filesize. The sizes are given for the hugr-model/capnp encoding, with only the encoding for the constants being different.

Baseline: The baseline is 36MB using the JSON encoding of the constants.
Terms: When the constants are encoded as terms, the file size went down to 6.4MB.
Flexbuffer: With an encoding as flexbuffers instead, the file size went down again to 1.2MB.
Flexbuffer (optionals): Flexbuffer encoding but special cased optionals leads to 724kb.

zrho · 2025-01-21T12:13:40Z

#1857 special cased the encoding for array, f64 and integer constants via the encoding as terms. While this still has substantial overhead over the optimum encoding, it is a big improvement in size and an ever bigger improvement in serialisation roundtrip time (for the example above I once measured 4.2s using the full json encoding vs 150ms with hugr-model/capnp and constants encoded as terms).

In the future we should add term constructors that are able to encode dense tensors so that big constants that just contain a big chunk of primitives can be packed together efficiently. There's some design space here how to make them fit well with guppy since guppy arrays can have missing values.

zrho added enhancement New feature or request perf Performance issue labels Jan 8, 2025

zrho self-assigned this Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient representations for constants #1850

Efficient representations for constants #1850

zrho commented Jan 8, 2025 •

edited

Loading

zrho commented Jan 9, 2025 •

edited

Loading

zrho commented Jan 21, 2025

Efficient representations for constants #1850

Efficient representations for constants #1850

Comments

zrho commented Jan 8, 2025 • edited Loading

zrho commented Jan 9, 2025 • edited Loading

zrho commented Jan 21, 2025

zrho commented Jan 8, 2025 •

edited

Loading

zrho commented Jan 9, 2025 •

edited

Loading