You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The size of JSON encoded HUGR modules has become a pressing problem. The time it takes to serialize and deserialize modules similarly is excessive. The binary serialization format based on hugr-model aims to alleviate this. However, the representation of constants in hugr-model as of #1838 still uses the JSON encoding. In some realistic examples, the size of the constants in this encoding completely dominates the size of the serialized file. Consider the following example:
Guppy code that generates a quantum circuit with two large matrices of constants.
The HUGR module created by the guppy code contains two large constants. One is a nested array of 100 * 50float64s, the other a nested array of 2 * 100 * 50u64s. Together, the two constants with the JSON encoding use roughly 36MB while the size of the rest of the module is a rounding error. The ideal size (ignoring bitsets to account for the fact that guppy arrays can have missing values) is around 120kb. Some encoding overhead is to be expected, but at the moment we are at a factor of about 300 away from the ideal.
A chunk of the size might compress away, but that is a bandaid. The uncompressed size also affects the time that it takes to serialize/deserialize the module. I've observed the roundtrip to be 4.15s for JSON and 2.8s for hugr-model/capnp. The roundtrip should not take more than a single digit number of milliseconds. Adding compression would make the time even worse.
The text was updated successfully, but these errors were encountered:
I am experimenting with different encodings that could bring down the filesize. The sizes are given for the hugr-model/capnp encoding, with only the encoding for the constants being different.
Baseline: The baseline is 36MB using the JSON encoding of the constants.
Terms: When the constants are encoded as terms, the file size went down to 6.4MB.
Flexbuffer: With an encoding as flexbuffers instead, the file size went down again to 1.2MB.
Flexbuffer (optionals): Flexbuffer encoding but special cased optionals leads to 724kb.
#1857 special cased the encoding for array, f64 and integer constants via the encoding as terms. While this still has substantial overhead over the optimum encoding, it is a big improvement in size and an ever bigger improvement in serialisation roundtrip time (for the example above I once measured 4.2s using the full json encoding vs 150ms with hugr-model/capnp and constants encoded as terms).
In the future we should add term constructors that are able to encode dense tensors so that big constants that just contain a big chunk of primitives can be packed together efficiently. There's some design space here how to make them fit well with guppy since guppy arrays can have missing values.
The size of JSON encoded HUGR modules has become a pressing problem. The time it takes to serialize and deserialize modules similarly is excessive. The binary serialization format based on
hugr-model
aims to alleviate this. However, the representation of constants inhugr-model
as of #1838 still uses the JSON encoding. In some realistic examples, the size of the constants in this encoding completely dominates the size of the serialized file. Consider the following example:Guppy code that generates a quantum circuit with two large matrices of constants.
The HUGR module created by the guppy code contains two large constants. One is a nested array of
100 * 50
float64
s, the other a nested array of2 * 100 * 50
u64
s. Together, the two constants with the JSON encoding use roughly 36MB while the size of the rest of the module is a rounding error. The ideal size (ignoring bitsets to account for the fact that guppy arrays can have missing values) is around 120kb. Some encoding overhead is to be expected, but at the moment we are at a factor of about 300 away from the ideal.A chunk of the size might compress away, but that is a bandaid. The uncompressed size also affects the time that it takes to serialize/deserialize the module. I've observed the roundtrip to be 4.15s for JSON and 2.8s for
hugr-model
/capnp. The roundtrip should not take more than a single digit number of milliseconds. Adding compression would make the time even worse.The text was updated successfully, but these errors were encountered: