MPS circuit `apply_gate` implementation & performance #232

vincentmr · 2024-05-14T17:09:26Z

What is your issue?

I would like to use Quimb as a backend for an MPS-based quantum circuit simulator. When running a circuit such as

import quimb.tensor as qtn
import numpy as np

cnot = [
    [1.0, 0.0, 0.0, 0.0],
    [0.0, 1.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 1.0],
    [0.0, 0.0, 1.0, 0.0],
]
cnot = np.array(cnot)
options = {"contract": "swap+split", "max_bond": 8}

n = 20
n_layers = 2
zero_mps = qtn.MPS_computational_state("0" * n)
circ = qtn.CircuitMPS(psi0=zero_mps)

for l in range(n_layers):
    for i in range(l % 2, n - 1, 2):
        circ.apply_gate(cnot, *(i, i + 1), **options)

with cProfile (python -m cProfile -o 01-prof mps.py) and visualizing with snakeviz, I see that mostly all the time is spent between canonicalize and gate_split. I'm wondering:

why do canonicalize and gate_split take so much time in this example?
is there a way to accelerate such computations with the current state of Quimb?
are there optimizations coming in Quimb that would accelerate such calculations?

The text was updated successfully, but these errors were encountered:

jcmgray · 2024-05-14T18:35:55Z

Maybe you could quantify "so much time"? This code takes about 10ms for me, possibly what you are seeing is some numba JIT compilation time on first run?

For the specific questions:

why do canonicalize and gate_split take so much time in this example?

These are the main two functions of an MPS simulator, you would expect them to take close to 100% of the time, especially as the bond dimensions become large (in fact this example has bond dimension 1 throughout, so in fact the time is almost all overhead, rather than linear algebra that will eventually dominate).

is there a way to accelerate such computations with the current state of Quimb?

For low bond (i.e. very cheap) simulations, acceleration is hard because so little time is spent on the actual numeric operations. This is one area where you expect python to be slower. Once the bond dimensions are big however, it should already be pretty optimized. You can further use a GPU or alternative array backend like torch to futher speeed things up as here: https://quimb.readthedocs.io/en/latest/tensor-circuit-mps.html.

are there optimizations coming in Quimb that would accelerate such calculations?

I am open to suggestions here, especially if you have a comparison point for a MPS simulator that is much faster and how they achieved that, but there is nothing specific planned!

Qottmann · 2024-05-15T07:24:42Z

I am open to suggestions here, especially if you have a comparison point for a MPS simulator that is much faster and how they achieved that, but there is nothing specific planned!

One particular point we had in mind was to use Vidal's form (see eq. (156) in 4.6 Notations and conversions in https://arxiv.org/abs/1008.3477) to represent an MPS. This would allow us to avoid re-canonization and should bring some noticable speed-ups. This is for example what TeNPy are doing, keeping track of both right-orthogonal B tensors and singular values SVs.

With that description you can apply local gates cheaply. For example, applying a 2-qubit gate at sites (i, i+1), you can do with the orthogonality center that is just SVs[i] - B[i] - B[i+1], so one additional multiplication of the singular values from the left, plus another of the inverse after the operation to restore the right-orthogonality of the new B matrix.

If I understand correctly, in the example above we are re-canonicalizing all sites 0, .., i-1 (or i+2,.., N, not sure which convention quimb are using) all involving SVDs. Please correct me if I am wrong here or if there is a better way to apply gates without having to re-canonicalize the MPS. I saw that there is a way to tell quimb where the current orthogonality center is, though this requires some structure of the circuits to be useful.

I understand that this is quite a significant feature request changing the structure of the MPS in quimb. I would be curious to hear your thoughts on this!

I also have a question: Is there a way to do eager contractions in quimb? For MPS there is a canonical, optimal contraction path and I am wondering if one can skip rehearsals and other overheads in MPS simulations. I would also imagine it handy sometimes to have a contrete MPS if you want to re-use it for multiple local expectation values or just want to look at the state itself.

vincentmr · 2024-05-15T13:55:22Z

Maybe you could quantify "so much time"? This code takes about 10ms for me, possibly what you are seeing is some numba JIT compilation time on first run?

Sorry, poor wording choice here. I meant as a percentage of the entire calculation, but since you mentioned this is expected, I would say it's understood and hence all good.

(in fact this example has bond dimension 1 throughout, so in fact the time is almost all overhead, rather than linear algebra that will eventually dominate).

I was trying to come up with the simplest example, but that one may be a bit too simple indeed. Cheating even :)

I'll post the following example for those who might want to play with max_bond (it doubles every two layers).

import quimb.tensor as qtn
import numpy as np

options = {"contract": "swap+split", "max_bond": 64}
n = 40
n_layers = 20
zero_mps = qtn.MPS_computational_state("0" * n)
circ = qtn.CircuitMPS(psi0=zero_mps)
params = np.random.rand(n_layers, n, 3) * 2 * np.pi
for l in range(n_layers):
    for i in range(n):
        circ.apply_gate("rz", params[l, i, 0], i, **options)
        circ.apply_gate("ry", params[l, i, 1], i, **options)
        circ.apply_gate("rz", params[l, i, 2], i, **options)
    for i in range(l % 2, n - 1, 2):
        circ.apply_gate("cnot", *(i, i + 1), **options)

print(circ.local_expectation(np.array([[1,0],[0,-1]]),0))

are there optimizations coming in Quimb that would accelerate such calculations?

I am open to suggestions here, especially if you have a comparison point for a MPS simulator that is much faster and how they achieved that, but there is nothing specific planned!

@Qottmann mentioned Vidal's form might be a good way to reduce overheads and lower the cost of canonicalization.

Significant time is also spent in tensor_core's tensor_split, to split the CNOT gates. I was wondering whether it would be possible to cache the results of such split operations (for static gates in particular) and reuse them on various targets on the MPS.

After further investigation, I think the above doesn't apply.

jcmgray · 2024-05-15T17:34:42Z

Thanks for the various clarifications!

Regarding canonicalization and Vidal gauge. Currently quimb yes passes around the current orthogonality center for 2qb gates so that it only has move minimally, (e.g. if it is around sites (4, 5) and the next 2qb gate is on (6, 7) it will canonicalize only from 4->6). I only recently explicitly added this explicit tracking to the MPS circuit simulator, https://quimb.readthedocs.io/en/latest/changelog.html#v1-8-1-2024-05-06, which you may want to double check if you have upgraded to!

You can monitor it directly with:

print(circ.gate_opts["info"]["cur_orthog"])

For the Vidal form, this is the same as Simple Update, so you could in fact adapt the SimpleUpdateGen implementation in 1D and just call with circuit gates rather than imag time evolution. The key function is gate_simple, long-range gates are not supported yet. I don't know if there are some subtle differences in terms of gauge quality however (if not equilibrating the gauges after rounds of gates), but yes likely worth checking out. Approximately (because of compressions) unitary circuits are relatively well gauged at all times in any case.

Regarding eager contractions - quimb implements a contract_structured method that uses a manual contraction scheme. It is automatically invoked when you call .contract(...) on a 1D like TN - note Ellipsis not all. The general circuit methods are really orientated around around the non-mps TN, so it currently is best to use the MPS object directly for the final computations. In general overriding the circuit methods to use MPS specific routines could use some love - e.g. sampling could be made much more efficient. I don't do much 1D stuff in my research thus the slight lack of development here.

And yes regarding the CNOT splitting - that happens only in the non-MPS case, as it seems you found out!

As a side note, snakeviz can sometimes give confusing results due to this - https://stackoverflow.com/questions/50436881/python-cprofile-snakeviz-cant-handle-multiple-functions-calling-same-function, especially in quimb where many methods route through tensor_split for example. A nice complement is https://github.com/joerick/pyinstrument.

jcmgray · 2024-05-15T17:47:50Z

Here's a implementation using gate_simple_ just to get you going:

import quimb.tensor as qtn
import numpy as np

n = 40
n_layers = 20
params = np.random.rand(n_layers, n, 3) * 2 * np.pi

psi = qtn.MPS_computational_state("0" * n)
gauges = {}

contract_opts = dict(
    max_bond=64,
    cutoff=1e-10,
)

gates = []

for l in range(n_layers):
    for i in range(n):
        for j, label in enumerate(["rz", "ry", "rz"]):
            gates.append(
                qtn.Gate(label, params=(params[l, i, j],), qubits=[i])
            )
    for i in range(l % 2, n - 1, 2):
        gates.append(
            qtn.Gate("cnot", params=(), qubits=[i, i + 1])
        )

for gate in gates:
    psi.gate_simple_(
        G=gate.array, 
        where=gate.qubits, 
        gauges=gauges,
        renorm=False, 
        **contract_opts,
    )

# re-insert gauges
psi.gauge_simple_insert(gauges, remove=True)
# approx fidelity
psi.H @ psi
# 0.3917061075190

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPS circuit `apply_gate` implementation & performance #232

MPS circuit `apply_gate` implementation & performance #232

vincentmr commented May 14, 2024

jcmgray commented May 14, 2024

Qottmann commented May 15, 2024 •

edited

Loading

vincentmr commented May 15, 2024 •

edited

Loading

jcmgray commented May 15, 2024

jcmgray commented May 15, 2024

MPS circuit apply_gate implementation & performance #232

MPS circuit apply_gate implementation & performance #232

Comments

vincentmr commented May 14, 2024

What is your issue?

jcmgray commented May 14, 2024

Qottmann commented May 15, 2024 • edited Loading

vincentmr commented May 15, 2024 • edited Loading

jcmgray commented May 15, 2024

jcmgray commented May 15, 2024

MPS circuit `apply_gate` implementation & performance #232

MPS circuit `apply_gate` implementation & performance #232

Qottmann commented May 15, 2024 •

edited

Loading

vincentmr commented May 15, 2024 •

edited

Loading