Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPS circuit apply_gate implementation & performance #232

Open
vincentmr opened this issue May 14, 2024 · 5 comments
Open

MPS circuit apply_gate implementation & performance #232

vincentmr opened this issue May 14, 2024 · 5 comments

Comments

@vincentmr
Copy link

What is your issue?

I would like to use Quimb as a backend for an MPS-based quantum circuit simulator. When running a circuit such as

import quimb.tensor as qtn
import numpy as np

cnot = [
    [1.0, 0.0, 0.0, 0.0],
    [0.0, 1.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 1.0],
    [0.0, 0.0, 1.0, 0.0],
]
cnot = np.array(cnot)
options = {"contract": "swap+split", "max_bond": 8}

n = 20
n_layers = 2
zero_mps = qtn.MPS_computational_state("0" * n)
circ = qtn.CircuitMPS(psi0=zero_mps)

for l in range(n_layers):
    for i in range(l % 2, n - 1, 2):
        circ.apply_gate(cnot, *(i, i + 1), **options)

with cProfile (python -m cProfile -o 01-prof mps.py) and visualizing with snakeviz, I see that mostly all the time is spent between canonicalize and gate_split. I'm wondering:

  • why do canonicalize and gate_split take so much time in this example?
  • is there a way to accelerate such computations with the current state of Quimb?
  • are there optimizations coming in Quimb that would accelerate such calculations?
@jcmgray
Copy link
Owner

jcmgray commented May 14, 2024

Maybe you could quantify "so much time"? This code takes about 10ms for me, possibly what you are seeing is some numba JIT compilation time on first run?

For the specific questions:

why do canonicalize and gate_split take so much time in this example?

These are the main two functions of an MPS simulator, you would expect them to take close to 100% of the time, especially as the bond dimensions become large (in fact this example has bond dimension 1 throughout, so in fact the time is almost all overhead, rather than linear algebra that will eventually dominate).

is there a way to accelerate such computations with the current state of Quimb?

For low bond (i.e. very cheap) simulations, acceleration is hard because so little time is spent on the actual numeric operations. This is one area where you expect python to be slower. Once the bond dimensions are big however, it should already be pretty optimized. You can further use a GPU or alternative array backend like torch to futher speeed things up as here: https://quimb.readthedocs.io/en/latest/tensor-circuit-mps.html.

are there optimizations coming in Quimb that would accelerate such calculations?

I am open to suggestions here, especially if you have a comparison point for a MPS simulator that is much faster and how they achieved that, but there is nothing specific planned!

@Qottmann
Copy link

Qottmann commented May 15, 2024

I am open to suggestions here, especially if you have a comparison point for a MPS simulator that is much faster and how they achieved that, but there is nothing specific planned!

One particular point we had in mind was to use Vidal's form (see eq. (156) in 4.6 Notations and conversions in https://arxiv.org/abs/1008.3477) to represent an MPS. This would allow us to avoid re-canonization and should bring some noticable speed-ups. This is for example what TeNPy are doing, keeping track of both right-orthogonal B tensors and singular values SVs.

With that description you can apply local gates cheaply. For example, applying a 2-qubit gate at sites (i, i+1), you can do with the orthogonality center that is just SVs[i] - B[i] - B[i+1], so one additional multiplication of the singular values from the left, plus another of the inverse after the operation to restore the right-orthogonality of the new B matrix.

If I understand correctly, in the example above we are re-canonicalizing all sites 0, .., i-1 (or i+2,.., N, not sure which convention quimb are using) all involving SVDs. Please correct me if I am wrong here or if there is a better way to apply gates without having to re-canonicalize the MPS. I saw that there is a way to tell quimb where the current orthogonality center is, though this requires some structure of the circuits to be useful.

I understand that this is quite a significant feature request changing the structure of the MPS in quimb. I would be curious to hear your thoughts on this!

I also have a question: Is there a way to do eager contractions in quimb? For MPS there is a canonical, optimal contraction path and I am wondering if one can skip rehearsals and other overheads in MPS simulations. I would also imagine it handy sometimes to have a contrete MPS if you want to re-use it for multiple local expectation values or just want to look at the state itself.

@vincentmr
Copy link
Author

vincentmr commented May 15, 2024

Maybe you could quantify "so much time"? This code takes about 10ms for me, possibly what you are seeing is some numba JIT compilation time on first run?

Sorry, poor wording choice here. I meant as a percentage of the entire calculation, but since you mentioned this is expected, I would say it's understood and hence all good.

(in fact this example has bond dimension 1 throughout, so in fact the time is almost all overhead, rather than linear algebra that will eventually dominate).

I was trying to come up with the simplest example, but that one may be a bit too simple indeed. Cheating even :)

I'll post the following example for those who might want to play with max_bond (it doubles every two layers).

import quimb.tensor as qtn
import numpy as np

options = {"contract": "swap+split", "max_bond": 64}
n = 40
n_layers = 20
zero_mps = qtn.MPS_computational_state("0" * n)
circ = qtn.CircuitMPS(psi0=zero_mps)
params = np.random.rand(n_layers, n, 3) * 2 * np.pi
for l in range(n_layers):
    for i in range(n):
        circ.apply_gate("rz", params[l, i, 0], i, **options)
        circ.apply_gate("ry", params[l, i, 1], i, **options)
        circ.apply_gate("rz", params[l, i, 2], i, **options)
    for i in range(l % 2, n - 1, 2):
        circ.apply_gate("cnot", *(i, i + 1), **options)

print(circ.local_expectation(np.array([[1,0],[0,-1]]),0))

are there optimizations coming in Quimb that would accelerate such calculations?

I am open to suggestions here, especially if you have a comparison point for a MPS simulator that is much faster and how they achieved that, but there is nothing specific planned!

@Qottmann mentioned Vidal's form might be a good way to reduce overheads and lower the cost of canonicalization.

Significant time is also spent in tensor_core's tensor_split, to split the CNOT gates. I was wondering whether it would be possible to cache the results of such split operations (for static gates in particular) and reuse them on various targets on the MPS.

After further investigation, I think the above doesn't apply.

@jcmgray
Copy link
Owner

jcmgray commented May 15, 2024

Thanks for the various clarifications!

Regarding canonicalization and Vidal gauge. Currently quimb yes passes around the current orthogonality center for 2qb gates so that it only has move minimally, (e.g. if it is around sites (4, 5) and the next 2qb gate is on (6, 7) it will canonicalize only from 4->6). I only recently explicitly added this explicit tracking to the MPS circuit simulator, https://quimb.readthedocs.io/en/latest/changelog.html#v1-8-1-2024-05-06, which you may want to double check if you have upgraded to!

You can monitor it directly with:

print(circ.gate_opts["info"]["cur_orthog"])

For the Vidal form, this is the same as Simple Update, so you could in fact adapt the SimpleUpdateGen implementation in 1D and just call with circuit gates rather than imag time evolution. The key function is gate_simple, long-range gates are not supported yet. I don't know if there are some subtle differences in terms of gauge quality however (if not equilibrating the gauges after rounds of gates), but yes likely worth checking out. Approximately (because of compressions) unitary circuits are relatively well gauged at all times in any case.

Regarding eager contractions - quimb implements a contract_structured method that uses a manual contraction scheme. It is automatically invoked when you call .contract(...) on a 1D like TN - note Ellipsis not all. The general circuit methods are really orientated around around the non-mps TN, so it currently is best to use the MPS object directly for the final computations. In general overriding the circuit methods to use MPS specific routines could use some love - e.g. sampling could be made much more efficient. I don't do much 1D stuff in my research thus the slight lack of development here.

And yes regarding the CNOT splitting - that happens only in the non-MPS case, as it seems you found out!

As a side note, snakeviz can sometimes give confusing results due to this - https://stackoverflow.com/questions/50436881/python-cprofile-snakeviz-cant-handle-multiple-functions-calling-same-function, especially in quimb where many methods route through tensor_split for example. A nice complement is https://github.com/joerick/pyinstrument.

@jcmgray
Copy link
Owner

jcmgray commented May 15, 2024

Here's a implementation using gate_simple_ just to get you going:

import quimb.tensor as qtn
import numpy as np

n = 40
n_layers = 20
params = np.random.rand(n_layers, n, 3) * 2 * np.pi

psi = qtn.MPS_computational_state("0" * n)
gauges = {}

contract_opts = dict(
    max_bond=64,
    cutoff=1e-10,
)

gates = []

for l in range(n_layers):
    for i in range(n):
        for j, label in enumerate(["rz", "ry", "rz"]):
            gates.append(
                qtn.Gate(label, params=(params[l, i, j],), qubits=[i])
            )
    for i in range(l % 2, n - 1, 2):
        gates.append(
            qtn.Gate("cnot", params=(), qubits=[i, i + 1])
        )

for gate in gates:
    psi.gate_simple_(
        G=gate.array, 
        where=gate.qubits, 
        gauges=gauges,
        renorm=False, 
        **contract_opts,
    )

# re-insert gauges
psi.gauge_simple_insert(gauges, remove=True)
# approx fidelity
psi.H @ psi
# 0.3917061075190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants