index.Rmd

---
published: false
title: "Introduction to Quantum Information Science"
author: "[Artur Ekert](https://www.arturekert.org/), [Timothy Hosgood](https://thosgood.com), [Alastair Kay](http://www.ma.rhul.ac.uk/akay/), [Chiara Macchiavello](https://fisica.unipv.it/personale/Persona.php?ID=55)"
date: "Last updated: `r format(Sys.Date(), format='%d %B %Y')`"
description: "An introductory textbook on quantum information science."
favicon: "favicon.ico"
---

\providecommand{\xmapsto}[1]{\overset{#1}{\longmapsto}}
\providecommand{\bra}[1]{\langle#1|}
\providecommand{\ket}[1]{|#1\rangle}
\providecommand{\braket}[2]{\langle#1|#2\rangle}
\providecommand{\proj}[1]{|#1\rangle\langle#1|}
\providecommand{\av}[1]{\langle#1\rangle}
\providecommand{\tr}{\operatorname{tr}}
\providecommand{\id}{\mathbf{1}}
\providecommand{\range}{\operatorname{range}}
\providecommand{\gcd}{\operatorname{gcd}}
\providecommand{\hcf}{\operatorname{hcf}}
\providecommand{\divides}{\mathbin{\vert}}
\providecommand{\NOT}{\texttt{NOT}}
\providecommand{\cNOT}{\texttt{c-NOT}}

\providecommand{\diag}[2]{\begin{bmatrix}#1&0\\0&#2\end{bmatrix}}
\providecommand{\smalldiag}[2]{\left(\begin{smallmatrix}#1&0\\0&#2\end{smallmatrix}\right)}
\providecommand{\mqty}[1]{\begin{matrix}#1\end{matrix}}
\providecommand{\bmqty}[1]{\begin{bmatrix}#1\end{bmatrix}}

\renewcommand{\leq}{\leqslant}
\renewcommand{\geq}{\geqslant}
\renewcommand{\mod}[1]{\ \mathrm{mod}\ #1}
\renewcommand{\Im}{\operatorname{Im}}
\renewcommand{\Re}{\operatorname{Re}}

```{r echo=FALSE,warning=FALSE}
 library(knitr)
  opts_chunk$set(cache=TRUE,
                 echo=FALSE,
                 fig.pos='H',
                 fig.ext=if (knitr:::is_latex_output()) 'pdf' else 'png',
                 fig.align='center'
                )
```

# Introduction {-}

::: {.todo latex=""}
<!-- TO-DO: write introduction -->
:::

> Although almost complete, this book is still a work-in-progress --- a few sections are missing, but we are constantly updating and filling in the gaps!
> Because of this, external links to specific chapters or sections might break as things move around.

::: {.idea specific_to="html"}
**Most recent changes:**

- *[New content]* Chapter \@ref(decoherence-and-recoherence): correctable channels, decoherence and interference, classical [3,1,3] code, and Shor [[9,1,3]] code; corresponding exercises
- *[New content]* Chapter \@ref(error-correction): Hamming codes, linear codes, CSS codes, logical states and operators, error-correcting conditions, error-protected encoding; corresponding exercises

[Full change history](https://github.com/thosgood/qubit.guide/commits/main)
:::


## Plan and intended audience {-}

In this series of lectures you will learn how inherently quantum phenomena, such as quantum interference and quantum entanglement, can make information processing more efficient and more secure, even in the presence of noise.

There are many introductions to quantum information science, so it seems like a good idea to start with an explanation of why this particular one exists.
When learning such a subject, located somewhere in between mathematics, physics, and computer science, there are many possible approaches, with one main factor being "how far along the scale of informal to formal do I want to be?".
In these notes we take the following philosophy: it can be both interesting and fun to cover lots of ground quickly and see as much as possible on a surface level, but it's also good to know that there is a lot of important stuff that you'll miss by doing this.
In practice, this means that we don't worry to much about high-level mathematics.
That is not to say that we do not use mathematics "properly" --- in these notes you'll find a perfectly formal treatment of e.g. quantum channels via completely positive trace-preserving maps in the language of linear algebra --- but rather than putting too many footnotes with technical caveats and explanations throughout the main text, we opt to collect them all together into one big "warning" here:

::: {.idea latex=""}
The mathematics underlying quantum theory is a vast and in-depth subject, most of which we will never touch upon, some of which we will only allude to, and the rest of which we will cover only in the level of detail necessary for our overarching goal (give or take some interesting mathematical detours).
:::

But this then poses the question of what the overarching goal of this book actually is.

::: {.idea latex=""}
This book aims to help the eager reader understand what quantum information science is all about, and for them to realise which facets of it they would like to study in more detail.
:::

But this does not mean that our treatment is cursory!
In fact, by the end of this book you will have learnt a fair bit more than what might usually be covered in a standard quantum information science course that you would find in a mathematics masters degree, for example.

The interdisciplinary nature of this topic, combined with the diverse backgrounds that different readers have, means that some may find certain chapters easy, while others find the same ones difficult --- so if things seem hard to you then don't worry, because the next chapter might feel much easier!
The only real prerequisites are a working knowledge of complex numbers and vectors and matrices; some previous exposure to elementary probability theory and Dirac bra-ket notation (for example) would be helpful, but we provide crash-course introductions to some topics like these at the end of this chapter.
A basic knowledge of quantum mechanics (especially in the simple context of finite dimensional state spaces, e.g. state vectors, composite systems, unitary matrices, Born rule for quantum measurements) and some ideas from classical theoretical computer science (complexity theory, algorithms) would be helpful, but is *not at all* necessary.

Of course, even if you aren't familiar with the formal mathematics of complex numbers and linear algebra, then that shouldn't stop you from reading this book if you want to.
You might be surprised at how much you can [black box](https://en.wikipedia.org/wiki/Black_box) the bits that you don't understand.
The caveat stands, however, that, to *really* get to grips with this subject, at least some knowledge of maths is necessary --- and this is not a bad thing!

On that note, every chapter ends with a section called "*Remarks and exercises*".
You will find the same advice in basically every single mathematical text: even just attempting to do the exercises is almost more important than reading the actual book itself.
For this book, it is doubly true that *you should at least read these sections*, because they contain not just exercises but also further content including worked exercises and further fundamental expository content.

Finally, throughout this text you will find some technical asides.
These are *not at all* necessary reading, but are just there to provide the exceptionally eager reader (or perhaps those with a more formal mathematical background) with some extra context, as well as some pointers towards further reading.
They are usually intentionally vague and scarce in detail.


```{r, child = if (knitr::is_html_output()) 'source/_web-instructions.Rmd'}
```

```{r, child = if (knitr::is_latex_output()) 'source/_pdf-instructions.Rmd'}
```


## How to cite this book {-}

BibLaTeX:

```{r,comment=""}
first <- '@online{qubitguide
  author = {Ekert, A and Hosgood, T and Kay, A and Macchiavello, C}
  title = {{Introduction to Quantum Information Science}}
  url = {https://qubit.guide}
  date = {'

last <- '}
}'

cat(first, format(Sys.Date(), format='%Y-%m-%d'), last, sep="")
```

BibTeX:

```{r,comment=""}
first <- '@misc{qubitguide
  author = {Ekert, A and Hosgood, T and Kay, A and Macchiavello, C}
  title = {{Introduction to Quantum Information Science}}
  howpublished = {\\url{https://qubit.guide}}
  date = {'

last <- '}
}'

cat(first, format(Sys.Date(), format='%Y-%m-%d'), last, sep="")
```


## Acknowledgements {-}

We thank the following for their helpful comments and corrections, as well as for catching typos: Zhenyu Cai, Jedrzej Burkat, Maryam Khaqan, Rolando Reiner, Jan Baltussen, Linus Kelsey, Edgardo Deza, Elizabeth Ealing, L.L. Salcedo, Giuseppe Bisicchia, Tom Scruby, Marcello Terra Cunha, Orla Supple, Ali Al-Ali, Garen-Ohan Gregorian.
We also appreciate the work of Yihui Xie in developing the [Bookdown package](https://bookdown.org/yihui/bookdown/) with which this e-book was built.

The creation and hosting of the online book was made possible by the [Centre for Quantum Technologies](https://cqt.quantumlah.org/), the [University of Oxford Mathematical Institute](https://www.maths.ox.ac.uk/), and the [Okinawa Institute of Science and Technology](https://www.oist.jp/).

```{r, out.width='90%'}
knitr::include_graphics('images/sponsors.png')
```


\loadgeometry{tufteish}
\pagestyle{fancy}

# Some mathematical preliminaries {-}

Here we quickly recall most of the fundamental mathematical results that we will rely on in the rest of this book, most importantly *linear algebra over the complex numbers*.
However, we will not introduce everything from the ground up.
Most notably, we will assume that the reader understands what a **matrix** is, and how it represents a **linear transformation**; some prior exposure to **complex numbers** would be helpful.

If an equation like $\tr\ket{a}\bra{b}=\braket{b}{a}$ makes sense to you, and you can find the eigenvalues and eigenvectors of a matrix like
$$
  \begin{bmatrix}
    0 & 1+i
  \\\sqrt{2}e^{-i\pi/4} & 0
  \end{bmatrix}
$$
then you can safely skip over this section and get started directly with Chapter \@ref(quantum-interference).

As a small note on notation, we generally write "$x\coloneqq y$" to mean "$x$ is defined to be (equal to) $y$", and "$x\equiv y$" to mean "$x$ is just another name for $y$", but sometimes we simply just write "$x=y$".


## Complex numbers

One of the fundamental ingredients of quantum information science (and, indeed, of quantum physics in general) is the notion of **complex numbers**.
It would be disingenuous to expect that a few paragraphs would suffice to make the reader sufficiently familiar with subject, but we try our best here to give a speedy overview of the core principles, and end with some exercises that can be a helpful indicator of which things you might want to read up on elsewhere.

The "classical" way of arriving at complex numbers is as follows: start with the **natural numbers** $\mathbb{N}=\{0,1,2,\ldots\}$, which we can add; if we want to be able to invert addition (i.e. subtract), then we end up with the **integers** $\mathbb{Z}=\{\ldots,-2,-1,0,1,2,\ldots\}$, which we can multiply; if we want to be able to invert multiplication (i.e. divide), then we end up with the **rationals** $\mathbb{Q}=\{\frac{p}{q}\mid p,q\in\mathbb{Z}\}$.
In this process of "closure under more and more binary operations", we have passed from a **monoid**, to a **group**, to a **field**.
Algebraically, then, we seem to be done: we can do all the addition and multiplication that we like, and we can invert it whenever it makes sense to do so (e.g. we can divide, as long as it's not by $0$).

But there are lots of numbers that turn up in geometry that are not rational, such as $\sqrt{2}\approx1.414$, $\pi\approx3.14$, and $e\approx2.718$.
To include all of these (and simultaneously make sense of things like infinite sums, and limits), we must do some **real analysis** --- something which we won't touch upon here --- to end up with the **real numbers** $\mathbb{R}$.
These form a field, just like the rationals, but now we don't have any "gaps" in our number line.
So what's left to do?

Well the reals have one big problem: they are not **algebraically closed**.
That is, there exist polynomials with no roots, i.e. equations of the form $a_nx^n+a_{n-1}x^{n-1}+\ldots+a_1x+a_0=0$ (where the $a_i$ are real numbers) that have no solutions.^[To explain why we care so much about polynomials would be the subject of a whole nother book, but one important reason (of the *many*!), for both analysts and geometers alike, is the [**Weierstrass Approximation Theorem**](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem).]
Somehow the most fundamental such example is the equation $x^2+1=0$, which has no solutions, because the square of any real number must be non-negative, and so $\sqrt{-1}\not\in\mathbb{R}$.

It turns out that if we just throw in this one extra number $i\coloneqq\sqrt{-1}$ to $\mathbb{R}$ then we can solve *any* polynomial --- a theorem so important that it's known as the **fundamental theorem of algebra**.
We call the result of doing this the **complex numbers**, and denote them by $\mathbb{C}$.

This gives us an algebraic way of understanding what a complex number is: it is a real number $x$ plus an **imaginary** number $iy$ (where $y\in\mathbb{R}$)
That is, every complex number $x+iy$ simply corresponds to a pair of real numbers $(x,y)$.
So now we can think geometrically!
We imagine the complex numbers $\mathbb{C}$ as the 2-dimensional Euclidean space $\mathbb{R}^2$, where the $x$-axis corresponds to the real part of a complex number, and the $y$-axis to the imaginary part.
This really is a geometric way of thinking, since now addition (and subtraction) of complex numbers (which is defined by adding their real and imaginary parts separately) is given by vector addition, as shown in Figure \@ref(fig:complex-addition).

(ref:complex-addition-caption) Addition of two complex numbers $z=x+iy$ and $z'=x'+iy'$, where we write $\Re$ (resp. $\Im$) for the real (resp. imaginary) part of a complex number: $\Re(x+iy)=x$ and $\Im(x+iy)=y$. Commutativity of addition corresponds to what is sometimes called the **parallelogram law** for addition of vectors.

```{r complex-addition,engine='tikz',fig.cap='(ref:complex-addition-caption)',fig.width=3}
\usetikzlibrary{arrows.meta}
\definecolor{primary}{RGB}{177,98,78}
\definecolor{secondary}{RGB}{91,132,177}
\begin{tikzpicture}[scale=0.8]
  \draw [->] (-0.2,0) to (4,0) node[label={right:$\Re(z)$}]{};
  \draw [->] (0,-0.2) to (0,4) node[label={above:$\Im(z)$}]{};
  %
  \draw [primary,ultra thick,-Latex] (1,2.5) to (4,3.5);
  \draw [secondary,ultra thick,-Latex] (3,1) to (4,3.5);
  %
  \draw [primary,ultra thick,-Latex] (0,0) to (3,1) node[black,label={right:$z$}]{$\bullet$};
  \draw [secondary,ultra thick,-Latex] (0,0) to (1,2.5) node[black,label={above:$z'$}]{$\bullet$};
  %
  \node [label={right:$z+z'$}] at (4,3.5) {$\bullet$};
\end{tikzpicture}
```

But what about multiplication and division?
Following the rules of the game, we can figure out what the product of two complex numbers is by treating the imaginary number $i$ as a "formal variable", i.e. pretending it's just a variable in some polynomial, and then remembering that $i=\sqrt{-1}$ at the very end:
$$
  \begin{aligned}
    (x+iy)(x'+iy')
    &= xx'+ixy'+iyx'+i^2yy'
  \\&= xx'+ixy'+iyx'-yy'
  \\&= xx'-yy'+i(xy'+yx').
  \end{aligned}
$$

Division works similarly --- the most simple example of inverting a complex number $x+iy$ makes sense whenever $x$ and $y$ are both non-zero, since then we can use the trick of "multiplying by $1$":
$$
  \begin{aligned}
    \frac{1}{x+iy}
    &= \frac{1}{x+iy}\frac{x-iy}{x-iy}
  \\&= \frac{x-iy}{x^2+y^2}
  \\&= \frac{x}{x^2+y^2}-i\frac{y}{x^2+y^2}
  \end{aligned}
$$

This other complex number $x-iy$ that we used is somehow special because it is exactly the thing we needed to make the denominator real, so we give it a name: the **complex conjugate**^[The more common notation in mathematics is $\bar{z}$ instead of $z^\star$, but physicists tend to like the latter.] of a complex number $z=x+iy$ is the complex number $z^\star\coloneqq x-iy$.
Geometrically, this is just the reflection of the vector $(x,y)\in\mathbb{R}^2$ in the $x$-axis.
The product $zz^\star=x^2+y^2$ is also important: you might recognise (from Pythagoras' theorem) that $\sqrt{x^2+y^2}$ is exactly the length of the vector $(x,y)$, and so we call the real number $|z|\coloneqq\sqrt{zz^\star}$ the **modulus** (or magnitude, norm, or absolute value).
Note then that we can simply write $1/z=z^\star/|z|^2$.

Now things are looking somewhat nice, but the story isn't complete.
We have a good geometric intuition for what a complex number is (a vector in $\mathbb{R}^2$) and how to add them (vector addition), as well as what the complex conjugate and the modulus mean (reflection in the $x$-axis, and the length of the vector, respectively); but what about multiplication and division?

To understand these we need to switch from our **rectangular coordinates** $z=x+iy$ to **polar coordinates** --- instead of describing a point $z$ in $\mathbb{R}^2$ as "$x$ units left/right and $y$ units up/down", we describe it as "$r$ units from the origin, at an angle of $\theta$ [**radians**](https://en.wikipedia.org/wiki/Radian)".
We already know, given $(x,y)\in\mathbb{R}^2$, how to calculate its distance $r$ from the origin, since this is exactly the length of the vector: $r=|(x,y)|=\sqrt{x^2+y^2}$.
But what about the angle?
Some trigonometry tells us that $\theta=\arctan(y/x)$, so we now know how to convert rectangular to polar coordinates:
$$
  x+iy = (x,y)
  \longmapsto (r,\theta) \coloneqq (\sqrt{x^2+y^2},\arctan(y/x)).
$$

It would be nice to know how to go in the other direction though, but this can also be solved with some trigonometry:
$$
  (r,\theta)
  \longmapsto (r\cos\theta,r\sin\theta).
$$

(ref:converting-planar-and-polar-caption) Expressing a complex number $z$ in both planar and polar forms.

```{r converting-planar-and-polar,engine='tikz',fig.cap='(ref:converting-planar-and-polar-caption)',fig.width=4}
\usetikzlibrary{arrows.meta}
\definecolor{primary}{RGB}{177,98,78}
\definecolor{secondary}{RGB}{91,132,177}
\begin{tikzpicture}
  \draw [->] (-0.2,0) to (4,0) node[label={right:$\Re(z)$}]{};
  \draw [->] (0,-0.2) to (0,4) node[label={above:$\Im(z)$}]{};
  %
  \draw [secondary,thick,-Latex] (0,0) to node[sloped,anchor=center,above]{$r=\sqrt{x^2+y^2}$} (3,2.5);
  \draw [secondary,thick, bend right] (0:0.9) to (40:0.9);
  \node [secondary] at (1.8,0.225) {$\theta=\arctan(y/x)$};
  %
  \draw [primary,thick,-Latex] (0,0) to (3,0) node[below]{$x=r\cos\theta$};
  \draw [primary,thick,dashed] (3,0) to (3,2.5);
  \draw [primary,thick,-Latex] (0,0) to (0,2.5) node[left]{$y=r\sin\theta$};
  \draw [primary,thick,dashed] (0,2.5) to (3,2.5);
  %
  \node [label={above:$z$}] at (3,2.5) {$\bullet$};
\end{tikzpicture}
```

Great!
... but what's the point of polar coordinates?
Well, it turns out that they give us a geometric way of understanding multiplication: you can show^[**Exercise.** Prove this!] that $(r,\theta)$ multiplied by $(r',\theta')$ is exactly $(rr',\theta+\theta')$, which says that multiplication by a complex number $(r,\theta)$ is exactly a scaling by a factor of $r$ and a rotation by $\theta$.
This means that we can also easily find the multiplicative inverse of $(r,\theta)$, since it's just $(1/r,-\theta)$.
Finally, complex conjugation just means switching the sign of the angle: $(r,\theta)^\star=(r,-\theta)$.

There is one last ingredient that we should mention, which is the thing that really solidifies the relation between rectangular and polar coordinates.
We know that rectangular coordinates $(x,y)$ can be written as $x+iy$, so is there some more algebraic way of writing polar coordinates $(r,\theta)$?
Then we can avoid any ambiguity that might arise from using pairs of numbers --- if I tell you that I'm thinking of the complex number $z=(0.3,2)$, do I mean the point $0.3+2i$, or the point that is distance $r$ from the origin at an angle of $2$ radians?

Given polar coordinates $(r,\theta)$, we know that this is equal to $(r\cos\theta,r\sin\theta)$ in rectangular coordinates.
For simplicity, let's first consider the case where $r=1$.
Then we can write $(1,\theta)$ as $\cos\theta+i\sin\theta$.
Using the [**Taylor series**](https://en.wikipedia.org/wiki/Taylor_series)^[If you don't know about Taylor series, then feel free to just skim this part, but make sure to read the punchline!] of $\sin$ and $\cos$, we can rewrite this as
$$
  \begin{aligned}
    \cos\theta+i\sin\theta
    &= \left(
      1-\frac{\theta^2}{2!}+\frac{\theta^4}{4!}-\ldots
    \right) + i\left(
      \theta-\frac{\theta^3}{3!}+\frac{\theta^5}{5!}-\ldots
    \right)
  \\&= 1+i\theta-\frac{\theta^2}{2!}-i\frac{\theta^3}{3!}+\frac{\theta^4}{4!}+i\frac{\theta^5}{5!}-\ldots
  \\&= 1+i\theta+\frac{i^2\theta^2}{2!}+\frac{i^3\theta^3}{3!}+\frac{i^4\theta^4}{4!}+\frac{i^5\theta^5}{5!}+\ldots
  \\&= \exp(i\theta)
  \end{aligned}
$$
where at the very end we use the Taylor expansion of the [**exponential function**](https://en.wikipedia.org/wiki/Exponential_function) $\exp(x)=e^x$.

We have just "proved"^[It is very important to point out that this "proof" is not rigorous or formal --- you need to be very *very* careful when rearranging infinite sums! However, this proof *can be made rigorous* by using some real analysis.] one of the most remarkable formulas in mathematics: **Euler's formula**
$$
  e^{i\theta} = \cos\theta+i\sin\theta
$$
(a special case of which gives the famous equation $e^{i\pi}+1=0$, uniting five fundamental constants: $0$, $1$, $i$, $e$, and $\pi$).
In summary then, we have two beautiful ways of expressing a complex number $z\in\mathbb{C}$, in either its rectangular/planar form or its polar/Euler form:
$$
  z = x+iy = re^{i\theta}.
$$

::: {.idea latex=""}
Addition and subtraction are most neatly expressed in the planar form $x+iy$, and multiplication and division are most neatly expressed in the polar form $re^{i\theta}$; complex conjugation looks nice and tidy in both.
:::

::: {.technical title="Addition of polar vectors" latex="{Addition of polar vectors}"}
We know how to perform addition, multiplication, inversion (which is a special case of division), and complex conjugation on complex numbers in planar form, but we've only described how to do the last *three* of these in polar form: we haven't said how to write $re^{i\theta}+r'e^{i\theta'}$ as $se^{i\varphi}$ for some $s$ and $\varphi$.
This is because it is very messy looking:
$$
  \begin{aligned}
    s
    &= \sqrt{r^2+(r')^2+2rr'\cos(\theta'-\theta)}
  \\\varphi
    &= \theta+\operatorname{atan2}\big(r'\sin(\theta'-\theta),r+r'\cos(\theta'-\theta)\big)
  \end{aligned}
$$
and where $\operatorname{atan2}$ is the [2-argument arctangent function](https://en.wikipedia.org/wiki/Atan2).
:::

You do not need to know everything about this whole story of algebraically closed fields and so on, but it helps to know the basics, so here are some exercises that should help you to become more familiar.^[Note that we have not really given you enough information in this section to be able to solve all these exercises, but that is intentional! Sometimes we like to ask questions and not answer them, with the hope that you will enjoy getting to do some research by yourself.]

a. The set $\mathbb{Q}$ of rational numbers and the set $\mathbb{R}$ of real numbers are both fields, but the set $\mathbb{Z}$ of integers is not. Why not?
b. Look up the formal statement of the fundamental theorem of algebra.
c. Evaluate each of the following quantities:
  $$
    1+e^{-i\pi},
    \quad
    |1+i|,
    \quad
    (1+i)^{42},
    \quad
    \sqrt{i},
    \quad
    2^i,
    \quad
    i^i.
  $$
d. Here is a simple "proof" that $+1=-1$:
  $$
    1=\sqrt{1}=\sqrt{(-1)(-1)}=\sqrt{-1}\sqrt{-1}=i^2=-1.
  $$
  What is wrong with it?
e. Prove that, for any two complex numbers $w,z\in\mathbb{C}$, we always have the inequality^[*Hint: use polar form, draw a diagram, and appeal to the [**triangle inequality**](https://en.wikipedia.org/wiki/Triangle_inequality).*]
  $$
    |z-w| \geq |z|-|w|.
  $$
f. Using the fact that $e^{3i\theta}=(e^{i\theta})^3$, derive a formula for $\cos3\theta$ in terms of $\cos\theta$ and $\sin\theta$.


## Euclidean vectors and vector spaces

We assume that you are familiar with Euclidean vectors --- those arrow-like geometric objects which are used to represent physical quantities, such as trajectories, velocities, or forces.
You know that any two velocities can be added to yield a third, and the multiplication of a "velocity vector" by a real number is another "velocity vector".
So a **linear combination** of vectors is another vector: if $v$ and $w$ are vectors, and $\lambda$ and $\mu$ are numbers (rational, real, or complex, for example), then $\lambda v+\mu w$ is another vector.
Mathematicians have simply taken these properties and defined vectors as _anything_ that we can add and multiply by numbers, as long as everything behaves in a nice enough way.
This is basically what an Italian mathematician Giuseppe Peano (1858--1932) did in a chapter of his 1888 book with an impressive title: _Calcolo geometrico secondo l'Ausdehnungslehre di H. Grassmann preceduto dalle operazioni della logica deduttiva_.
Following Peano, we define a **vector space** as a mathematical structure in which the notion of linear combination "makes sense".

More formally, a **complex vector space** is a set $V$ such that, given any two **vectors** $a$ and $b$ (that is, any two elements of $V$) and any two *complex* numbers $\alpha$ and $\beta$, we can form the linear combination $\alpha a+\beta b$, which is also a vector in $V$.
There are certain "nice properties" that vector spaces things must satisfy. Addition of vectors must be commutative and associative, with an identity (the zero vector, which is often written as $\mathbf{0}$) and an inverse for each $v$ (written as $-v$). Multiplication by complex numbers must obey the two distributive laws: $(\alpha+\beta)v = \alpha v+\beta v$ and $\alpha (v+w) = \alpha v+\alpha w$.

::: {.technical title="Modules over a ring" latex="{Modules over a ring}"}
A more succinct way of defining a vector space is as an abelian group endowed with a scalar action of a field.
This showcases vector spaces as a particularly well behaved example of a more general object: **modules over a ring**.
:::

A **subspace** of $V$ is any subset of $V$ which is closed under vector addition and multiplication by complex numbers.
Here we start using the Dirac bra-ket notation and write vectors in a somewhat fancy way as $\ket{\text{label}}$, where the "label" is anything that serves to specify what the vector is.
For example, $\ket{\uparrow}$ and $\ket{\downarrow}$ may refer to an electron with spin up or down along some prescribed direction, and $\ket{0}$ and $\ket{1}$ may describe a quantum bit holding either logical $0$ or $1$.
As a maybe more familiar example, the set of binary strings of length $n$ is a vector space over the field $\mathbb{Z}/2\mathbb{Z}$ of integers mod $2$; in the case $n=2$ we can write down all the vectors in this vector space in this notation: $\ket{00}$, $\ket{01}$, $\ket{10}$, $\ket{11}$, where e.g. $\ket{10}+\ket{11}=\ket{01}$ (addition is taken mod $2$).
These are often called **ket** vectors, or simply **kets**.
(We will deal with "bras" in a moment).

A **basis** in $V$ is a collection of vectors $\ket{e_1},\ket{e_2},\ldots,\ket{e_n}$ such that every vector $\ket{v}$ in $V$ can be written (in _exactly_ one way) as a linear combination of the basis vectors: $\ket{v}=\sum_{i=1}^n v_i\ket{e_i}$, where the $v_i$ are complex numbers.
The number of elements in a basis is called the **dimension** of $V$.^[Showing that this definition is independent of the basis that we choose is a "fun" linear algebra exercise.]
The most common, and prototypical, $n$-dimensional complex vector space (and the space which we will be using most of the time) is the space of ordered $n$-tuples of complex numbers, usually written as column vectors:
$$
  \ket{a}
  = \begin{bmatrix}a_1\\a_2\\\vdots\\a_n\end{bmatrix}
$$
with a basis given by the column vectors $\ket{e_i}$ that are $0$ in every row except for a $1$ in the $i$-th row:
$$
  \ket{e_1}
  = \begin{bmatrix}1\\0\\\vdots\\0\end{bmatrix}
  \qquad
  \ket{e_2}
  = \begin{bmatrix}0\\1\\\vdots\\0\end{bmatrix}
  \qquad\ldots\qquad
  \ket{e_n}
  = \begin{bmatrix}0\\0\\\vdots\\1\end{bmatrix}
$$
and where addition of vectors is done **component-wise**, so that
$$
  \left(\sum_{i=1}^n v_i\ket{e_i}\right)+\left(\sum_{i=1}^n w_i\ket{e_i}\right)
  = \sum_{i=1}^n (v_i+w_i)\ket{e_i}
$$
or, in column vectors,
$$
  \begin{gathered}
    \ket{v}
    = \begin{bmatrix}v_1\\v_2\\\vdots\\v_n\end{bmatrix}
    \qquad
    \ket{w}
    = \begin{bmatrix}w_1\\w_2\\\vdots\\w_n\end{bmatrix}
  \\\alpha\ket{a}+\beta\ket{b}
    = \begin{bmatrix}\alpha v_1+\beta w_1\\\alpha v_2+\beta w_2\\\vdots\\\alpha v_n+\beta w_n\end{bmatrix}
  \end{gathered}
$$

Throughout the course we will deal only with vector spaces of *finite* dimensions.
This is sufficient for all our purposes and we will avoid many mathematical subtleties associated with infinite dimensional spaces, for which we would need the tools of **functional analysis**.

Formally, whenever we say **$n$-dimensional Euclidean space**, we mean the *real* vector space $\mathbb{R}^n$.


## Bras and kets {#bras-and-kets}

An **inner product** on a vector space $V$ (over the complex numbers) is a function that assigns to each pair of vectors $\ket{u},\ket{v}\in V$ a complex number $\braket{u}{v}$, and satisfies the following conditions:

- $\braket{u}{v}=\braket{v}{u}^\star$
- $\braket{v}{v}\geq 0$ for all $\ket{v}$
- $\braket{v}{v}= 0$ if and only if $\ket{v}=0$

where ${}^\star$ denotes complex conjugation (sometimes written as $z\mapsto\bar{z}$ instead).

The inner product must also be _linear_ in the second argument but _antilinear_ in the first argument:
$$
  \braket{c_1u_1+c_2u_2}{v} = c_1^\star\braket{u_1}{v}+c_2^\star\braket{u_2}{v}
$$
for any complex constants $c_1$ and $c_2$.

To any physical system we associate^[The question of *how* exactly we construct this associated space is a subtle one in the case of arbitrary physical systems, but we shall see that this is relatively straightforward when working with the types of systems that we consider in this book.] a complex vector space with an inner product, known as a **Hilbert space** $\mathcal{H}$.
The inner product between vectors $\ket{u}$ and $\ket{v}$ in ${\mathcal{H}}$ is written as $\braket{u}{v}$.

::: {.technical title="Finite-dimensional functional analysis" latex="{Finite-dimensional functional analysis}"}
If $V$ is a vector space with an inner product $\langle-,-\rangle$, then this gives us a **norm** on $V$ by defining $\|x\|=\sqrt{\langle x,x\rangle}$ and thus a **metric** by defining $d(x,y)=\|y-x\|$.
We say that a sequence $(x_n)$ in $V$ is **Cauchy** if its elements "eventually always get closer", i.e. if for all $\varepsilon>0$ there exists some $N\in\mathbb{N}$ such that for all $m,n>N$ we have $\|x_n-x_m\|<\varepsilon$.
We say that a normed space is **complete** if *every Cauchy sequence converges in that space*, i.e. if the limits of sequences that *should* exist actually *do* exist.

Now one useful fact is the following: on a *finite dimensional* vector space, all norms are equivalent.
(Note that this does *not* mean that $\|x\|_1=\|x\|_2$ for any two norms $\|-\|_1$ and $\|-\|_2$, but simply that they "induce the same topology" --- feel free to look up the precise definition elsewhere).
This follows from another useful fact: in a *finite dimensional* vector space, the unit ball is compact.
By a short topological argument, we can use these facts to show that what we claimed, namely that every *finite dimensional* inner product space is complete (with respect to the norm induced by the inner product, and thus with respect to *any* norm, since all norms are equivalent).

In the infinite dimensional case these facts are *not* true, and we have a special name for those inner product spaces which *are* complete: **Hilbert spaces**.
So working in the finite dimensional case means that "we do not have to worry about analysis", in that the completeness property comes for free the moment we have an inner product, and we can freely refer to inner product spaces as Hilbert spaces.
:::

For example, for column vectors $\ket{u}$ and $\ket{v}$ in $\mathbb{C}^n$ written as
$$
  \ket{u}
  = \begin{bmatrix}u_1\\u_2\\\vdots\\u_n\end{bmatrix}
  \qquad
  \ket{v}
  = \begin{bmatrix}v_1\\v_2\\\vdots\\v_n\end{bmatrix}
$$
their inner product is defined by
$$
  \braket{u}{v}
  = u_1^\star v_1 + u_2^\star v_2+\ldots + u_n^\star v_n.
$$
Following Dirac, we may split the inner product into two ingredients:
$$
  \braket{u}{v}
  \longrightarrow \bra{u}\,\ket{v}.
$$
Here $\ket{v}$ is a ket vector, and $\bra{u}$ is called a **bra** vector, or a **bra**, and can be represented by a row vector:
$$
  \bra{u}
  = \begin{bmatrix}u_1^\star,&u_2^\star,&\ldots,&u_n^\star\end{bmatrix}.
$$
The inner product can now be viewed as the result of the matrix multiplication:
$$
  \begin{aligned}
    \braket{u}{v}
    &= \begin{bmatrix}u_1^\star,&u_2^\star,&\ldots,&u_n^\star\end{bmatrix}
    \cdot \begin{bmatrix}v_1\\v_2\\\vdots\\v_n\end{bmatrix}
  \\&= u_1^\star v_1 + u_2^\star v_2 + \ldots + u_n^\star v_n.
  \end{aligned}
$$

Bras are vectors: you can add them, and multiply them by scalars (which, here, are complex numbers), but they are vectors in the space ${\mathcal{H}}^\star$ which is **dual** to $\mathcal{H}$.
Elements of ${\mathcal{H}}^\star$ are **linear functionals**, that is, linear maps from $\mathcal{H}$ to $\mathbb{C}$.
A linear functional $\bra{u}$ acting on a vector $\ket{v}$ in $\mathcal{H}$ gives a complex number $\braket{u}{v}$.

::: {.idea latex=""}
All Hilbert spaces of the same (finite) dimension are isomorphic, so the differences between quantum systems cannot be really understood without additional structure. This structure is provided by a specific algebra of operators acting on $\mathcal{H}$.
:::


## Daggers {#daggers}

Although $\mathcal{H}$ and $\mathcal{H}^\star$ are not identical spaces --- the former is inhabited by kets, and the latter by bras --- they are closely related.
There is a bijective map from one to the other given by $\ket{v}\leftrightarrow \bra{v}$, and denoted by a **dagger**:^["Is this a $\dagger$ which I see before me..."]
$$
  \begin{aligned}
    \bra{v}
    &= (\ket{v})^\dagger
  \\\ket{v}
    &= (\bra{v})^\dagger.
  \end{aligned}
$$
We usually omit the parentheses when it is obvious what the dagger operation applies to.

The dagger operation, also known as **Hermitian conjugation**, is _antilinear_:
$$
  \begin{aligned}
    (c_1\ket{v_1}+c_2\ket{v_2})^\dagger
    &= c_1^\star\bra{v_1} + c_2^\star\bra{v_2}
  \\(c_1\bra{v_1}+c_2\bra{v_2})^\dagger
    &= c_1^\star\ket{v_1} + c_2^\star\ket{v_2}.
  \end{aligned}
$$
Also, when applied twice, the dagger operation is the identity map.

You might already be familiar with Hermitian conjugation under another name: the **conjugate transpose**^[In mathematics texts this operation is often denoted by ${}^\star$ rather than ${}^\dagger$, but we reserve the former for complex conjugation *without* matrix transposition. Note, however, that scalars can be thought of as $(1\times1)$ matrices, and in this special case we have that $\dagger=\star$.] of an $(n\times m)$ matrix $A$ is an $(m\times n)$ matrix $A^\dagger$, obtained by interchanging the rows and columns of $A$ and taking complex conjugates of each entry in $A$, i.e. $A^\dagger_{ij}=A^\star_{ji}$.
In particular then,
$$
  \ket{v} = \begin{bmatrix}v_1\\v_2\\\vdots\\v_n\end{bmatrix}
  \overset{\dagger}{\longleftrightarrow}
  \bra{v} = \begin{bmatrix}v_1^\star,&v_2^\star,&\ldots,&v_n^\star\end{bmatrix}.
$$
We will come back to this $\dagger$ operation on matrices in Section \@ref(operators).


## Geometry {#geometry}

The inner product brings geometry: the **length**, or **norm**, of $\ket{v}$ is given by $\|v\|=\sqrt{\braket{v}{v}}$, and we say that $\ket{u}$ and $\ket{v}$ are **orthogonal** if $\braket{u}{v}=0$.
Any maximal set of pairwise orthogonal vectors of unit length^[That is, consider sets of vectors $\ket{e_i}$ such that $\braket{e_i}{e_j}=\delta_{ij}$ (where the **Kronecker delta** $\delta_{ij}$ is $0$ if $i\neq j$, and $1$ if $i=j$.), and then pick any of the largest such sets (which must exist, since we assume our vector spaces to be finite dimensional).] forms an orthonormal basis $\{\ket{e_1},\ldots,\ket{e_n}\}$, and so any vector can be expressed as a linear combination of the basis vectors:
$$
  \ket{v}
  = \sum_i v_i\ket{e_i}
$$
where $v_i=\braket{e_i}{v}$.
Then the bras $\bra{e_i}$ form the **dual basis**
$$
  \begin{gathered}
    \bra{v}
    =\sum_i v_i^\star\bra{e_i}
  \\\text{where $v_i^\star=\braket{v}{e_i}$}.
  \end{gathered}
$$

To make the notation a bit less cumbersome, we will sometimes label the basis kets as $\ket{i}$ rather than $\ket{e_i}$, and write
$$
  \begin{aligned}
    \ket{v}
    &= \sum_i \ket{i}\braket{i}{v}
  \\\bra{v}
    &= \sum_j \braket{v}{i}\bra{i}
  \end{aligned}
$$
but *do not confuse $\ket{0}$ with the zero vector*!
We *never* write the zero vector as $\ket{0}$, but only ever as $0$, without any bra or ket decorations (so e.g. $\ket{v}+0=\ket{v}$).

Now that we have some notion of geometry, we can explain a bit more about this idea of associating a Hilbert space to a quantum system --- we will use some terminology that we have not yet introduced, but all will be explained in due time.

::: {.idea latex=""}
To any *isolated* quantum system, which can be prepared in $n$ **perfectly distinguishable** states, we can associate a Hilbert space $\mathcal{H}$ of dimension $n$ such that each vector $\ket{v}\in\mathcal{H}$ of unit length $\braket{v}{v}=1$ represents a quantum state of the system.
The overall phase of the vector has no physical significance: $\ket{v}$ and $e^{i\varphi}\ket{v}$ (for any real $\varphi$) both describe the same state.
:::

We note here one more fact that also won't yet make sense, but which won't hurt to have hidden away in the back of your mind.

::: {.idea latex=""}
The inner product $\braket{u}{v}$ is the **probability amplitude** that a quantum system prepared in state $\ket{v}$ will be found in state $\ket{u}$ upon measurement.
This means that states corresponding to orthogonal vectors (i.e. $\braket{u}{v}=0$) are perfectly distinguishable: if we prepare the system in state $\ket{v}$, then it will never be found in state $\ket{u}$, and vice versa.
:::


## Operators {#operators}

A **linear map** between two vector spaces $\mathcal{H}$ and $\mathcal{K}$ is a function $A\colon\mathcal{H}\to\mathcal{K}$ that respects linear combinations:
$$
  A(c_1\ket{v_1}+c_2\ket{v_2})=c_1 A\ket{v_1}+c_2 A\ket{v_2}
$$
for any vectors $\ket{v_1},\ket{v_2}$ and any complex numbers $c_1,c_2$.
We will focus mostly on **endomorphisms**, that is, maps from $\mathcal{H}$ to $\mathcal{H}$, and we will call them **operators**.
The symbol $\id$ is reserved for the identity operator that maps every element of $\mathcal{H}$ to itself (i.e. $\id\ket{v}=\ket{v}$ for all $\ket{v}\in\mathcal{H}$).
The product $BA$ of two operators $A$ and $B$ is the operator obtained by first applying $A$ to some ket $\ket{v}$ and then $B$ to the ket which results from applying $A$:
$$
  (BA)\ket{v} = B(A\ket{v}).
$$
The order _does_ matter: in general, $BA\neq AB$.
In the exceptional case in which $AB=BA$, one says that these two operators **commute**.
The inverse of $A$, written as $A^{-1}$, is the operator that satisfies $AA^{-1}=\id=A^{-1}A$.
For finite-dimensional spaces, one only needs to check _one_ of these two conditions, since any one of the two implies the other, whereas, on an infinite-dimensional space, _both_ must be checked.
Finally, given a particular basis, an operator $A$ is uniquely determined by the entries of its matrix: $A_{ij}=\bra{i}A\ket{j}$.

The **adjoint**, or **Hermitian conjugate**, of an linear map $A$, denoted by $A^\dagger$, is defined by the relation
$$
  \begin{gathered}
    \bra{i}A^\dagger\ket{j}
    = \bra{j}A\ket{i}^\star
  \\\text{for all $\ket{i},\ket{j}\in\mathcal{H}$}
  \end{gathered}
$$
and $\dagger$ turns $(n\times m)$ matrices into $(m\times n)$ matrices.

An operator $A$ is said to be

- **normal** if $AA^\dagger = A^\dagger A$
- **unitary** if $A^\dagger=A^{-1}$
- **Hermitian** (or **self-adjoint**) if $A^\dagger = A$.

In particular then, being unitary implies being normal, since if $A^\dagger=A^{-1}$ then $AA^\dagger=A^\dagger A$, since both of these are equal to $\id$.
Note also that unitary and Hermitian operators must indeed be *operators*, i.e. they are represented by a *square* matrix.

Any *physically admissible* evolution of an isolated quantum system is represented by a unitary operator.^[This is an *axiom*, justified by experimental evidence, and also by some sort of mathematical intuition. So, in this book, we take this as a *fact* that we do not question. It is, however, very interesting to question it: *why should we assume this to be true?*]
Note that unitary operators preserve the inner product: given a unitary operator $U$ and two kets $\ket{a}$ and $\ket{b}$, and defining $\ket{a'}=U\ket{a}$ and $\ket{b'}=U\ket{b}$, we have that
$$
  \begin{gathered}
    \bra{a'}=\bra{a}U^\dagger
  \\\bra{b'}=\bra{b}U^\dagger
  \\\braket{a'}{b'}=\bra{a}U^\dagger U\ket{b}=\bra{a}\id\ket{b}=\braket{a}{b}.
  \end{gathered}
$$
Preserving the inner product implies preserving the norm induced by this product, i.e. unit state vectors are mapped to unit state vectors, i.e. _unitary operations are the isometries of the Euclidean norm_.

::: {.technical title="Dagger compact categories" latex="{Dagger compact categories}"}
This whole package of stuff and properties and structure (i.e. finite dimensional Hilbert spaces with linear maps and the dagger) bundles up into an abstract framework called a [**dagger compact category**](https://en.wikipedia.org/wiki/Dagger_compact_category).
We will not delve into the vast world of category theory in this book, and to reach an understanding of all the ingredients that go into the one single definition of dagger compact categories would take more than a single chapter.
But it's a good idea to be aware that there are researchers in quantum information science who work *entirely* from this approach, known as [**categorical quantum mechanics**](https://en.wikipedia.org/wiki/Categorical_quantum_mechanics).

One particular method within this approach is the use of [**string diagrams**](https://en.wikipedia.org/wiki/String_diagram), which allow for the use of so-called diagrammatic reasoning, with the [**ZX-calculus**](https://en.wikipedia.org/wiki/ZX-calculus) being a particularly successful example.
For an introduction to string diagrams of this flavour, it's maybe a good idea to start with understanding how they can express the linear algebra that you already know.
For example, Pawel Sobocinski's ["Graphical Linear Algebra"](https://graphicallinearalgebra.net) aims to teach linear algebra entirely through the introduction of string diagrams.
:::


## Eigenvalues and eigenvectors

Given an operator $A$, an **eigenvector** is a non-zero vector $\ket{v}$ such that
$$
  A\ket{v} = \lambda\ket{v}
$$
for some $\lambda\in\mathbb{C}$ (which is called the corresponding **eigenvalue**).
We call the pair $(\lambda,\ket{v})$ an **eigenpair**, and we call the set of eigenvalues the **spectrum** of $A$, denoted by $\sigma(A)$.
It is a surprising (but incredibly useful) fact that every operator has at least one eigenpair.^[You can prove this for an $(n\times n)$ matrix $A$ by considering the set $\{\ket{v},A\ket{v},A^2\ket{v},\ldots,A^n\ket{v}\}$ of vectors in $\mathbb{C}^n$. Since this has $n+1$ elements, it must be linearly *dependent*, and so (after some lengthy algebra) we can construct an eigenpair.]
Geometrically, an eigenvector of an operator $A$ is a vector upon which $A$ simply acts by "stretching".
Note that eigenvectors can be scaled by an arbitrary non-zero length: if $A\ket{v}=\lambda\ket{v}$ then
$$
  \begin{aligned}
    A(\mu\ket{v})
    &= \mu(A\ket{v})
  \\&= \mu\lambda\ket{v}
  \\&= \lambda(\mu\ket{v})
  \end{aligned}
$$
for any $\mu\neq0$.
Because of this, *we usually assume all eigenvectors to be of length $1$*.

Rewriting the defining property of an eigenpair $(\lambda,\ket{v})$, we see that
$$
  (A-\lambda\id)\ket{v} = 0
$$
which tells us that the operator $A-\lambda\id$ has a non-zero kernel, and is thus non-invertible.
This gives a useful characterisation of the spectrum in terms of a determinant:
$$
  \sigma(A) = \{\lambda\in\mathbb{C} \mid \det(A-\lambda\id)=0\}.
$$

The spectrum $\sigma(A)$ allows us to recover information about the operator $A$.
For example, the trace of $A$ is equal to the sum of all its eigenvalues, and the determinant of $A$ is equal to the product of all its eigenvalues.
We can show this easily for normal operators using the fact^[**Exercise.** Prove this! *Hint: start by showing that, if $(\lambda,\ket{v})$ is an eigenpair of $A$, then $(\lambda^\star,\ket{v})$ is an eigenpair of $A^\dagger$.*] that their eigenvectors are orthogonal: they satisfy $\braket{v}{w}=0$ for $v\neq w$.
Because eigenvectors can always be scaled, this means that we can assume the eigenvectors of a normal operator to be *orthonormal*.
If we write the eigenpairs as $(\lambda_i,\ket{v_i})$, we can define
$$
  U
  = \sum_i \ket{i}\bra{v_i}
$$
for an orthonormal basis $\{\ket{1},\ldots,\ket{n}\}$, which is an orthogonal matrix (since the eigenvectors are also assumed to be orthonormal).
Then we see that
$$
  \begin{aligned}
    UAU^\dagger
    &= \sum_{i,j} \ket{i}\bra{v_i} A \ket{v_j}\bra{j}
  \\&= \sum_{i,j} \ket{i}\bra{v_i} \lambda_j\ket{v_j}\bra{j}
  \\&= \sum_{i,j} \lambda_j\ket{i}(\braket{v_i}{v_j})\bra{j}
  \\&= \sum_i \lambda_i \ket{i}\bra{i}
  \end{aligned}
$$
(where we again use this hypothesis that the eigenvectors of $A$ are all orthonormal) which is the diagonal matrix $D$ consisting of the eigenvalues $\lambda_i$ of $A$ along its diagonal.
Then, since $UAU^\dagger=D$, we can equally write $A=U^\dagger DU$, which gives us
$$
  A
  = \sum_i \lambda_i \proj{v_i}.
$$
We call each $\lambda_i\proj{v_i}$ the **eigenspace projector**, since a **projector** is defined to be any operator $P$ that satisfies $P=P^\dagger$ and $P^2=P$.
Note that projectors can only have eigenvalues equal to $0$ or $1$, since if $\ket{v}$ is an eigenvector of $P$ then, using the fact that $P^2=P$,
$$
  \begin{aligned}
    0
    &= (P^2-P)\ket{v}
  \\&= (\lambda^2-\lambda)\ket{v}
  \\\implies \lambda(\lambda-1)=0
  \end{aligned}
$$
and so $\lambda$ must be equal to either $0$ or $1$.

Finally we can now return to the relationship between eigenvalues and the trace and determinant, using this fact that any normal operator $A$ gives a unitary operator $U$ such that $A=U^\dagger DU$ for the diagonal matrix $D$ of eigenvalues of $A$.
Indeed,
$$
  \begin{aligned}
    \tr(A)
    &= \tr(U^\dagger DU)
  \\&= \tr(DUU^\dagger)
  \\&= \tr(D)
  \\&= \sum_i \lambda_i
  \end{aligned}
$$
which proves that the trace is equal to the sum of the eigenvalues, and
$$
  \begin{aligned}
    \det(A)
    &= \det(U^\dagger DU)
  \\&= \det(U^\dagger)\det(D)\det(U)
  \\&= \det(D)\det(U^\dagger)\det(U)
  \\&= \det(D)\det(U^\dagger U)
  \\&= \det(D)
  \\&= \prod_i \lambda_i
  \end{aligned}
$$
which proves that the determinant is equal to the product of the eigenvalues.

The eigenspace projectors give us the **spectral decomposition** of $A$, which is where we write
$$
  A
  = \sum_i \lambda_i\proj{v_i}.
$$

::: {.technical title="Extending functions to matrices" latex="{Extending functions to matrices}"}
The spectral decomposition of a normal operator gives an effective way of calculating the action of a function on a matrix.
If $f\colon\mathbb{C}\to\mathbb{C}$ is a function, then we can define
$$
  f(A)
  = \sum_i f(\lambda_i)\proj{v_i}.
$$
For example, if $f(x)=x^2$, then, by this definition, $f(A)=\sum_i \lambda_i^2\proj{v_i}$.
But this is consistent with the definition of $A^2$ that you expect:
$$
  \begin{aligned}
    A^2
    &= \left(\sum_i \lambda_i\proj{v_i}\right)\left(\sum_i \lambda_i\proj{v_i}\right)
  \\&= \sum_{i,j} \lambda_i\lambda_j \proj{v_i}\proj{v_j}
  \\&= \sum_i \lambda_i^2 \proj{v_i}
  \end{aligned}
$$
using the fact that the eigenvectors $\ket{v_i}$ are orthonormal and that projectors $P=\proj{v_i}$ satisfy $P^2=P$.
:::


## Outer products {#outer-products}

Apart from the inner product $\braket{u}{v}$, which is a complex number, we can also form the **outer product** $\ket{u}\bra{v}$, which is a linear map (operator) on $\mathcal{H}$ (or on $\mathcal{H}^\star$, depending how you look at it).
This is what physicists like (and what mathematicians dislike!) about Dirac notation: a certain degree of healthy ambiguity.

- The result of $\ket{u}\bra{v}$ acting on a ket $\ket{x}$ is $\ket{u}\braket{v}{x}$, i.e. the vector $\ket{u}$ multiplied by the complex number $\braket{v}{x}$.
- Similarly, the result of $\ket{u}\bra{v}$ acting on a bra $\bra{y}$ is $\braket{y}{u}\bra{v}$, i.e. the linear functional $\bra{v}$ multiplied by the complex number $\braket{y}{u}$.

The product of two maps, $A=\ket{a}\bra{b}$ followed by $B=\ket{c}\bra{d}$, is a linear map $BA$, which can be written in Dirac notation as
$$
  BA = \ket{c}\braket{d}{a}\bra{b} = \braket{d}{a}\ket{c}\bra{b}
$$
i.e. the inner product (complex number) $\braket{d}{a}$ times the outer product (linear map) $\ket{c}\bra{b}$.

Any operator on $\mathcal{H}$ can be expressed as a sum of outer products. Given an orthonormal basis $\{\ket{e_i}\}_{i=1,\ldots,n}$, any operator which maps the basis vectors $\ket{e_i}$ to vectors $\ket{f_i}$ can be written as $\sum_{i=1}^n\ket{f_i}\bra{e_i}$.
If the vectors $\{\ket{f_i}\}$ *also* form an orthonormal basis then the operator simply "rotates" one orthonormal basis into another.
These are unitary operators which preserve the inner product.
In particular, if each $\ket{e_i}$ is mapped to $\ket{e_i}$, then we obtain the identity operator:
$$
  \sum_i\ket{e_i}\bra{e_i}=\id.
$$ 
This relation holds for _any_ orthonormal basis, and it is one of the most ubiquitous and useful formulas in quantum theory, known as **completeness**.^[Not to be confused with "completeness" in the sense of Hilbert spaces.]
For example, for any vector $\ket{v}$ and for any orthonormal basis $\{\ket{e_i}\}$, we have
$$
  \begin{aligned}
    \ket{v}
    &= \id\ket{v}
  \\&= \sum_i \ket{e_i}\bra{e_i}\;\ket{v}
  \\&= \sum_i \ket{e_i}\;\braket{e_i}{v}
  \\&= \sum_i v_i\ket{e_i}
  \end{aligned}
$$
where $v_i=\braket{e_i}{v}$ are the components of $\ket{v}$.

Finally, note that calculating the adjoint of an outer product boils down to just swapping the order:
$$
  (\ket{a}\bra{b})^\dagger = \ket{b}\bra{a}.
$$

## The trace

The **trace** is an operation which turns outer products into inner products,
$$
  \tr\colon \ket{b}\bra{a} \longmapsto \braket{a}{b}.
$$
We have just seen that any linear operator can be written as a sum of outer products, and so we can extend the definition of trace (by linearity) to any operator.
Equivalently, for any square matrix $A$, the trace of $A$ can be defined to be the sum of its diagonal elements:
$$
  \tr A = \sum_k \bra{e_k}A\ket{e_k} = \sum_k A_{kk}.
$$
In fact, the trace of $A$ is equal to the sum of the eigenvalues of $A$, even in the case where $A$ is not diagonalisable.

You can show, using this definition or otherwise, that the trace is cyclic^[Note that "cyclic" does not mean the same thing as "permutation invariant"! It is *not* true in general that $\tr(ABC)=\tr(CBA)$, but only that $\tr(ABC)=\tr(BCA)=\tr(CAB)$, i.e. we can only *cyclically* permute the operators.] ($\tr(AB) = \tr(BA)$) and linear ($\tr(\alpha A+\beta B) = \alpha\tr(A)+\beta\tr(B)$, where $A$ and $B$ are square matrices and $\alpha$ and $\beta$ complex numbers).

To recover the first definition from the second, we argue as follows:
$$
  \begin{aligned}
    \tr\ket{b}\bra{a}
    &= \sum_k \braket{e_k}{b}\braket{a}{e_k}
  \\&= \sum_k \braket{a}{e_k}\braket{e_k}{b}
  \\&= \bra{a}\id\ket{b}
  \\&= \braket{a}{b}.
  \end{aligned}
$$
Here, the second term can be viewed both as the sum of the diagonal elements of $\ket{b}\bra{a}$ in the $\ket{e_k}$ basis, and as the sum of the products of two complex numbers $\braket{e_k}{b}$ and $\braket{a}{e_k}$.
We have used the decomposition of the identity, $\sum_k\ket{e_k}\bra{e_k}=\id$.
Given that we can decompose the identity by choosing any orthonormal basis, it is clear that the trace does _not_ depend on the choice of the basis.


## Some useful identities {#some-useful-identities}

Here is a summary of some particularly useful equalities concerning bras, kets, inner products, outer products, traces, and operators, that we will be using time and time again.
In all of these, $\ket{a},\ket{b}\in\mathcal{H}$ are kets, $A,B,C$ are operators on $\mathcal{H}$, and $\alpha,\beta\in\mathbb{C}$ are scalars.

::: {.idea latex=""}
Dagger for bras and kets:

- $\ket{a}^\dagger = \bra{a}$
- $\bra{a}^\dagger = \ket{a}$
- $(\ket{a}\bra{b})^\dagger = \ket{b}\bra{a}$
- $(\alpha\ket{a}+\beta\ket{b})^\dagger = \alpha^\star\bra{a}+\beta^\star\bra{b}$
:::

::: {.idea latex=""}
Dagger for operators:

- $(AB)^\dagger = B^\dagger A^\dagger$
- $(A^\dagger)^\dagger = A$
- $(\alpha A+\beta B)^\dagger = \alpha^\star A^\dagger+\beta^\star B^\dagger$
:::

::: {.idea latex=""}
Trace:

- $\tr(\alpha A+\beta B) = \alpha \tr(A)+\beta\tr(B)$
- $\tr(ABC) = \tr(CAB) = \tr(BCA)$
- $\tr\ket{a}\bra{b} = \braket{b}{a}$
- $\tr (A\ket{a}\bra{b}) = \braket{b}{A|a} = \tr(\ket{a}\bra{b}A)$
:::


## Probabilities

In a sense, the basics of quantum theory boil down to the combination of two bits of mathematics: linear algebra over the complex numbers, and probability theory.
We have just gone over all the linear algebra that we will need, so now let's tackle the other topic (though we will immediately revisit it in Chapter \@ref(quantum-interference)).

Probability theory is a vast and beautiful subject which has undergone many transformations over the centuries.
What started as something understood in terms of gambling odds later evolved into the theory of measure spaces, and is now even able to be expressed in terms of diagrammatic category theory.
But for our purposes, we only need the very elementary parts of the subject, so we will stick with the first interpretation: probability tells us the odds of something happening.^[What we actually mean by "the odds of something happening" and how we should really interpret probabilities "in the real world" is a profound philosophical problem that we shall completely pass over.]

The setup is always the same: we have some process (rolling some dice, flipping a coin, drawing a card, etc.) that has some possible **outcomes** (getting a 5, heads, or an ace of hearts, etc.) but is realised in some way which means that we cannot be certain which outcome we will see whenever we run the process (or "perform the experiment").

The first thing to define in any such scenario is the **sample space**, usually denoted by $\Omega$, which is the set of all possible outcomes.
Next we have the **event space** $\mathcal{F}$, which is the set of all **events**, where an event is a set of outcomes (this might sound confusing at first, but we'll give some examples shortly).
Whenever we run the process, we will get some outcome, and we say that any event that contains that outcome **occurred**.
Finally, we will have a **probability function**, which assigns to any event a **probability**, which is a number between $0$ and $1$.
This probability function has to satisfy some axioms, but we'll come to these later; for now, let us give some examples.
Here is a table of the sample spaces for some processes.

| **Process** | **Sample space $\Omega$** |
| :- | :- |
| Rolling a six-sided die | $\{1,2,3,4,5,6\}$ |
| Flipping a coin | $\{H,T\}$ |
| Flipping two *distinct* coins | $\{HH,TH,HT,TT\}$ |
| Flipping two *identical* coins | $\{HH,TH,TT\}$ |

And here's a table of some (but not all, except for in the case of flipping a single coin) of the events corresponding to these sample spaces.

| **Process** | **Example events $A\in\mathcal{F}$** | **Interpretation** |
| :- | :- | :- |
| Rolling a six-sided die | $\{1\}$ | rolling a 1 |
| | $\{1,3,5\}$ | rolling an odd number |
| | $\{1,2,3\}$ | rolling a number less than or equal to 3 |
| | $\{2,3\}$ | rolling a prime number less than 4 |
| Flipping a coin | $\{H\}$ | getting heads |
| | $\{T\}$ | getting tails |
| | $\{H,T\}$ | any outcome at all |
| Flipping two *distinct* coins | $\{HH\}$ | getting two heads |
| | $\{HH,TH,HT\}$ | getting at least one heads |
| | $\{HH,TT\}$ | getting two the same |
| Flipping two *identical* coins | $\{HH\}$ | getting two heads |
| | $\{HH,TH\}$ | getting at least one heads |

In the table above we can see that, for example, if we rolled a 1 on a six-sided die then many events occurred: we rolled a 1, but we also rolled an odd number, and a number less than 3; but we did not roll a prime number.

Something else that arises in these examples the notion of **distinguishable outcomes**, when we look at how the sample space of flipping two coins depends on whether or not they are identical.
That is, if we have a gold coin and a silver coin then it makes sense to say that $HT$ is different from $TH$, because the first means that the gold coin was the one that landed on heads, and the second means that it was instead the silver coin.
But if we have two identical coins, then how can we distinguish between $HT$ and $TH$?
We would have to be able to point to one of them, say, the one on the left, and say "that's the coin that's on the left", but if we can do this then by definition we can distinguish between them!^[Another possibility would be to distinguish the coins *in time* instead of space, i.e. to flip one coin first and then the other afterwards. A coin cannot remember what happened the last time it was flipped, so is there really a difference between flipping a single coin twice or two coins once? In the eyes of probability theory, the answer is "no".]
In general, the sample space consists only of distinguishable outcomes, but what counts as distinguishable really depends on the specifics of the experiment that we have set up.

::: {.technical title="Measure theory" latex="{Measure theory}"}
This approach towards probability, where we think of a "scenario" as such a triple $(\Omega,\mathcal{F},P)$, places us firmly within the setting of [**measure theory**](https://en.wikipedia.org/wiki/Measure_(mathematics)).
What we have described is a **probability measure**, and there are many more general types of measure that exist, but they all describe the same sort of idea: we have a set $\Omega$ that we think of as our "space", some particularly well-behaved collection $\mathcal{F}$ of subsets of $\Omega$, and a function $\mu\colon\mathcal{F}\to\mathbb{R}\sqcup\{\pm\infty\}$ known as the **measure**.
We think of the measure $\mu$ as telling us how big any element of $\mathcal{F}$ is, be it interpreted as geometric size or, in the example of probability measures, the likelihood.
This formalism becomes very useful when we want to do any analysis (which happens as soon as we want to deal with infinite-dimensional spaces, or even just "introduce some geometry"), such as in constructing the [**Lebesgue integral**](https://en.wikipedia.org/wiki/Lebesgue_integration).
Thinking of probability spaces through measure theory allows us to make use of the many powerful tools of real analysis.
:::

Now we can define probability rather succinctly.

::: {.idea latex=""}
In a **fair process**, where all outcomes are equally likely, the **probability** $P(A)$ of an event $A$ is the number of desired outcomes divided by the number of total outcomes, i.e.
$$
  P(A)
  = \frac{\text{\# of elements of }A}{\text{\# of elements of }\Omega}.
$$
In particular, the probability will always be a number between $0$ and $1$.
:::

Running through some of the above examples of events, we see that this definition of probability agrees with what we might already expect.

| **Event** | **Probability** |
| :- | :- |
| Getting heads on a single coin flip | $1/2$ |
| Rolling a 6 with a single die | $1/6$ |
| Rolling an odd number with a single die | $3/6=1/2$ |

::: {.technical title="Pascal's triangle" latex="{Pascal's triangle}"}
Flipping a fair coin (or actually, even an unfair one) is a common scenario in discussing probability, because it has just two outcomes --- the smallest amount you can have without things becoming purely deterministic.
There are lots of numbers that you will see turn up time and time again in calculations of probability for binary outcome events, and most usually they are [**binomial coefficients**](https://en.wikipedia.org/wiki/Binomial_coefficient).
These are numbers that can be read directly from the rows of [**Pascal's triangle**](https://en.wikipedia.org/wiki/Pascal%27s_triangle) (which, as is often the case in mathematics, is more deserving of being named after a different person: [Al-Karaji](https://en.wikipedia.org/wiki/Al-Karaji), or maybe [Omar Khayyam](https://en.wikipedia.org/wiki/Omar_Khayyam)), and they satisfy many interesting combinatorial patterns.
:::

Now let's look at what happens when we're interested in more than one event occurring.
We might study the possibility of *either* event $A$ *or* event $B$ happening, where the "or" here can be **exclusive** (we want exactly one of them to happen, but *not* both) or **inclusive** (we want *at least* one of them to happen), or we could study the possibility of *both* event $A$ *and* event $B$ happening.
It turns out that these two notions, which seem somehow opposite, are delicately related.

First of all, let's consider *both* events $A$ and $B$ occurring.
What does this mean?
Well, by our definition of "occurring", it means that the outcome of the process is an element of $A$ and also of $B$, which is equivalent to saying that it is an element of their intersection $A\cap B$.
In other words,
$$
  P(A\text{ and }B)
  = P(A\cap B).
$$
This lets us define two very important terms.

::: {.idea latex=""}
We say that $A$ and $B$ are **mutually exclusive** if $P(A\cap B)=0$, and **independent** if $P(A\cap B)=P(A)P(B)$.

In words, $A$ and $B$ are mutually exclusive if one of them occurring precludes the other from occurring, i.e. if *at most* one of them can occur, and they are independent if one of them occurring has no effect on the other occurring.
:::

Usually, when we talk of mutually exclusive events we are referring to a single run of an experiment, and for independent events we are referring to multiple runs.
For example, "rolling an even number" and "rolling an odd number" are mutually exclusive events when rolling a single die once, but independent events when rolling a single die twice.^[**Exercise.** Are the events "rolling an even number" and "rolling an odd number" still independent when we think of rolling two die simultaneously?]
Basically, we should be careful when talking about events and make sure to be precise as to what our sample space is, and how the event is actually realised as a subset of this.

We can think of mutually exclusive and independent as extreme ends of a scale: on one side we have events that affect each other so strongly that if one occurs then we know with absolute certainty that the other one did not; on the other we have events that have absolutely no effect on each other whatsoever.
One might wonder about what the opposite of mutually exclusive might be, and there are two ideas that seem like they might be interesting: events $A$ and $B$ such that $P(A\cap B)=1$, and events $A$ and $B$ such that, if one occurs, then the other also always occurs.
The first of these two putative definitions is not so interesting, because if $P(A\cap B)=1$ then both $A$ and $B$ always occur, and so, by the definition of $P(A)=P(B)=1$, this means that $A=B=\Omega$ is just the event that "anything happens".
The second is a bit less trivial, but also not so deep: saying that $A$ occurs if and only if $B$ occurs is equivalent to saying that $A=B$ as events.

Now let's think about *either* event $A$ or event $B$ occurring, in the inclusive-or sense of the word (we will return to the exclusive one afterwards).
This is equivalent to the event given by the union $A\cup B$ occurring.
In other words,
$$
  P(A\text{ or }B)
  = P(A\cup B).
$$

The relationship between $P(A\cup B)$ and $P(A\cap B)$ is given by the following principle.

::: {.idea latex=""}
**Inclusion--exclusion principle.** $P(A\cup B)=P(A)+P(B)-P(A\cap B)$.
:::

We will not prove this, but it's a fun exercise to think about why this must be true.^[Drawing a Venn diagram might help.]
In fact, the [general inclusion--exclusion principle](https://en.wikipedia.org/wiki/Inclusion–exclusion_principle) describes what happens for an arbitrary finite number of events.
Using this, we can see why mutually independent events are particularly nice: if $A$ and $B$ are mutually exclusive, then $P(A\cup B)=P(A)+P(B)$, i.e. the probability of $(A\text{ or }B)$ is the sum of the probability of $A$ and the probability of $B$.

In the same way that mutually exclusive events are special in the eyes of the inclusion--exclusion principle, independent events are special in the eyes of conditional probability.
Oftentimes we consider events that are not independent, such as drawing cards from a deck without replacing them afterwards: if I take the ace of hearts, then the probability of me drawing a heart the next time has gone down from $13/52$ to $12/51$, since there is now one fewer heart in the deck.
Even worse, the probability of me drawing the ace of hearts again is now $0$.
Given two events $A$ and $B$, we define the **conditional probability $P(B|A)$**, of **$B$ given $A$**, to be the probability that $B$ occurs *assuming that $A$ has just previously occurred*.
Thinking back to the definition of probability, we can calculate this by taking the number of outcomes in $A\cap B$ (our desired outcomes, the outcomes in $B$ that also are outcomes in $A$) and dividing it by the number of outcomes in $A$ (all possible outcomes, since we are assuming that $A$ has happened), so that
$$
  P(B|A)
  = \frac{P(A\cap B)}{P(A)}.
$$

Conditional probabilities are the source of many misunderstandings.
For example, it's intuitively obvious that the probability of flipping a coin 100 times and getting heads every single time is very small.
So say we've flipped a coin 99 times and managed to get a heads every single time, are we now more likely to flip a tail, because the chance of getting 100 heads in a row must be small?
Well we don't even need mathematics to answer this: the coin has no way of remembering what has happened on the previous flips!
In other words, also the probability $P(H^{100})=P(H^{99}\text{ and }H)$ is very small, the (conditional) probability $P(H|H^{99})$ is still exactly $1/2$.
Looking at the definition of conditional probability, this makes sense: $P(H^{99})$ is itself very small, almost as small as $P(H^{100})$, so it's not surprising that when we divide one by the other we get a number that's not so far away from $1$.

Now we can see how independent events are special: if $P(A\cap B)=P(A)P(B)$ then
$$
  \begin{aligned}
    P(B|A)
    &= \frac{P(A\cap B)}{P(A)}
  \\&= \frac{P(A)P(B)}{P(A)}
  \\&= P(B)
  \end{aligned}
$$
and so the probability of $B$ given $A$ is exactly the same as the probability of $B$ without knowing anything about the outcome of $A$ (and similarly for $P(A|B)=P(A)$).

Finally, we mention what might be called the fundamental theorem of conditional probability.

::: {.idea latex=""}
**Bayes' theorem.**
Let $A$ and $B$ be events with $P(B)\neq0$.
Then
$$
  P(A|B)
  = \frac{P(B|A)P(A)}{P(B)}.
$$
:::

You should now be able to answer the following questions:

1. When you roll a normal six-sided die, what is the set of distinguishable outcomes?
2. What is the probability of getting a 5?
3. What is the probability of getting a number (strictly) less than 3?

Now imagine that you have two six-sided dice.

4. If you roll both dice at the same time, what is the probability of them both landing on a 6?
5. What is the probability of getting two numbers that add up to 6?

Finally, we give our two six-sided dice to a friend for them to roll in secret.

6. If they tell us that they rolled two numbers that added up to 6, what is the probability that they rolled a 1?

::: {.technical title="Diagrammatic probability theory" latex="{Diagrammatic probability theory}"}
We mentioned in Section \@ref(operators) that a lot of the structure inherent in our formalism of quantum theory can be encapsulated by the notion of a dagger compact category, and can thus be investigated with a diagrammatic approach.
It turns out that parts of probability theory --- specifically [**Markov processes**](https://en.wikipedia.org/wiki/Markov_chain), which describe scenarios where different events can happen with varying probabilities, but where nothing depends on the history of the scenario, only on the here-and-now --- are also amenable to such an approach.
This leads to the definition of a **Markov category**.
:::