-
Notifications
You must be signed in to change notification settings - Fork 0
/
linalg.Rmd
369 lines (251 loc) · 14.6 KB
/
linalg.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
---
title: "Linear Algebra"
description: |
Notes on Linear Algebra.
author:
- name: Alex Stephenson
output:
distill::distill_article:
toc: true
toc_depth: 3
hightlight: haddock
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This page provides notes on linear algebra with a bias towards results that are particularly useful to doing statistics and machine learning. Most of the notation comes from Greene (2017) and Boyd and Vandenberghe (2018).
## Matrices {#top}
### Matrix Definition
First we need some terminology. A **matrix** is a rectangular array of numbers.
$$\begin{aligned}
A &= [a_{jk}] \\
A &= [A]_{ik} \\
A &= \begin{pmatrix} a_{11} & a_{12} & ... & a_{1k} \\
... & ... & ... & ... \\
a_{n1} & a_{n2} & ... & a_{nk}
\end{pmatrix}
\end{aligned}$$
We always read any subscripted element of a matrix as "a row, column". For example
$$\begin{aligned}
A &= \begin{pmatrix} 1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{pmatrix}
\end{aligned}$$
Then $A_{11} = 1$.
A matrix is made up of a vectors. A vector is an ordered set of numbers arranged either in a row or column. A column vector can be thought of as a matrix with one column. A row vector can be thought of a matrix with one row.^[Unless otherwise noted, a vector is always a column vector.]
### Square Matrix
A **square matrix** is a matrix where the number of rows is equal to the number of columns.
### Symmetric Matrix
There are a few matrices that are very common in statistics. The first is a **symmetric matrix** which is where $a_{ik} = a_{ki}, \forall i, k$
$$\begin{aligned}
\begin{pmatrix}
1 & 2 & 5 \\
2 & 3 & 6 \\
5 & 6 & 4
\end{pmatrix}
\end{aligned}$$ is symmetric.
### Diagonal Matrix
A **diagonal matrix** is a square matrix where all off-diagonal elements are 0, and the only non-zero elements are on the main diagonal.
$$\begin{aligned}
\begin{pmatrix}
1 & 0 & 0 \\
0 & 3 & 0 \\
0 & 0 & 4
\end{pmatrix}
\end{aligned}$$
### Scalar Matrix
A **scalar matrix** is a square matrix where every diagonal element is the same.
$$\begin{aligned}
\begin{pmatrix}
2 & 0 & 0 \\
0 & 2 & 0 \\
0 & 0 & 2
\end{pmatrix}
\end{aligned}$$
### Triangular
A **triangular matrix** is a matrix that has only zeros either above or below the main diagonal. If the zeros are above the diagonal, it is *lower triangular.* If the zeros are below the diagonal, it is *upper triangular*.
$$\begin{aligned}
\begin{pmatrix}
1 & 0 & 0 \\
2 & 3 & 0 \\
5 & 6 & 4
\end{pmatrix}
\end{aligned}$$
### Identity Matrix
A special kind of scalar, diagonal, symmetric, and triangular matrix is the **idential matrix**. We always denote this as $\textbf{I}$ with a subscript to denote the number of columns (the order).
$$\begin{aligned}
\textbf{I}_3\begin{pmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{pmatrix}
\end{aligned}$$
[Return to Top](#top)
## Matrix Manipulation
Two matrices $\textbf{A}$ and $\textbf{B}$ are equal if and only if they have the same dimensions *and* each element of $\textbf{A}$ is the same as each element in $\textbf{B}$.
For example,
$$\begin{aligned}
\textbf{A} &= \begin{pmatrix}
1 & 2 & 5 \\
2 & 3 & 6 \\
5 & 6 & 4
\end{pmatrix} \\
\textbf{B} &= \begin{pmatrix}
1 & 2 & 5 \\
2 & 3 & 6 \\
5 & 6 & 4
\end{pmatrix}
\end{aligned}$$
are equal to each other.
### Transpose
The transpose of a matrix is denoted $\textbf{A}^T$ or $\textbf{A}^{'}$ is the matrix that we get by rotating the rows and columns of $\textbf{A}$. Formally $\textbf{B} = \textbf{A}^T \iff b_{ik} = a_{ki}$.
An immediate implication of this definition is that any symmetric matrix $\textbf{A}$ is equal to $\textbf{A}^T$ *and* any matrix $\textbf{A} = (\textbf{A}^{T})^T$.
$$\begin{aligned}
\textbf{A} &= \begin{pmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{pmatrix} \\
\textbf{A}^T &= \begin{pmatrix}
1 & 4 & 7 \\
2 & 5 & 8 \\
3 & 6 & 9
\end{pmatrix}
\end{aligned}$$
### Matrix Addition
We can add and subtract any two matrices of equal sizes (**conformable**). $\textbf{C} = \textbf{A} + \textbf{B} = [a_{ik} + b_{ik}]$. A special matrix known as the **zero matrix** is one where all elements are zero and can be any order. It plays the same role as zero does in regular addition.
#### Properties of Matrix Addition
1. Matrix addition is commutative. $\textbf{A} + \textbf{B} = \textbf{B} + \textbf{A}$
The proof is straight forward. Suppose $\textbf{A}_{m \times n}$ and $\textbf{B}_{m \times n}$. Let $\textbf{C}_{m \times n}$ be the result of adding $\textbf{A} + \textbf{B}$. Let $\textbf{D}_{m \times n} = \textbf{B} + \textbf{A}$. Then:
$$\begin{aligned}
c_{ij} &= a_{ij} + b_{ij} \\
d_{ij} &= b_{ij} + a_{ij} \\
c_{ij} &= d_{ij}
\end{aligned}$$
2. Matrix addition is associative. $(\textbf{A} + \textbf{B}) + \textbf{B} = \textbf{A} + (\textbf{B} + \textbf{C})$
The proof is slightly more involved, but only slightly. Assume all three matrices are the same order. Let $\textbf{D} = \textbf{B} + \textbf{C}$, $\textbf{E} = \textbf{A} + \textbf{B}$, $\textbf{F} = \textbf{A} + \textbf{D}$, and $\textbf{G} = \textbf{E} + \textbf{C}$. Then:
$$\begin{aligned}
d_{ij} &= b_{ij} + c_{ij} \\
e_{ij} &= a_{ij} + b_{ij} \\
f_{ij} &= a_{ij} + d_{ij} \rightarrow a_{ij} + b_{ij} + c_{ij}\\
g_{ij} &= e_{ij} + c_{ij} \rightarrow a_{ij} + b_{ij} + c_{ij}
\end{aligned}$$.
We observe that both the new matrices **F** and **G** are the same order and that their elements are the same, completing the proof.
3. Matrix addition with transposes follows $(\textbf{A} +\textbf{B})^T = \textbf{A}^T + \textbf{B}^T$.
The proof is straight forward. The $(i,j)$ entry of $(\textbf{A} +\textbf{B})^T$ is the sum of $(i,j)$ entries of $\textbf{A}^T + \textbf{B}^T$. which are $(j,i)$ entries of **A** and **B**. That means that the $(i,j)$ entry of $\textbf{A}^T + \textbf{B}^T$ is the $(j,i)$ entry of the sum of $\textbf{A} +\textbf{B}$ , which is equal to the $(i,j)$ entry of the transpose $(\textbf{A} +\textbf{B})^T$.
### Inner Product
We multiply matrices and vector by using the inner product (also known as the dot product). The **inner product** of two vectors **a** and **b** is $a^Tb = a_1b_1 + ... + a_nb_n$. Of course provided they are conformable it is also the case that $a^Tb = b^Ta$.
### Matrix Multiplication
For an $n \times k$ matrix **A** and a $k \times m$ matrix **B**, the product matrix $\textbf{C} = \textbf{A}\textbf{B}$ is a $n \times m$ matrix whose $ik^{th}$ element is the inner product of row *i* in **A** and column *k* in **B**.
$$\begin{aligned}
\textbf{C} = \textbf{A}\textbf{B} \rightarrow c_{ik} = a_i^Tb_k
\end{aligned}$$
Matrices can only be multiplied if they are conformable. In simple terms, the number of columns in **A** must be equal to the number of rows in **B**. It is not generally the case that matrix multiplication is commutative. For example consider the two matrices:
$$\begin{aligned}
A &= \begin{pmatrix}
1 & 2 \\
3 & 4
\end{pmatrix} \\
B &= \begin{pmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{pmatrix}
\end{aligned}$$
It is possible to multiply **AB**, but not possible to multiply **BA** because the latter has non-conformable arguments. Even when **A** and **B** have the same dimensions, it is generally the case that they will not be equal. Here's a simple example.
$$\begin{aligned}
AB &= \begin{pmatrix}
1 & 2 \\
3 & 4
\end{pmatrix}
\begin{pmatrix}
5 & 6 \\
7 & 8
\end{pmatrix} = \begin{pmatrix} 19 & 22 \\ 43 &50 \end{pmatrix} \\
BA &= \begin{pmatrix}
5 & 6 \\
7 & 8
\end{pmatrix}
\begin{pmatrix}
1 & 2 \\
4 & 5
\end{pmatrix} = \begin{pmatrix} 23 & 34 \\ 31 & 46 \end{pmatrix}
\end{aligned}$$
Because it is not the case that matrix multiplication is commutative, we have two different ways of multiplying. We can **premultiply** or **postmultiply**. In **AB**, **B** is *premultiplied* by **A** and **A** is *postmultiplied* by **B**.
### Scalar Multiplication
For any matrix (or vector), **scalar multiplication** is the operation of multiplying every element by a given scalar $c\textbf{A} = [ca_{ik}]$.
The product of a matrix and a vector is written $\textbf{c} = \textbf{A}\textbf{b}$. In order to perform this operation, the number of elements in **b** must be equal to the number of columns in **A**. It's the equivalent of multiplying a $n \times k$ matrix by a $k \times 1$ matrix.
#### Properties of Matrix Multiplication
1. Associative Law: **(AB)C = A(BC)**
Proof:
2. Distributive Law: **A(B + C) = AB + AC**
Proof:
3. Transpose of a Product: **(AB)' = B'A'**
Proof:
4. Transpose of an extended product: **(ABC)' = C'B'A'**
Proof:
### Vector Sums
There are some common summations that are useful to see in matrix form.
Let **i** be a column vector of ones. $\textbf{ix} = \sum_{i=1}^nx_i$.
For any constant *c* then $\textbf{x} = c\textbf{i} = \sum_{i=1}^ncx_i = c\sum_{i=1}^nx_i = a\textbf{i}^T\textbf{x}$.
The *sum of squares* of the elements of **x** is $\textbf{x}^T\textbf{x}$.
### Centering Matrix
A fundamental matrix that we use to transform data to deviations from their mean is the *centering matrix*
$$\begin{aligned}
\textbf{i}\bar{x} &= \textbf{i}\frac{1}{n}\textbf{i}^T\textbf{x} \\
\textbf{i}\bar{x} &= \begin{pmatrix} \bar{x} \\ \bar{x} \\\cdot \\ \cdot \\ \bar{x}\end{pmatrix} \\
\textbf{i}\bar{x} &= \frac{1}{n}\textbf{i}\textbf{i}^T\textbf{x}
\end{aligned}$$
The first matrix $\frac{1}{n}ii^T$ is an $n \times n$ matrix with every element equal to $\frac{1}{2}$. This means that the set of values is $\textbf{x} - \frac{1}{2}ii^Tx$.
Since $x = \mathbb{I}x$ this implies that we can write this as $[\mathbb{I} - \frac{1}{n}ii^T]x = \textbf{M}^0\textbf{x}$. $M^0$ is an idempotent matrix, which means that it is equal to its square, and it is symmetric so $(M^0)^TM^0 = M^0$.
The matrix is particularly useful when computing sums of squared deviations. For example, for a single variable **x** the sum of squared deviations about the mean is $\sum_{i=1}^n(x_i - \bar{x})^2 = (\sum_{i=1}^nx_i^2) - n\bar{x}^2$.
The equivalent in matrix notation is:
$$\begin{aligned}
\sum_{i=1}^n(x_i - \bar{x})^2 &= (\textbf{x} - \bar{x}\textbf{i})(\textbf{x} - \bar{x}\textbf{i}) \\
\sum_{i=1}^n(x_i - \bar{x})^2 &= (M^0x)^T(M^0x) \\
\sum_{i=1}^n(x_i - \bar{x})^2 &= x^TM^0M^0x \\
\sum_{i=1}^n(x_i - \bar{x})^2 &= x^TM^0x
\end{aligned}$$
## Matrix Geometry
### Vector Spaces
A **vector space** is any set of vectors that is closed under scalar multiplication and addition.
- Closed under multiplication means that every scalar multiple of a vector is also in the set of vectors.
- Closed under addition means that the sum of any two vectors in the space is also a vector in the space.
A **basis** is a set of vectors in a vector space that are linearly independent and any vector in the vector space can be written as a linear combination of that set of vectors.
For example, consider three arbitrary vectors in $\mathbb{R}^2$, **a**,**b**, **c**. If **a** and **b** are a basis, then we can find some numbers $\alpha_1$ and $\alpha_2$ such that $c = \alpha_1 a + \alpha_2 b$. The solutions to this problem turn out to be:
$$\begin{aligned}
\alpha_1 = \frac{b_2c_1 - b_1c_2}{a_1b_2 - b_1a_2} \\
\alpha_2 = \frac{a_1c_2 - a_2c_1}{a_1b_2 - b_1a_2}
\end{aligned}$$
These solution are unique unless the denominator is zero. In a world where this is true, it implies that **b** is a multiple of **a** and therefore does not point in different directions. Any basis of a vector space is not unqiue, but for any specific basis only one linear combination will produce another particularly vector in the vector space.
### Linear Dependence
A set of $k \geq 2$ vectors is **linearly dependent** if at least one of the vectors in the set can be written as a linear combination of the others.
For example consider the vectors $a = [1,2]^T, b = [3,3]^T, c = [10,14]^T$. As a whole they are linearly dependent $2a + b - .5c = 0$, but any two of them are linearly independent.
A set of vectors is **linearly independent** if and only if the only solution to $\alpha_1 a_1 + \cdot\cdot\cdot + \alpha_k a_k = 0$ is where all the coefficients are zero $\alpha_i = 0, \forall i \in 1,..., k$.
A consequence of this is that another definition of a basis is any set of linearly independent vectors in the vector space. Because any $(K+1)st$ vector can be written as a linear combination of the K basis vectors, this also implies that any set of more than K vectors in $\mathbb{R}^K$ must be linearly dependent.
The **spanning vectors** are the set of all linear combinations of a set of vectors.
### Rank
A matrix is a set of column vectors. The number of columns in the matrix is equivalent to the number of vectors in the set, and the number of rows is the number of coordinates in each column vector.
A **column space** of a matrix is the vector space that is spanned by its column vectors. The **column rank** of a matrix is the dimension of a vector space that is spanned by its column vectors.
It turns out to be the case (and an important theorem) that the column rank and the row rank of a matrix is the same, which also means that the row space and the column space have the same dimension. We say a matrix has full rank when all columns (or rows) are linearly independent of each other.
A useful fact about the rank is that the rank of a matrix A $rank(A) = rank(A^TA) = rank(AA^T)$
### Determinants
The determinant of a square matrix is a function of the elements of a matrix. Importantly, the determinant of a matrix is nonzero and that matrix is therefore non-singular if and only if it has full rank.
Determinants matter in part because a matrix is nonsingular if and only if its inverse exists, and in order for this to be true, the determinant of said matrix cannot be zero.
### Norms
The Euclidean length or **norm** of a vector e is given by the Pythagorean theorem: $\lVert e \rVert = \sqrt{e^Te}$.
#### Orthogonal Vectors
Two nonzero vectors are **orthogonal** written as $a \perp b$ if and only if $a^Tb = b^Ta = 0$
### The Cosine Law
The angle $\theta$ between two vectors satisfies $cos(\theta) = \frac{a^Tb}{\lVert a \rVert \lVert b \rVert}$
## Solving Systems of Linear Equations
### Types of Systems
There are two types of systems of equations. A **homogeneous** system is of the form $Ax = 0$. By definition, we know that a nonzero solution to this system exists if and only if the matrix **A** does not have full rank. A **nonhomogeneous** system of equations is of the form $Ax = b$ where **b** is a nonzero vector.
### Inverse Matrix
To solve nonhomogeneous systems, we need a square matrix **B** such that
$$\begin{aligned}
BAx &= Ix \\
BAx &= x \\
BAx &= Bb
\end{aligned}$$
If such a matrix exists, then it is the **inverse** of **A** and denoted $A^{-1}$. If **A** is nonsingular, then the unique solution to the nonhomogeneous systems equation is $x = A^{-1}b$