Skip to content

Commit

Permalink
Refactor notation page a little.
Browse files Browse the repository at this point in the history
  • Loading branch information
matthew-brett committed Sep 14, 2023
1 parent 9da53ac commit 7a82bac
Showing 1 changed file with 51 additions and 26 deletions.
77 changes: 51 additions & 26 deletions regression_notation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ jupyter:
extension: .Rmd
format_name: rmarkdown
format_version: '1.2'
jupytext_version: 1.10.3
jupytext_version: 1.15.0
kernelspec:
display_name: Python 3
display_name: Python 3 (ipykernel)
language: python
name: python3
---
Expand Down Expand Up @@ -161,61 +161,86 @@ plt.legend()

## Mathematical notation


```{python tags=c("hide-input")}
# Utilities to format LaTeX output.
# You don't need to understand this code - it is to format
# the mathematics in the notebook.
import pandas as pd
from IPython.display import display, Math
def format_vec(x, name, break_every=5):
vals = [f'{v}, ' for v in x[:-1]] + [f'{x[-1]}']
indent = rf'\hspace{{1.5em}}'
for pi, pos in enumerate(range(break_every, n, break_every)):
vals.insert(pos + pi, r'\\' + f'\n{indent}')
return rf'\vec{{{name}}} = [%s]' % ''.join(vals)
```

Our next step is to write out this model more generally and more formally in
mathematical symbols, so we can think about *any* vector (sequence) of x values
$\xvec$, and any sequence (vector) of matching y values $\yvec$. But to start
with, let's think about the actual set of values we have for $\xvec$. We could
write the actual values in mathematical notation as:

$$
\xvec = [ 0.389, 0.2 , 0.241, 0.463, \\
4.585, 1.097, 1.642, 4.972, \\
7.957, 5.585, 5.527, 6.964 ]
$$
```{python tags=c("hide-input")}
display(Math(format_vec(x, 'x', 6)))
```

This means that $\xvec$ is a sequence of these specific 12 values. But we
could write $\xvec$ in a more general way, to be *any* 12 values, like this:

$$
\xvec = [ x_1, x_2, x_3, x_4, x_5, x_6,\\
x_7, x_8, x_9, x_{10}, x_{11}, x_{12} ]
$$
```{python tags=c("hide-input")}
indices = np.arange(1, n + 1)
x_is = [f'x_{{{i}}}' for i in indices]
display(Math(format_vec(x_is, 'x', 6)))
```

This means that $\xvec$ consists of 12 numbers, $x_1, x_2 ..., x_{12}$, where
$x_1$ can be any number, $x_2$ can be any number, and so on.

$x_1$ is the value for the first student, $x_2$ is the value for the second student, and so on.
$x_1$ is the value for the first student, $x_2$ is the value for the second student, etc.

In our *particular case*:
Here's another way of looking at the relationship of the values in our
particular case, and their notation:

$$
x_1 = 0.389 \\
x_2 = 0.2 \\
... \\
x_{12} = 6.964
$$
```{python tags=c("hide-input")}
df_index = pd.Index(indices, name='1-based index')
pd.DataFrame(
{r'$\vec{x}$ values': x,
f'$x$ value notation':
[f'${v}$' for v in x_is]
},
index=df_index
)
```

We can make $\xvec$ be even more general, by writing it like this:

$$
\xvec = [ x_1, x_2, ..., x_{n} ]
\xvec = [x_1, x_2, ..., x_{n} ]
$$

This means that $\xvec$ is a sequence of any $n$ numbers, where $n$ can be any
whole number, such as 1, 2, 3 ... In our specific case, $n = 12$.

Similarly, for our `psychopathy` ($\yvec$) values, we can write:

$$
\yvec = [ 11.416, 4.514, 12.204, 14.835, \\
8.416, 6.563, 17.343, 13.02, \\
15.19 , 11.902, 22.721, 22.324 ]
$$
```{python tags=c("hide-input")}
display(Math(format_vec(y, 'y', 6)))
```

```{python tags=c("hide-input")}
pd.DataFrame(
{r'$\vec{y}$ values': y,
f'$y$ value notation': [f'$y_{{{i}}}$' for i in indices]
}, index=df_index)
```

More generally we can write $\yvec$ as a vector of any $n$ numbers:

$$
\yvec = [ y_1, y_2, ..., y_{n} ]
\yvec = [y_1, y_2, ..., y_{n} ]
$$


Expand Down

0 comments on commit 7a82bac

Please sign in to comment.