Skip to content
Konstantin Ibadullaev edited this page Oct 11, 2024 · 16 revisions

Construction of Trajectory Matrix

One starts the SSA with the construction of trajectory matrix $X$, which columns represent lagged vectors, i.e each columns is a shifted version of the original time series. The main parameters here are $N$-the length of the original time series, $L$ - window size and $2 \leq L \leq N/2$ and $L \in \mathbb{N} $. Each subseries $[f_i, f_{i+1},...,f_i, f_{i+L-1}]$ for $i=0,1,..N-L$ forms a column vector of the trajectory matrix $X$. Thus, one obtains a trajectory matrix $X$ of a size $L\times N-L+1$(because we start indexing with zero). The difference $K= N-L+1$, hence, denotes the number of column vectors.

$$X = \begin{bmatrix} f_0 & f_1 &f_2 & f_3 &\dots & f_{N-L}\\\\ f_1 & f_2 &f_3 & f_4 &\dots & f_{N-L+1}\\\\ f_2 & f_3 &f_4 & f_5 &\dots & f_{N-L+2}\\\\ \vdots & \vdots &\vdots & \vdots &\vdots & \vdots\\\\ f_{L-1}& f_{L} &f_{L+1} & f_{L+2} &\dots & f_{N-1} \end{bmatrix}$$

It is clear that the elements of the anti-diagonals are equal. This a Hankel matrix.

Decomposition of the Trajectory Matrix X

The second step is to decompose $X$. There exist several options, here we focus on the SVD decomposition and the Eigendecomposition.

  1. SVD Decomposition

This option is used primarly when the time series is not too long, because SVD is computationally intensive algorithm. It becomes critical especially for the MSSA, where the trajectory matrix becomes extremly high-dimensional, because of stacking of trajectory matrices for each time series. The formula for SVD in SSA is defined as follows:

$$X=U \Sigma V^T$$

where:

  • $U$ is an $L \times L$ unitary matrix containing the orthonormal set of left singular vectors of $X$ as columns
  • $\Sigma$ is an $L \times K$ rectangular diagonal matrix containing singular values of $X$ in the descending order
  • $V^T$ is an $K \times K$ unitary matrix containing the orthonormal set of right singular vectors of $X$ as columns.

The SVD of the trajectory matrix can be also formulated as follows:

$$X = \sum^{d-1}_{i=0}\sigma_iU_iV_i^T = \sum^{d-1}_{i=0}{X_i}$$

where:

  • ${\sigma_i,U_i,V_i}$ it the $i^{th}$ eigentriple of the SVD

  • $\sigma_i$ is the $i^{th}$ singular value, is a scaling factor that determines the relative importance of the eigentriple

  • $U_i$ is a vector representing the $i^{th}$ column of $U$, which spans the column space of $X$

  • $V_i$ is a vector representing the $i^{th}$ column of $V$, which spans the row space of $X$

  • $d$, such that $d \leq L$, is a rank of the trajectory matrix $X$. It can be regarded as the instrinsic dimensionality of the time series' trajectory space

  • $X_i=\sigma_iU_iV_i^T$ is the $i^{th}$ elementary matrix

  1. Randomized SVD

    This techinque is used for the approximation of the classic SVD and primarly used for SSA with large $N$ and in case of MSSA where $X$ is high-dimensional because of stacking of trajectory matrices of each considered time series. The main purpose of this technique is to reduce the dimensionality of $U$ and $V$.

The computation of the $i^{th}$ elementary matrix $X_i$ is the same as for the SVD decomposition procedure, and $X = \sum^{d-1}_{i=0}{X_i}$ holds as well.

Eigentriple Grouping and Elementary Matrix Separation

On this step we perform an eigentriple grouping to decompose time series into several components. We make use of one or several techniques listed below:

  1. Visual Inspection of Elementary Matrices

    Matrices without patterns are referred as the trend component, while seasonal and noise components are associated with repeating patterns. The most frequent alterations of the pattern are indicators of the noise component.

  2. Eigenvalue Contribution and Shape of Eigenvectors $U$

    This combines considering the shape of eigenvectors $U$ and relative contribution of eigenvalues. The most important $\sigma_i$-s are associated with the ternd component. The related eignevectors demonstrate absence of periodicity and clear trend, while seasonal and noice components have a great deal lesser contribution and periodicity.

  3. Weighted Correlation between Components

    This approach is frequently used for the automatic eigentriple grouping. It is obviuous, that the most correlated components $X_i$ have more in common than decorrelated or negatively correlated ones.

Time Series Reconstruction

For the reconstruction of time series from the elemenatary matrices $X_i$, one needs to perform diagonal averaging. The resulting time series $\tilde{F}_i$ for a chosen $X_i$ is computed as averages of the corresponding anti-diagonals of the elementary matrix $X_i$.

For this purpose one makes use of the Hankelisation linear operator, $\hat{H}$. Application of $\hat{H}$ results in the following Hankel matrix $\widetilde{X_i}$. Thus, we get:

$$\widetilde{X_i} = \hat{H} X_i$$

The entry $\widetilde{x}_{m, n}$ of the hankelised matrix $\widetilde{X_i}$ is defined by the following formula. For the sake of the readability we denote $s = m+n$:

$$\widetilde{x}_{m, n}= \begin{cases} \frac{1}{s+1}\sum^{s}_{l=0}x_{l,s-l}, & 0\leq s\leq \text{L}- 1\\\ \frac{1}{L-1}\sum^{L-1}_{l=0}x_{l,s-l}, & \text{L} \leq s\leq \text{K} - 1\\\ \frac{1}{K+L-s-1}\sum^{L}_{l=s-K+1}x_{l,s-l}, & \text{K}\leq s\leq \text{K+L}- 2\\\ \end{cases}$$

Since $\hat{H}$ is the linear operator, the following holds for a trajectory matrix $X$.

$$X = \hat{H} \sum^{d-1}_{i=0} X_i = \sum^{d-1}_{i=0}\hat{H} X_i =\sum^{d-1}_{i=0}\bar{X_i}$$

Since the trajectory matrix $X$ is already a Hankel matrix, it can be expressed via the sum of its elementary matrices. Therefore, it holds

$$X = \sum^{d-1}_{i=0}\bar{X_i}$$

Subsequently, the sum of the reconstructed time series components $\vec{\tilde{F}_i}$ must be equal to the original time series $F$. That is, it holds:

$$\vec{F} = \sum^{d-1}_{i=0}\vec{\tilde{F_i}}$$

Remeber, that in Python the indexing starts with 0, this why we set $d - 1$. For any other programing language due to its indexing(e.g. R) it is $d$.

Forecasting

There several options for the forecasting future values in SSA. In py-ssa-lib the K-Forecasting and L-Forecasting are implemented

L-Forecasting

This forecasting method relies on the signal subspace formed by ${U}^r_{j=0}$ vectors obtained via SVD for a given time series, where $r\leq d-1$.

To begin with, we denote by $\vec{y}$ the vector consisting of the last $L-1$ values of the reconstructed time series:

$$\vec{y} = (f_{N-L+1},....f_{N-1})$$

Then we need to subset a signal space composed from the $\underline{U_j}$. Each eigenvector $\underline{U_j}$ contains the first $L-1$ coordinates, while the coordinates $L$ form another vector $\vec{\pi}$ and are denoted as $\pi_j$. It is important that the sum of squares of the last coordinates( denoted by $\nu$) is strictly less than one. That is,

$$\nu=\sum^r_j\pi_j^2<1$$

Subsequently, the definitions above lead us to the cornerstone of the L-Forecasting, Linear Recurrence Relations coefficients, denoted by $R_L \in \mathbb{R}^{L-1}$.

$$R_L=\frac{1}{1 -\nu^2 }\sum^r_{j=0}\pi_j\underline{U_j}$$

Finally, one can make a one-step forecast using the formula below:

$$y_{N+1} = \vec{y} R_L$$

On top of that, one can also predict iteratively $M$ future values. In order to forecast values for $M$ steps ahead, one should slide a window along $\vec{y}$, including recently predicted values $N+1...M$ into the vector $\vec{y}$ and leaving out the old values. This technique is implemented in py-ssa-lib as well.

K-Forecasting

The key difference of this forecasting method compared to L-Forecasting is that it makes use of the eigenvectors ${V}^r_{j=0}$. Let $\underline{V_j} \in R^{K-1}$ be the vectors consisting of all coordinates of the $V_{j}$ excluding the last $K$ coordinates. Then we introduce the vector of the last $K-1$ values of the reconstructed signals:

$$\vec{Z} = (f_{N-K+1},...,f_{N-1})^T$$

Note, however, that in the case of the MSSA $Z$ is a matrix that comprises $K-1$ values for $s$ time series.

Also, denote the chosen signal space as a matrix

$$Q=(\underline{V_0}, \underline{V_1}, ..., \underline{V_r})$$

and collect the last coordinates of the given signal space in

$$\vec{W} = (\pi_0,\pi_1,...,\pi_r )$$

In a multidimensional case(MSSA) it is necessary to assure prior to forecasting that $(I_{ss} - WW^T)^{-1}$ is invertible and $r\leq(K-1)$, where $W$ is a matrix of the size $s \times r$ and $I_{ss}$ is an $s \times s$ identity matrix with $s$ denoting the number of time series in $Z$. However, for the SSA it simplifies to $(1 - WW^T)^{-1}$. If these conditions are satisfied then the K-Forecasting formula holds,

$$Z_{N+1} = (1 - WW^T)^{-1}WQ^TZ$$

The forecasting values for $M$ steps ahead is very similar to the ones in the L-Forecasting, one should slide a window along $\vec{Z}$, including recently predicted values $N+1...M$ into the vector $\vec{Z}$ and leaving out the old values. This technique is implemented in py-ssa-lib as well.