401-4623-00: Time Series Analysis
Section 7
Forcasting and Inovation Algorithm
Swiss Federal Institute of Technology Zurich
Eidgenössische Technische Hochschule Zürich
Last Edit Date: 12/22/2024
Disclaimer and Term of Use:
We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file
This personal note is adapted from Professor Fadoua Balabdaoui and Professor Nicolai Meinshausen. Please contact us to delete this file if you think your rights have been violated.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Forcasting¶
Let $X_1, \cdots, X_n$ be observations from a stationary time series with mean $\mu_x(t) = \mu \in \mathbb{R}$ and ACVF $\gamma_x$. We want to find the best linear prediction of $X_{n+h}$, $h \ge 1$ based on $X_1, \cdots, X_n$.
Let us denote this prediction by $P_n X_{n+h}$. Then
$$P_n X_{n+h} = \hat{a}_0 + \hat{a}_1 X_n + \cdots + \hat{a}_n X_1$$
where $\hat{a}_0, \cdots, \hat{a}_n$ are such that
$$ \mathbb{E} \left[ (X_{n+h} - P_n X_{n+h})^2 \right] = \min_{(a_0 \cdots a_n)^T \in \mathbb{R}^{n+1}} \mathbb{E} [\underbrace{(X_{n+h} - a_0 - a_1 X_n - \cdots - a_n X_1)^2}_{S(a_0, \cdots, a_n)}] $$
To find $\hat{a}_j, j=0, \cdots, n$, we need to solve the equations
$$\frac{\partial S}{\partial a_j} \Big|_{(a_0, \cdots, a_n) = (\hat{a}_0, \cdots, \hat{a}_n)} =0, 0 \le j \le n.$$
We obtain $(n+1)$ equations
$$ \begin{cases} \mathbb{E}(X_{n+h} - \hat{a}_n - \hat{a}_1 X_n - \cdots - \hat{a}_n X_1) = 0 \\ \mathbb{E}[X_{n+1-j} (X_{n+h} - \hat{a}_0 - \hat{a}_1 X_n - \cdots - \hat{a}_n X_1)] = 0 \end{cases} $$
We can have
$$\hat{a}_0 = \mu \left( 1 - \sum_{i=1}^n \hat{a}_i \right)$$
and
$$ \begin{align} \mathbb{E}[X_{n+1-j}X_{n+k}] &= \hat{a}_0 \mu + \sum{i=1}^n \hat{a}_i \mathbb{E}[X_{n+1-j}X_{n+1-i}]\\ &= \mu^2 \left( 1 - \sum_{i=1}^n \hat{a}_i \right) + \sum_{i=1}^n \hat{a}_i \mathbb{E}[X_{n+1-j}X_{n+1-i}]\\ &\Leftrightarrow \mathbb{E}[X_{n+1-j}X_{n+h}] - \mu^2 = \sum_{i=1}^{n} \hat{a}_i \left( \mathbb{E}[X_{n+1-j}X_{n+1-i}] - \mu^2 \right)\\ &\Leftrightarrow \mathrm{Cov}(X_{n+1-j}, X_{n+h}) = \sum_{i=1}^{n}\hat{a}_i \mathrm{Cov}(X_{n+1-j}, X_{n+1-i})\\ &\Leftrightarrow \gamma_x(h+j-1) = \sum_{i=1}^n \hat{a}_i \gamma_x(i - j) \end{align} $$
In summary
$$\hat{a}_0 = \mu \left( 1 - \sum_{i=1}^n \hat{a}_i \right)$$
$$\sum_{i=1}^n \hat{a}_i \gamma_x (i-j) = \gamma_x(h+j-1), j = 1, \cdots, n.$$
Let $\Gamma_n = (\gamma_x(i - j))_{1 \le i, j \le n}$, $\hat{a}_n (\hat{a}_1 \cdots \hat{a}_n)^T$, and $\gamma_n(h) = (\hat{\gamma}_x(h) \cdots \hat{\gamma}_x(h+n-1))^T$. If $\Gamma_n$ is invertible, then
$$ \begin{cases} \hat{a}_n = \Gamma_n^{-1} \gamma_n(h) \\ \hat{a}_0 = \mu(1 - \sum_{i=1}^n \hat{a}_i) \end{cases} $$
Then
$$ \begin{align} P_n X_{n+h} &= \mu \left( 1 - \sum_{i=1}^n \hat{a}_i \right) + \hat{a}_1 X_n + \cdots + \hat{a}_n X_1\\ &= \mu + \sum_{j=1}^n \hat{a}_j (X_{n+1-j} - \mu) \end{align} $$
Note that $\mathbb{E}[P_n X_{n+h}] = \mu$.
The mean square prediciton error is given by
$$ \begin{align} \mathbb{E}[(X_{n+h}-P_n X_{n+h})^2] &= \mathbb{E}[(X_{n+h} - P_n X_{n+h})] - \mathbb{E} [\underbrace{(X_{n+h} - P_n X_{n+h})}_{0}P_n X_{n+h}]\\ &= \mathbb{E}[\left(X_{n+h} - \mu - \sum_{j=1}^{n} \hat{a}_j (X_{n+1-j} - \mu) X_{n+h} \right)]\\ &= \underbrace{\mathbb{E}[(X_{n+h} - \mu) X_{n+h}]}_{\mathrm{Var}(X_{n+h})} - \sum_{j=1}^{n}\hat{a}_j \underbrace{\mathbb{E}[(X_{n+1-j} - \mu)X_{n+h}]}_{\mathrm{Cov(X_{n+1-j, X_{n+h}})}}\\ &= \gamma_x(0) - \sum_{j = 1}^n \hat{a}_j \gamma_x(h+j-1)\\ &= \gamma_x(0) - \hat{a}_n^T \gamma_n(h) \end{align} $$
Remark:
The prediciton $\_n X_{n+h}$ is almost surely unique as the orthogonal projection of $X_{n+h}$ on the vertical space generated by $1, X_1, \cdots, X_n$.
The prediction of $X_1$ is $\hat{a}_0 = \mu$.
Suppose $\mu = 0$. Then $P_n X_{n+h}$ is almost surely uniquely determined by the $n$ orthogonality conditions
$$\mathbb{E}[X_{n+1-j} (X_{n+h} - P_n X_{n+h})] = 0, j = 1, \cdots, n.$$
Hence, to determine $P_n X_{n+h}$, it is enough to check that these equations are satisfied by a "good guess".
Example¶
Suppose $\{X_t\}_t \sim AR(p)$ that is causal
$$X_t - \phi_1 X_{t-1} - \cdots - \phi_p X_{t-p} = Z_t$$
with $\{Z_t\}_t \sim WN(0, \sigma^2)$ and $\phi(z) = 1 - \phi_1 z - \cdots - \phi_p z^p$ admits no zero in the unit disk.
We want to make the best prediction for $X_{n+1}$. We will now show that $P_n X_{n+1} = \phi_1 X_n + \cdots + \phi_p X_{n+1-p}$, $\forall n \ge p$ - the "good guess". Let $j \in \{1, \cdots, n\}$, then
$$\mathbb{E}[X_{n+1-j}(X_{n+1} - \phi_1 X_n - \cdots - \phi_p X_{n+1-p})] = \mathbb{E}[X_{n+1-j} Z_{n+1}] = 0$$
by causality ($n+1-j < n+1, \forall j \in \{1, \cdots, n\}$).
The innovation algorithm¶
This algorithm, among many others, provides a recursive approach to compute $P_n X_{n+1}$: the best linear 1-step prediction based on the data $X_1, \cdots, X_n$.
In the following, we assume that $\{X_t\}_t$ is a 0-mean time series with covariance function $K$: $K(i, j) = \mathrm{Cov}(X_i, X_j)$ for $i, j \in \mathbb{Z}$. If $\{X_t\}_t$ is stationary then we have $K(i, j) = \gamma_x(i - j)$ with $\gamma_x$ as the ACVF of $\{X_t\}_t$.
Let us write
$$ \hat{X}_{n+1} = \begin{cases} 0 & n=0 \\ P_n X_{n+1} & n \ge 1 \end{cases} $$
where $P_n X_{n+1}$ is the so called best linear prediction of $X_{n+1}$ based on $X_1, \cdots, X_n$.
Also, define $\nu_n = \mathbb{E}[(X_{n+1} - \hat{X}_{n+1})^2]$, $n = 0, 1, \cdots$. The mean square prediction error (associated with predicting $X_{n+1}$ based on $X_1, \cdots, X_n$).
Finally, let $U_n = X_n - \hat{X}_n$ be the innovation associated with predicting $X_n$.
Proposition:
Let $n \ge 0$, then
$$ \hat{X_{n+1}} = \begin{cases} 0 & n = 0\\ \sum_{j=1}^{n} \theta_{n,j} (\underbrace{X_{n+1-j} - \hat{X}_{n+1-j}}_{U_{n+1-j}}) & n \ge 1 \end{cases} $$
with $\theta_{n,1}, \cdots, \theta_{n,n} \in \mathbb{R}$.
Moreover, $\theta_{n,1}, \cdots, \theta_{n,n}$ can be determined in the following recursive way
$$ \text{The innovation algorithm} \begin{cases} \nu_0 = K(1, 1) \\ \theta_{n, n-k} = \frac{1}{\nu_k} \left( K(n+1, k+1) - \sum_{j=0}^{k-1} \theta_{k,k-j} \theta_{n,n-j} \nu_j \right)\\ \nu_n = K(n+1, n+1) - \sum_{j=0}^{n-1}\theta_{n,n-j}^2 \nu_j \end{cases} $$
withe the convention $\sum_{j=0}^{k-1} \theta_{k, k-j} \theta_{n, n-j} \nu_j = 0$ if $k = 0$.
Example¶
$n = 1$, predicting $X_2$ based on $X_2$
$$\theta_{1, 1} = \frac{1}{\nu_0}(K(2, 1) - 0) = \frac{K(2, 1)}{\nu_0} = \frac{K(2,1)}{K(1,1)}$$
$$\nu_1 = K(2,2) - \theta_{1,1}^2 \nu_0 = K(2,2) - \frac{K(2,1)^2}{K(1,1)}$$
$n = 2$, predicting $X_3$ based on $X_1$ and $X_2$
$$\theta_{2,2} = \frac{K(3,1)}{\nu_0} = \frac{K(3,1)}{K(1,1)}$$
$$\theta_{2,1} = \frac{1}{\nu_1}(K(3,2) - \theta_{1,1} \theta_{2,2} \nu_0)$$
$$\nu_2 = K(3, 3) - \theta_{2,2}^2 \nu_0 - \theta_{2,1}^2 \nu_1$$
$n = 3$, predicting $X_4$ based on $X_1$, $X_2$, and $X_3$
$$\theta_{3,3} = \frac{K(4,1)}{\nu_0} = \frac{K(4,1)}{K(1,1)}$$
$$\theta_{3,2} = \frac{1}{\nu_1} (K(4,2) - \theta_{1,1} \theta_{3,3} \nu_0)$$
$$\theta_{3,1} = \frac{1}{\nu_2} (K(4,3) - \theta_{2,2} \theta_{3,3} \nu_0 - \theta_{2,1} \theta_{3,2} \nu_1)$$
$$\nu_3 = K(4, 4) - \theta_{3,3}^2 \nu_0 - \theta_{3,2}^2 \nu_1 - \theta_{3,1}^2 \nu_2$$