149 Theoretical Concepts

149.1 Stationary Processes

Assume \(X_t\) is observed for \(t = 1, 2, \ldots, T\).

For time-series analysis it is important to distinguish two stationarity concepts:

Strict stationarity: for any \(m \in \mathbb{N}\) and any time points \(t_1,\ldots,t_m\), the joint distribution of \((X_{t_1},\ldots,X_{t_m})\) is identical to that of \((X_{t_1+h},\ldots,X_{t_m+h})\) for every shift \(h\).
Weak (second-order) stationarity: \(\text{E}(X_t)=\mu\) is constant, \(\text{Var}(X_t)=\sigma^2<\infty\) is constant, and \(\text{Cov}(X_t,X_{t-k})=\gamma_k\) depends only on lag \(k\) (not on calendar time \(t\)).

In this handbook, Box-Jenkins modeling is based on weak stationarity after transformation and differencing.

Normality is not part of the definition of stationarity. Gaussianity can be used as an additional modeling assumption in some likelihood-based inference settings, but stationarity and normality are different concepts.

149.2 White Noise

Two white-noise notions are commonly used:

Weak white noise: \(\text{E}(X_t)=0\), \(\text{Var}(X_t)=\sigma^2<\infty\), and \(\text{Cov}(X_t,X_{t-k})=0\) for all \(k \neq 0\).
Strong (iid) white noise: observations are independent and identically distributed with mean 0 and constant variance.

Independence implies zero covariance, but zero covariance does not imply independence.

In practice, we want model residuals to behave as white noise. If this is the case, the residuals do not contain systematic information that could still improve forecasts.

149.3 Autocorrelation

We define the autocovariance as

\[ \gamma_k = \text{E}\left[(X_t - \mu)(X_{t-k} - \mu)\right] \]

and the autocorrelation as

\[ \rho_k = \frac{\gamma_k}{\gamma_0} \]

In practice, we only observe a sample, so \(\mu\) is unknown. We therefore use the sample mean \(\bar{X}\) to define sample autocorrelations:

\[ r_k = \frac{\sum_{t=k+1}^{T}\left[(X_t - \bar{X})(X_{t-k} - \bar{X})\right]}{\sum_{t=1}^{T}(X_t - \bar{X})^2} \]

where \(\bar{X} = \frac{1}{T} \sum_{t=1}^{T} X_t\) and \(k \in \mathbb{N}\).

The autocovariance and autocorrelation matrices can be written as

\[ \Gamma_T = \begin{bmatrix} \gamma_0 & \gamma_1 & \gamma_2 & \cdots & \gamma_{T-1} \\ \gamma_1 & \gamma_0 & \gamma_1 & \cdots & \gamma_{T-2} \\ \gamma_2 & \gamma_1 & \gamma_0 & \cdots & \gamma_{T-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \gamma_{T-1} & \gamma_{T-2} & \gamma_{T-3} & \cdots & \gamma_0 \end{bmatrix} = \sigma_X^2 \begin{bmatrix} 1 & \rho_1 & \rho_2 & \cdots & \rho_{T-1} \\ \rho_1 & 1 & \rho_1 & \cdots & \rho_{T-2} \\ \rho_2 & \rho_1 & 1 & \cdots & \rho_{T-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \rho_{T-1} & \rho_{T-2} & \rho_{T-3} & \cdots & 1 \end{bmatrix} = \sigma_X^2 P_T \]

This matrix is positive semidefinite (positive definite under non-degeneracy conditions), because any linear combination

\[ S_t = \sum_{i=1}^{T} w_i X_{t-i+1}, \quad w_i \in \mathbb{R} \]

has variance

\[ \text{Var}(S_t) = \sum_{i=1}^{T} \sum_{j=1}^{T} w_i w_j \gamma_{|j-i|} \ge 0 \]

For \(T = 3\), positive definiteness implies

\[ \left|\begin{matrix}1 & \rho_1 \\\rho_1 & 1\end{matrix}\right| > 0,\quad \left|\begin{matrix}1 & \rho_2 \\\rho_2 & 1\end{matrix}\right| > 0,\quad \left|\begin{matrix}1 & \rho_1 & \rho_2 \\\rho_1 & 1 & \rho_1 \\\rho_2 & \rho_1 & 1\end{matrix}\right| > 0 \]

\[ \begin{cases} -1 < \rho_1 < 1 \\ -1 < \rho_2 < 1 \\ -1 < \frac{\rho_2 - \rho_1^2}{1 - \rho_1^2} < 1 \end{cases} \]

The Bartlett formulas (Bartlett 1946) for the variance of sample autocorrelation coefficients can be written as

\[ \text{V}(r_k) \simeq \frac{1}{T}\sum_{i=-\infty}^{+\infty}\left(\rho_i^2 + \rho_{i+k}\rho_{i-k} - 4\rho_k\rho_i\rho_{i-k} + 2\rho_i^2\rho_k^2\right) \]

which can be reduced to

\[ \text{V}(r_k) \simeq \frac{1}{T}\left(\frac{(1+\kappa^2)(1-\kappa^{2k})}{1-\kappa^2} - 2k\kappa^{2k}\right) \]

provided autocorrelations decay exponentially:

\[ \forall \kappa: -1 < \kappa < 1, \quad \rho_k = \kappa^{|k|} \]

If autocorrelations are zero for \(i > q\) (\(q \in \mathbb{N}\)), then

\[ \forall k > q: \text{V}(r_k) \simeq \frac{1}{T}\left(1 + 2\sum_{i=1}^{q}\rho_i^2\right) \]

which is the so-called “large-lag” variance. Many software packages assume white noise and use the approximation

\[ \sqrt{\text{V}(r_k)} \simeq \frac{1}{\sqrt{T}} \]

Bartlett also derived covariances between sample autocorrelations:

\[ \text{cov}(r_k, r_{k+s}) \simeq \frac{1}{T}\sum_{i=-\infty}^{+\infty}\rho_i\rho_{i+s} \]

This implies that inter-correlations between sample autocorrelations can distort the visual pattern of the ACF.

The Partial Autocorrelation Function (PACF) removes these indirect effects.

The PACF coefficients for a time series \(X_t\) are defined as the last coefficient of an autoregression of order \(k\):

\[ x_t = \phi_{k1}x_{t-1} + \phi_{k2}x_{t-2} + \cdots + \phi_{k(k-1)}x_{t-k+1} + \phi_{kk}x_{t-k} + a_t \]

with \(x_t = X_t - \mu\).

A relationship between ACF and PACF follows from

\[ \forall k \in \mathbb{N}: \quad x_{t-k}x_t = \sum_{i=1}^{k}\phi_{ki}x_{t-k}x_{t-i} + x_{t-k}a_t \]

and therefore (after taking expectations and dividing by variance)

\[ \forall k \in \mathbb{N}: \quad \rho_k = \sum_{i=1}^{k}\phi_{ki}\rho_{k-i} \]

which is the Yule-Walker system (Yule 1927; Walker 1931)

\[ \begin{bmatrix} 1 & \rho_1 & \rho_2 & \cdots & \rho_{k-1} \\ \rho_1 & 1 & \rho_1 & \cdots & \rho_{k-2} \\ \rho_2 & \rho_1 & 1 & \cdots & \rho_{k-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \rho_{k-1} & \rho_{k-2} & \rho_{k-3} & \cdots & 1 \end{bmatrix} \begin{bmatrix} \phi_{k1} \\ \phi_{k2} \\ \vdots \\ \phi_{kk} \end{bmatrix} = \begin{bmatrix} \rho_1 \\ \rho_2 \\ \vdots \\ \rho_k \end{bmatrix} \]

or simply

\[ P_k \phi_k = \rho_k \]

A practical numerical algorithm for PACF estimation is due to Durbin (Durbin 1960):

\[ \begin{aligned} \hat{\phi}_{11} &= r_1 \\ \hat{\phi}_{ll} &= \frac{r_l - \sum_{j=1}^{l-1} \hat{\phi}_{l-1,j} r_{l-j}}{1 - \sum_{j=1}^{l-1} \hat{\phi}_{l-1,j} r_j}, \quad l=2,3,\ldots,K \end{aligned} \]

with

\[ \hat{\phi}_{lj} = \hat{\phi}_{l-1,j} - \hat{\phi}_{ll}\hat{\phi}_{l-1,l-j}, \quad j = 1,2,\ldots,l-1 \]

The standard deviation of a partial autocorrelation coefficient for \(k > p\) (where \(p\) is the order of the autoregressive data-generating process) is approximately

\[ \hat{\sigma}(\hat{\phi}_{kk}) \simeq \frac{1}{\sqrt{T}} \]