151 Identifying ARMA parameters

Any stationary time series can be modelled by AR and MA models as is shown in Wold’s decomposition theorem (Wold 1938). The definitions and properties of (some of) these models are described in the following sections.

151.1 AR(1) Model

The AR(1) process is defined as

\[ \begin{aligned} (1-\phi_1 B) W_t &= e_t \\ W_t - \phi_1 W_{t-1} &= e_t \\ W_t &= \phi_1 W_{t-1} + e_t \end{aligned} \]

where \(W_t\) is a stationary time series, \(e_t\) is a white noise error term, and \(F_t\) is called the forecast or prediction. Now we derive the theoretical pattern of the ACF of an AR(1) process for identification purposes.

First, we note that the above expression may be rewritten as follows

\[ \begin{aligned} (1 - \phi_1 B) W_t &= e_t \\ W_t &= (1 - \phi_1 B)^{-1} e_t \\ W_t &= e_t + \phi_1 e_{t-1} + \phi_1^2 e_{t-2} + \phi_1^3 e_{t-3} + \cdots \end{aligned} \]

We multiply the AR(1) process by \(W_{t-k}\) in expectations form

\[ \begin{aligned} W_t W_{t-k} - \phi_1 W_{t-1} W_{t-k} &= e_t W_{t-k} \\ \text{E}(W_t W_{t-k}) - \phi_1 \text{E}(W_{t-1} W_{t-k}) &= \text{E}(e_t W_{t-k}) \\ \gamma_k - \phi_1 \gamma_{k-1} &= \text{E}(e_t W_{t-k}) \end{aligned} \]

For \(k=0\), the right hand side may be rewritten as

\[ \begin{aligned} \text{E}(e_t W_t) &= \text{E} \left[ e_t ( e_t + \phi_1 e_{t-1} + \phi_1^2 e_{t-2} + \cdots ) \right] \\ \text{E}(e_t W_t) &= \text{E}(e_t^2) = \sigma_{e_t}^2 \end{aligned} \]

and for \(k>0\), the right hand side is

\[ \begin{aligned} \text{E}(e_t W_{t-k}) &= \text{E} \left[ e_t ( e_{t-k} + \phi_1 e_{t-k-1} + \phi_1^2 e_{t-k-2} + \cdots ) \right] \\ \text{E}(e_t W_{t-k}) &= 0 \end{aligned} \]

Hence, the left hand side becomes

\[ \gamma_k - \phi_1 \gamma_{k-1} = \begin{cases} \gamma_0 - \phi_1 \gamma_1 = \sigma_{e_t}^2 \text{ for } k = 0 \\ 0 \text{ for } k > 0 \end{cases} \]

Based on the previous expressions, we can investigate the characteristics of the ACF and PACF:

\[ \begin{aligned} \gamma_0 - \phi_1 \gamma_1 &= \sigma_{e_t}^2 \\ \gamma_0 - \phi_1^2 \gamma_0 &= \sigma_{e_t}^2 \\ \gamma_0 &= \frac{\sigma_{e_t}^2}{1 - \phi_1^2} \end{aligned} \]

and

\[ \begin{aligned} \gamma_k - \phi_1 \gamma_{k-1} &= 0 \\ \gamma_k &= \phi_1 \gamma_{k-1} \\ \frac{\gamma_k}{\gamma_0} &= \phi_1 \frac{\gamma_{k-1}}{\gamma_0} \end{aligned} \]

such that \(\rho_k = \phi_1 \rho_{k-1} = \phi_1^2 \rho_{k-2} = \phi_1^3 \rho_{k-3} = \cdots\). We conclude that \(\rho_k = \phi_1^k \rho_0 = \phi_1^k\).

Figure 151.1: Theoretical ACF/PACF of AR(1) process

Figure 151.1 shows two theoretical patterns that occur in the ACF and PACF when the stationary time series \(W_t\) follows an AR(1) process. Note that the first ACF and PACF coefficients are always equal.

Generally speaking, a linear filter process is stationary if the inverse-filter expansion converges. For AR(1), stationarity requires \(|\phi_1|<1\). Equivalently, the root of \((1-\phi_1 B)=0\) is \(B=1/\phi_1\), which must lie outside the unit circle (absolute value greater than 1).

\[ \psi(B) = (1 - \phi_1 B)^{-1} = \sum_{j=0}^{\infty} \phi_1^j B^j \]

For a general AR(p) model the solutions of

\[ \phi(B) = \prod_{i=1}^{p} (1 - \xi_i B) \]

for which

\[ \forall i \in \{1,2,\ldots,p\}: \left|\xi_i\right| < 1 \]

must be satisfied in order to obtain stationarity.

Equivalent statement (often used in textbooks): if roots are written in the polynomial variable \(z\) for \(\\phi(z)=0\), those roots must lie outside the unit circle. Both formulations describe the same stationarity condition.

151.2 AR(2) Model

The AR(2) process is defined as

\[ \begin{align*}(1-\phi_1 B - \phi_2 B^2) W_t &= e_t \\W_t - \phi_1 W_{t-1} - \phi_2 W_{t-2} &= e_t \\W_t - e_t &= \phi_1 W_{t-1} + \phi_2 W_{t-2}\\F_t &= \phi_1 W_{t-1} + \phi_2 W_{t-2}\end{align*} \]

where \(W_t\) is a stationary time series, \(e_t\) is a white noise error term, and \(F_t\) is the forecasting function.

The process can be written in the form

\[ \begin{align*}(1-\phi_1 B - \phi_2 B^2) W_t &= e_t \\W_t &= (1 + \psi_1 B + \psi_2 B^2 + \psi_3 B^3 + …) e_t \\W_t &= \Psi(B) e_t\end{align*} \]

and therefore

\[ \begin{align*}(1 - \phi_1 B - \phi_2 B^2)^{-1} = \Psi(B) &= (1 + \psi_1 B + \psi_2 B^2 + \psi_3 B^3 + …) \\(1 - \phi_1 B - \phi_2 B^2)(1 + \psi_1 B + \psi_2 B^2 + \psi_3 B^3 + …) &= 1\end{align*} \]

For this to be valid, it follows that

\[ \psi_1 - \phi_1 = 0 \Leftrightarrow \psi_1 = \phi_1 \]

and that

\[ \psi_2 - \phi_1 \psi_1 - \phi_2 = 0 \Leftrightarrow \psi_2 = \phi_1^2 + \phi_2 \]

and that

\[ \psi_3 - \phi_1 \psi_2 - \phi_2 \psi_1 = 0 \Leftrightarrow \psi_3 = \phi_1^3 + 2 \phi_1 \phi_2 \]

and finally that

\[ \forall i \ge 3: \psi_i = \phi_1 \psi_{i-1} + \phi_2 \psi_{i-2} \]

The model is stationary if the \(\psi_i\) weights converge. This is the case when some conditions on \(\phi_1\) and \(\phi_2\) are imposed. These conditions can be found on using the solutions of the polynomial of the AR(2) model. The so-called characteristic equation is used to find these solutions

\[ (1 - \phi_1 B - \phi_2 B^2) = (1 - \xi_1 B)(1 - \xi_2 B) = 0 \]

The solutions of \(\xi_1\) and \(\xi_2\) are

\[ \xi_1, \xi_2 = \frac{\phi_1 \pm \sqrt{\phi_1^2 + 4 \phi_2}}{2} \]

which can be either real or complex. Note that the roots are complex if \(\phi_1^2 + 4 \phi_2 < 0\). When these solutions (in absolute values) are smaller than 1, the AR(2) model is stationary.

It can be shown that these conditions are satisfied if \(\phi_1\) and \(\phi_2\) lie inside of the Stralkowski triangular region which is restricted by

\[ \begin{cases} \phi_2 + \phi_1 < 1 \\ \phi_2 - \phi_1 < 1 \\ -1 < \phi_2 < 1 \\\end{cases} \]

The theoretical ACF and PACF are illustrated below.

Figure 151.2: Theoretical ACF/PACF of AR(2) process with real roots

Figure 151.3: Theoretical ACF/PACF of AR(2) process with complex roots

151.3 MA(1) Model

The MA(1) process is defined as

\[ \begin{align*}W_t &= (1 - \theta_1 B) e_t \\W_t &= e_t - \theta_1 e_{t-1}\end{align*} \]

Note

R’s arima() function uses the MA sign convention \((1 + \\theta_1 B)\), which is the opposite sign of the convention used in this chapter. Therefore, ma = c(-0.8) in R corresponds to \(\\theta_1 = 0.8\) here.

where \(W_t\) is a stationary time series and \(e_t\) is a white noise error term. The current observation depends on the current and previous error term. We now derive the theoretical ACF of the MA(1) process.

We compute the autocovariance at lag \(k\) by multiplying \(W_t\) by \(W_{t-k}\) in expectations form

\[ \begin{align*}\gamma_k &= \text{E}(W_t W_{t-k}) \\&= \text{E}\left[(e_t - \theta_1 e_{t-1})(e_{t-k} - \theta_1 e_{t-k-1})\right]\end{align*} \]

For \(k=0\)

\[ \gamma_0 = \text{E}(e_t^2) + \theta_1^2 \text{E}(e_{t-1}^2) = (1 + \theta_1^2) \sigma_{e_t}^2 \]

For \(k=1\)

\[ \gamma_1 = \text{E}\left[(e_t - \theta_1 e_{t-1})(e_{t-1} - \theta_1 e_{t-2})\right] = -\theta_1 \sigma_{e_t}^2 \]

For \(k > 1\), all expectations vanish because the error terms are uncorrelated: \(\gamma_k = 0\).

Therefore the ACF is

\[ \rho_k = \frac{\gamma_k}{\gamma_0} = \begin{cases} \dfrac{-\theta_1}{1 + \theta_1^2} & \text{for } k = 1 \\ 0 & \text{for } k > 1 \end{cases} \]

The ACF of an MA(1) process cuts off after lag 1, while the PACF shows an exponential decay (tailing off). This pattern is the mirror image of the AR(1) process where the ACF decays and the PACF cuts off.

The MA(1) model is invertible if the root of \((1 - \theta_1 B) = 0\) is larger than 1 in absolute value, which requires \(|\theta_1| < 1\). Invertibility ensures that the MA representation can be rewritten as an infinite AR process and guarantees uniqueness of the model.

par(mfrow = c(2, 2), mar = c(4, 4, 3, 1))
# MA(1) with positive theta
acf_vals <- ARMAacf(ma = c(-0.8), lag.max = 12)
pacf_vals <- ARMAacf(ma = c(-0.8), lag.max = 12, pacf = TRUE)
plot(1:12, acf_vals[-1], type = "h", lwd = 3, col = "steelblue",
     xlab = "Lag", ylab = "ACF", main = expression(paste("ACF of MA(1): ", theta[1], " = 0.8")),
     ylim = c(-1, 1))
abline(h = 0)
plot(1:12, pacf_vals, type = "h", lwd = 3, col = "steelblue",
     xlab = "Lag", ylab = "PACF", main = expression(paste("PACF of MA(1): ", theta[1], " = 0.8")),
     ylim = c(-1, 1))
abline(h = 0)
# MA(1) with negative theta
acf_vals2 <- ARMAacf(ma = c(0.8), lag.max = 12)
pacf_vals2 <- ARMAacf(ma = c(0.8), lag.max = 12, pacf = TRUE)
plot(1:12, acf_vals2[-1], type = "h", lwd = 3, col = "coral",
     xlab = "Lag", ylab = "ACF", main = expression(paste("ACF of MA(1): ", theta[1], " = -0.8")),
     ylim = c(-1, 1))
abline(h = 0)
plot(1:12, pacf_vals2, type = "h", lwd = 3, col = "coral",
     xlab = "Lag", ylab = "PACF", main = expression(paste("PACF of MA(1): ", theta[1], " = -0.8")),
     ylim = c(-1, 1))
abline(h = 0)
par(mfrow = c(1, 1))

Figure 151.4: Theoretical ACF/PACF of MA(1) process

151.4 MA(2) Model

The MA(2) process is defined as

\[ \begin{align*}W_t &= (1 - \theta_1 B - \theta_2 B^2) e_t \\W_t &= e_t - \theta_1 e_{t-1} - \theta_2 e_{t-2}\end{align*} \]

Following the same derivation as for the MA(1) model, the autocovariances are

\[ \begin{align*}\gamma_0 &= (1 + \theta_1^2 + \theta_2^2) \sigma_{e_t}^2 \\\gamma_1 &= (-\theta_1 + \theta_1 \theta_2) \sigma_{e_t}^2 \\\gamma_2 &= -\theta_2 \sigma_{e_t}^2 \\\gamma_k &= 0 \quad \text{for } k > 2\end{align*} \]

The ACF of an MA(2) process therefore cuts off after lag 2, while the PACF shows a decay pattern. The invertibility conditions require that the roots of the characteristic equation \((1 - \theta_1 B - \theta_2 B^2) = 0\) lie outside the unit circle, analogously to the stationarity conditions for the AR(2) model.

par(mfrow = c(2, 2), mar = c(4, 4, 3, 1))
acf_vals <- ARMAacf(ma = c(-0.5, -0.3), lag.max = 12)
pacf_vals <- ARMAacf(ma = c(-0.5, -0.3), lag.max = 12, pacf = TRUE)
plot(1:12, acf_vals[-1], type = "h", lwd = 3, col = "steelblue",
     xlab = "Lag", ylab = "ACF",
     main = expression(paste("ACF of MA(2): ", theta[1], " = 0.5, ", theta[2], " = 0.3")),
     ylim = c(-1, 1))
abline(h = 0)
plot(1:12, pacf_vals, type = "h", lwd = 3, col = "steelblue",
     xlab = "Lag", ylab = "PACF",
     main = expression(paste("PACF of MA(2): ", theta[1], " = 0.5, ", theta[2], " = 0.3")),
     ylim = c(-1, 1))
abline(h = 0)
acf_vals2 <- ARMAacf(ma = c(-1.2, 0.5), lag.max = 12)
pacf_vals2 <- ARMAacf(ma = c(-1.2, 0.5), lag.max = 12, pacf = TRUE)
plot(1:12, acf_vals2[-1], type = "h", lwd = 3, col = "coral",
     xlab = "Lag", ylab = "ACF",
     main = expression(paste("ACF of MA(2): ", theta[1], " = 1.2, ", theta[2], " = -0.5")),
     ylim = c(-1, 1))
abline(h = 0)
plot(1:12, pacf_vals2, type = "h", lwd = 3, col = "coral",
     xlab = "Lag", ylab = "PACF",
     main = expression(paste("PACF of MA(2): ", theta[1], " = 1.2, ", theta[2], " = -0.5")),
     ylim = c(-1, 1))
abline(h = 0)
par(mfrow = c(1, 1))

Figure 151.5: Theoretical ACF/PACF of MA(2) process

151.5 ARMA(1,1) Model

The ARMA(1,1) process combines both an AR(1) and MA(1) component

\[ \begin{align*}(1 - \phi_1 B) W_t &= (1 - \theta_1 B) e_t \\W_t - \phi_1 W_{t-1} &= e_t - \theta_1 e_{t-1}\end{align*} \]

The theoretical ACF of the ARMA(1,1) model has a starting value

\[ \rho_1 = \frac{(1 - \phi_1 \theta_1)(\phi_1 - \theta_1)}{1 + \theta_1^2 - 2 \phi_1 \theta_1} \]

followed by exponential decay: \(\rho_k = \phi_1 \rho_{k-1}\) for \(k > 1\). The PACF also shows a decay pattern. Since both the ACF and PACF decay (neither cuts off cleanly), it is difficult in practice to identify an ARMA model from the ACF/PACF alone. This is precisely why the backward selection approach (Chapter 152) is so useful: rather than trying to identify exact values of \(p\) and \(q\) from theoretical patterns, we start with maximum values and let the estimation procedure eliminate non-significant parameters.

par(mfrow = c(2, 2), mar = c(4, 4, 3, 1))
acf_vals <- ARMAacf(ar = c(0.7), ma = c(-0.4), lag.max = 12)
pacf_vals <- ARMAacf(ar = c(0.7), ma = c(-0.4), lag.max = 12, pacf = TRUE)
plot(1:12, acf_vals[-1], type = "h", lwd = 3, col = "steelblue",
     xlab = "Lag", ylab = "ACF",
     main = expression(paste("ACF of ARMA(1,1): ", phi[1], " = 0.7, ", theta[1], " = 0.4")),
     ylim = c(-1, 1))
abline(h = 0)
plot(1:12, pacf_vals, type = "h", lwd = 3, col = "steelblue",
     xlab = "Lag", ylab = "PACF",
     main = expression(paste("PACF of ARMA(1,1): ", phi[1], " = 0.7, ", theta[1], " = 0.4")),
     ylim = c(-1, 1))
abline(h = 0)
acf_vals2 <- ARMAacf(ar = c(0.5), ma = c(0.6), lag.max = 12)
pacf_vals2 <- ARMAacf(ar = c(0.5), ma = c(0.6), lag.max = 12, pacf = TRUE)
plot(1:12, acf_vals2[-1], type = "h", lwd = 3, col = "coral",
     xlab = "Lag", ylab = "ACF",
     main = expression(paste("ACF of ARMA(1,1): ", phi[1], " = 0.5, ", theta[1], " = -0.6")),
     ylim = c(-1, 1))
abline(h = 0)
plot(1:12, pacf_vals2, type = "h", lwd = 3, col = "coral",
     xlab = "Lag", ylab = "PACF",
     main = expression(paste("PACF of ARMA(1,1): ", phi[1], " = 0.5, ", theta[1], " = -0.6")),
     ylim = c(-1, 1))
abline(h = 0)
par(mfrow = c(1, 1))

Figure 151.6: Theoretical ACF/PACF of ARMA(1,1) process

151.6 ARMA Identification Summary

The table below summarises the theoretical ACF and PACF patterns for the most common models. These patterns form the basis of the Box-Jenkins identification step.

Table 151.1: ACF/PACF identification patterns

Model	ACF pattern	PACF pattern
AR(1)	Exponential decay	Cuts off after lag 1
AR(2)	Exponential or sinusoidal decay	Cuts off after lag 2
AR(p)	Decays (exponential and/or sinusoidal)	Cuts off after lag p
MA(1)	Cuts off after lag 1	Exponential decay
MA(2)	Cuts off after lag 2	Exponential or sinusoidal decay
MA(q)	Cuts off after lag q	Decays (exponential and/or sinusoidal)
ARMA(p,q)	Tails off (no clean cutoff)	Tails off (no clean cutoff)

In practice, the sample ACF and PACF are noisy estimates of the theoretical patterns, making it hard to distinguish between “cutting off” and “decaying” especially for mixed ARMA models. This is why the ARIMA Backward Selection approach described in the next chapters is recommended as the primary identification strategy.

151.7 Practical Identification Workflow

In real datasets, a practical workflow is usually more reliable than pattern matching alone:

Transform and difference first (\(\\lambda\), \(d\), \(D\)) so the working series is approximately stationary.
Inspect ACF/PACF of the stationary series and propose small candidate sets for \((p,q,P,Q)\).
Fit a deliberately general candidate (e.g., modest maxima for \(p,q,P,Q\)) and simplify.
Compare candidates using information criteria (AIC/BIC; Akaike (1974); Schwarz (1978)) and residual diagnostics (ACF/PACF, Ljung-Box (Ljung and Box 1978), normality checks).
Keep the most parsimonious model that passes diagnostics and still forecasts well out of sample.

This chapter focuses on Step 2 (theoretical identification patterns). The implementation of Steps 3–5 is provided in Chapter 152.

151.8 Identifying ARMA Parameters in Practice

The complete ARIMA(p,d,q)(P,D,Q)-lambda model is defined by the following equation:

\[ (1-\phi_1 B^1 -\phi_2 B^2 - ... - \phi_p B^p) (1-\Phi_1 B_s^1 - \Phi_2 B_s^2 - ... - \Phi_P B_s^P) \nabla^d \nabla_s^D \lambda(Y_t) \]

\[ = (1 - \theta_1 B^1 - \theta_2 B^2 - ... - \theta_q B^q) (1 - \Theta_1 B_s^1 - \Theta_2 B_s^2- ... - \Theta_Q B_s^Q) e_t \]

Before we can estimate the AR and MA parameters we need to identify the appropriate values for \(\lambda\), d, D, p, q, P, and Q. We already know how to determine \(\lambda\), d, and D. The other parameters may be identified through careful examination of the ACF and Partial ACF (PACF) about the stationary time series (for instance, the theoretical ACF and PACF patterns that correspond to the AR(1) process).

It requires a lot of experience to identify the values for p, q, P, and Q based on the theoretical patterns of AR and MA models. Therefore we introduce an easier method (ARIMA Backward selection) which is based on a trial-and-error strategy that simplifies a general model (see next section).