92 (Partial) Autocorrelation Function

92.1 Definition

The Autocorrelation Function (ACF) of a time series \(Y_t\) relates sequential correlations (on the y-axis) \(\rho_k = \rho(Y_t, Y_{t-k})\) for \(k = 1, 2, …, K\) to the time lag \(k\) (on the x-axis). Often this is presented in graphical form because it allows to quickly detect patterns that are typical of the dynamical properties of the underlying time series. Note that the time lag \(k\) is sometimes called the “order” of autocorrelation.

The sample autocorrelations are computed by the formula

\[ \hat{\rho}_k = \frac{\overset{T}{\underset{t=k+1}\sum} \left( (Y_t - \bar{Y})(Y_{t-k} - \bar{Y}) \right) }{\overset{T}{\underset{t=1}\sum} \left(Y_t - \bar{Y} \right)^2} \]

with

\[ \bar{Y} = \frac{1}{T} \overset{T}{\underset{t=1}\sum} Y_t \]

The 95% Confidence Intervals (of individual autocorrelation coefficients) are displayed as two horizontal lines, based on the approximated standard deviation \(\sigma_{\rho_k} \simeq \tfrac{1}{\sqrt{T}}\) (Bartlett 1946). These bands assume a white-noise null hypothesis (approximately i.i.d. observations/residuals); if that assumption is violated, they are only approximate and may be anti-conservative.

The Partial Autocorrelation Function (PACF) at lag \(k\) is the correlation between \(Y_t\) and \(Y_{t-k}\) after removing the linear effects of the intermediate lags \(Y_{t-1},\ldots,Y_{t-k+1}\). In practice, PACF isolates the direct lag-\(k\) relationship that is not explained by shorter lags.

92.1.1 Horizontal axis

The horizontal axis displays the time lag \(k\).

92.1.2 Vertical axis

The vertical axis represents the autocorrelation coefficients \(\hat{\rho}_k\).

92.2 R Module

92.2.1 Public website

The (Partial) Autocorrelation Function is available on the public website:

https://compute.wessa.net/rwasp_autocorrelation.wasp

92.2.2 RFC

The (Partial) Autocorrelation Function is also available in RFC (when using the default profile) under the “Time Series / (P)ACF” menu item.

To compute the (Partial) Autocorrelation Function on your local machine, the following script can be used in the R console:

par(mar = c(4, 4, 2, 1))
x <- 100 + cumsum(rnorm(150))
summary(x)
par1 = 20 #number of time lags
par2 = 1 #Box-Cox transformation parameter
par3 = 0 #degree of non-seasonal differencing
par4 = 0 #degree of seasonal differencing
par5 = 12 #seasonality
par6 = 'White Noise' #type of confidence interval
par7 = 0.95 #confidence interval
par8 = ''
if (par1 == 'Default') {
  par1 = 10*log10(length(x))
} else {
  par1 <- as.numeric(par1)
}
if (par6 == 'White Noise') par6 <- 'white' else par6 <- 'ma'
if (par8 != '') par8 <- as.numeric(par8)
ox <- x
if (par8 == '') {
  if (par2 == 0) {
    x <- log(x)
  } else {
    x <- (x ^ par2 - 1) / par2
  }
} else {
  x <- log(x,base=par8)
}
if (par3 > 0) x <- diff(x,lag=1,difference=par3)
if (par4 > 0) x <- diff(x,lag=par5,difference=par4)
op <- par(mfrow=c(2,1))
plot(ox,type='l',main='Original Time Series',xlab='time',ylab='value')
if (par8=='') {
  mytitle <- paste('Working Time Series (lambda=',par2,', d=',par3,', D=',par4,')',sep='')
  mysub <- paste('(lambda=',par2,', d=',par3,', D=',par4,', CI=', par7, ', CI type=',par6,')',sep='')
} else {
  mytitle <- paste('Working Time Series (base=',par8,', d=',par3,', D=',par4,')',sep='')
  mysub <- paste('(base=',par8,', d=',par3,', D=',par4,', CI=', par7, ', CI type=',par6,')',sep='')
}
plot(x,type='l', main=mytitle,xlab='time',ylab='value')

par(op)
racf <- acf(x, par1, main='Autocorrelation', xlab='time lag', ylab='ACF', ci.type=par6, ci=par7, sub=mysub)

rpacf <- pacf(x,par1,main='Partial Autocorrelation',xlab='lags',ylab='PACF',sub=mysub)

df = data.frame(k = 0:par1, ACF = c(racf$acf), PACF = c(NA, rpacf$acf))
print(df)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  94.62   97.90   99.11   99.65  101.75  105.40 
    k        ACF         PACF
1   0 1.00000000           NA
2   1 0.92244505  0.922445055
3   2 0.85676937  0.039333913
4   3 0.78752943 -0.053673907
5   4 0.71391579 -0.071177562
6   5 0.65667202  0.063758993
7   6 0.58662679 -0.105598022
8   7 0.52356094 -0.010324303
9   8 0.46587407 -0.001963130
10  9 0.40683487 -0.031680865
11 10 0.35960417  0.020227141
12 11 0.32727544  0.085935071
13 12 0.29117677 -0.042410444
14 13 0.25451010 -0.050473886
15 14 0.21285908 -0.057095254
16 15 0.17999680  0.033860294
17 16 0.15224966 -0.002072882
18 17 0.11425926 -0.085751888
19 18 0.07636231 -0.047282307
20 19 0.04828137  0.050807506
21 20 0.01878635 -0.020996578

To compute the (Partial) Autocorrelation Function, the R code uses the standard acf and pacf functions to compute the analysis.

92.3 Purpose

In practice the ACF can be used to:

describe/summarize autocorrelation at various orders
identify non-seasonal and seasonal trends
identify various types of typical patterns that correspond to well-known forecasting models
check the independence assumption of the residuals of regression and forecasting models

92.4 Pros & Cons

92.4.1 Pros

The (Partial) Autocorrelation Function has the following advantages:

It is relatively easy to interpret and provide a lot of information about the serial correlation of a time series.
It can be computed with many software packages (even though one should be careful when using spreadsheets because they do not take empty cells that occur in lagged time series into account properly).
The combination of Partial and (ordinary) Autocorrelation Functions allows one to identify how the time series model should be specified (Box and Jenkins 1970).
It allows one to check an important assumption of residuals (prediction errors).

92.4.2 Cons

The (Partial) Autocorrelation Function has the following disadvantages:

It is sensitive to outliers.
The confidence intervals are not always computed correctly in statistical software. There are (at least) two types of confidence intervals: one for testing whether residuals contain autocorrelation¹ and another which should be used to identify the autocorrelation structure of the time series model².

92.5 Example

Let us consider the Airline Data and apply the ACF analysis. In a first stage we compute the ACF without any differencing, i.e. \(d = D = 0\) (these parameters were introduced in Section 91.1). The analysis shows the ACF for the original time series and exhibits a pattern which is typical for time series with a non-seasonal trend and strong seasonality:

The coefficients of the ACF are positive and slowly decreasing.
The seasonal coefficients, i.e. \(\rho_{12}, \rho_{24}, \rho_{36}, ...\) are positive and slowly decreasing too.

Interactive Shiny app (click to load).

Open in new tab

In the next step we set \(d = 1\) in order to apply non-seasonal differencing (the other parameter is not changed, \(D = 0\)). When we recompute the ACF (with \(d=1\)) then an entirely different pattern is observed, as is shown in the R module where the long-run trend pattern has disappeared. Only the seasonal Autocorrelation coefficients are still clearly positive and slowly decreasing, indicating the presence of a strong, seasonal pattern.

Finally we set \(d = D = 1\) to apply ordinary and seasonal differencing before recomputing the ACF. The R module shows that the combined differencing procedures eliminate the trend and the seasonal pattern. We can see this in the ACF because it does not exhibit the typical seasonality pattern anymore.

In later chapters, this information will be used to build a practical forecasting model for the Airline time series.

92.6 Task

Based on the ACF, examine the monthly time series of Divorces and determine whether or not there is a long-run trend and seasonality.

In this case you should set the field “CI type” equal to “White Noise” in the R module.↩︎
This is achieved by setting the field “CI type” to the value “MA”↩︎