79 Box-Cox Normality Plot

79.1 Definition

The Box-Cox Normality Plot is closely related to the PPCC Plot. It uses the same iterative procedure but with a different objective: instead of identifying which distribution best fits the data, the Box-Cox Normality Plot finds the optimal power transformation parameter \(\lambda\) (based on the Box-Cox transformation (Box and Cox 1964)) that makes the data as close to normally distributed as possible.

The procedure works as follows:

a value for the transformation parameter \(\lambda\) is set to an initial value
all observations are transformed using the Box-Cox transformation for the current value of \(\lambda\)
the Normal Probability Plot (i.e. QQ Plot against the Normal Distribution) is computed for the transformed data
the Pearson Correlation Coefficient (of the Normal Probability Plot) is computed and stored together with the current value of \(\lambda\)
steps 2 through 4 are repeated until a (pre-specified) final value is reached for the transformation parameter
a plot is generated which shows all Pearson Correlation Coefficients against their respective \(\lambda\) values

The \(\lambda\) value that produces the highest correlation indicates the power transformation that brings the data closest to normality.

79.1.1 Full (Sign-Preserving) Power Transformation

The following sign-preserving, Box-Cox-like transformation is defined as

\[ T(Y) = \begin{cases}\text{sign}(Y)\frac{|Y|^\lambda-1}{\lambda} & \text{for } \lambda \neq 0 \\\text{sign}(Y)\ln|Y| & \text{for } \lambda = 0\end{cases} \]

which is the default setting for the Box-Cox Normality Plot in this module. It is not the standard Box-Cox transformation: standard Box-Cox requires strictly positive data (or an additive shift), while the use of \(\text{sign}(Y)\) and \(|Y|\) allows this variant to handle negative values (except \(Y=0\) when \(\lambda=0\)).

79.1.2 Simple Box-Cox Transformation

The simple Box-Cox transformation is defined as

\[ T(Y) = \begin{cases} Y^\lambda & \text{for } \lambda \neq 0 \\ \ln Y & \text{for } \lambda = 0\end{cases} \]

The simple transformation is more commonly encountered in the literature but requires strictly positive data.

Common values of \(\lambda\) correspond to well-known transformations – Table 79.1 shows the most frequently used values.

Table 79.1: Common Box-Cox lambda values and their corresponding transformations

\(\lambda\)	Transformation
\(\lambda = -1\)	reciprocal: \(T(Y) = 1/Y\)
\(\lambda = -0.5\)	reciprocal square root: \(T(Y) = 1/\sqrt{Y}\)
\(\lambda = 0\)	natural logarithm: \(T(Y) = \ln Y\)
\(\lambda = 0.5\)	square root: \(T(Y) = \sqrt{Y}\)
\(\lambda = 1\)	no transformation: \(T(Y) = Y\) (identity)
\(\lambda = 2\)	square: \(T(Y) = Y^2\)

79.2 Horizontal axis

The horizontal axis displays the values of \(\lambda\) which varied between two pre-specified (minimum and maximum) values.

79.3 Vertical axis

The vertical axis displays the Pearson Correlation Coefficients between the Normal Quantiles and the sorted transformed data.

79.4 R Module

79.4.1 Public website

The Box-Cox Normality Plot can be found on the public website:

https://compute.wessa.net/rwasp_boxcoxnorm.wasp

79.4.2 RFC

The Box-Cox Normality Plot is available in RFC under the “Distributions / Box-Cox Normality Plot” menu item.

To compute the Box-Cox Normality Plot on your local machine, the following script can be used in the R console:

library(car)
x <- rlnorm(500) #right-skewed log-normal data, should result in lambda near 0
par1 = 'Full Box-Cox transform' #Type of transformation
par2 <- 200 #abs(minlambda * 100)
par3 <- 200 #maxlambda * 100
numlam <- par2 + par3 + 1
n <- length(x)
c <- array(NA, dim=c(numlam))
l <- array(NA, dim=c(numlam))
mx <- -1
mxli <- -999
for (i in 1:numlam) {
  l[i] <- (i - par2 - 1) / 100
  if (l[i] != 0) {
    if (par1 == 'Full Box-Cox transform') x1 <- sign(x) * (abs(x)^l[i] - 1) / l[i]
    if (par1 == 'Simple Box-Cox transform') x1 <- x^l[i]
  } else {
    if (par1 == 'Full Box-Cox transform') x1 <- sign(x) * log(abs(x))
    if (par1 == 'Simple Box-Cox transform') x1 <- log(x)
  }
  c[i] <- cor(qnorm(ppoints(x), mean=0, sd=1), sort(x1))
  if (mx < c[i]) {
    mx <- c[i]
    mxli <- l[i]
    x1.best <- x1
  }
}
if (mxli != 0) {
  if (par1 == 'Full Box-Cox transform') x1 <- sign(x) * (abs(x)^mxli - 1) / mxli
  if (par1 == 'Simple Box-Cox transform') x1 <- x^mxli
} else {
  if (par1 == 'Full Box-Cox transform') x1 <- sign(x) * log(abs(x))
  if (par1 == 'Simple Box-Cox transform') x1 <- log(x)
}
cat('Maximum correlation:', mx, '\n')
cat('Optimal lambda:', mxli, '\n')
mypT <- powerTransform(x)
summary(mypT)
op <- par(mfrow=c(2,2))
plot(l, c, main='Box-Cox Normality Plot', xlab='Lambda', ylab='Correlation')
mtext(paste('Optimal Lambda =', mxli))
grid()
qqPlot(x, main='QQ Plot - Original Data')
grid()
qqPlot(x1, main='QQ Plot - Transformed Data')
grid()
plot(x, x1, xlab='Original Data', ylab='Transformed Data')
grid()

par(op)

Maximum correlation: 0.9990619 
Optimal lambda: -0.01 
bcPower Transformation to Normality 
  Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
x   -0.0146           0      -0.0892       0.0601

Likelihood ratio test that transformation parameter is equal to 0
 (log transformation)
                            LRT df    pval
LR test, lambda = (0) 0.1462913  1 0.70211

Likelihood ratio test that no transformation is needed
                          LRT df       pval
LR test, lambda = (1) 677.965  1 < 2.22e-16
[1]  75 157
[1]  75 413

To compute the Box-Cox Normality Plot, the R code iterates over values of \(\lambda\) between \(-2\) and \(2\) with a step size of \(0.01\). For each value of \(\lambda\), the data is transformed and the Pearson Correlation Coefficient between the Normal Quantiles (computed via qnorm(ppoints(x))) and the sorted transformed data is computed. The value of \(\lambda\) that produces the highest correlation is the optimal transformation parameter.

In addition to the graphical (correlation-based) approach, the powerTransform function from the car package computes a Maximum Likelihood Estimate (MLE) of \(\lambda\) for a parametric power-transformation model. In this example the data are positive, so the standard Box-Cox MLE is applicable; in general, standard Box-Cox requires strictly positive data (or a prior shift). The MLE approach provides a confidence interval and a formal hypothesis test for whether \(\lambda\) is significantly different from common values such as \(0\) (log transform) or \(1\) (no transform).

79.5 Purpose

The Box-Cox Normality Plot serves two main purposes:

It provides a visual and quantitative method to determine which power transformation brings the data closest to a Normal Distribution. This is particularly useful when normality is an assumption of subsequent analysis (e.g. hypothesis testing, linear regression).
It provides an estimate for \(\lambda\) which can be used to transform the data before further analysis. Note that the value of \(\lambda\) is typically rounded to a convenient value (see Table 79.1) if the rounded value falls within the confidence interval of the MLE estimate.

Note: the Box-Cox Normality Plot is also used in time series analysis to induce stationarity of the variance – see Chapter 150 for a detailed discussion.

79.6 Pros & Cons

79.6.1 Pros

The Box-Cox Normality Plot has the following advantages:

it provides a clear visual indication of the optimal transformation parameter
it is easy to interpret: a sharp peak in the plot indicates a well-defined optimal \(\lambda\)
it combines well with the QQ Plot to show the improvement in normality before and after transformation
the MLE approach provides a formal hypothesis test and confidence interval for \(\lambda\)

79.6.2 Cons

The Box-Cox Normality Plot has the following disadvantages:

the Box-Cox transformation mainly addresses skewness; it may reduce tail weight caused by right-skewness, but it typically cannot fix multimodality or intrinsically heavy-tailed symmetric distributions
the simple transformation requires strictly positive data; a constant may need to be added before the analysis is performed
there is no guarantee that any power transformation will achieve normality for a given dataset

79.7 Example

The following analysis shows the Box-Cox Normality Plot for the monthly marriages time series in Belgium. From the Tukey-Lambda PPCC Plot we already know that the distribution of this time series resembles a Uniform Distribution (see Section 78.8). The Box-Cox Normality Plot shows the optimal value of \(\lambda\) to transform the data towards normality.

Interactive Shiny app (click to load).

Open in new tab

The Box-Cox Normality Plot shows that the optimal \(\lambda\) is close to \(0\) (\(\ln Y\)) for the marriages time series. The QQ Plots confirm that the log-transformed data is closer to the Normal Distribution than the original data. The MLE output from powerTransform provides the exact estimate of \(\lambda\) and its confidence interval, which allows us to determine whether rounding to a convenient value (e.g. \(0\)) is appropriate.

79.8 Task

Compute the Box-Cox Normality Plot for the monthly divorces time series and interpret the results. Compare the optimal \(\lambda\) with the result of the PPCC Plot in Section 78.8. What does this tell you about the relationship between the two methods?