131 Testing Correlations

Correlations can be used in Hypothesis Testing just like the Arithmetic Mean. For the Pearson Correlation we can use a t-Test (as explained in Section 71.4) while the Spearman Rank Order Correlation can be tested with a t-statistic (with a t-Distribution) or z-statistic (using the Normal Distribution) as described in Section 72.4 and Section 72.5.

Similarly, Hypothesis Testing can be used for (almost) any type of correlation, including Kendall’s \(\tau\) Rank Correlations, Partial Pearson Correlations, and Autocorrelations.

131.1 Hypotheses

In most practical cases, Hypothesis Tests formulated about correlations are two-sided with a Null value of zero:

\[ \begin{cases}\text{H}_0: \rho = \rho_0 \\\text{H}_A: \rho \neq \rho_0\end{cases} \]

where \(\rho_0 = 0\).

131.2 Analysis based on p-values and confidence intervals

Let us reconsider the example from the Pearson Correlation coefficient between US retail prices and Arabica import prices. The p-value \(p \simeq 0\) is shown in the lower left panel of the Pairwise Scatterplots and leads us to reject the Null Hypothesis. If rank correlation (Spearman or Kendall) is selected instead of Pearson, the p-value will be automatically recomputed (in this case it doesn’t make any difference because the p-values are extremely small).

Interactive Shiny app (click to load).

Open in new tab

131.3 Assumptions

The Pearson Correlation coefficient can be computed and used as a purely descriptive tool (even when using two binary variables). However, when we wish to use Pearson correlation in the context of Hypothesis Testing, the observations should be independent and the relationship should be approximately linear. For small samples, the usual tests rely on stronger distributional assumptions (typically approximate bivariate normality).

For interval estimation and normal approximation, the Fisher-\(z\) transform (Fisher 1915) is preferred:

\[ z = \frac{1}{2}\ln\left(\frac{1+r}{1-r}\right) \approx \text{N}\left(\frac{1}{2}\ln\left(\frac{1+\rho}{1-\rho}\right), \frac{1}{n-3}\right) \]

The Pearson Correlation coefficient can only describe linear relationships, whereas rank correlations, e.g. Spearman’s Rank Order correlation and Kendall’s \(\tau\), are estimates of any monotonic form.

131.4 R code

data(cars)

# Pearson correlation test
ct <- cor.test(cars$speed, cars$dist, method = "pearson")
ct

# Fisher-z confidence interval (equivalent to cor.test output for Pearson)
r <- ct$estimate
n <- sum(complete.cases(cars$speed, cars$dist))
z <- 0.5 * log((1 + r) / (1 - r))
se <- 1 / sqrt(n - 3)
zcrit <- qnorm(0.975)
ci_z <- c(z - zcrit * se, z + zcrit * se)
ci_r <- (exp(2 * ci_z) - 1) / (exp(2 * ci_z) + 1)
ci_r


    Pearson's product-moment correlation

data:  cars$speed and cars$dist
t = 9.464, df = 48, p-value = 1.49e-12
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.6816422 0.8862036
sample estimates:
      cor 
0.8068949 

      cor       cor 
0.6816422 0.8862036