72 Rank Correlation

72.1 Definition of Spearman Rank Order Correlation

The basic idea of Rank Correlations is that we compute the linear association between the rank orders of two variables, rather than the original data values. To define the Spearman Rank Order Correlation (Spearman 1904) we use a computational example based on sample data that are displayed in Table 72.1.

Table 72.1: Student Scores for two exams \(x\) and \(y\)

Student	Score \(x\)	Rank \(x\)	Score \(y\)	Rank \(y\)	\(d_i\)	\(d_i^2\)
A	30	11.0	70	10.5	+0.5	0.25
B	30	11.0	70	10.5	+0.5	0.25
C	25	5.5	68	7.5	-2.0	4.00
D	27	7.5	63	5.0	+2.5	6.25
E	23	3.0	52	2.5	+0.5	0.25
F	21	1.0	50	1.0	+0.0	0.00
G	27	7.5	68	7.5	+0.0	0.00
H	23	3.0	59	4.0	-1.0	1.00
I	23	3.0	52	2.5	+0.5	0.25
J	30	11.0	70	10.5	+0.5	0.25
K	28	9.0	70	10.5	-1.5	2.25
L	25	5.5	64	6.0	-0.5	0.25
						15.00

Note that a “mean rank” is assigned if two or more data values are equal (e.g. students C and L both have a score of 25 for exam \(x\) which corresponds to a mean rank of 5.5 = (5+6)/2). Because ties are present in this example, the no-ties z-test based on \(D=\sum d_i^2\) (see Section 72.5) is not valid for inference here.

72.2 Uncorrected Spearman Rank Order Correlation

The “uncorrected” Spearman Rank Order Correlation is defined as

\[ r_s = 1 - \frac{6 \sum_{i=1}^{n} d_i^2}{n(n-1)(n+1)} = 1 - \frac{6D}{n(n-1)(n+1)} \]

where \(d_i\) is the difference in rank order for each observation \(i = 1, 2, 3, …, n\).

When this definition is applied to our sample data we obtain

\[ r_s = 1 - \frac{6 \times 15}{12 \times 11 \times 13} = 1 - \frac{90}{1716} = 0.947552 \]

72.3 Corrected Spearman Rank Order Correlation

The problem with this definition is that it does not take into account the ties in the rank orders. To compute the “corrected” Spearman Rank Order Correlation we use the following definition

\[ r_s = \frac{\sum x^2 + \sum y^2 - \sum d^2}{2 \sqrt{\sum x^2 \times \sum y^2}} \]

with the following components (applied to the sample data)

\[ \begin{cases}\sum T_x = \sum \frac{t^3 - t}{12} = \underset{\text{{\tiny 3 ties of rank 3}}}{\frac{3^3 - 3}{12}} + \underset{\text{{\tiny 2 ties of rank 5.5}}}{\frac{2^3 - 2}{12}} + \underset{\text{{\tiny 2 ties of rank 7.5}}}{\frac{2^3 - 2}{12}} + \underset{\text{{\tiny 3 ties of rank 11}}}{\frac{3^3 - 3}{12}} = 5 \\\sum T_y = \sum \frac{t^3 - t}{12} = \underset{\text{{\tiny 2 ties of rank 2.5}}}{\frac{2^3 - 2}{12}} + \underset{\text{{\tiny 2 ties of rank 7.5}}}{\frac{2^3 - 2}{12}} + \underset{\text{{\tiny 4 ties of rank 10.5}}}{\frac{4^3 - 4}{12}} = 6 \\\sum x^2 = \frac{n^3 - n}{12} - \sum T_x = \frac{12^3 - 12}{12} - 5 = 143 - 5 = 138 \\\sum y^2 = \frac{n^3 - n}{12} - \sum T_y = \frac{12^3 - 12}{12} - 6 = 143 - 6 = 137\end{cases} \]

This implies that the “corrected” result is

\[ r_s = \frac{138 + 137 - 15}{2 \sqrt{138 \times 137}} = 0.945461 \]

72.4 t-Test Statistic

\[ t = r_s \sqrt{\frac{n-2}{1-r_s^2}} \]

When applied to the sample data we obtain

\[ t = 0.945461 \sqrt{\frac{12-2}{1-0.893897}} = 9.178631 \]

which, based on the t-Distribution, leads to

\[ \text{P}(t \geq 9.178631) = 0.000002 \]

72.5 z-Test Statistic

This z-test based on \(D=\sum d_i^2\) assumes there are no ties in either ranking.

\[ z = \frac{D - \text{E}(D)}{\sqrt{\text{V}(D)}} = \frac{\sum_{i=1}^{n}d_i^2 - \frac{n(n-1)(n+1)}{6}}{\sqrt{\frac{n^2(n-1)(n+1)^2}{36}}} \]

If this no-ties formula is applied to the tied sample data above, we obtain the following value (illustrative only; for tied data use a tie-corrected or software-computed Spearman test for inference):

\[ z = \frac{15 - 286}{\sqrt{7436}} = -3.142676 \]

If one nonetheless plugs this illustrative value into the standard Normal Distribution, it leads to

\[ \text{P}(z \geq -3.142676) = 0.999163 \wedge \text{P}(z < -3.142676) = 0.000837 \]

These probabilities are not valid Spearman inference results for this tied sample.

72.6 Definition of Kendall’s \(\tau\) Rank Order Correlation (Kendall 1938)

\[ \tau = \frac{\text{\# concordant pairs - \# discordant pairs}}{\frac{n(n-1)}{2}} \]

where a pair of ranks is said to be “concordant” if \(x_i > x_j\) and \(y_i > y_j\) or if both \(x_i < x_j\) and \(y_i < y_j\).

As is the case with the Spearman Rank Order Correlation, Kendall’s \(\tau\) requires special treatment of ties. This treatment is not discussed in this book -- however, the R modules use the corrected formulas.

72.7 R Module

72.7.1 Public website

The Spearman Rank Order Correlation for bivariate data is available on the public website:

https://compute.wessa.net/rwasp_spearman.wasp

The Kendall’s \(\tau\) Rank Order Correlation for bivariate data is available on the public website:

https://compute.wessa.net/rwasp_kendall.wasp

The public website also features an R module which allows to compute Pearson Correlations, Spearman Rank Order Correlations, and Kendall’s \(\tau\) Rank Order Correlations for all possible pairs of variables in a multivariate dataset:

https://compute.wessa.net/rwasp_pairs.wasp

72.7.2 RFC

When using the default profile in RFC these R modules can be found under the “Descriptive / Multivariate Descriptive Statistics”.

The R code to compute Correlation Matrices is shown in Section 71.5.2. To compute the bivariate Spearman and Kendall \(\tau\) Rank Order Correlation on your local machine, the following script can be used in the R console:

y <- c(80,60,10,20,30)
x <- c(20,40,30,50,60)
ylab = 'y'
xlab = 'x'
plot(x,y,main='Scatterplot',xlab=xlab,ylab=ylab)
grid()

plot(rank(x),rank(y),main='Scatterplot of Ranks',xlab=xlab,ylab=ylab)
grid()

#Kendall's tau with base R
k <- cor.test(x,y,method='kendall')
#rho
k$estimate
#2-sided p-value
k$p.value
#Spearman's rho with base R
k <- cor.test(x,y,method='spearman')
#rho
k$estimate
#2-sided p-value
k$p.value

 tau 
-0.2 
[1] 0.8166667
 rho 
-0.3 
[1] 0.6833333

To compute the Spearman or Kendall \(\tau\) Rank Order Correlation, the R code uses the cor.test function which features a method parameter (method can have the values ‘pearson’, ‘spearman’, and ‘kendall’). Alternatively, there are several external libraries (such as the Kendall package) that can be used.

72.8 Purpose

Rank Order Correlations are used to identify associations between pairs of variables. Since the computations are based on ranks (rather than the original values) they require a hypothesis-testing mechanism (see Hypothesis Testing) which does not rely on distributional assumptions (i.e. Rank Order Correlations are non-parametric).

72.9 Pros & Cons

72.9.1 Pros

Rank Order Correlations have the following advantages:

There are no distributional assumptions when these correlations are used to test hypotheses. In other words, there is no requirement that the variables have a Normal Distribution.
These correlations are robust (not sensitive to outliers).
They are easily computed with many software packages.

72.9.2 Cons

Rank Order Correlations have the following disadvantages:

Some software packages may not correct for ties (software documentation does not always describe whether or not correction for ties is applied, and how this is done)
While most readers know that Rank Order Correlations exist, they often do not know why they are used and how they differ from Pearson Correlations.

72.10 Example 1

The following analysis shows the Spearman Rank Order Correlation for two types of intrinsic motivation scores. Since both variables have a discrete (non normal) distribution, this computation can be used for the purpose of hypothesis testing.

Interactive Shiny app (click to load).

Open in new tab

The correlation coefficient is 0.599 which is evidence for a positive association between both types of intrinsic motivation (i.e. students with high scores for IM.Know also tend to have high scores for IM.Accomplishment).

72.11 Example 2

The Kendall \(\tau\) correlation can be computed by selecting “Kendall” in the “Type of Correlations” drop down box. The output show that the correlation is 0.463 which is considerably smaller than the Spearman correlation from the previous example. This is (not always but) often the case: Kendall’s \(\tau\) tends to be more conservative than Spearman’s correlation.