109 Statistical Test of the difference between Means -- Dependent/Paired Samples
109.1 Theory
We define a first population \(X_1 \sim \text{N}\left( \mu_1, \sigma_1^2 \right)\) from which a simple random sample is drawn of size \(n\) with sample mean \(\bar{x}_1 = \frac{1}{n} \sum_{i=1}^{n}x_{1i}\).
We also define a second population \(X_2 \sim \text{N}\left( \mu_2, \sigma_2^2 \right)\) from which a simple random sample is drawn of size \(n\) with sample mean \(\bar{x}_2 = \frac{1}{n} \sum_{i=1}^{n}x_{2i}\).
To test the difference between the sample means we compute the following test statistic
\[ \begin{align*}\bar{d} &= \bar{x}_1 - \bar{x}_2 \\&= \frac{1}{n} \sum_{i=1}^{n} \left( x_{1i} - x_{2i} \right) \\&= \frac{1}{n} \sum_{i=1}^{n} d_i\end{align*} \]
which has the following distribution
\[ \bar{d} \sim \text{N} \left( D, \frac{\sigma_D^2}{n} \right) \]
where \(D = \mu_1 - \mu_2\).
Now it is possible to transform the test statistic in a convenient form
\[ u = \frac{\bar{d} - D}{\frac{\sigma_D}{\sqrt{n}}} \sim \text{N}(0,1) \]
which can be used with the Standard Normal Table of Appendix E.
If the population variance of the difference is unknown this becomes
\[ t = \frac{\bar{d} - D}{\frac{s_d}{\sqrt{n}}} \sim t_{n-1} \]
with
\[ s_d^2 = \frac{1}{n-1} \sum_{i=1}^{n} \left[ \left( x_{1i} - x_{2i} \right) - \bar{d} \right]^2 = \frac{1}{n-1} \sum_{i=1}^{n} \left( d_i - \bar{d} \right)^2 \]
109.2 Example
Consider the following paired samples
| Observation | Sample 1 | Sample 2 | \(d_i\) | \(d_i^2\) |
|---|---|---|---|---|
| 1 | 106 | 102 | 4 | 16 |
| 2 | 98 | 94 | 4 | 16 |
| 3 | 123 | 118 | 5 | 25 |
| 4 | 97 | 91 | 6 | 36 |
| 5 | 88 | 83 | 5 | 25 |
| 118 |
Now we can compute
\[ \bar{d} = \frac{24}{5} = 4.8 \text{ or } \bar{d} = \bar{x}_1 - \bar{x}_2 = 102.4 - 97.6 = 4.8 \]
and, based on the alternative formulation of the Variance (Section 66.5.2), we compute
\[ \tilde{s}_d^2 = \frac{1}{n} \sum_{i=1}^{n} \left( d_i - \bar{d} \right)^2 = \frac{1}{n} \sum_{i=1}^{n} d_i^2 - \bar{d}^2 = \frac{118}{5} - (4.8)^2 = 0.56 \]
An unbiased estimate for the variance of the differences is given by
\[ s_d^2 = \frac{1}{n-1} \sum_{i=1}^{n} \left( d_i - \bar{d} \right)^2 = \tilde{s}_d^2 \times \frac{n}{n-1} = 0.56 \times \frac{5}{5-1} = 0.7 \]
which results in
\[ s_d = \sqrt{0.7} = 0.8367 \]
109.2.1 Critical Value (Region)
Because the alternative is right-sided (\(H_A: D > D_0 = 0\)), the critical region is of the form \(\bar{d} \geq c\).
\[ \text{P} (\bar{d} \geq c) = 0.05 \]
\[ \text{P} \left( \frac{\bar{d} - D}{\frac{s_d}{\sqrt{n}}} \geq \frac{c - D}{\frac{s_d}{\sqrt{n}}} \right) = 0.05 \]
If we define the following hypotheses
\[ \begin{cases}\text{H}_0: D_0 = \mu_1 - \mu_2 = 0 \\\text{H}_A: D > D_0 = 0\end{cases} \]
Then we can write
\[ \text{P} \left( t \geq \frac{c}{\frac{s_d}{\sqrt{n}}} \right) = 0.05 \]
from which it follows that
\[ \begin{align*}\frac{c \sqrt{n}}{s_d} &= 2.132 \\c &= 2.132 \times \frac{s_d}{\sqrt{n}} \\&= 2.132 \times \frac{0.8367}{2.236} \\&= 0.7978\end{align*} \]
We conclude that
\[ \text{P}\left( \bar{d} \geq 0.7978 \right) = 0.05 \]
Since \(\bar{d} = 4.8\) is larger than the critical value \(c=0.7978\) we reject the Null Hypothesis H\(_0: D_0 = \mu_1 - \mu_2 = 0\) and accept the Alternative Hypothesis.
109.2.2 P-Value
\[ t = \frac{\bar{d}}{\frac{s_d}{\sqrt{n}}} = \frac{\bar{d}\sqrt{n}}{s_d} = \frac{4.8 \times 2.236}{0.8367} = 12.828 \]
Since the number of degrees of freedom df = \(5-1=4\) this can be written as
\[ \text{P}(t_{4} \geq 12.828) = 0.0001 \]
Since the probability 0.0001 is smaller than \(\alpha = 0.05\) we reject the Null Hypothesis H\(_0: D_0 = \mu_1 - \mu_2 = 0\) and accept the Alternative Hypothesis.