129 Repeated Measures ANOVA

Repeated Measures ANOVA is a natural extension of the Paired Two Sample t-Test (Chapter 116) and is used when the same subjects are measured under three or more conditions or time points. In a within-subjects design, each participant serves as their own control, which reduces the effect of individual differences and increases statistical power.

129.1 Hypotheses

The Hypothesis Test can be written as follows:

\[ \begin{cases}\text{H}_0: \mu_1 = \mu_2 = \mu_3 = \ldots = \mu_k \\\text{H}_A: \exists\; i \neq j: \mu_i \neq \mu_j\end{cases} \]

where \(k\) is the number of conditions or time points and \(\mu_i\) is the population mean of condition \(i\).

In other words, we test whether the mean response is the same across all conditions. If the Null Hypothesis is rejected, at least two conditions have significantly different means. The test does not indicate which conditions differ – for that, post-hoc pairwise comparisons are needed.

Within-subjects design: Unlike One-way ANOVA (Chapter 126) where different subjects are assigned to each group, Repeated Measures ANOVA uses the same subjects measured multiple times. This design is common in:

Longitudinal studies (measurements at baseline, 3 months, 6 months)
Crossover trials (each patient receives all treatments in sequence)
Learning experiments (performance measured across multiple sessions)

129.2 Analysis based on p-values and confidence intervals

129.2.1 Software

The Repeated Measures ANOVA can be computed in RFC under the “Hypotheses / Empirical Tests” menu item (select “Repeated Measures ANOVA” from the ANOVA type dropdown), or by using the R code shown below.

129.2.2 Data & Parameters

The data for Repeated Measures ANOVA can be organized in two formats:

Wide format: Each row represents one subject, and each column represents a condition/time point. This format is often convenient for some repeated-measures procedures (e.g. multivariate formulations and diagnostics).
Long format: Each row represents one observation, with separate columns for the subject identifier, the condition, and the response variable. This is the format commonly used with aov(... + Error(subject/condition)).

The key parameters are:

Response variable: the quantitative measurement of interest
Within-subjects factor: a categorical variable identifying the condition or time point
Subject identifier: a variable that identifies which measurements belong to the same subject

129.2.3 Output

Consider the problem of measuring the reaction time (in milliseconds) of 12 subjects under three different conditions: no caffeine, moderate caffeine, and high caffeine. Each subject is tested under all three conditions. The results from the Repeated Measures ANOVA analysis are shown below.

Interactive Shiny app (click to load).

Open in new tab

The output includes the within-subjects F-test, Mauchly’s test for sphericity (with Greenhouse-Geisser and Huynh-Feldt corrections), and post-hoc pairwise paired t-tests with Bonferroni correction. The plots show box plots per condition and subject profile (spaghetti) plots.

The same analysis can also be replicated with R code:

# Simulated reaction time data (wide format)
set.seed(42)
n_subjects <- 10
no_caffeine <- rnorm(n_subjects, mean = 350, sd = 30)
moderate_caffeine <- rnorm(n_subjects, mean = 320, sd = 30)
high_caffeine <- rnorm(n_subjects, mean = 300, sd = 30)

# Create long-format data frame for aov(... + Error(subject/condition))
reaction_data <- data.frame(
  subject = factor(rep(1:n_subjects, 3)),
  condition = factor(rep(c("None", "Moderate", "High"), each = n_subjects),
                     levels = c("None", "Moderate", "High")),
  reaction_time = c(no_caffeine, moderate_caffeine, high_caffeine)
)

# Fit repeated measures ANOVA using aov() in long format with Error term
rm_aov <- aov(reaction_time ~ condition + Error(subject/condition), data = reaction_data)
summary(rm_aov)


Error: subject
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9  10648    1183               

Error: subject:condition
          Df Sum Sq Mean Sq F value  Pr(>F)   
condition  2  27338   13669    8.99 0.00196 **
Residuals 18  27367    1520                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA table shows the within-subjects F-test for the “condition” factor. The F-statistic is the ratio of the variance explained by the condition effect to the residual variance (after removing the between-subjects variance). If the p-value is smaller than the chosen type I error \(\alpha = 0.05\), we reject the Null Hypothesis and conclude that caffeine intake has a significant effect on reaction time.

129.2.3.1 Sphericity

A critical assumption of Repeated Measures ANOVA is sphericity, which requires that the variances of the differences between all pairs of conditions are equal. Mauchly’s test (Mauchly 1940) is used to assess this assumption:

# Wide format for Mauchly's test
wide_data <- data.frame(
  none = no_caffeine,
  moderate = moderate_caffeine,
  high = high_caffeine
)

# Using a multivariate approach for Mauchly's test
idata <- data.frame(condition = factor(c("None", "Moderate", "High"),
                                        levels = c("None", "Moderate", "High")))
mlm <- lm(cbind(none, moderate, high) ~ 1, data = wide_data)
library(car)
rm_anova <- Anova(mlm, idata = idata, idesign = ~condition, type = "III")
summary(rm_anova, multivariate = FALSE, univariate = TRUE)


Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

             Sum Sq num Df Error SS den Df   F value    Pr(>F)    
(Intercept) 3176378      1    10648      9 2684.8061 1.866e-12 ***
condition     27338      2    27368     18    8.9903  0.001963 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Mauchly Tests for Sphericity

          Test statistic p-value
condition        0.57711 0.11092


Greenhouse-Geisser and Huynh-Feldt Corrections
 for Departure from Sphericity

           GG eps Pr(>F[GG])   
condition 0.70279    0.00652 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

             HF eps  Pr(>F[HF])
condition 0.7937342 0.004505291

When sphericity is violated (Mauchly’s test p-value \(< \alpha\)), two corrections are available:

Greenhouse-Geisser correction (Greenhouse and Geisser 1959): More conservative, recommended when the sphericity estimate \(\hat{\varepsilon}\) is less than 0.75.
Huynh-Feldt correction (Huynh and Feldt 1976): Less conservative, recommended when \(\hat{\varepsilon} > 0.75\).

Both corrections adjust the degrees of freedom of the F-test downward, resulting in a larger (more conservative) p-value.

129.2.3.2 Post-hoc pairwise comparisons

If the overall F-test is significant, post-hoc comparisons identify which specific conditions differ. The Bonferroni correction is used to control the family-wise type I error:

# Pairwise paired t-tests with Bonferroni correction
pairwise.t.test(reaction_data$reaction_time, reaction_data$condition,
                paired = TRUE, p.adjust.method = "bonferroni")


    Pairwise comparisons using paired t tests 

data:  reaction_data$reaction_time and reaction_data$condition 

         None   Moderate
Moderate 0.0883 -       
High     0.0002 1.0000  

P value adjustment method: bonferroni

129.3 R code

To compute the Repeated Measures ANOVA on your local machine, the following script can be used in the R console:

# Example with the built-in sleep dataset
# (extra = increase in hours of sleep, group = drug, ID = subject)
data(sleep)

# Repeated measures ANOVA using aov() with Error term
rm_model <- aov(extra ~ group + Error(ID/group), data = sleep)
summary(rm_model)

# Post-hoc pairwise comparisons (Bonferroni-corrected)
pairwise.t.test(sleep$extra, sleep$group,
                paired = TRUE, p.adjust.method = "bonferroni")


Error: ID
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9  58.08   6.453               

Error: ID:group
          Df Sum Sq Mean Sq F value  Pr(>F)   
group      1 12.482  12.482    16.5 0.00283 **
Residuals  9  6.808   0.756                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    Pairwise comparisons using paired t tests 

data:  sleep$extra and sleep$group 

  1     
2 0.0028

P value adjustment method: bonferroni

Note that the Error(ID/group) term in the aov() formula partitions subject-level variability and within-subject (condition) variability. In repeated-measures settings, conditions are measured within each subject (crossed with subjects), and this structure removes between-subject variability from the within-subject test.

129.4 Assumptions

The Repeated Measures ANOVA makes the following assumptions:

Normality: The distribution of the response variable should be approximately normal within each condition, or equivalently, the differences between conditions should be approximately normally distributed. This can be checked using a QQ plot (Chapter 76) of the residuals.
Sphericity: The variances of the differences between all pairs of conditions should be equal. This is tested using Mauchly’s test (Mauchly 1940). When violated, use the Greenhouse-Geisser (Greenhouse and Geisser 1959) or Huynh-Feldt (Huynh and Feldt 1976) correction.
No significant outliers: Extreme values can distort the results. Outliers can be identified using a box plot (Chapter 69) of each condition.

129.5 Alternatives

Friedman test (Chapter 130): A non-parametric alternative that does not require normality or sphericity assumptions. Based on ranks rather than raw values.
Linear mixed-effects models: A more flexible approach that can handle missing data, unbalanced designs, and complex correlation structures. Available through the lme4 package in R.