Repeated Measures ANOVA is a natural extension of the Paired Two Sample t-Test (Chapter 116) and is used when the same subjects are measured under three or more conditions or time points. In a within-subjects design, each participant serves as their own control, which reduces the effect of individual differences and increases statistical power.
where \(k\) is the number of conditions or time points and \(\mu_i\) is the population mean of condition \(i\).
In other words, we test whether the mean response is the same across all conditions. If the Null Hypothesis is rejected, at least two conditions have significantly different means. The test does not indicate which conditions differ – for that, post-hoc pairwise comparisons are needed.
Within-subjects design: Unlike One-way ANOVA (Chapter 126) where different subjects are assigned to each group, Repeated Measures ANOVA uses the same subjects measured multiple times. This design is common in:
Longitudinal studies (measurements at baseline, 3 months, 6 months)
Crossover trials (each patient receives all treatments in sequence)
Learning experiments (performance measured across multiple sessions)
129.2 Analysis based on p-values and confidence intervals
129.2.1 Software
The Repeated Measures ANOVA can be computed in RFC under the “Hypotheses / Empirical Tests” menu item (select “Repeated Measures ANOVA” from the ANOVA type dropdown), or by using the R code shown below.
129.2.2 Data & Parameters
The data for Repeated Measures ANOVA can be organized in two formats:
Wide format: Each row represents one subject, and each column represents a condition/time point. This format is often convenient for some repeated-measures procedures (e.g. multivariate formulations and diagnostics).
Long format: Each row represents one observation, with separate columns for the subject identifier, the condition, and the response variable. This is the format commonly used with aov(... + Error(subject/condition)).
The key parameters are:
Response variable: the quantitative measurement of interest
Within-subjects factor: a categorical variable identifying the condition or time point
Subject identifier: a variable that identifies which measurements belong to the same subject
129.2.3 Output
Consider the problem of measuring the reaction time (in milliseconds) of 12 subjects under three different conditions: no caffeine, moderate caffeine, and high caffeine. Each subject is tested under all three conditions. The results from the Repeated Measures ANOVA analysis are shown below.
The output includes the within-subjects F-test, Mauchly’s test for sphericity (with Greenhouse-Geisser and Huynh-Feldt corrections), and post-hoc pairwise paired t-tests with Bonferroni correction. The plots show box plots per condition and subject profile (spaghetti) plots.
The same analysis can also be replicated with R code:
# Simulated reaction time data (wide format)set.seed(42)n_subjects <-10no_caffeine <-rnorm(n_subjects, mean =350, sd =30)moderate_caffeine <-rnorm(n_subjects, mean =320, sd =30)high_caffeine <-rnorm(n_subjects, mean =300, sd =30)# Create long-format data frame for aov(... + Error(subject/condition))reaction_data <-data.frame(subject =factor(rep(1:n_subjects, 3)),condition =factor(rep(c("None", "Moderate", "High"), each = n_subjects),levels =c("None", "Moderate", "High")),reaction_time =c(no_caffeine, moderate_caffeine, high_caffeine))# Fit repeated measures ANOVA using aov() in long format with Error termrm_aov <-aov(reaction_time ~ condition +Error(subject/condition), data = reaction_data)summary(rm_aov)
Error: subject
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 9 10648 1183
Error: subject:condition
Df Sum Sq Mean Sq F value Pr(>F)
condition 2 27338 13669 8.99 0.00196 **
Residuals 18 27367 1520
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA table shows the within-subjects F-test for the “condition” factor. The F-statistic is the ratio of the variance explained by the condition effect to the residual variance (after removing the between-subjects variance). If the p-value is smaller than the chosen type I error \(\alpha = 0.05\), we reject the Null Hypothesis and conclude that caffeine intake has a significant effect on reaction time.
129.2.3.1 Sphericity
A critical assumption of Repeated Measures ANOVA is sphericity, which requires that the variances of the differences between all pairs of conditions are equal. Mauchly’s test (Mauchly 1940) is used to assess this assumption:
# Wide format for Mauchly's testwide_data <-data.frame(none = no_caffeine,moderate = moderate_caffeine,high = high_caffeine)# Using a multivariate approach for Mauchly's testidata <-data.frame(condition =factor(c("None", "Moderate", "High"),levels =c("None", "Moderate", "High")))mlm <-lm(cbind(none, moderate, high) ~1, data = wide_data)library(car)rm_anova <-Anova(mlm, idata = idata, idesign =~condition, type ="III")summary(rm_anova, multivariate =FALSE, univariate =TRUE)
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
Sum Sq num Df Error SS den Df F value Pr(>F)
(Intercept) 3176378 1 10648 9 2684.8061 1.866e-12 ***
condition 27338 2 27368 18 8.9903 0.001963 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Mauchly Tests for Sphericity
Test statistic p-value
condition 0.57711 0.11092
Greenhouse-Geisser and Huynh-Feldt Corrections
for Departure from Sphericity
GG eps Pr(>F[GG])
condition 0.70279 0.00652 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
HF eps Pr(>F[HF])
condition 0.7937342 0.004505291
When sphericity is violated (Mauchly’s test p-value \(< \alpha\)), two corrections are available:
Greenhouse-Geisser correction(Greenhouse and Geisser 1959): More conservative, recommended when the sphericity estimate \(\hat{\varepsilon}\) is less than 0.75.
Huynh-Feldt correction(Huynh and Feldt 1976): Less conservative, recommended when \(\hat{\varepsilon} > 0.75\).
Both corrections adjust the degrees of freedom of the F-test downward, resulting in a larger (more conservative) p-value.
129.2.3.2 Post-hoc pairwise comparisons
If the overall F-test is significant, post-hoc comparisons identify which specific conditions differ. The Bonferroni correction is used to control the family-wise type I error:
Pairwise comparisons using paired t tests
data: reaction_data$reaction_time and reaction_data$condition
None Moderate
Moderate 0.0883 -
High 0.0002 1.0000
P value adjustment method: bonferroni
129.3 R code
To compute the Repeated Measures ANOVA on your local machine, the following script can be used in the R console:
# Example with the built-in sleep dataset# (extra = increase in hours of sleep, group = drug, ID = subject)data(sleep)# Repeated measures ANOVA using aov() with Error termrm_model <-aov(extra ~ group +Error(ID/group), data = sleep)summary(rm_model)# Post-hoc pairwise comparisons (Bonferroni-corrected)pairwise.t.test(sleep$extra, sleep$group,paired =TRUE, p.adjust.method ="bonferroni")
Error: ID
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 9 58.08 6.453
Error: ID:group
Df Sum Sq Mean Sq F value Pr(>F)
group 1 12.482 12.482 16.5 0.00283 **
Residuals 9 6.808 0.756
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Pairwise comparisons using paired t tests
data: sleep$extra and sleep$group
1
2 0.0028
P value adjustment method: bonferroni
Note that the Error(ID/group) term in the aov() formula partitions subject-level variability and within-subject (condition) variability. In repeated-measures settings, conditions are measured within each subject (crossed with subjects), and this structure removes between-subject variability from the within-subject test.
129.4 Assumptions
The Repeated Measures ANOVA makes the following assumptions:
Normality: The distribution of the response variable should be approximately normal within each condition, or equivalently, the differences between conditions should be approximately normally distributed. This can be checked using a QQ plot (Chapter 76) of the residuals.
Sphericity: The variances of the differences between all pairs of conditions should be equal. This is tested using Mauchly’s test (Mauchly 1940). When violated, use the Greenhouse-Geisser (Greenhouse and Geisser 1959) or Huynh-Feldt (Huynh and Feldt 1976) correction.
No significant outliers: Extreme values can distort the results. Outliers can be identified using a box plot (Chapter 69) of each condition.
129.5 Alternatives
Friedman test (Chapter 130): A non-parametric alternative that does not require normality or sphericity assumptions. Based on ranks rather than raw values.
Linear mixed-effects models: A more flexible approach that can handle missing data, unbalanced designs, and complex correlation structures. Available through the lme4 package in R.
Greenhouse, Samuel W., and Seymour Geisser. 1959. “On Methods in the Analysis of Profile Data.”Psychometrika 24 (2): 95–112. https://doi.org/10.1007/BF02289823.
Huynh, Huynh, and Leonard S. Feldt. 1976. “Estimation of the Box Correction for Degrees of Freedom from Sample Data in Randomized Block and Split-Plot Designs.”Journal of Educational Statistics 1 (1): 69–82. https://doi.org/10.3102/10769986001001069.
Mauchly, John W. 1940. “Significance Test for Sphericity of a Normal \(n\)-Variate Distribution.”The Annals of Mathematical Statistics 11 (2): 204–9. https://doi.org/10.1214/aoms/1177731915.