• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Hypothesis Testing
  2. 129  Repeated Measures ANOVA
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • 129.1 Hypotheses
  • 129.2 Analysis based on p-values and confidence intervals
    • 129.2.1 Software
    • 129.2.2 Data & Parameters
    • 129.2.3 Output
  • 129.3 R code
  • 129.4 Assumptions
  • 129.5 Alternatives
  1. Hypothesis Testing
  2. 129  Repeated Measures ANOVA

129  Repeated Measures ANOVA

Repeated Measures ANOVA is a natural extension of the Paired Two Sample t-Test (Chapter 116) and is used when the same subjects are measured under three or more conditions or time points. In a within-subjects design, each participant serves as their own control, which reduces the effect of individual differences and increases statistical power.

129.1 Hypotheses

The Hypothesis Test can be written as follows:

\[ \begin{cases}\text{H}_0: \mu_1 = \mu_2 = \mu_3 = \ldots = \mu_k \\\text{H}_A: \exists\; i \neq j: \mu_i \neq \mu_j\end{cases} \]

where \(k\) is the number of conditions or time points and \(\mu_i\) is the population mean of condition \(i\).

In other words, we test whether the mean response is the same across all conditions. If the Null Hypothesis is rejected, at least two conditions have significantly different means. The test does not indicate which conditions differ – for that, post-hoc pairwise comparisons are needed.

Within-subjects design: Unlike One-way ANOVA (Chapter 126) where different subjects are assigned to each group, Repeated Measures ANOVA uses the same subjects measured multiple times. This design is common in:

  • Longitudinal studies (measurements at baseline, 3 months, 6 months)
  • Crossover trials (each patient receives all treatments in sequence)
  • Learning experiments (performance measured across multiple sessions)

129.2 Analysis based on p-values and confidence intervals

129.2.1 Software

The Repeated Measures ANOVA can be computed in RFC under the “Hypotheses / Empirical Tests” menu item (select “Repeated Measures ANOVA” from the ANOVA type dropdown), or by using the R code shown below.

129.2.2 Data & Parameters

The data for Repeated Measures ANOVA can be organized in two formats:

  • Wide format: Each row represents one subject, and each column represents a condition/time point. This format is often convenient for some repeated-measures procedures (e.g. multivariate formulations and diagnostics).
  • Long format: Each row represents one observation, with separate columns for the subject identifier, the condition, and the response variable. This is the format commonly used with aov(... + Error(subject/condition)).

The key parameters are:

  • Response variable: the quantitative measurement of interest
  • Within-subjects factor: a categorical variable identifying the condition or time point
  • Subject identifier: a variable that identifies which measurements belong to the same subject

129.2.3 Output

Consider the problem of measuring the reaction time (in milliseconds) of 12 subjects under three different conditions: no caffeine, moderate caffeine, and high caffeine. Each subject is tested under all three conditions. The results from the Repeated Measures ANOVA analysis are shown below.

Interactive Shiny app (click to load).
Open in new tab

The output includes the within-subjects F-test, Mauchly’s test for sphericity (with Greenhouse-Geisser and Huynh-Feldt corrections), and post-hoc pairwise paired t-tests with Bonferroni correction. The plots show box plots per condition and subject profile (spaghetti) plots.

The same analysis can also be replicated with R code:

# Simulated reaction time data (wide format)
set.seed(42)
n_subjects <- 10
no_caffeine <- rnorm(n_subjects, mean = 350, sd = 30)
moderate_caffeine <- rnorm(n_subjects, mean = 320, sd = 30)
high_caffeine <- rnorm(n_subjects, mean = 300, sd = 30)

# Create long-format data frame for aov(... + Error(subject/condition))
reaction_data <- data.frame(
  subject = factor(rep(1:n_subjects, 3)),
  condition = factor(rep(c("None", "Moderate", "High"), each = n_subjects),
                     levels = c("None", "Moderate", "High")),
  reaction_time = c(no_caffeine, moderate_caffeine, high_caffeine)
)

# Fit repeated measures ANOVA using aov() in long format with Error term
rm_aov <- aov(reaction_time ~ condition + Error(subject/condition), data = reaction_data)
summary(rm_aov)

Error: subject
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9  10648    1183               

Error: subject:condition
          Df Sum Sq Mean Sq F value  Pr(>F)   
condition  2  27338   13669    8.99 0.00196 **
Residuals 18  27367    1520                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA table shows the within-subjects F-test for the “condition” factor. The F-statistic is the ratio of the variance explained by the condition effect to the residual variance (after removing the between-subjects variance). If the p-value is smaller than the chosen type I error \(\alpha = 0.05\), we reject the Null Hypothesis and conclude that caffeine intake has a significant effect on reaction time.

129.2.3.1 Sphericity

A critical assumption of Repeated Measures ANOVA is sphericity, which requires that the variances of the differences between all pairs of conditions are equal. Mauchly’s test (Mauchly 1940) is used to assess this assumption:

# Wide format for Mauchly's test
wide_data <- data.frame(
  none = no_caffeine,
  moderate = moderate_caffeine,
  high = high_caffeine
)

# Using a multivariate approach for Mauchly's test
idata <- data.frame(condition = factor(c("None", "Moderate", "High"),
                                        levels = c("None", "Moderate", "High")))
mlm <- lm(cbind(none, moderate, high) ~ 1, data = wide_data)
library(car)
rm_anova <- Anova(mlm, idata = idata, idesign = ~condition, type = "III")
summary(rm_anova, multivariate = FALSE, univariate = TRUE)

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

             Sum Sq num Df Error SS den Df   F value    Pr(>F)    
(Intercept) 3176378      1    10648      9 2684.8061 1.866e-12 ***
condition     27338      2    27368     18    8.9903  0.001963 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Mauchly Tests for Sphericity

          Test statistic p-value
condition        0.57711 0.11092


Greenhouse-Geisser and Huynh-Feldt Corrections
 for Departure from Sphericity

           GG eps Pr(>F[GG])   
condition 0.70279    0.00652 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

             HF eps  Pr(>F[HF])
condition 0.7937342 0.004505291

When sphericity is violated (Mauchly’s test p-value \(< \alpha\)), two corrections are available:

  • Greenhouse-Geisser correction (Greenhouse and Geisser 1959): More conservative, recommended when the sphericity estimate \(\hat{\varepsilon}\) is less than 0.75.
  • Huynh-Feldt correction (Huynh and Feldt 1976): Less conservative, recommended when \(\hat{\varepsilon} > 0.75\).

Both corrections adjust the degrees of freedom of the F-test downward, resulting in a larger (more conservative) p-value.

129.2.3.2 Post-hoc pairwise comparisons

If the overall F-test is significant, post-hoc comparisons identify which specific conditions differ. The Bonferroni correction is used to control the family-wise type I error:

# Pairwise paired t-tests with Bonferroni correction
pairwise.t.test(reaction_data$reaction_time, reaction_data$condition,
                paired = TRUE, p.adjust.method = "bonferroni")

    Pairwise comparisons using paired t tests 

data:  reaction_data$reaction_time and reaction_data$condition 

         None   Moderate
Moderate 0.0883 -       
High     0.0002 1.0000  

P value adjustment method: bonferroni 

129.3 R code

To compute the Repeated Measures ANOVA on your local machine, the following script can be used in the R console:

# Example with the built-in sleep dataset
# (extra = increase in hours of sleep, group = drug, ID = subject)
data(sleep)

# Repeated measures ANOVA using aov() with Error term
rm_model <- aov(extra ~ group + Error(ID/group), data = sleep)
summary(rm_model)

# Post-hoc pairwise comparisons (Bonferroni-corrected)
pairwise.t.test(sleep$extra, sleep$group,
                paired = TRUE, p.adjust.method = "bonferroni")

Error: ID
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  9  58.08   6.453               

Error: ID:group
          Df Sum Sq Mean Sq F value  Pr(>F)   
group      1 12.482  12.482    16.5 0.00283 **
Residuals  9  6.808   0.756                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    Pairwise comparisons using paired t tests 

data:  sleep$extra and sleep$group 

  1     
2 0.0028

P value adjustment method: bonferroni 

Note that the Error(ID/group) term in the aov() formula partitions subject-level variability and within-subject (condition) variability. In repeated-measures settings, conditions are measured within each subject (crossed with subjects), and this structure removes between-subject variability from the within-subject test.

129.4 Assumptions

The Repeated Measures ANOVA makes the following assumptions:

  • Normality: The distribution of the response variable should be approximately normal within each condition, or equivalently, the differences between conditions should be approximately normally distributed. This can be checked using a QQ plot (Chapter 76) of the residuals.
  • Sphericity: The variances of the differences between all pairs of conditions should be equal. This is tested using Mauchly’s test (Mauchly 1940). When violated, use the Greenhouse-Geisser (Greenhouse and Geisser 1959) or Huynh-Feldt (Huynh and Feldt 1976) correction.
  • No significant outliers: Extreme values can distort the results. Outliers can be identified using a box plot (Chapter 69) of each condition.

129.5 Alternatives

  • Friedman test (Chapter 130): A non-parametric alternative that does not require normality or sphericity assumptions. Based on ranks rather than raw values.
  • Linear mixed-effects models: A more flexible approach that can handle missing data, unbalanced designs, and complex correlation structures. Available through the lme4 package in R.
Greenhouse, Samuel W., and Seymour Geisser. 1959. “On Methods in the Analysis of Profile Data.” Psychometrika 24 (2): 95–112. https://doi.org/10.1007/BF02289823.
Huynh, Huynh, and Leonard S. Feldt. 1976. “Estimation of the Box Correction for Degrees of Freedom from Sample Data in Randomized Block and Split-Plot Designs.” Journal of Educational Statistics 1 (1): 69–82. https://doi.org/10.3102/10769986001001069.
Mauchly, John W. 1940. “Significance Test for Sphericity of a Normal \(n\)-Variate Distribution.” The Annals of Mathematical Statistics 11 (2): 204–9. https://doi.org/10.1214/aoms/1177731915.
128  Two Way Analysis of Variance (2-way ANOVA)
130  Friedman Test

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences