• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Hypothesis Testing
  2. 124  Chi-Squared Tests for Count Data
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • 124.1 Pearson Chi-Squared Test
    • 124.1.1 Hypotheses
    • 124.1.2 Analysis based on p-values -- Software
    • 124.1.3 Analysis based on p-values -- Data & Parameters
    • 124.1.4 Analysis based on p-values -- Output
  • 124.2 Assumptions
  • 124.3 Alternatives
  • 124.4 Monte Carlo (Simulation-Based) Pearson Chi-Squared Test
    • 124.4.1 Hypotheses
    • 124.4.2 Analysis based on p-values
  • 124.5 Assumptions
  • 124.6 Alternatives
  • 124.7 Fisher’s Exact Test
    • 124.7.1 When to Use
    • 124.7.2 Hypotheses
    • 124.7.3 How It Works
    • 124.7.4 Analysis based on p-values – Software
    • 124.7.5 R Code
    • 124.7.6 Assumptions
    • 124.7.7 When to Prefer Over Pearson Chi-Squared
  • 124.8 Association vs. Marginal Homogeneity vs. Symmetry
    • 124.8.1 Choosing the test
  • 124.9 Worked Examples
    • 124.9.1 Example A (2×2, unpaired association): Smoking vs low birth weight
    • 124.9.2 Example B (2×2, paired outcomes): same students in Maths vs Stats
    • 124.9.3 Example C (\(k\times k\), paired): marginal homogeneity via Stuart-Maxwell
    • 124.9.4 Example D (\(k\times k\), paired): symmetry via Bowker (McNemar-Bowker)
  1. Hypothesis Testing
  2. 124  Chi-Squared Tests for Count Data

124  Chi-Squared Tests for Count Data

124.1 Pearson Chi-Squared Test

The Chi-Squared Tests for Count Data which fall under the category of Pearson Chi-Squared Tests (Pearson 1900) can be subdivided in two types: a goodness-of-fit test and an independence test. The first type is based on a one dimensional test statistic and is typically used to test whether or not the observed frequencies differ from a theoretical distribution (e.g. a normality test). The second type is computed for two dimensional contingency tables and tests whether or not the variable represented in the rows are independent from the variable shown in the columns of the contingency table.

For continuous data, an alternative goodness-of-fit test is the Kolmogorov-Smirnov test (Chapter 125), which does not require binning the data into categories and preserves all information in the sample. When parameters are estimated from the same sample, use a Lilliefors-corrected KS test (nortest::lillie.test) or the Anderson-Darling test, because standard KS critical values are invalid in that case.

The goodness of fit test statistic is defined as

\[ \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} \]

where \(O_i\) denotes the number of observations in category \(i\), \(E_i\) is the expected/theoretical frequency of type \(i\), and \(k\) is the number of categories. The test statistic of this type follows a Chi-Squared Distribution with degrees of freedom equal to \(k - p - 1\) where \(p\) is the number of parameters that is used to define the theoretical distribution (e.g. for the Normal Distribution \(p = 2\)).

The independence test statistic for a two dimension contingency table is defined as

\[ \chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]

where \(r\) is the number of rows, \(c\) is the number of columns, \(O_{ij}\) is the observed frequency in the \(i\)-th row of column \(j\), and \(E_{ij}\) is the expected frequency in the \(i\)-th row of column \(j\) of the contingency table. The test statistic of this type follows a Chi-Squared Distribution with degrees of freedom equal to \((r-1)(c-1)\).

Note that the Pearson Chi-Squared value is closely related to the Pearson Phi Coefficient or Matthews Correlation (Section 71.7) which is typically used for the Confusion Matrix of Binomial Classification problems (Chapter 58). The relationship can be formulated as follows:

\[ \phi^2 = \frac{\chi^2}{n} \]

where \(n\) is the total number of observations.

124.1.1 Hypotheses

The Null Hypothesis states that observed frequencies match expected frequencies under the model.
For a goodness-of-fit setting: H\(_0\): the specified distribution fits the data.
For an independence setting: H\(_0\): the variables are independent.
The Alternative Hypothesis states that the distribution does not fit (goodness-of-fit) or that variables are associated (independence).

124.1.2 Analysis based on p-values -- Software

The Chi-Squared Test R module can be found on the publicly available website:

  • https://compute.wessa.net/rwasp_chi_squared_tests.wasp

The R Module is also available in RFC under the “Hypotheses / Empirical Tests” menu item.

124.1.3 Analysis based on p-values -- Data & Parameters

This R module contains the following fields:

  • Data X: a multivariate dataset containing quantitative data
  • Names of X columns: a space delimited list of names (one name for each column)
  • Factor 1: a positive integer value of the column in the multivariate dataset which corresponds to the first sample
  • Factor 2: a positive integer value of the column in the multivariate dataset which corresponds to the second sample
  • Type of test to use. This parameter can be set to the following values:
    • Pearson Chi-Squared
    • Monte Carlo (simulation-based) Pearson Chi-Squared (labeled “Exact Pearson Chi-Squared by Simulation” in the module)
    • McNemar Chi-Squared
    • Stuart-Maxwell Marginal Homogeneity
    • Bowker Symmetry (McNemar-Bowker)
    • Fisher Exact Test

124.1.4 Analysis based on p-values -- Output

Consider the case where we wish to investigate the association between smoking and low birth weight of infants. Both variables are coded as binary numbers, i.e.:

  • low birth weight (\(< 2.5\) kg) corresponds to low = 1 (0 otherwise)
  • when the mother smokes cigarettes then smoke = 1 (0 otherwise)
Interactive Shiny app (click to load).
Open in new tab

The total number of observations and the expected cell frequencies are sufficiently large for the Pearson Chi-Squared Test to be used. The p-value is 3.958% which is (for most researchers) small enough to reject the Null Hypothesis. We conclude that smoking and low birth weight are associated.

For reporting, include an association effect size in addition to the p-value. For an \(r \times c\) table:

\[ V = \sqrt{\frac{\chi^2}{N\,\min(r-1, c-1)}} \]

where \(V\) is Cramer’s \(V\) (Cramér 1946) (for \(2\times2\) tables, this reduces to \(\phi\)).

To compute the Chi-Squared Test on your local machine, the following script can be used in the R console:

library(MASS)
library(vcd)
x <- birthwt
x$smoke <- factor(x$smoke, levels = c(0, 1), labels = c("no", "yes"))
x$low <- factor(x$low, levels = c(0, 1), labels = c("normal", "low"))
par3 = 'Pearson Chi-Squared' #Type of test to use
main = 'Association Plot'
simulate.p.value=FALSE
B = 2000
if (par3 %in% c('Monte Carlo (simulation-based) Pearson Chi-Squared',
                'Exact Pearson Chi-Squared by Simulation')) simulate.p.value=TRUE
cat("Actual frequencies\n")
(table1 <- table(x$smoke, x$low))
assoc(table1, shade = TRUE)

#Pearson's Chi-squared test
(cst<-chisq.test(table1, simulate.p.value=simulate.p.value, B=B))
cat("\nExpected frequencies\n")
print(cst$exp)
if (par3 == 'McNemar Chi-Squared') {
  mcnemar.test(table1)
}
if (par3=='Fisher Exact Test') {
  fisher.test(table1)
}
Actual frequencies
     
      normal low
  no      86  29
  yes     44  30

    Pearson's Chi-squared test with Yates' continuity correction

data:  table1
X-squared = 4.2359, df = 1, p-value = 0.03958


Expected frequencies
     
        normal      low
  no  79.10053 35.89947
  yes 50.89947 23.10053

124.2 Assumptions

The following assumptions are made:

  • The observations are obtained from a simple random sample.
  • The sample size must be sufficiently large.
  • For the Chi-Squared approximation to be reliable, no more than 20% of expected frequencies should be below 5 and none should be below 1; otherwise use a simulation-based Pearson test or Fisher’s Exact Test.
  • The observations are independent of each other. This also implies that the samples must be unpaired.

The main advantage of the Chi-Squared Tests is that it can be used with qualitative (i.e categorical) data. If one employs these tests with quantitative data, one must define categories first (which implies that information is lost).

124.3 Alternatives

If expected cell counts are too small, use Fisher’s Exact Test or the simulation-based Pearson test (both discussed below). For paired binary data, use McNemar’s Test. For paired square tables with more than two categories, use Stuart-Maxwell (marginal homogeneity) and/or Bowker (symmetry), depending on the hypothesis of interest.

If originally quantitative data were categorized, consider using methods on the original scale (e.g. correlation or regression) to avoid information loss.

124.4 Monte Carlo (Simulation-Based) Pearson Chi-Squared Test

124.4.1 Hypotheses

The hypotheses are identical to those of the Pearson Chi-Squared Test (Section 124.1).

124.4.2 Analysis based on p-values

Consider the survey results from many years ago. We wish to test whether there is a difference between female and male students when it comes to their favorite choice of Internet browser. In the R module shown below, select the AMS dataset in the Input tab, click on Chi-squared Test, select Browser as X variable, select gender as Y variable, and specify that you want the Monte Carlo (simulation-based) Pearson method (labeled “Exact Pearson Chi-Squared by Simulation” in the app) in the “Type of test to use” box.

Interactive Shiny app (click to load).
Open in new tab

The expected cell frequencies in the output Table are too low for the Pearson Chi-Squared Test to be used1. Therefore, it is necessary to use a Monte Carlo simulation method to obtain a reliable approximate p-value.

Using Monte Carlo simulation (\(B = 2000\) replicates), the p-value is rather small (approximately 3.298%) – therefore we reject the Null Hypothesis and conclude that the choice of browser is different for males and females. Results vary slightly across runs due to Monte Carlo simulation error. This contrast also shows how anti-conservative the Chi-squared approximation can become when expected cell counts are too low: if you select the ordinary Pearson Chi-squared method, the p-value drops to 0.1777%, implying that we could have come to an opposite conclusion if the chosen type I error would be 3% instead of 5%.

To compute the Chi-Squared Test by Simulation on your local machine, the following parameter should be changed in the previous script:

par3 = 'Monte Carlo (simulation-based) Pearson Chi-Squared'
B = 2000

124.5 Assumptions

The assumptions are the same as for the Pearson Chi-Squared Test except for the requirement that the expected cell frequencies are sufficiently large.

124.6 Alternatives

This is the same as for the Pearson Chi-Squared Test.

124.7 Fisher’s Exact Test

Fisher’s Exact Test (Fisher 1935) is used to test the association between two categorical variables in a contingency table when sample sizes are small or when expected cell counts fall below 5. Unlike the Pearson Chi-Squared test, which relies on a large-sample approximation, Fisher’s Exact Test computes the exact p-value using the hypergeometric distribution.

124.7.1 When to Use

Fisher’s Exact Test is preferred over the Pearson Chi-Squared Test when:

  • One or more expected cell frequencies are less than 5
  • You want an exact p-value rather than an asymptotic approximation

For large samples with adequate expected cell counts, the Pearson Chi-Squared Test and Fisher’s Exact Test give virtually identical results. However, Fisher’s Exact Test is always valid regardless of sample size, while the Pearson approximation may be unreliable for small samples.

124.7.2 Hypotheses

The hypotheses are the same as for the Pearson Chi-Squared independence test (Section 124.1):

\[ H_0: \text{The two variables are independent (no association)} \]

\[ H_A: \text{The two variables are not independent (there is an association)} \]

124.7.3 How It Works

For a 2×2 contingency table, given fixed row and column totals (marginals), the probability of observing any particular table configuration follows the hypergeometric distribution:

\[ P = \frac{\binom{R_1}{a} \binom{R_2}{c}}{\binom{N}{C_1}} \]

where \(a\) is the observed count in the first cell, \(c = C_1-a\) is the observed count in the second row/first column cell, \(R_1\) and \(R_2\) are the row totals, \(C_1\) is the first column total, and \(N\) is the grand total.

The p-value is computed as the sum of probabilities of all table configurations that are as extreme as, or more extreme than, the observed table.

124.7.4 Analysis based on p-values – Software

Interactive Shiny app (click to load).
Open in new tab

124.7.5 R Code

To compute Fisher’s Exact Test on your local machine:

# Create a 2x2 contingency table
# Example: Treatment outcome by group
treatment_table <- matrix(c(3, 1, 1, 5), nrow = 2,
                          dimnames = list(Treatment = c("Drug", "Placebo"),
                                         Outcome = c("Improved", "Not improved")))
cat("Contingency Table:\n")
print(treatment_table)

cat("\nExpected frequencies:\n")
print(chisq.test(treatment_table)$expected)

cat("\nFisher's Exact Test:\n")
fisher.test(treatment_table)
Contingency Table:
         Outcome
Treatment Improved Not improved
  Drug           3            1
  Placebo        1            5

Expected frequencies:
         Outcome
Treatment Improved Not improved
  Drug         1.6          2.4
  Placebo      2.4          3.6

Fisher's Exact Test:

    Fisher's Exact Test for Count Data

data:  treatment_table
p-value = 0.1905
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
   0.40161 930.24692
sample estimates:
odds ratio 
  10.29391 

Note that several expected cell frequencies are below 5, making the Pearson Chi-Squared approximation unreliable. Fisher’s Exact Test provides a valid p-value in this situation.

124.7.6 Assumptions

  • The observations are independent
  • The row and column totals (marginals) are fixed by the study design, or treated as fixed for the purpose of inference
  • The data are categorical (nominal or ordinal)

124.7.7 When to Prefer Over Pearson Chi-Squared

Table 124.1: Choosing between Pearson Chi-Squared and Fisher’s Exact Test
Situation Recommended Test
All expected counts \(\geq\) 5 Pearson Chi-Squared (faster, same result)
Any expected count \(<\) 5 Fisher’s Exact Test
Large, sparse contingency table Fisher’s Exact Test (or simulation-based Pearson)

124.8 Association vs. Marginal Homogeneity vs. Symmetry

For count data, three related but distinct null hypotheses are common:

\[ H_0^{\text{ind}}: X \perp Y \quad \text{(independence / association question)} \]

\[ H_0^{\text{mh}}: \pi_{i+} = \pi_{+i}\ \forall i \quad \text{(marginal homogeneity in paired square tables)} \]

\[ H_0^{\text{sym}}: \pi_{ij} = \pi_{ji}\ \forall i \neq j \quad \text{(symmetry in paired square tables)} \]

In plain language:

  • Independence asks whether row and column variables are associated in an unpaired table.
  • Marginal homogeneity asks whether pre/post (or rater A/rater B) marginal distributions are equal in a paired square table.
  • Symmetry asks whether opposite transition cells are equal: \((i \rightarrow j)\) versus \((j \rightarrow i)\).

124.8.1 Choosing the test

Data structure Main null hypothesis Recommended test
2×2 unpaired Independence Pearson Chi-Squared / Fisher Exact
2×2 paired Marginal homogeneity (equiv. symmetry for 2×2) McNemar
\(k\times k\) paired (\(k>2\)) Marginal homogeneity Stuart-Maxwell (optionally Bhapkar)
\(k\times k\) paired (\(k>2\)) Symmetry Bowker (McNemar-Bowker)

Key nuance:

  • For 2×2 paired tables, McNemar, marginal homogeneity, and symmetry coincide.
  • For \(k>2\) paired tables, Bowker (symmetry) and Stuart-Maxwell/Bhapkar (marginal homogeneity) are different hypotheses and can lead to different conclusions.
WarningCommon Pitfalls
  • Using Pearson independence on paired data and calling it a paired alternative to McNemar.
  • Treating Bowker symmetry and Stuart-Maxwell marginal homogeneity as interchangeable for \(k>2\).
  • Ignoring small expected counts in unpaired 2×2 tables (use Fisher or simulation when needed).
  • Interpreting Yates-corrected and uncorrected Pearson outputs as “contradictory” instead of approximation choices.
  • Reporting only p-values without an effect-size scale (e.g., odds ratio, risk difference, Cramer’s \(V\)).
  • Interpreting non-rejection of \(H_0\) as proof that assumptions hold exactly.

124.9 Worked Examples

124.9.1 Example A (2×2, unpaired association): Smoking vs low birth weight

Interactive Shiny app (click to load).
Open in new tab
# Explicit 2x2 table (rows = smoking, cols = low birth weight)
tabA <- matrix(c(30, 44,
                 29, 86),
               nrow = 2, byrow = TRUE,
               dimnames = list(Smoking = c("Yes", "No"),
                               LowBirthWeight = c("Low", "Normal")))
tabA

# Pearson (with and without Yates correction)
chisq.test(tabA, correct = TRUE)
chisq.test(tabA, correct = FALSE)

# Simulation-based Pearson p-value
set.seed(1)
chisq.test(tabA, simulate.p.value = TRUE, B = 1e5)

# Fisher exact test
fisher.test(tabA)

# Effect size: odds ratio (with CI from Fisher) and risk difference
or_fit <- fisher.test(tabA, conf.level = 0.95)
risk_smoke <- tabA["Yes", "Low"] / sum(tabA["Yes", ])
risk_no    <- tabA["No", "Low"] / sum(tabA["No", ])
rd <- risk_smoke - risk_no

cat("\nOdds ratio estimate (smoking vs non-smoking, low birth weight):",
    unname(or_fit$estimate), "\n")
cat("95% CI for odds ratio:",
    paste(round(or_fit$conf.int, 4), collapse = " to "), "\n")
cat("Risk difference (P(low|smoke) - P(low|no smoke)):", round(rd, 4), "\n")
       LowBirthWeight
Smoking Low Normal
    Yes  30     44
    No   29     86

    Pearson's Chi-squared test with Yates' continuity correction

data:  tabA
X-squared = 4.2359, df = 1, p-value = 0.03958


    Pearson's Chi-squared test

data:  tabA
X-squared = 4.9237, df = 1, p-value = 0.02649


    Pearson's Chi-squared test with simulated p-value (based on 1e+05
    replicates)

data:  tabA
X-squared = 4.9237, df = NA, p-value = 0.03717


    Fisher's Exact Test for Count Data

data:  tabA
p-value = 0.03618
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.028780 3.964904
sample estimates:
odds ratio 
  2.014137 


Odds ratio estimate (smoking vs non-smoking, low birth weight): 2.014137 
95% CI for odds ratio: 1.0288 to 3.9649 
Risk difference (P(low|smoke) - P(low|no smoke)): 0.1532 

124.9.2 Example B (2×2, paired outcomes): same students in Maths vs Stats

Interactive Shiny app (click to load).
Open in new tab
tabB <- matrix(c(52, 8,
                 18, 42),
               nrow = 2, byrow = TRUE,
               dimnames = list(Maths = c("Pass", "Fail"),
                               Stats = c("Pass", "Fail")))
tabB

# Correct paired test (McNemar)
mcnemar.test(tabB, correct = TRUE)

# Exact McNemar via Binomial on discordant pairs
b <- tabB["Pass", "Fail"]
c <- tabB["Fail", "Pass"]
cat("\nDiscordant pairs: b =", b, ", c =", c, "\n")
binom.test(x = b, n = b + c, p = 0.5, alternative = "two.sided")

# Independence test on the same table (different question)
chisq.test(tabB, correct = FALSE)
cat("\nNote: Pearson independence here tests association, not paired change asymmetry.\n")
      Stats
Maths  Pass Fail
  Pass   52    8
  Fail   18   42

    McNemar's Chi-squared test with continuity correction

data:  tabB
McNemar's chi-squared = 3.1154, df = 1, p-value = 0.07756


Discordant pairs: b = 8 , c = 18 

    Exact binomial test

data:  b and b + c
number of successes = 8, number of trials = 26, p-value = 0.07552
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.1432600 0.5178964
sample estimates:
probability of success 
             0.3076923 


    Pearson's Chi-squared test

data:  tabB
X-squared = 39.634, df = 1, p-value = 3.063e-10


Note: Pearson independence here tests association, not paired change asymmetry.

124.9.3 Example C (\(k\times k\), paired): marginal homogeneity via Stuart-Maxwell

Interactive Shiny app (click to load).
Open in new tab
tabC <- matrix(c(150, 54, 21,
                 27, 138, 42,
                 12, 48, 108),
               nrow = 3, byrow = TRUE,
               dimnames = list(Pre = c("Low", "Medium", "High"),
                               Post = c("Low", "Medium", "High")))
tabC

stuart_maxwell_manual <- function(tab) {
  # Stuart-Maxwell statistic:
  # X^2 = d' V^{-1} d
  # d_i = row_i - col_i (for i = 1,...,k-1)
  # V_ii = row_i + col_i - 2*n_ii
  # V_ij = -(n_ij + n_ji), i != j
  tab <- as.matrix(tab)
  stopifnot(nrow(tab) == ncol(tab), nrow(tab) >= 2)
  k <- nrow(tab)
  r <- rowSums(tab)
  c <- colSums(tab)
  d <- r - c
  idx <- seq_len(k - 1)
  V <- matrix(0, nrow = k - 1, ncol = k - 1)
  for(i in seq_len(k - 1)) {
    ii <- idx[i]
    V[i, i] <- r[ii] + c[ii] - 2 * tab[ii, ii]
    for(j in seq_len(k - 1)) {
      if(i == j) next
      jj <- idx[j]
      V[i, j] <- - (tab[ii, jj] + tab[jj, ii])
    }
  }
  d_red <- d[idx]
  stat <- as.numeric(t(d_red) %*% solve(V) %*% d_red)
  df <- k - 1
  p <- pchisq(stat, df = df, lower.tail = FALSE)
  list(statistic = stat, df = df, p.value = p, margins = cbind(row = r, col = c, diff = d))
}

sm <- stuart_maxwell_manual(tabC)
sm$margins
cat(sprintf("\nStuart-Maxwell: X-squared = %.4f, df = %d, p-value = %.6g\n",
            sm$statistic, sm$df, sm$p.value))

if(requireNamespace("DescTools", quietly = TRUE)) {
  cat("\nValidation (DescTools::StuartMaxwellTest):\n")
  print(DescTools::StuartMaxwellTest(tabC))
}
        Post
Pre      Low Medium High
  Low    150     54   21
  Medium  27    138   42
  High    12     48  108
       row col diff
Low    225 189   36
Medium 207 240  -33
High   168 171   -3

Stuart-Maxwell: X-squared = 11.8539, df = 2, p-value = 0.00266665

124.9.4 Example D (\(k\times k\), paired): symmetry via Bowker (McNemar-Bowker)

Interactive Shiny app (click to load).
Open in new tab
tabD <- matrix(c(80, 30, 5,
                 10, 70, 22,
                 25, 2, 75),
               nrow = 3, byrow = TRUE,
               dimnames = list(Pre = c("Low", "Medium", "High"),
                               Post = c("Low", "Medium", "High")))
tabD

bowker_manual <- function(tab) {
  # Bowker symmetry statistic:
  # X^2 = sum_{i<j} (n_ij - n_ji)^2 / (n_ij + n_ji)
  tab <- as.matrix(tab)
  stopifnot(nrow(tab) == ncol(tab), nrow(tab) >= 2)
  stat <- 0
  df <- 0
  for(i in seq_len(nrow(tab) - 1)) {
    for(j in (i + 1):ncol(tab)) {
      den <- tab[i, j] + tab[j, i]
      if(den <= 0) next
      stat <- stat + (tab[i, j] - tab[j, i])^2 / den
      df <- df + 1
    }
  }
  p <- pchisq(stat, df = df, lower.tail = FALSE)
  list(statistic = stat, df = df, p.value = p)
}

smD <- stuart_maxwell_manual(tabD)
bwD <- bowker_manual(tabD)

cat(sprintf("Stuart-Maxwell (marginal homogeneity): X-squared = %.4f, df = %d, p-value = %.6g\n",
            smD$statistic, smD$df, smD$p.value))
cat(sprintf("Bowker (symmetry): X-squared = %.4f, df = %d, p-value = %.6g\n",
            bwD$statistic, bwD$df, bwD$p.value))
cat("\nInterpretation: margins can be equal while symmetry fails.\n")

if(requireNamespace("DescTools", quietly = TRUE) &&
   "BowkerTest" %in% getNamespaceExports("DescTools")) {
  cat("\nValidation (DescTools::BowkerTest):\n")
  print(DescTools::BowkerTest(tabD))
}
        Post
Pre      Low Medium High
  Low     80     30    5
  Medium  10     70   22
  High    25      2   75
Stuart-Maxwell (marginal homogeneity): X-squared = 0.0000, df = 2, p-value = 1
Bowker (symmetry): X-squared = 40.0000, df = 3, p-value = 1.06551e-08

Interpretation: margins can be equal while symmetry fails.

In this example, row and column margins are equal (Stuart-Maxwell does not reject), but opposite off-diagonal transitions are not equal (Bowker rejects). This demonstrates explicitly that for \(k>2\), marginal homogeneity and symmetry are different hypotheses.

Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton Mathematical Series 9. Princeton: Princeton University Press.
Fisher, Ronald A. 1935. The Design of Experiments. Edinburgh: Oliver; Boyd.
Pearson, Karl. 1900. “On the Criterion That a Given System of Deviations from the Probable in the Case of a Correlated System of Variables Is Such That It Can Be Reasonably Supposed to Have Arisen from Random Sampling.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50 (302): 157–75. https://doi.org/10.1080/14786440009463897.

  1. Note that there are three columns for gender because one respondent left the gender field blank.↩︎

123  Median Test based on Notched Boxplots
125  Kolmogorov-Smirnov Test

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences