124 Chi-Squared Tests for Count Data

124.1 Pearson Chi-Squared Test

The Chi-Squared Tests for Count Data which fall under the category of Pearson Chi-Squared Tests (Pearson 1900) can be subdivided in two types: a goodness-of-fit test and an independence test. The first type is based on a one dimensional test statistic and is typically used to test whether or not the observed frequencies differ from a theoretical distribution (e.g. a normality test). The second type is computed for two dimensional contingency tables and tests whether or not the variable represented in the rows are independent from the variable shown in the columns of the contingency table.

For continuous data, an alternative goodness-of-fit test is the Kolmogorov-Smirnov test (Chapter 125), which does not require binning the data into categories and preserves all information in the sample. When parameters are estimated from the same sample, use a Lilliefors-corrected KS test (nortest::lillie.test) or the Anderson-Darling test, because standard KS critical values are invalid in that case.

The goodness of fit test statistic is defined as

\[ \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} \]

where \(O_i\) denotes the number of observations in category \(i\), \(E_i\) is the expected/theoretical frequency of type \(i\), and \(k\) is the number of categories. The test statistic of this type follows a Chi-Squared Distribution with degrees of freedom equal to \(k - p - 1\) where \(p\) is the number of parameters that is used to define the theoretical distribution (e.g. for the Normal Distribution \(p = 2\)).

The independence test statistic for a two dimension contingency table is defined as

\[ \chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]

where \(r\) is the number of rows, \(c\) is the number of columns, \(O_{ij}\) is the observed frequency in the \(i\)-th row of column \(j\), and \(E_{ij}\) is the expected frequency in the \(i\)-th row of column \(j\) of the contingency table. The test statistic of this type follows a Chi-Squared Distribution with degrees of freedom equal to \((r-1)(c-1)\).

Note that the Pearson Chi-Squared value is closely related to the Pearson Phi Coefficient or Matthews Correlation (Section 71.7) which is typically used for the Confusion Matrix of Binomial Classification problems (Chapter 58). The relationship can be formulated as follows:

\[ \phi^2 = \frac{\chi^2}{n} \]

where \(n\) is the total number of observations.

124.1.1 Hypotheses

The Null Hypothesis states that observed frequencies match expected frequencies under the model.
For a goodness-of-fit setting: H\(_0\): the specified distribution fits the data.
For an independence setting: H\(_0\): the variables are independent.
The Alternative Hypothesis states that the distribution does not fit (goodness-of-fit) or that variables are associated (independence).

124.1.2 Analysis based on p-values -- Software

The Chi-Squared Test R module can be found on the publicly available website:

https://compute.wessa.net/rwasp_chi_squared_tests.wasp

The R Module is also available in RFC under the “Hypotheses / Empirical Tests” menu item.

124.1.3 Analysis based on p-values -- Data & Parameters

This R module contains the following fields:

Data X: a multivariate dataset containing quantitative data
Names of X columns: a space delimited list of names (one name for each column)
Factor 1: a positive integer value of the column in the multivariate dataset which corresponds to the first sample
Factor 2: a positive integer value of the column in the multivariate dataset which corresponds to the second sample
Type of test to use. This parameter can be set to the following values:
- Pearson Chi-Squared
- Monte Carlo (simulation-based) Pearson Chi-Squared (labeled “Exact Pearson Chi-Squared by Simulation” in the module)
- McNemar Chi-Squared
- Stuart-Maxwell Marginal Homogeneity
- Bowker Symmetry (McNemar-Bowker)
- Fisher Exact Test

124.1.4 Analysis based on p-values -- Output

Consider the case where we wish to investigate the association between smoking and low birth weight of infants. Both variables are coded as binary numbers, i.e.:

low birth weight (\(< 2.5\) kg) corresponds to low = 1 (0 otherwise)
when the mother smokes cigarettes then smoke = 1 (0 otherwise)

Interactive Shiny app (click to load).

Open in new tab

The total number of observations and the expected cell frequencies are sufficiently large for the Pearson Chi-Squared Test to be used. The p-value is 3.958% which is (for most researchers) small enough to reject the Null Hypothesis. We conclude that smoking and low birth weight are associated.

For reporting, include an association effect size in addition to the p-value. For an \(r \times c\) table:

\[ V = \sqrt{\frac{\chi^2}{N\,\min(r-1, c-1)}} \]

where \(V\) is Cramer’s \(V\) (Cramér 1946) (for \(2\times2\) tables, this reduces to \(\phi\)).

To compute the Chi-Squared Test on your local machine, the following script can be used in the R console:

library(MASS)
library(vcd)
x <- birthwt
x$smoke <- factor(x$smoke, levels = c(0, 1), labels = c("no", "yes"))
x$low <- factor(x$low, levels = c(0, 1), labels = c("normal", "low"))
par3 = 'Pearson Chi-Squared' #Type of test to use
main = 'Association Plot'
simulate.p.value=FALSE
B = 2000
if (par3 %in% c('Monte Carlo (simulation-based) Pearson Chi-Squared',
                'Exact Pearson Chi-Squared by Simulation')) simulate.p.value=TRUE
cat("Actual frequencies\n")
(table1 <- table(x$smoke, x$low))
assoc(table1, shade = TRUE)

#Pearson's Chi-squared test
(cst<-chisq.test(table1, simulate.p.value=simulate.p.value, B=B))
cat("\nExpected frequencies\n")
print(cst$exp)
if (par3 == 'McNemar Chi-Squared') {
  mcnemar.test(table1)
}
if (par3=='Fisher Exact Test') {
  fisher.test(table1)
}

Actual frequencies
     
      normal low
  no      86  29
  yes     44  30

    Pearson's Chi-squared test with Yates' continuity correction

data:  table1
X-squared = 4.2359, df = 1, p-value = 0.03958


Expected frequencies
     
        normal      low
  no  79.10053 35.89947
  yes 50.89947 23.10053

124.2 Assumptions

The following assumptions are made:

The observations are obtained from a simple random sample.
The sample size must be sufficiently large.
For the Chi-Squared approximation to be reliable, no more than 20% of expected frequencies should be below 5 and none should be below 1; otherwise use a simulation-based Pearson test or Fisher’s Exact Test.
The observations are independent of each other. This also implies that the samples must be unpaired.

The main advantage of the Chi-Squared Tests is that it can be used with qualitative (i.e categorical) data. If one employs these tests with quantitative data, one must define categories first (which implies that information is lost).

124.3 Alternatives

If expected cell counts are too small, use Fisher’s Exact Test or the simulation-based Pearson test (both discussed below). For paired binary data, use McNemar’s Test. For paired square tables with more than two categories, use Stuart-Maxwell (marginal homogeneity) and/or Bowker (symmetry), depending on the hypothesis of interest.

If originally quantitative data were categorized, consider using methods on the original scale (e.g. correlation or regression) to avoid information loss.

124.4 Monte Carlo (Simulation-Based) Pearson Chi-Squared Test

124.4.1 Hypotheses

The hypotheses are identical to those of the Pearson Chi-Squared Test (Section 124.1).

124.4.2 Analysis based on p-values

Consider the survey results from many years ago. We wish to test whether there is a difference between female and male students when it comes to their favorite choice of Internet browser. In the R module shown below, select the AMS dataset in the Input tab, click on Chi-squared Test, select Browser as X variable, select gender as Y variable, and specify that you want the Monte Carlo (simulation-based) Pearson method (labeled “Exact Pearson Chi-Squared by Simulation” in the app) in the “Type of test to use” box.

Interactive Shiny app (click to load).

Open in new tab

The expected cell frequencies in the output Table are too low for the Pearson Chi-Squared Test to be used¹. Therefore, it is necessary to use a Monte Carlo simulation method to obtain a reliable approximate p-value.

Using Monte Carlo simulation (\(B = 2000\) replicates), the p-value is rather small (approximately 3.298%) – therefore we reject the Null Hypothesis and conclude that the choice of browser is different for males and females. Results vary slightly across runs due to Monte Carlo simulation error. This contrast also shows how anti-conservative the Chi-squared approximation can become when expected cell counts are too low: if you select the ordinary Pearson Chi-squared method, the p-value drops to 0.1777%, implying that we could have come to an opposite conclusion if the chosen type I error would be 3% instead of 5%.

To compute the Chi-Squared Test by Simulation on your local machine, the following parameter should be changed in the previous script:

par3 = 'Monte Carlo (simulation-based) Pearson Chi-Squared'
B = 2000

124.5 Assumptions

The assumptions are the same as for the Pearson Chi-Squared Test except for the requirement that the expected cell frequencies are sufficiently large.

124.6 Alternatives

This is the same as for the Pearson Chi-Squared Test.

124.7 Fisher’s Exact Test

Fisher’s Exact Test (Fisher 1935) is used to test the association between two categorical variables in a contingency table when sample sizes are small or when expected cell counts fall below 5. Unlike the Pearson Chi-Squared test, which relies on a large-sample approximation, Fisher’s Exact Test computes the exact p-value using the hypergeometric distribution.

124.7.1 When to Use

Fisher’s Exact Test is preferred over the Pearson Chi-Squared Test when:

One or more expected cell frequencies are less than 5
You want an exact p-value rather than an asymptotic approximation

For large samples with adequate expected cell counts, the Pearson Chi-Squared Test and Fisher’s Exact Test give virtually identical results. However, Fisher’s Exact Test is always valid regardless of sample size, while the Pearson approximation may be unreliable for small samples.

124.7.2 Hypotheses

The hypotheses are the same as for the Pearson Chi-Squared independence test (Section 124.1):

\[ H_0: \text{The two variables are independent (no association)} \]

\[ H_A: \text{The two variables are not independent (there is an association)} \]

124.7.3 How It Works

For a 2×2 contingency table, given fixed row and column totals (marginals), the probability of observing any particular table configuration follows the hypergeometric distribution:

\[ P = \frac{\binom{R_1}{a} \binom{R_2}{c}}{\binom{N}{C_1}} \]

where \(a\) is the observed count in the first cell, \(c = C_1-a\) is the observed count in the second row/first column cell, \(R_1\) and \(R_2\) are the row totals, \(C_1\) is the first column total, and \(N\) is the grand total.

The p-value is computed as the sum of probabilities of all table configurations that are as extreme as, or more extreme than, the observed table.

124.7.4 Analysis based on p-values – Software

Interactive Shiny app (click to load).

Open in new tab

124.7.5 R Code

To compute Fisher’s Exact Test on your local machine:

# Create a 2x2 contingency table
# Example: Treatment outcome by group
treatment_table <- matrix(c(3, 1, 1, 5), nrow = 2,
                          dimnames = list(Treatment = c("Drug", "Placebo"),
                                         Outcome = c("Improved", "Not improved")))
cat("Contingency Table:\n")
print(treatment_table)

cat("\nExpected frequencies:\n")
print(chisq.test(treatment_table)$expected)

cat("\nFisher's Exact Test:\n")
fisher.test(treatment_table)

Contingency Table:
         Outcome
Treatment Improved Not improved
  Drug           3            1
  Placebo        1            5

Expected frequencies:
         Outcome
Treatment Improved Not improved
  Drug         1.6          2.4
  Placebo      2.4          3.6

Fisher's Exact Test:

    Fisher's Exact Test for Count Data

data:  treatment_table
p-value = 0.1905
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
   0.40161 930.24692
sample estimates:
odds ratio 
  10.29391

Note that several expected cell frequencies are below 5, making the Pearson Chi-Squared approximation unreliable. Fisher’s Exact Test provides a valid p-value in this situation.

124.7.6 Assumptions

The observations are independent
The row and column totals (marginals) are fixed by the study design, or treated as fixed for the purpose of inference
The data are categorical (nominal or ordinal)

124.7.7 When to Prefer Over Pearson Chi-Squared

Table 124.1: Choosing between Pearson Chi-Squared and Fisher’s Exact Test

Situation	Recommended Test
All expected counts \(\geq\) 5	Pearson Chi-Squared (faster, same result)
Any expected count \(<\) 5	Fisher’s Exact Test
Large, sparse contingency table	Fisher’s Exact Test (or simulation-based Pearson)

124.8 Association vs. Marginal Homogeneity vs. Symmetry

For count data, three related but distinct null hypotheses are common:

\[ H_0^{\text{ind}}: X \perp Y \quad \text{(independence / association question)} \]

\[ H_0^{\text{mh}}: \pi_{i+} = \pi_{+i}\ \forall i \quad \text{(marginal homogeneity in paired square tables)} \]

\[ H_0^{\text{sym}}: \pi_{ij} = \pi_{ji}\ \forall i \neq j \quad \text{(symmetry in paired square tables)} \]

In plain language:

Independence asks whether row and column variables are associated in an unpaired table.
Marginal homogeneity asks whether pre/post (or rater A/rater B) marginal distributions are equal in a paired square table.
Symmetry asks whether opposite transition cells are equal: \((i \rightarrow j)\) versus \((j \rightarrow i)\).

124.8.1 Choosing the test

Data structure	Main null hypothesis	Recommended test
2×2 unpaired	Independence	Pearson Chi-Squared / Fisher Exact
2×2 paired	Marginal homogeneity (equiv. symmetry for 2×2)	McNemar
\(k\times k\) paired (\(k>2\))	Marginal homogeneity	Stuart-Maxwell (optionally Bhapkar)
\(k\times k\) paired (\(k>2\))	Symmetry	Bowker (McNemar-Bowker)

Key nuance:

For 2×2 paired tables, McNemar, marginal homogeneity, and symmetry coincide.
For \(k>2\) paired tables, Bowker (symmetry) and Stuart-Maxwell/Bhapkar (marginal homogeneity) are different hypotheses and can lead to different conclusions.

Common Pitfalls

Using Pearson independence on paired data and calling it a paired alternative to McNemar.
Treating Bowker symmetry and Stuart-Maxwell marginal homogeneity as interchangeable for \(k>2\).
Ignoring small expected counts in unpaired 2×2 tables (use Fisher or simulation when needed).
Interpreting Yates-corrected and uncorrected Pearson outputs as “contradictory” instead of approximation choices.
Reporting only p-values without an effect-size scale (e.g., odds ratio, risk difference, Cramer’s \(V\)).
Interpreting non-rejection of \(H_0\) as proof that assumptions hold exactly.

124.9 Worked Examples

124.9.1 Example A (2×2, unpaired association): Smoking vs low birth weight

Interactive Shiny app (click to load).

Open in new tab

# Explicit 2x2 table (rows = smoking, cols = low birth weight)
tabA <- matrix(c(30, 44,
                 29, 86),
               nrow = 2, byrow = TRUE,
               dimnames = list(Smoking = c("Yes", "No"),
                               LowBirthWeight = c("Low", "Normal")))
tabA

# Pearson (with and without Yates correction)
chisq.test(tabA, correct = TRUE)
chisq.test(tabA, correct = FALSE)

# Simulation-based Pearson p-value
set.seed(1)
chisq.test(tabA, simulate.p.value = TRUE, B = 1e5)

# Fisher exact test
fisher.test(tabA)

# Effect size: odds ratio (with CI from Fisher) and risk difference
or_fit <- fisher.test(tabA, conf.level = 0.95)
risk_smoke <- tabA["Yes", "Low"] / sum(tabA["Yes", ])
risk_no    <- tabA["No", "Low"] / sum(tabA["No", ])
rd <- risk_smoke - risk_no

cat("\nOdds ratio estimate (smoking vs non-smoking, low birth weight):",
    unname(or_fit$estimate), "\n")
cat("95% CI for odds ratio:",
    paste(round(or_fit$conf.int, 4), collapse = " to "), "\n")
cat("Risk difference (P(low|smoke) - P(low|no smoke)):", round(rd, 4), "\n")

       LowBirthWeight
Smoking Low Normal
    Yes  30     44
    No   29     86

    Pearson's Chi-squared test with Yates' continuity correction

data:  tabA
X-squared = 4.2359, df = 1, p-value = 0.03958


    Pearson's Chi-squared test

data:  tabA
X-squared = 4.9237, df = 1, p-value = 0.02649


    Pearson's Chi-squared test with simulated p-value (based on 1e+05
    replicates)

data:  tabA
X-squared = 4.9237, df = NA, p-value = 0.03717


    Fisher's Exact Test for Count Data

data:  tabA
p-value = 0.03618
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 1.028780 3.964904
sample estimates:
odds ratio 
  2.014137 


Odds ratio estimate (smoking vs non-smoking, low birth weight): 2.014137 
95% CI for odds ratio: 1.0288 to 3.9649 
Risk difference (P(low|smoke) - P(low|no smoke)): 0.1532

124.9.2 Example B (2×2, paired outcomes): same students in Maths vs Stats

Interactive Shiny app (click to load).

Open in new tab

tabB <- matrix(c(52, 8,
                 18, 42),
               nrow = 2, byrow = TRUE,
               dimnames = list(Maths = c("Pass", "Fail"),
                               Stats = c("Pass", "Fail")))
tabB

# Correct paired test (McNemar)
mcnemar.test(tabB, correct = TRUE)

# Exact McNemar via Binomial on discordant pairs
b <- tabB["Pass", "Fail"]
c <- tabB["Fail", "Pass"]
cat("\nDiscordant pairs: b =", b, ", c =", c, "\n")
binom.test(x = b, n = b + c, p = 0.5, alternative = "two.sided")

# Independence test on the same table (different question)
chisq.test(tabB, correct = FALSE)
cat("\nNote: Pearson independence here tests association, not paired change asymmetry.\n")

      Stats
Maths  Pass Fail
  Pass   52    8
  Fail   18   42

    McNemar's Chi-squared test with continuity correction

data:  tabB
McNemar's chi-squared = 3.1154, df = 1, p-value = 0.07756


Discordant pairs: b = 8 , c = 18 

    Exact binomial test

data:  b and b + c
number of successes = 8, number of trials = 26, p-value = 0.07552
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.1432600 0.5178964
sample estimates:
probability of success 
             0.3076923 


    Pearson's Chi-squared test

data:  tabB
X-squared = 39.634, df = 1, p-value = 3.063e-10


Note: Pearson independence here tests association, not paired change asymmetry.

124.9.3 Example C (\(k\times k\), paired): marginal homogeneity via Stuart-Maxwell

Interactive Shiny app (click to load).

Open in new tab

tabC <- matrix(c(150, 54, 21,
                 27, 138, 42,
                 12, 48, 108),
               nrow = 3, byrow = TRUE,
               dimnames = list(Pre = c("Low", "Medium", "High"),
                               Post = c("Low", "Medium", "High")))
tabC

stuart_maxwell_manual <- function(tab) {
  # Stuart-Maxwell statistic:
  # X^2 = d' V^{-1} d
  # d_i = row_i - col_i (for i = 1,...,k-1)
  # V_ii = row_i + col_i - 2*n_ii
  # V_ij = -(n_ij + n_ji), i != j
  tab <- as.matrix(tab)
  stopifnot(nrow(tab) == ncol(tab), nrow(tab) >= 2)
  k <- nrow(tab)
  r <- rowSums(tab)
  c <- colSums(tab)
  d <- r - c
  idx <- seq_len(k - 1)
  V <- matrix(0, nrow = k - 1, ncol = k - 1)
  for(i in seq_len(k - 1)) {
    ii <- idx[i]
    V[i, i] <- r[ii] + c[ii] - 2 * tab[ii, ii]
    for(j in seq_len(k - 1)) {
      if(i == j) next
      jj <- idx[j]
      V[i, j] <- - (tab[ii, jj] + tab[jj, ii])
    }
  }
  d_red <- d[idx]
  stat <- as.numeric(t(d_red) %*% solve(V) %*% d_red)
  df <- k - 1
  p <- pchisq(stat, df = df, lower.tail = FALSE)
  list(statistic = stat, df = df, p.value = p, margins = cbind(row = r, col = c, diff = d))
}

sm <- stuart_maxwell_manual(tabC)
sm$margins
cat(sprintf("\nStuart-Maxwell: X-squared = %.4f, df = %d, p-value = %.6g\n",
            sm$statistic, sm$df, sm$p.value))

if(requireNamespace("DescTools", quietly = TRUE)) {
  cat("\nValidation (DescTools::StuartMaxwellTest):\n")
  print(DescTools::StuartMaxwellTest(tabC))
}

        Post
Pre      Low Medium High
  Low    150     54   21
  Medium  27    138   42
  High    12     48  108
       row col diff
Low    225 189   36
Medium 207 240  -33
High   168 171   -3

Stuart-Maxwell: X-squared = 11.8539, df = 2, p-value = 0.00266665

124.9.4 Example D (\(k\times k\), paired): symmetry via Bowker (McNemar-Bowker)

Interactive Shiny app (click to load).

Open in new tab

tabD <- matrix(c(80, 30, 5,
                 10, 70, 22,
                 25, 2, 75),
               nrow = 3, byrow = TRUE,
               dimnames = list(Pre = c("Low", "Medium", "High"),
                               Post = c("Low", "Medium", "High")))
tabD

bowker_manual <- function(tab) {
  # Bowker symmetry statistic:
  # X^2 = sum_{i<j} (n_ij - n_ji)^2 / (n_ij + n_ji)
  tab <- as.matrix(tab)
  stopifnot(nrow(tab) == ncol(tab), nrow(tab) >= 2)
  stat <- 0
  df <- 0
  for(i in seq_len(nrow(tab) - 1)) {
    for(j in (i + 1):ncol(tab)) {
      den <- tab[i, j] + tab[j, i]
      if(den <= 0) next
      stat <- stat + (tab[i, j] - tab[j, i])^2 / den
      df <- df + 1
    }
  }
  p <- pchisq(stat, df = df, lower.tail = FALSE)
  list(statistic = stat, df = df, p.value = p)
}

smD <- stuart_maxwell_manual(tabD)
bwD <- bowker_manual(tabD)

cat(sprintf("Stuart-Maxwell (marginal homogeneity): X-squared = %.4f, df = %d, p-value = %.6g\n",
            smD$statistic, smD$df, smD$p.value))
cat(sprintf("Bowker (symmetry): X-squared = %.4f, df = %d, p-value = %.6g\n",
            bwD$statistic, bwD$df, bwD$p.value))
cat("\nInterpretation: margins can be equal while symmetry fails.\n")

if(requireNamespace("DescTools", quietly = TRUE) &&
   "BowkerTest" %in% getNamespaceExports("DescTools")) {
  cat("\nValidation (DescTools::BowkerTest):\n")
  print(DescTools::BowkerTest(tabD))
}

        Post
Pre      Low Medium High
  Low     80     30    5
  Medium  10     70   22
  High    25      2   75
Stuart-Maxwell (marginal homogeneity): X-squared = 0.0000, df = 2, p-value = 1
Bowker (symmetry): X-squared = 40.0000, df = 3, p-value = 1.06551e-08

Interpretation: margins can be equal while symmetry fails.

In this example, row and column margins are equal (Stuart-Maxwell does not reject), but opposite off-diagonal transitions are not equal (Bowker rejects). This demonstrates explicitly that for \(k>2\), marginal homogeneity and symmetry are different hypotheses.

Note that there are three columns for gender because one respondent left the gender field blank.↩︎