57 Contingency Table

57.1 Definition

The Contingency Table is a two-dimensional Frequency Table used when two qualitative variables are studied jointly. It is a core descriptive tool before inferential methods such as Chi-Squared Tests for Count Data (to be discussed in Hypothesis Testing) and before model diagnostics such as the Confusion Matrix for Classification Models (Chapter 59).

In notation, a two-way table is often written as \(n_{ij}\) where \(i\) indexes row categories and \(j\) indexes column categories. The row totals \(n_{i\cdot}\) and column totals \(n_{\cdot j}\) are called marginal totals and the grand total is \(n\).

A practical reading sequence is:

Start with absolute counts \(n_{ij}\).
Check row percentages \(n_{ij}/n_{i\cdot}\) to compare category composition within each row.
Check column percentages \(n_{ij}/n_{\cdot j}\) to compare category composition within each column.

If row or column percentages differ strongly across groups, this is descriptive evidence of association between the variables.

57.2 Example

The analysis shown below contains the Contingency Table for “Drive Train” by “Origin” for the Cars93 dataset. The rows represent three types of drive trains while the columns correspond to the origin of the car. Each cell of the Contingency Table contains the absolute frequency that corresponds to the category of its row and column: for instance, there are 9 rear wheel drive cars from the US.

Interactive Shiny app (click to load).

Open in new tab

From a descriptive point of view, the table already tells you where differences are concentrated (for example, a larger share of front-wheel drive in some origins). To test whether these differences are statistically meaningful rather than sampling noise, continue with the Chi-Squared Tests for Count Data in Hypothesis Testing.

57.3 R Companion (Optional)

You can build the same outputs in R as follows:

Code

library(MASS)
tab <- table(Cars93$DriveTrain, Cars93$Origin)

tab                      # absolute counts n_ij

       
        USA non-USA
  4WD     5       5
  Front  34      33
  Rear    9       7

Code

addmargins(tab)          # adds row/column/grand totals

       
        USA non-USA Sum
  4WD     5       5  10
  Front  34      33  67
  Rear    9       7  16
  Sum    48      45  93

Code

round(prop.table(tab, 1), 3)  # row percentages

       
          USA non-USA
  4WD   0.500   0.500
  Front 0.507   0.493
  Rear  0.562   0.438

Code

round(prop.table(tab, 2), 3)  # column percentages

       
          USA non-USA
  4WD   0.104   0.111
  Front 0.708   0.733
  Rear  0.188   0.156

57.4 Strengths and Limitations

Strength: very transparent summary of joint categorical structure.
Strength: direct bridge to inference via the chi-squared framework.
Limitation: counts alone do not quantify statistical uncertainty.
Limitation: sparse cells can make interpretation unstable and can violate chi-squared test assumptions.