The Contingency Table is a two-dimensional Frequency Table used when two qualitative variables are studied jointly. It is a core descriptive tool before inferential methods such as Chi-Squared Tests for Count Data (to be discussed in Hypothesis Testing) and before model diagnostics such as the Confusion Matrix for Classification Models (Chapter 59).
In notation, a two-way table is often written as \(n_{ij}\) where \(i\) indexes row categories and \(j\) indexes column categories. The row totals \(n_{i\cdot}\) and column totals \(n_{\cdot j}\) are called marginal totals and the grand total is \(n\).
A practical reading sequence is:
Start with absolute counts \(n_{ij}\).
Check row percentages \(n_{ij}/n_{i\cdot}\) to compare category composition within each row.
Check column percentages \(n_{ij}/n_{\cdot j}\) to compare category composition within each column.
If row or column percentages differ strongly across groups, this is descriptive evidence of association between the variables.
57.2 Example
The analysis shown below contains the Contingency Table for “Drive Train” by “Origin” for the Cars93 dataset. The rows represent three types of drive trains while the columns correspond to the origin of the car. Each cell of the Contingency Table contains the absolute frequency that corresponds to the category of its row and column: for instance, there are 9 rear wheel drive cars from the US.
From a descriptive point of view, the table already tells you where differences are concentrated (for example, a larger share of front-wheel drive in some origins). To test whether these differences are statistically meaningful rather than sampling noise, continue with the Chi-Squared Tests for Count Data in Hypothesis Testing.