where \(K\) = number of categories, \(p_k\) = probability of category \(k\), \(p_k \ge 0\), \(\sum_{k=1}^K p_k = 1\), \(n\) = number of independent draws, and \(X_k\) = number of outcomes in category \(k\).
In other words, the Multinomial Distribution is a generalisation of the Bernoulli and Binomial Distribution:
when \(K = 2\) and \(n = 1\) then it is equivalent to the Bernoulli Distribution
when \(K = 2\) and \(n > 1\) it describes the Binomial Distribution
The covariance is negative because counts are constrained to sum to \(n\): if one category gets more counts, at least one other category must get fewer.
Within this handbook, the Multinomial Distribution has multiple practical uses:
Multi-class event modeling: whenever one trial can fall into one of several categories (e.g. support ticket outcomes, customer response classes, defect types).
Bridge from Binomial to multi-category data: the Binomial model (Chapter 13) is the special case \(K=2\); multinomial extends this to \(K>2\).
Foundational model for count-based classification: the Multinomial Naive Bayes Classifier directly uses this distribution for token/count features (Chapter 9).
Expected-vs-observed category diagnostics: expected counts \(n p_k\) from the multinomial model connect naturally to the Pearson chi-squared framework (Section 124.1, Chapter 124).
Contingency-table interpretation: multinomial logic underlies how row/column category counts are interpreted in contingency tables (Chapter 57) and in classification summaries such as confusion matrices (Chapter 59).
17.7 R Module
The Multinomial Probabilities app is available in the handbook menu:
The observed follow-up and escalation counts are above their expected values, while first-contact resolution is below expectation.
This may indicate a temporary complexity spike (harder tickets), staffing mismatch, or process bottlenecks.
The app’s chi-squared statistic is useful as a quick discrepancy indicator; for formal inferential testing and p-values, continue with the Pearson chi-squared test framework in Section 124.1. For goodness-of-fit with \(K\) categories and no estimated parameters, the reference degrees of freedom are \(K-1\).
Exact multinomial probability:
[1] 0.005733821
Expected counts under Hardy-Weinberg proportions:
AA Aa aa
58.8 50.4 10.8
Pearson chi-squared statistic (descriptive):
[1] 1.042139
17.10 Related Distributions 1: Dirichlet Distribution
The Dirichlet distribution is the conjugate prior for the Multinomial likelihood. If \(\boldsymbol{\theta} \sim \text{Dir}(\boldsymbol{\alpha})\) and \(\mathbf{n} \sim \text{Multinomial}(N, \boldsymbol{\theta})\), then the posterior is \(\boldsymbol{\theta} \mid \mathbf{n} \sim \text{Dir}(\alpha_1 + n_1, \ldots, \alpha_K + n_K)\) (see Chapter 44).