Suppose a finite population has size \(N\), with \(M\) “success” items and \(N-M\) “failure” items. If we draw \(n\) items without replacement, and let \(X\) be the number of successes in the sample, then:
The factor \(\frac{N-n}{N-1}\) is the finite-population correction (FPC). It is strictly less than 1 when \(n>1\), which reduces variance relative to a binomial model with the same nominal success probability \(M/N\). Intuitively, sampling without replacement induces negative dependence among draws.
The hypergeometric distribution is the default model for sampling without replacement:
Audit and compliance sampling: expected number of problematic records in a fixed audit sample.
Quality control: defect counts when inspecting items from a finite lot.
Inventory and logistics checks: category counts from finite stock pulls.
Lot acceptance sampling: classical quality-control acceptance/rejection decisions for finite lots.
Relation to binomial: when the sample fraction \(n/N\) is small (rule of thumb: \(n/N < 0.05\)), hypergeometric probabilities are often close to binomial probabilities (Chapter 13). Otherwise, hypergeometric is the exact finite-population model and should be preferred.
16.10 R Module
The Hypergeometric Probabilities app is available in the handbook menu:
An organization has \(N = 1200\) procurement records for a quarter. Based on risk profiling, \(M = 90\) are flagged as high-risk. An internal audit samples \(n = 80\) records without replacement.
Let \(X\) be the number of high-risk records in the sample. A key escalation metric is:
\[
\text{P}(X \ge 10)
\]
N <-1200M <-90n <-80cat("P(X >= 10) =", 1-phyper(9, m = M, n = N - M, k = n), "\n")cat("P(5 <= X <= 12) =", phyper(12, m = M, n = N - M, k = n) -phyper(4, m = M, n = N - M, k = n), "\n")
16.12 Additional Academic Example: Ecology Field Sampling
In a conservation study, a habitat has \(N=500\) tagged plants, of which \(M=80\) belong to a rare species.
A team samples \(n=40\) plants without replacement and records the number \(X\) of rare plants.
A monitoring question is:
\[
\text{P}(X \ge 10).
\]
N_field <-500M_rare <-80n_sample <-40cat("P(X >= 10) =",1-phyper(9, m = M_rare, n = N_field - M_rare, k = n_sample), "\n")