Observe your data
↓
Is it discrete (counts) or continuous (measurements)?
↓ ↓
Discrete: Bernoulli, Binomial, Continuous: Uniform, Normal
Geometric, Neg. Binomial, (t, χ², F → hypothesis testing)
Hypergeometric, Poisson,
Multinomial
One thread: identify the process that generated the data, then choose the distribution.
Pre-reading check
Hands up: true or false?
“Counting events in a fixed time window at a constant rate → use the Binomial distribution” false
“The Poisson distribution has the property that its mean equals its variance” true
“The Normal distribution extends to ±∞ and has no finite maximum” true
What is a probability distribution?
A probability distribution is a complete description of all possible outcomes and their probabilities.
Discrete distributions assign probability to countable outcomes: P(X = k) for k = 0, 1, 2, …
Continuous distributions assign probability to intervals: P(a ≤ X ≤ b) = area under f(x)
Two things to always specify: 1. Support — what values can X take? 2. Parameters — which numbers control the shape?
Every distribution in today’s session answers: “what is the data-generating process?”
How to choose a distribution
When you observe…
Distribution
Parameters
One trial, binary outcome
Bernoulli
p
n fixed trials, count successes
Binomial
n, p
Count failures before 1st success
Geometric
p
Count failures before r-th success
Negative Binomial
r, p
Sample n items from N without replacement
Hypergeometric
N, M, n
Count events in fixed time/space at rate λ
Poisson
λ
Count items in K ≥ 3 categories
Multinomial
n, p₁…p_K
Equal probability over interval [a, b]
Uniform
a, b
Measurement or sum of many independent factors
Normal
μ, σ²
Bernoulli — the building block
The Bernoulli distribution is the simplest possible: one trial, two outcomes.
\[P(X = 1) = p \quad\quad P(X = 0) = 1 - p\]
X = 1 (“success”), X = 0 (“failure”)
Single parameter: p = P(success)
Mean = p; Variance = p(1−p)
Every Binomial trial is a Bernoulli trial. Binomial(n, p) = sum of n independent Bernoulli(p) variables.
Story: Binomial — hospital births
Setting: A large maternity hospital records an average of 45 births per day. Historically, about 50% of births are recorded as male.
Question: On a given day with 45 births, what is the probability of at most 27 males?
Why Binomial? - Fixed number of trials: n = 45 - Each birth independently classified as male or not: binary outcome - Constant probability: p = 0.5
\[X \sim \text{Binomial}(n = 45,\ p = 0.5)\]
Handbook result: P(X ≤ 27) ≈ 0.9324
Binomial — annotated PMF
\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]
Symbol
Meaning
n
number of trials
k
number of successes (the outcome we’re asking about)
p
P(success) per trial
(1−p)
P(failure) per trial
\(\binom{n}{k}\)
number of ways to arrange k successes in n trials
Mean = np Variance = np(1−p)
Variance is maximal when p = 0.5 and approaches 0 as p → 0 or p → 1.
Binomial app
Task
Set n=45, p=0.5. Compute P(X≤27). Does the result match the handbook?
Increase n to 200, keep p=0.5. How does the shape change?
Set p=0.05, n=100. What does the distribution look like? (Compare to Poisson with λ=5.)
Predict: what happens to variance when p=0.5 vs p=0.1? Verify.
where \(k_1 + k_2 + \cdots + k_K = n\) and \(p_1 + p_2 + \cdots + p_K = 1\).
Example: A SaaS support team classifies tickets into K = 4 categories: billing (30%), technical (45%), account (15%), other (10%). In a batch of n = 40 tickets, what is P(billing ≥ 15, technical ≥ 20)?
Each margin is Binomial:\(X_j \sim \text{Binomial}(n, p_j)\).
Binomial is the special case K = 2.
Story: Poisson — SOC alerts (λ = 6.2)
Setting: A security operations center (SOC) receives an average of λ = 6.2 high-priority alerts per hour. Operations policy requires immediate escalation when 10 or more alerts arrive in one hour.
Question: What is the probability of triggering escalation in any given hour?
Why Poisson? - No fixed number of trials (alerts can arrive at any moment) - Events occur independently at a constant average rate λ - Two alerts cannot arrive at exactly the same instant
Discrete distributions assign probability to individual points: P(X = k).
Continuous distributions assign probability to intervals: P(a ≤ X ≤ b) = area under f(x).
The shift happens when: - Outcomes are measurements, not counts - The possible values form a continuum (every real number in a range)
Key rule: For continuous distributions, P(X = x) = 0 for any single value x.
Probability lives in area, not in height.
Uniform — equal probability
When to use: every value in the interval [a, b] is equally likely.
\[f(x) = \frac{1}{b-a} \quad\quad a \leq x \leq b\]
Symbol
Meaning
a
lower bound
b
upper bound
1/(b−a)
constant density (height of the rectangle)
Mean = (a+b)/2 Variance = (b−a)²/12
Applications: - Random number generation (computers generate U(0,1) first, then transform) - Models for “I have no information about which value in [a, b] is more likely” - Rounding errors: if a value is rounded to the nearest integer, the rounding error ~ U(−0.5, 0.5)
P(c ≤ X ≤ d) = (d − c)/(b − a) — just the fraction of the interval.
Story: Normal — sum of many small effects
Why does the Normal distribution appear everywhere?
Human height is the result of hundreds of genetic and environmental factors, each contributing a tiny amount. When many independent small effects add up, the sum follows a Normal distribution — regardless of the original distributions of the individual effects.
This is the Central Limit Theorem: the sample mean \(\bar{X}\) of n independent draws converges to Normal as n → ∞, no matter the original distribution.
Common Normal applications: - Measurement errors (many tiny sources of error summing up) - Test scores (sum of many item responses) - Financial returns over a day (many trades contributing) - Biological measurements: height, weight, blood pressure
You do not choose these distributions — the test procedure selects them automatically based on what assumptions are met.
They all have heavier tails than the Normal because of added uncertainty from estimating σ. As ν → ∞, t → N(0,1) and χ²/ν → 1.
Key ideas — distribution selection table
Topic
What to retain
Bernoulli
one trial, binary; building block for Binomial
Binomial
n fixed trials, count successes; mean = np; variance = np(1−p)
Geometric / Neg. Binomial
count failures before r-th success; memoryless (Geometric)
Hypergeometric
Binomial without replacement; use when n/N ≥ 0.1
Poisson
events at rate λ; mean = variance = λ (diagnostic)
Multinomial
K ≥ 3 categories; each margin is Binomial
Uniform
equal probability on [a, b]; base for simulation
Normal
sum of many factors; 68-95-99.7 rule; parameters μ, σ²
t, χ², F
derived from Normal; chosen by the test, not by you
Exit problem (pairs, 5 min)
A factory produces chips. Defects occur at an average rate of 2.5 per hour. An inspector samples 20 chips from today’s batch of 400, which contains 30 defectives.
Which distribution describes the hourly defect count? Name two parameters.
Which distribution describes the inspector’s sample? Why is it NOT the same as (1)?
A colleague says “use Binomial for (2).” When would this approximation be acceptable?