14 Geometric Distribution

14.1 Definition

Let \(X\) be the number of failures before the first success in independent Bernoulli trials with success probability \(p\). Then \(X\) follows a geometric distribution:

\[ X \sim \text{Geom}(p), \quad p \in (0,1), \quad X \in \{0,1,2,\dots\} \]

with probability mass function

\[ \text{P}(X = k) = (1-p)^k p, \quad k = 0,1,2,\dots \]

and cumulative distribution function

\[ \text{P}(X \le k) = 1 - (1-p)^{k+1}, \quad k = 0,1,2,\dots \]

Note on parameterization: this chapter uses the same definition as R’s dgeom and pgeom functions (failures before first success).

14.2 Mean

\[ \text{E}(X) = \frac{1-p}{p} \]

14.3 Variance

\[ \text{V}(X) = \frac{1-p}{p^2} \]

14.4 Moment Generating Function

\[ M_X(t) = \frac{p}{1-(1-p)e^t}, \quad t < -\ln(1-p) \]

14.5 Mode

\[ \text{Mo}(X) = 0 \]

14.6 Median

An exact integer median (smallest \(m\) with \(\text{P}(X \le m)\ge 0.5\)) is:

\[ \text{Med}(X)=\left\lceil \frac{\ln(0.5)}{\ln(1-p)} \right\rceil - 1 \]

14.7 Coefficient of Skewness

\[ g_1 = \frac{2-p}{\sqrt{1-p}} \]

14.8 Coefficient of Kurtosis

\[ g_2 = 9 + \frac{p^2}{1-p} \]

The corresponding excess kurtosis is \(6 + \frac{p^2}{1-p}\).

14.9 Parameter Estimation

For observations \(x_1,\dots,x_n\) (failures before first success), the maximum-likelihood estimator is

\[ \hat p = \frac{1}{1+\bar x}. \]

14.10 Memoryless Property

The geometric distribution is memoryless:

\[ \text{P}(X \ge m+n \mid X \ge m)=\text{P}(X \ge n) \]

because \(\text{P}(X \ge k)=(1-p)^k\) and therefore

\[ \frac{\text{P}(X \ge m+n)}{\text{P}(X \ge m)}= \frac{(1-p)^{m+n}}{(1-p)^m}=(1-p)^n=\text{P}(X \ge n). \]

Among discrete distributions on \(\{0,1,2,\dots\}\), the geometric distribution is uniquely memoryless.

14.11 Purpose

The geometric distribution is useful when the key quantity is “how many failed attempts occur before the first success”:

Sales and conversion processes: number of unsuccessful contacts before the first signed deal.
Reliability and maintenance: number of non-failure periods before the first failure event.
Queueing and service systems: number of non-arrivals between arrivals in discrete-time settings.
Clinical screening workflows: number of negative tests before the first positive case.
Bridge to negative binomial modeling: the geometric distribution is the special case of the negative binomial with target successes \(r=1\) (see Chapter 15).

14.12 R Module

The Geometric Probabilities app is available in the handbook menu:

Distributions / Geometric Probabilities

It is also accessible directly at:

https://shiny.wessa.net/geometric/

14.13 Business Example: Outbound Sales Contact Strategy

A B2B team estimates that each qualified outreach attempt has success probability \(p = 0.18\) (booking a product demo). Let \(X\) be the number of unsuccessful attempts before the first booked demo.

We can evaluate the chance of obtaining a demo within the first four attempts:

\[ \text{P}(X \le 3) \]

For a direct PMF calculation, the probability of exactly 3 failures before first success is:

\[ \text{P}(X=3)=(1-p)^3p=(1-0.18)^3(0.18)\approx 0.0993 \]

p <- 0.18

cat("P(X = 3) =", dgeom(3, prob = p), "\n")
cat("P(X <= 3) =", pgeom(3, prob = p), "\n")
cat("P(X >= 8) =", 1 - pgeom(7, prob = p), "\n")

P(X = 3) = 0.09924624 
P(X <= 3) = 0.5478782 
P(X >= 8) = 0.2044141

The second quantity, \(\text{P}(X \ge 8)\), is the probability of at least eight failures before the first success (a useful tail-risk indicator for workload planning).

You can reproduce this setup with the preconfigured app below:

Interactive Shiny app (click to load).

Open in new tab

14.14 Additional Academic Example: Screening Program

In a diagnostic screening program, each test independently identifies a positive case with probability \(p=0.12\). Let \(X\) be the number of negative tests before the first positive result.

Two useful quantities are:

\[ \text{P}(X \le 4) \quad \text{and} \quad \text{P}(X \ge 10). \]

p_screen <- 0.12

cat("P(X <= 4) =", pgeom(4, prob = p_screen), "\n")
cat("P(X >= 10) =", 1 - pgeom(9, prob = p_screen), "\n")

P(X <= 4) = 0.4722681 
P(X >= 10) = 0.278501

Interpretation: the first value measures the chance of finding a positive case quickly (within five tests total), while the second quantifies the tail risk of long negative streaks before detection.