• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Probability Distributions
  2. 15  Negative Binomial Distribution
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • 15.1 Definition
  • 15.2 Mean
  • 15.3 Variance
  • 15.4 Mode
  • 15.5 Median
  • 15.6 Coefficient of Skewness
  • 15.7 Coefficient of Kurtosis
  • 15.8 Moment Generating Function
  • 15.9 Gamma-Poisson Mixture Link
  • 15.10 Purpose
  • 15.11 R Module
  • 15.12 Business Example: Conversion Pipeline Completion Risk
  • 15.13 Additional Academic Example: Seed Germination Screening
  1. Probability Distributions
  2. 15  Negative Binomial Distribution

15  Negative Binomial Distribution

15.1 Definition

Let \(X\) be the number of failures before the \(r\)-th success in independent Bernoulli trials with success probability \(p\). Then \(X\) follows a negative binomial distribution:

\[ X \sim \text{NegBin}(r,p), \quad r \in \mathbb{N},\; p \in (0,1),\; X \in \{0,1,2,\dots\} \]

with probability mass function

\[ \text{P}(X = k) = \binom{k+r-1}{k}(1-p)^k p^r, \quad k = 0,1,2,\dots \]

and cumulative distribution function

\[ \text{P}(X \le k) = \sum_{i=0}^{k} \binom{i+r-1}{i}(1-p)^i p^r \]

This chapter uses the same parameterization as R’s dnbinom and pnbinom (failures before \(r\) successes).

Setting \(r=1\) recovers the geometric PMF:

\[ \text{P}(X=k)=\binom{k+1-1}{k}(1-p)^k p^1=(1-p)^k p. \]

15.2 Mean

\[ \text{E}(X) = \frac{r(1-p)}{p} \]

15.3 Variance

\[ \text{V}(X) = \frac{r(1-p)}{p^2} \]

Since \(0<p<1\), we have \(\frac{1}{p}>1\), so

\[ \text{V}(X)=\frac{\text{E}(X)}{p}>\text{E}(X), \]

which explains why the negative binomial naturally supports overdispersed count data relative to Poisson.

15.4 Mode

\[ \text{Mo}(X)= \begin{cases} 0, & r=1,\\ \left\lfloor\frac{(r-1)(1-p)}{p}\right\rfloor, & r>1. \end{cases} \]

15.5 Median

There is no simple closed-form expression for the median. In applications, it is usually computed numerically via the CDF.

15.6 Coefficient of Skewness

\[ g_1 = \frac{2-p}{\sqrt{r(1-p)}} \]

15.7 Coefficient of Kurtosis

\[ g_2 = 3 + \frac{6}{r} + \frac{p^2}{r(1-p)} \]

The corresponding excess kurtosis is \(\frac{6}{r}+\frac{p^2}{r(1-p)}\).

15.8 Moment Generating Function

\[ M_X(t)=\left(\frac{p}{1-(1-p)e^t}\right)^r, \quad t<-\ln(1-p) \]

15.9 Gamma-Poisson Mixture Link

The negative binomial can be derived as a Poisson-gamma mixture:

\[ X \mid \Lambda=\lambda \sim \text{Pois}(\lambda), \qquad \Lambda \sim \text{Gamma}\!\left(r,\ \text{rate}=\frac{p}{1-p}\right). \]

Integrating out \(\Lambda\) yields

\[ X \sim \text{NegBin}(r,p). \]

This mechanism explains overdispersion: the latent rate variation (gamma mixing) inflates marginal variance beyond the marginal mean.

15.10 Purpose

The negative binomial model is useful when events are counted until a target number of successes is reached:

  • Campaign execution: number of failed contacts before achieving a fixed number of conversions.
  • Quality assurance: number of nonconforming units observed before a target number of conforming outcomes.
  • Overdispersed count modeling: compared with Poisson (Chapter 18), the negative binomial can accommodate larger variance relative to the mean.
  • Generalization of geometric: when \(r=1\), the negative binomial reduces to the geometric distribution (Chapter 14).

15.11 R Module

The Negative Binomial Probabilities app is available in the handbook menu:

  • Distributions / Negative Binomial Probabilities

It is also accessible directly at:

  • https://shiny.wessa.net/negativebinomial/

15.12 Business Example: Conversion Pipeline Completion Risk

A sales team needs \(r = 6\) signed contracts to complete a quarterly target tranche. For each qualified lead, the estimated close probability is \(p = 0.25\). Let \(X\) denote the number of failed leads before reaching 6 signed contracts.

A useful planning quantity is:

\[ \text{P}(X \le 12) \]

which is the probability of reaching the target with at most 12 failed leads.

r <- 6
p <- 0.25

cat("P(X <= 12) =", pnbinom(12, size = r, prob = p), "\n")
cat("P(X >= 20) =", 1 - pnbinom(19, size = r, prob = p), "\n")
P(X <= 12) = 0.2825492 
P(X >= 20) = 0.3782785 

You can reproduce this setup with the preconfigured app below:

Interactive Shiny app (click to load).
Open in new tab

15.13 Additional Academic Example: Seed Germination Screening

In a plant-science pilot, researchers monitor germination attempts until they observe \(r=4\) successful germinations.
If each attempt succeeds with probability \(p=0.35\), let \(X\) be the number of failed attempts before the fourth success.

Two useful planning probabilities are:

\[ \text{P}(X \le 6) \quad \text{and} \quad \text{P}(X \ge 12). \]

r_germ <- 4
p_germ <- 0.35

cat("P(X <= 6) =", pnbinom(6, size = r_germ, prob = p_germ), "\n")
cat("P(X >= 12) =", 1 - pnbinom(11, size = r_germ, prob = p_germ), "\n")
P(X <= 6) = 0.486173 
P(X >= 12) = 0.1726965 
14  Geometric Distribution
16  Hypergeometric Distribution

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences