• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Regression Models
  2. 137  Generalized Linear Models
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • 137.1 The GLM Framework
    • 137.1.1 Common GLM Families
    • 137.1.2 The General GLM Equation
  • 137.2 Poisson Regression
    • 137.2.1 When to Use
    • 137.2.2 The Model
    • 137.2.3 Offsets for Exposure
    • 137.2.4 Interpretation of Coefficients
    • 137.2.5 Analysis based on p-values – Software
    • 137.2.6 R Code
    • 137.2.7 Checking for Overdispersion
    • 137.2.8 Assumptions
  • 137.3 Overdispersion and Alternatives
    • 137.3.1 Quasipoisson Regression
    • 137.3.2 Negative Binomial Regression
    • 137.3.3 Quasipoisson vs. Negative Binomial: When to Use Which
  • 137.4 Quasibinomial Regression
    • 137.4.1 When to Use
    • 137.4.2 R Code
  • 137.5 Model Selection
    • 137.5.1 AIC Comparison
    • 137.5.2 Likelihood Ratio Tests
    • 137.5.3 Residual Diagnostics for GLMs
  • 137.6 Assumptions (GLM General)
  • 137.7 Zero-Inflated and Hurdle Extensions
  • 137.8 Pros & Cons
    • 137.8.1 Pros
    • 137.8.2 Cons
  • 137.9 Task
  1. Regression Models
  2. 137  Generalized Linear Models

137  Generalized Linear Models

137.1 The GLM Framework

Generalized Linear Models (GLMs) extend ordinary linear regression to handle response variables that follow distributions other than the normal distribution (Nelder and Wedderburn 1972). A GLM consists of three components:

  1. Random component: The probability distribution of the response variable \(Y\) (e.g., Normal, Poisson, Binomial)
  2. Systematic component: The linear predictor \(\eta = \beta_0 + \beta_1 X_1 + ... + \beta_k X_k\)
  3. Link function: A function \(g(\cdot)\) that relates the expected value of \(Y\) to the linear predictor: \(g(\mu) = \eta\)

Ordinary linear regression (Chapter 134) is a special case of the GLM framework with a Gaussian (normal) family and identity link function. Logistic regression (Chapter 136) is another special case with a binomial family and logit link function.

137.1.1 Common GLM Families

Table 137.1: Common GLM families and their properties
Family Link Function Response Type Example
Gaussian Identity: \(g(\mu) = \mu\) Continuous, symmetric Height, temperature
Binomial Logit: \(g(\mu) = \log\frac{\mu}{1-\mu}\) Binary (0/1) or proportions Disease status, pass/fail
Poisson Log: \(g(\mu) = \log(\mu)\) Counts (0, 1, 2, …) Number of accidents, species count
Quasipoisson Log: \(g(\mu) = \log(\mu)\) Overdispersed counts Counts with extra variability
Negative Binomial Log: \(g(\mu) = \log(\mu)\) Overdispersed counts Species abundance
Quasibinomial Logit: \(g(\mu) = \log\frac{\mu}{1-\mu}\) Overdispersed proportions Test scores as proportions
Gamma Inverse: \(g(\mu) = 1/\mu\) Positive continuous, right-skewed Insurance claims, waiting times

Notes:

  • Negative binomial regression is widely taught alongside GLMs but is not a strict canonical GLM in the narrow sense because an extra dispersion parameter is estimated.
  • For Gamma models, the inverse link is canonical, while the log link is often preferred in applied work because coefficient interpretation is more straightforward.

137.1.2 The General GLM Equation

For a GLM with link function \(g\):

\[ g(E[Y | X]) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_k X_k \]

The inverse of the link function maps the linear predictor back to the scale of the response:

\[ E[Y | X] = g^{-1}(\beta_0 + \beta_1 X_1 + ... + \beta_k X_k) \]

137.2 Poisson Regression

137.2.1 When to Use

Poisson regression is used when the response variable represents count data – the number of times an event occurs in a fixed period of time, area, or other unit of observation. Examples include:

  • Number of customer complaints per month
  • Number of species observed in a habitat
  • Number of accidents at an intersection per year
  • Number of goals scored per match

The key assumption is that the variance of the counts is approximately equal to the mean (equidispersion).

137.2.2 The Model

The Poisson regression model uses a log link function:

\[ \log(\mu) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_k X_k \]

where \(\mu = E[Y | X]\) is the expected count. Equivalently:

\[ \mu = e^{\beta_0 + \beta_1 X_1 + ... + \beta_k X_k} \]

137.2.3 Offsets for Exposure

When observations have different exposure times (e.g., person-years, machine-hours), a Poisson model should include an offset:

\[ \log(\mu_i) = \beta_0 + \beta_1 X_{i1} + \cdots + \beta_k X_{ik} + \log(\text{exposure}_i) \]

In R:

glm(count ~ x1 + x2 + offset(log(exposure)), family = poisson, data = df)

This models event rates rather than raw counts, which is essential when exposure differs across units.

137.2.4 Interpretation of Coefficients

Because of the log link, the coefficients have a multiplicative interpretation:

  • A one-unit increase in \(X_j\) multiplies the expected count by \(e^{\beta_j}\)
  • The quantity \(e^{\beta_j}\) is called the incidence rate ratio (IRR)
  • \(e^{\beta_j} > 1\): the predictor increases the expected count
  • \(e^{\beta_j} = 1\): no effect (equivalent to \(\beta_j = 0\))
  • \(e^{\beta_j} < 1\): the predictor decreases the expected count

137.2.5 Analysis based on p-values – Software

The Poisson GLM is available in the Manual Model Building application under the GLM tab by selecting “family = poisson”:

Interactive Shiny app (click to load).
Open in new tab

137.2.6 R Code

# Example: Number of awards by math score and program type
set.seed(42)
n <- 200
math_score <- round(rnorm(n, mean = 52, sd = 10))
program <- sample(c("General", "Academic", "Vocational"), n, replace = TRUE,
                  prob = c(0.4, 0.35, 0.25))

# True relationship
log_mean <- -3 + 0.04 * math_score + ifelse(program == "Academic", 0.5,
                                              ifelse(program == "Vocational", -0.3, 0))
num_awards <- rpois(n, lambda = exp(log_mean))

df <- data.frame(num_awards, math_score, program = factor(program))

cat("Summary of award counts:\n")
print(table(num_awards))

# Fit Poisson regression
pois_model <- glm(num_awards ~ math_score + program, family = poisson, data = df)
summary(pois_model)
Summary of award counts:
num_awards
  0   1   2   3 
129  50  20   1 

Call:
glm(formula = num_awards ~ math_score + program, family = poisson, 
    data = df)

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -3.20136    0.64460  -4.966 6.82e-07 ***
math_score         0.04916    0.01105   4.450 8.59e-06 ***
programGeneral    -0.23964    0.22653  -1.058   0.2901    
programVocational -0.58520    0.30504  -1.918   0.0551 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 204.47  on 199  degrees of freedom
Residual deviance: 179.57  on 196  degrees of freedom
AIC: 342.84

Number of Fisher Scoring iterations: 5
# Incidence rate ratios (IRR)
cat("\nIncidence Rate Ratios (IRR) with 95% CI:\n")
irr <- exp(cbind(IRR = coef(pois_model), confint(pois_model)))
print(irr)

Incidence Rate Ratios (IRR) with 95% CI:
                         IRR      2.5 %    97.5 %
(Intercept)       0.04070667 0.01125596 0.1409211
math_score        1.05039279 1.02795544 1.0734627
programGeneral    0.78690986 0.50427827 1.2300309
programVocational 0.55699706 0.29703951 0.9914422

137.2.7 Checking for Overdispersion

A critical assumption of Poisson regression is that the variance equals the mean. When the variance exceeds the mean, we have overdispersion. The dispersion parameter can be estimated as:

\[ \hat{\phi} = \frac{\text{Residual deviance}}{\text{Residual degrees of freedom}} \]

If \(\hat{\phi} \approx 1\), the Poisson model is appropriate. If \(\hat{\phi} \gg 1\), overdispersion is present and the standard errors from the Poisson model will be too small, leading to inflated significance.

# Check dispersion
dispersion <- pois_model$deviance / pois_model$df.residual
cat("Estimated dispersion parameter:", round(dispersion, 3), "\n")
cat("If this value is much larger than 1, consider quasipoisson or negative binomial.\n")
Estimated dispersion parameter: 0.916 
If this value is much larger than 1, consider quasipoisson or negative binomial.

137.2.8 Assumptions

  • The response variable is a count (non-negative integer)
  • Observations are independent
  • The mean equals the variance (equidispersion)
  • The log of the mean is a linear function of the predictors
  • No excess zeros beyond what the Poisson distribution predicts

137.3 Overdispersion and Alternatives

When the variance of the count data exceeds the mean (overdispersion), the standard Poisson model is inappropriate because it underestimates the standard errors. Two common alternatives address this problem.

137.3.1 Quasipoisson Regression

Quasipoisson regression handles overdispersion by introducing a dispersion parameter \(\phi\) that scales the variance:

\[ \text{Var}(Y) = \phi \cdot \mu \]

When \(\phi = 1\), this reduces to Poisson regression. When \(\phi > 1\), the standard errors are inflated by \(\sqrt{\phi}\), correctly accounting for the extra variability.

Key properties:

  • Same point estimates for coefficients as Poisson regression
  • Larger standard errors (more conservative p-values)
  • The dispersion parameter \(\phi\) is estimated from the data
  • AIC is not available for quasi-families (use F-tests for model comparison)

137.3.1.1 R Code

# Simulate overdispersed count data
set.seed(42)
n <- 200
x <- rnorm(n, mean = 5, sd = 2)
# Overdispersed counts: add extra variability
lambda <- exp(0.5 + 0.3 * x)
y_overdispersed <- rnbinom(n, mu = lambda, size = 2)  # Negative binomial generates overdispersion

df_od <- data.frame(y = y_overdispersed, x = x)

# Standard Poisson (will underestimate SE)
pois_fit <- glm(y ~ x, family = poisson, data = df_od)

# Quasipoisson (corrects SE)
qpois_fit <- glm(y ~ x, family = quasipoisson, data = df_od)

cat("=== Poisson Model ===\n")
cat("Dispersion:", round(pois_fit$deviance / pois_fit$df.residual, 3), "\n")
cat("SE for x:", round(summary(pois_fit)$coefficients["x", "Std. Error"], 4), "\n\n")

cat("=== Quasipoisson Model ===\n")
cat("Dispersion:", round(summary(qpois_fit)$dispersion, 3), "\n")
cat("SE for x:", round(summary(qpois_fit)$coefficients["x", "Std. Error"], 4), "\n")
=== Poisson Model ===
Dispersion: 4.555 
SE for x: 0.013 

=== Quasipoisson Model ===
Dispersion: 4.854 
SE for x: 0.0287 

Notice that the quasipoisson model produces larger standard errors, reflecting the true uncertainty more accurately.

137.3.2 Negative Binomial Regression

Negative binomial regression is an alternative approach to overdispersed count data. Instead of adjusting the standard errors post-hoc (as quasipoisson does), it explicitly models the extra variation through a dispersion parameter \(\theta\):

\[ \text{Var}(Y) = \mu + \frac{\mu^2}{\theta} \]

As \(\theta \to \infty\), the negative binomial converges to the Poisson distribution.

Key properties:

  • Estimated coefficients may differ slightly from Poisson
  • Models the overdispersion mechanism explicitly
  • AIC is available for model comparison
  • Requires the MASS package in R

137.3.2.1 R Code

# Negative binomial regression
library(MASS)
nb_fit <- glm.nb(y ~ x, data = df_od)
summary(nb_fit)

Call:
glm.nb(formula = y ~ x, data = df_od, init.theta = 2.205366332, 
    link = log)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.63124    0.16083   3.925 8.68e-05 ***
x            0.27417    0.02914   9.409  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(2.2054) family taken to be 1)

    Null deviance: 311.22  on 199  degrees of freedom
Residual deviance: 220.96  on 198  degrees of freedom
AIC: 1194.5

Number of Fisher Scoring iterations: 1

              Theta:  2.205 
          Std. Err.:  0.296 

 2 x log-likelihood:  -1188.486 
# Compare all three models
cat("=== Model Comparison ===\n\n")
cat("Poisson AIC:", round(AIC(pois_fit), 1), "\n")
cat("Negative Binomial AIC:", round(AIC(nb_fit), 1), "\n")
cat("(Lower AIC is better; AIC not available for quasipoisson)\n\n")

cat("Coefficient for x:\n")
cat("  Poisson:           ", round(coef(pois_fit)["x"], 4),
    " (SE:", round(summary(pois_fit)$coefficients["x", "Std. Error"], 4), ")\n")
cat("  Quasipoisson:      ", round(coef(qpois_fit)["x"], 4),
    " (SE:", round(summary(qpois_fit)$coefficients["x", "Std. Error"], 4), ")\n")
cat("  Negative Binomial: ", round(coef(nb_fit)["x"], 4),
    " (SE:", round(summary(nb_fit)$coefficients["x", "Std. Error"], 4), ")\n")
=== Model Comparison ===

Poisson AIC: 1599.3 
Negative Binomial AIC: 1194.5 
(Lower AIC is better; AIC not available for quasipoisson)

Coefficient for x:
  Poisson:            0.2653  (SE: 0.013 )
  Quasipoisson:       0.2653  (SE: 0.0287 )
  Negative Binomial:  0.2742  (SE: 0.0291 )

137.3.3 Quasipoisson vs. Negative Binomial: When to Use Which

Table 137.2: Quasipoisson vs. Negative Binomial comparison
Aspect Quasipoisson Negative Binomial
Variance structure \(\text{Var} = \phi \mu\) (linear in mean) \(\text{Var} = \mu + \mu^2/\theta\) (quadratic in mean)
AIC available No Yes
Model comparison F-tests only AIC, likelihood ratio tests
Point estimates Same as Poisson May differ from Poisson
When to prefer Mild overdispersion; want simplicity Moderate to strong overdispersion; need AIC

137.4 Quasibinomial Regression

Quasibinomial regression extends logistic regression (Chapter 136) to handle overdispersed binary or proportional data. It uses the same logit link function as standard logistic regression but allows the variance to exceed what the binomial distribution predicts.

137.4.1 When to Use

  • The response is a proportion (between 0 and 1) rather than a binary outcome
  • Binary data shows more variability than the binomial distribution allows
  • Cluster-level proportions where within-cluster correlation creates overdispersion

137.4.2 R Code

# Example: Proportion of seeds germinating under different conditions
set.seed(42)
n_plots <- 30
temperature <- rnorm(n_plots, mean = 20, sd = 5)
moisture <- rnorm(n_plots, mean = 50, sd = 10)
seeds_planted <- rep(100, n_plots)

# True germination probability
logit_p <- -2 + 0.1 * temperature + 0.02 * moisture
p <- 1 / (1 + exp(-logit_p))

# Overdispersed binomial (beta-binomial simulation)
alpha_param <- p * 5
beta_param <- (1 - p) * 5
actual_p <- rbeta(n_plots, alpha_param, beta_param)
seeds_germinated <- rbinom(n_plots, seeds_planted, actual_p)

df_germ <- data.frame(
  germinated = seeds_germinated,
  total = seeds_planted,
  temperature = temperature,
  moisture = moisture
)

# Standard binomial
binom_fit <- glm(cbind(germinated, total - germinated) ~ temperature + moisture,
                 family = binomial, data = df_germ)

# Quasibinomial (accounts for overdispersion)
qbinom_fit <- glm(cbind(germinated, total - germinated) ~ temperature + moisture,
                  family = quasibinomial, data = df_germ)

cat("=== Standard Binomial ===\n")
cat("Dispersion:", round(binom_fit$deviance / binom_fit$df.residual, 3), "\n")
printCoefmat(summary(binom_fit)$coefficients, digits = 3)

cat("\n=== Quasibinomial ===\n")
cat("Dispersion:", round(summary(qbinom_fit)$dispersion, 3), "\n")
printCoefmat(summary(qbinom_fit)$coefficients, digits = 3)
=== Standard Binomial ===
Dispersion: 19.216 
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.88608    0.25632  -11.26   <2e-16 ***
temperature  0.05698    0.00660    8.63   <2e-16 ***
moisture     0.04921    0.00391   12.59   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

=== Quasibinomial ===
Dispersion: 17.655 
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -2.8861     1.0770   -2.68   0.0124 * 
temperature   0.0570     0.0277    2.05   0.0498 * 
moisture      0.0492     0.0164    3.00   0.0058 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The quasibinomial model produces wider confidence intervals and more conservative p-values when overdispersion is present.

137.5 Model Selection

137.5.1 AIC Comparison

For true likelihood-based models (Poisson, negative binomial, binomial, Gaussian), the Akaike Information Criterion (AIC) can be used to compare models:

\[ \text{AIC} = -2 \ell(\hat{\beta}) + 2k \]

where \(\ell\) is the log-likelihood and \(k\) is the number of parameters. Lower AIC indicates a better balance between fit and complexity.

Important: AIC is not available for quasi-families (quasipoisson, quasibinomial) because they do not specify a full likelihood. For these models, use F-tests or compare with their non-quasi counterparts.

137.5.2 Likelihood Ratio Tests

Nested models can be compared using likelihood ratio tests:

# Compare nested models
full_model <- glm(y ~ x1 + x2, family = poisson, data = df)
reduced_model <- glm(y ~ x1, family = poisson, data = df)
anova(reduced_model, full_model, test = "Chisq")

For quasi-families, use test = "F" instead:

# For quasi-families
full_model <- glm(y ~ x1 + x2, family = quasipoisson, data = df)
reduced_model <- glm(y ~ x1, family = quasipoisson, data = df)
anova(reduced_model, full_model, test = "F")

137.5.3 Residual Diagnostics for GLMs

GLM residuals differ from ordinary residuals in linear regression. The two most useful types are:

  • Deviance residuals: Analogous to residuals in linear regression; for moderate/large counts they are often roughly symmetric around zero, but perfect normality is not expected
  • Pearson residuals: Standardized by the expected variance; useful for detecting overdispersion
Code
par(mfrow = c(2, 2))

# Deviance residuals vs fitted
plot(predict(pois_model, type = "response"), residuals(pois_model, type = "deviance"),
     xlab = "Fitted values", ylab = "Deviance residuals",
     main = "Deviance Residuals vs Fitted")
abline(h = 0, lty = 2, col = "red")

# QQ plot of deviance residuals
qqnorm(residuals(pois_model, type = "deviance"), main = "QQ Plot of Deviance Residuals")
qqline(residuals(pois_model, type = "deviance"), col = "red")

# Scale-location plot
plot(predict(pois_model, type = "response"),
     sqrt(abs(residuals(pois_model, type = "deviance"))),
     xlab = "Fitted values", ylab = "√|Deviance residuals|",
     main = "Scale-Location")

# Residuals vs leverage
plot(hatvalues(pois_model), residuals(pois_model, type = "deviance"),
     xlab = "Leverage", ylab = "Deviance residuals",
     main = "Residuals vs Leverage")
abline(h = 0, lty = 2, col = "red")

par(mfrow = c(1, 1))
Figure 137.1: Diagnostic plots for Poisson regression model

137.6 Assumptions (GLM General)

All GLMs share the following assumptions:

  1. Correct specification of the link function: The chosen link function correctly relates the mean response to the linear predictor
  2. Correct distributional family: The response variable follows the specified distribution (or quasi-distribution)
  3. Independence of observations: Each observation is independent of the others
  4. No perfect multicollinearity (identification): Predictors cannot be exact linear combinations of each other
  5. Linearity on the link scale: The relationship between each predictor and the transformed mean is linear

High (imperfect) multicollinearity is a practical estimation concern because it inflates standard errors and can destabilize coefficient estimates.

Additionally, for Poisson models: the mean should equal the variance (unless using quasipoisson or negative binomial).

137.7 Zero-Inflated and Hurdle Extensions

If count data contain many zeros, standard Poisson/negative binomial models can underfit the zero mass. Two common extensions are:

  • Zero-inflated models (ZIP/ZINB): combine a point-mass-at-zero component with a count component.
  • Hurdle models: model zero vs positive counts first, then positive counts separately.

In R, these can be fit with functions such as pscl::zeroinfl() and pscl::hurdle().

137.8 Pros & Cons

137.8.1 Pros

  • Unified framework that encompasses many common regression models
  • Handles non-normal response distributions naturally
  • Coefficients have interpretable meanings (odds ratios, incidence rate ratios)
  • Well-established theory for inference (confidence intervals, hypothesis tests)
  • Flexible: can accommodate various response types by choosing the appropriate family

137.8.2 Cons

  • Assumes a specific parametric relationship between predictors and response
  • Quasi-families do not support AIC-based model comparison
  • Overdispersion must be detected and addressed explicitly
  • Cannot handle zero-inflated data without extensions (zero-inflated Poisson or hurdle models)
  • Requires careful model checking; a poor choice of family or link function leads to invalid inference

137.9 Task

  1. Using the warpbreaks dataset in R (data(warpbreaks)), fit a Poisson regression model predicting the number of breaks by wool type and tension level. Check for overdispersion. If present, refit using quasipoisson and negative binomial regression and compare the results.

  2. Generate overdispersed count data and compare the standard errors from Poisson, quasipoisson, and negative binomial models. Which model gives the most honest standard errors?

  3. Fit a quasibinomial model to proportion data of your choice. Compare the results with a standard binomial model and discuss how overdispersion affects inference.

Nelder, John A., and Robert W. M. Wedderburn. 1972. “Generalized Linear Models.” Journal of the Royal Statistical Society: Series A (General) 135 (3): 370–84. https://doi.org/10.2307/2344614.
136  Logistic Regression
138  Multinomial and Ordinal Logistic Regression

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences