137 Generalized Linear Models

137.1 The GLM Framework

Generalized Linear Models (GLMs) extend ordinary linear regression to handle response variables that follow distributions other than the normal distribution (Nelder and Wedderburn 1972). A GLM consists of three components:

Random component: The probability distribution of the response variable \(Y\) (e.g., Normal, Poisson, Binomial)
Systematic component: The linear predictor \(\eta = \beta_0 + \beta_1 X_1 + ... + \beta_k X_k\)
Link function: A function \(g(\cdot)\) that relates the expected value of \(Y\) to the linear predictor: \(g(\mu) = \eta\)

Ordinary linear regression (Chapter 134) is a special case of the GLM framework with a Gaussian (normal) family and identity link function. Logistic regression (Chapter 136) is another special case with a binomial family and logit link function.

137.1.1 Common GLM Families

Table 137.1: Common GLM families and their properties

Family	Link Function	Response Type	Example
Gaussian	Identity: \(g(\mu) = \mu\)	Continuous, symmetric	Height, temperature
Binomial	Logit: \(g(\mu) = \log\frac{\mu}{1-\mu}\)	Binary (0/1) or proportions	Disease status, pass/fail
Poisson	Log: \(g(\mu) = \log(\mu)\)	Counts (0, 1, 2, …)	Number of accidents, species count
Quasipoisson	Log: \(g(\mu) = \log(\mu)\)	Overdispersed counts	Counts with extra variability
Negative Binomial	Log: \(g(\mu) = \log(\mu)\)	Overdispersed counts	Species abundance
Quasibinomial	Logit: \(g(\mu) = \log\frac{\mu}{1-\mu}\)	Overdispersed proportions	Test scores as proportions
Gamma	Inverse: \(g(\mu) = 1/\mu\)	Positive continuous, right-skewed	Insurance claims, waiting times

Notes:

Negative binomial regression is widely taught alongside GLMs but is not a strict canonical GLM in the narrow sense because an extra dispersion parameter is estimated.
For Gamma models, the inverse link is canonical, while the log link is often preferred in applied work because coefficient interpretation is more straightforward.

137.1.2 The General GLM Equation

For a GLM with link function \(g\):

\[ g(E[Y | X]) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_k X_k \]

The inverse of the link function maps the linear predictor back to the scale of the response:

\[ E[Y | X] = g^{-1}(\beta_0 + \beta_1 X_1 + ... + \beta_k X_k) \]

137.2 Poisson Regression

137.2.1 When to Use

Poisson regression is used when the response variable represents count data – the number of times an event occurs in a fixed period of time, area, or other unit of observation. Examples include:

Number of customer complaints per month
Number of species observed in a habitat
Number of accidents at an intersection per year
Number of goals scored per match

The key assumption is that the variance of the counts is approximately equal to the mean (equidispersion).

137.2.2 The Model

The Poisson regression model uses a log link function:

\[ \log(\mu) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_k X_k \]

where \(\mu = E[Y | X]\) is the expected count. Equivalently:

\[ \mu = e^{\beta_0 + \beta_1 X_1 + ... + \beta_k X_k} \]

137.2.3 Offsets for Exposure

When observations have different exposure times (e.g., person-years, machine-hours), a Poisson model should include an offset:

\[ \log(\mu_i) = \beta_0 + \beta_1 X_{i1} + \cdots + \beta_k X_{ik} + \log(\text{exposure}_i) \]

In R:

glm(count ~ x1 + x2 + offset(log(exposure)), family = poisson, data = df)

This models event rates rather than raw counts, which is essential when exposure differs across units.

137.2.4 Interpretation of Coefficients

Because of the log link, the coefficients have a multiplicative interpretation:

A one-unit increase in \(X_j\) multiplies the expected count by \(e^{\beta_j}\)
The quantity \(e^{\beta_j}\) is called the incidence rate ratio (IRR)
\(e^{\beta_j} > 1\): the predictor increases the expected count
\(e^{\beta_j} = 1\): no effect (equivalent to \(\beta_j = 0\))
\(e^{\beta_j} < 1\): the predictor decreases the expected count

137.2.5 Analysis based on p-values – Software

The Poisson GLM is available in the Manual Model Building application under the GLM tab by selecting “family = poisson”:

Interactive Shiny app (click to load).

Open in new tab

137.2.6 R Code

# Example: Number of awards by math score and program type
set.seed(42)
n <- 200
math_score <- round(rnorm(n, mean = 52, sd = 10))
program <- sample(c("General", "Academic", "Vocational"), n, replace = TRUE,
                  prob = c(0.4, 0.35, 0.25))

# True relationship
log_mean <- -3 + 0.04 * math_score + ifelse(program == "Academic", 0.5,
                                              ifelse(program == "Vocational", -0.3, 0))
num_awards <- rpois(n, lambda = exp(log_mean))

df <- data.frame(num_awards, math_score, program = factor(program))

cat("Summary of award counts:\n")
print(table(num_awards))

# Fit Poisson regression
pois_model <- glm(num_awards ~ math_score + program, family = poisson, data = df)
summary(pois_model)

Summary of award counts:
num_awards
  0   1   2   3 
129  50  20   1 

Call:
glm(formula = num_awards ~ math_score + program, family = poisson, 
    data = df)

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -3.20136    0.64460  -4.966 6.82e-07 ***
math_score         0.04916    0.01105   4.450 8.59e-06 ***
programGeneral    -0.23964    0.22653  -1.058   0.2901    
programVocational -0.58520    0.30504  -1.918   0.0551 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 204.47  on 199  degrees of freedom
Residual deviance: 179.57  on 196  degrees of freedom
AIC: 342.84

Number of Fisher Scoring iterations: 5

# Incidence rate ratios (IRR)
cat("\nIncidence Rate Ratios (IRR) with 95% CI:\n")
irr <- exp(cbind(IRR = coef(pois_model), confint(pois_model)))
print(irr)


Incidence Rate Ratios (IRR) with 95% CI:
                         IRR      2.5 %    97.5 %
(Intercept)       0.04070667 0.01125596 0.1409211
math_score        1.05039279 1.02795544 1.0734627
programGeneral    0.78690986 0.50427827 1.2300309
programVocational 0.55699706 0.29703951 0.9914422

137.2.7 Checking for Overdispersion

A critical assumption of Poisson regression is that the variance equals the mean. When the variance exceeds the mean, we have overdispersion. The dispersion parameter can be estimated as:

\[ \hat{\phi} = \frac{\text{Residual deviance}}{\text{Residual degrees of freedom}} \]

If \(\hat{\phi} \approx 1\), the Poisson model is appropriate. If \(\hat{\phi} \gg 1\), overdispersion is present and the standard errors from the Poisson model will be too small, leading to inflated significance.

# Check dispersion
dispersion <- pois_model$deviance / pois_model$df.residual
cat("Estimated dispersion parameter:", round(dispersion, 3), "\n")
cat("If this value is much larger than 1, consider quasipoisson or negative binomial.\n")

Estimated dispersion parameter: 0.916 
If this value is much larger than 1, consider quasipoisson or negative binomial.

137.2.8 Assumptions

The response variable is a count (non-negative integer)
Observations are independent
The mean equals the variance (equidispersion)
The log of the mean is a linear function of the predictors
No excess zeros beyond what the Poisson distribution predicts

137.3 Overdispersion and Alternatives

When the variance of the count data exceeds the mean (overdispersion), the standard Poisson model is inappropriate because it underestimates the standard errors. Two common alternatives address this problem.

137.3.1 Quasipoisson Regression

Quasipoisson regression handles overdispersion by introducing a dispersion parameter \(\phi\) that scales the variance:

\[ \text{Var}(Y) = \phi \cdot \mu \]

When \(\phi = 1\), this reduces to Poisson regression. When \(\phi > 1\), the standard errors are inflated by \(\sqrt{\phi}\), correctly accounting for the extra variability.

Key properties:

Same point estimates for coefficients as Poisson regression
Larger standard errors (more conservative p-values)
The dispersion parameter \(\phi\) is estimated from the data
AIC is not available for quasi-families (use F-tests for model comparison)

137.3.1.1 R Code

# Simulate overdispersed count data
set.seed(42)
n <- 200
x <- rnorm(n, mean = 5, sd = 2)
# Overdispersed counts: add extra variability
lambda <- exp(0.5 + 0.3 * x)
y_overdispersed <- rnbinom(n, mu = lambda, size = 2)  # Negative binomial generates overdispersion

df_od <- data.frame(y = y_overdispersed, x = x)

# Standard Poisson (will underestimate SE)
pois_fit <- glm(y ~ x, family = poisson, data = df_od)

# Quasipoisson (corrects SE)
qpois_fit <- glm(y ~ x, family = quasipoisson, data = df_od)

cat("=== Poisson Model ===\n")
cat("Dispersion:", round(pois_fit$deviance / pois_fit$df.residual, 3), "\n")
cat("SE for x:", round(summary(pois_fit)$coefficients["x", "Std. Error"], 4), "\n\n")

cat("=== Quasipoisson Model ===\n")
cat("Dispersion:", round(summary(qpois_fit)$dispersion, 3), "\n")
cat("SE for x:", round(summary(qpois_fit)$coefficients["x", "Std. Error"], 4), "\n")

=== Poisson Model ===
Dispersion: 4.555 
SE for x: 0.013 

=== Quasipoisson Model ===
Dispersion: 4.854 
SE for x: 0.0287

Notice that the quasipoisson model produces larger standard errors, reflecting the true uncertainty more accurately.

137.3.2 Negative Binomial Regression

Negative binomial regression is an alternative approach to overdispersed count data. Instead of adjusting the standard errors post-hoc (as quasipoisson does), it explicitly models the extra variation through a dispersion parameter \(\theta\):

\[ \text{Var}(Y) = \mu + \frac{\mu^2}{\theta} \]

As \(\theta \to \infty\), the negative binomial converges to the Poisson distribution.

Key properties:

Estimated coefficients may differ slightly from Poisson
Models the overdispersion mechanism explicitly
AIC is available for model comparison
Requires the MASS package in R

137.3.2.1 R Code

# Negative binomial regression
library(MASS)
nb_fit <- glm.nb(y ~ x, data = df_od)
summary(nb_fit)


Call:
glm.nb(formula = y ~ x, data = df_od, init.theta = 2.205366332, 
    link = log)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.63124    0.16083   3.925 8.68e-05 ***
x            0.27417    0.02914   9.409  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(2.2054) family taken to be 1)

    Null deviance: 311.22  on 199  degrees of freedom
Residual deviance: 220.96  on 198  degrees of freedom
AIC: 1194.5

Number of Fisher Scoring iterations: 1

              Theta:  2.205 
          Std. Err.:  0.296 

 2 x log-likelihood:  -1188.486

# Compare all three models
cat("=== Model Comparison ===\n\n")
cat("Poisson AIC:", round(AIC(pois_fit), 1), "\n")
cat("Negative Binomial AIC:", round(AIC(nb_fit), 1), "\n")
cat("(Lower AIC is better; AIC not available for quasipoisson)\n\n")

cat("Coefficient for x:\n")
cat("  Poisson:           ", round(coef(pois_fit)["x"], 4),
    " (SE:", round(summary(pois_fit)$coefficients["x", "Std. Error"], 4), ")\n")
cat("  Quasipoisson:      ", round(coef(qpois_fit)["x"], 4),
    " (SE:", round(summary(qpois_fit)$coefficients["x", "Std. Error"], 4), ")\n")
cat("  Negative Binomial: ", round(coef(nb_fit)["x"], 4),
    " (SE:", round(summary(nb_fit)$coefficients["x", "Std. Error"], 4), ")\n")

=== Model Comparison ===

Poisson AIC: 1599.3 
Negative Binomial AIC: 1194.5 
(Lower AIC is better; AIC not available for quasipoisson)

Coefficient for x:
  Poisson:            0.2653  (SE: 0.013 )
  Quasipoisson:       0.2653  (SE: 0.0287 )
  Negative Binomial:  0.2742  (SE: 0.0291 )

137.3.3 Quasipoisson vs. Negative Binomial: When to Use Which

Table 137.2: Quasipoisson vs. Negative Binomial comparison

Aspect	Quasipoisson	Negative Binomial
Variance structure	\(\text{Var} = \phi \mu\) (linear in mean)	\(\text{Var} = \mu + \mu^2/\theta\) (quadratic in mean)
AIC available	No	Yes
Model comparison	F-tests only	AIC, likelihood ratio tests
Point estimates	Same as Poisson	May differ from Poisson
When to prefer	Mild overdispersion; want simplicity	Moderate to strong overdispersion; need AIC

137.4 Quasibinomial Regression

Quasibinomial regression extends logistic regression (Chapter 136) to handle overdispersed binary or proportional data. It uses the same logit link function as standard logistic regression but allows the variance to exceed what the binomial distribution predicts.

137.4.1 When to Use

The response is a proportion (between 0 and 1) rather than a binary outcome
Binary data shows more variability than the binomial distribution allows
Cluster-level proportions where within-cluster correlation creates overdispersion

137.4.2 R Code

# Example: Proportion of seeds germinating under different conditions
set.seed(42)
n_plots <- 30
temperature <- rnorm(n_plots, mean = 20, sd = 5)
moisture <- rnorm(n_plots, mean = 50, sd = 10)
seeds_planted <- rep(100, n_plots)

# True germination probability
logit_p <- -2 + 0.1 * temperature + 0.02 * moisture
p <- 1 / (1 + exp(-logit_p))

# Overdispersed binomial (beta-binomial simulation)
alpha_param <- p * 5
beta_param <- (1 - p) * 5
actual_p <- rbeta(n_plots, alpha_param, beta_param)
seeds_germinated <- rbinom(n_plots, seeds_planted, actual_p)

df_germ <- data.frame(
  germinated = seeds_germinated,
  total = seeds_planted,
  temperature = temperature,
  moisture = moisture
)

# Standard binomial
binom_fit <- glm(cbind(germinated, total - germinated) ~ temperature + moisture,
                 family = binomial, data = df_germ)

# Quasibinomial (accounts for overdispersion)
qbinom_fit <- glm(cbind(germinated, total - germinated) ~ temperature + moisture,
                  family = quasibinomial, data = df_germ)

cat("=== Standard Binomial ===\n")
cat("Dispersion:", round(binom_fit$deviance / binom_fit$df.residual, 3), "\n")
printCoefmat(summary(binom_fit)$coefficients, digits = 3)

cat("\n=== Quasibinomial ===\n")
cat("Dispersion:", round(summary(qbinom_fit)$dispersion, 3), "\n")
printCoefmat(summary(qbinom_fit)$coefficients, digits = 3)

=== Standard Binomial ===
Dispersion: 19.216 
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.88608    0.25632  -11.26   <2e-16 ***
temperature  0.05698    0.00660    8.63   <2e-16 ***
moisture     0.04921    0.00391   12.59   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

=== Quasibinomial ===
Dispersion: 17.655 
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -2.8861     1.0770   -2.68   0.0124 * 
temperature   0.0570     0.0277    2.05   0.0498 * 
moisture      0.0492     0.0164    3.00   0.0058 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The quasibinomial model produces wider confidence intervals and more conservative p-values when overdispersion is present.

137.5 Model Selection

137.5.1 AIC Comparison

For true likelihood-based models (Poisson, negative binomial, binomial, Gaussian), the Akaike Information Criterion (AIC) can be used to compare models:

\[ \text{AIC} = -2 \ell(\hat{\beta}) + 2k \]

where \(\ell\) is the log-likelihood and \(k\) is the number of parameters. Lower AIC indicates a better balance between fit and complexity.

Important: AIC is not available for quasi-families (quasipoisson, quasibinomial) because they do not specify a full likelihood. For these models, use F-tests or compare with their non-quasi counterparts.

137.5.2 Likelihood Ratio Tests

Nested models can be compared using likelihood ratio tests:

# Compare nested models
full_model <- glm(y ~ x1 + x2, family = poisson, data = df)
reduced_model <- glm(y ~ x1, family = poisson, data = df)
anova(reduced_model, full_model, test = "Chisq")

For quasi-families, use test = "F" instead:

# For quasi-families
full_model <- glm(y ~ x1 + x2, family = quasipoisson, data = df)
reduced_model <- glm(y ~ x1, family = quasipoisson, data = df)
anova(reduced_model, full_model, test = "F")

137.5.3 Residual Diagnostics for GLMs

GLM residuals differ from ordinary residuals in linear regression. The two most useful types are:

Deviance residuals: Analogous to residuals in linear regression; for moderate/large counts they are often roughly symmetric around zero, but perfect normality is not expected
Pearson residuals: Standardized by the expected variance; useful for detecting overdispersion

Code

par(mfrow = c(2, 2))

# Deviance residuals vs fitted
plot(predict(pois_model, type = "response"), residuals(pois_model, type = "deviance"),
     xlab = "Fitted values", ylab = "Deviance residuals",
     main = "Deviance Residuals vs Fitted")
abline(h = 0, lty = 2, col = "red")

# QQ plot of deviance residuals
qqnorm(residuals(pois_model, type = "deviance"), main = "QQ Plot of Deviance Residuals")
qqline(residuals(pois_model, type = "deviance"), col = "red")

# Scale-location plot
plot(predict(pois_model, type = "response"),
     sqrt(abs(residuals(pois_model, type = "deviance"))),
     xlab = "Fitted values", ylab = "√|Deviance residuals|",
     main = "Scale-Location")

# Residuals vs leverage
plot(hatvalues(pois_model), residuals(pois_model, type = "deviance"),
     xlab = "Leverage", ylab = "Deviance residuals",
     main = "Residuals vs Leverage")
abline(h = 0, lty = 2, col = "red")

par(mfrow = c(1, 1))

Figure 137.1: Diagnostic plots for Poisson regression model

137.6 Assumptions (GLM General)

All GLMs share the following assumptions:

Correct specification of the link function: The chosen link function correctly relates the mean response to the linear predictor
Correct distributional family: The response variable follows the specified distribution (or quasi-distribution)
Independence of observations: Each observation is independent of the others
No perfect multicollinearity (identification): Predictors cannot be exact linear combinations of each other
Linearity on the link scale: The relationship between each predictor and the transformed mean is linear

High (imperfect) multicollinearity is a practical estimation concern because it inflates standard errors and can destabilize coefficient estimates.

Additionally, for Poisson models: the mean should equal the variance (unless using quasipoisson or negative binomial).

137.7 Zero-Inflated and Hurdle Extensions

If count data contain many zeros, standard Poisson/negative binomial models can underfit the zero mass. Two common extensions are:

Zero-inflated models (ZIP/ZINB): combine a point-mass-at-zero component with a count component.
Hurdle models: model zero vs positive counts first, then positive counts separately.

In R, these can be fit with functions such as pscl::zeroinfl() and pscl::hurdle().

137.8 Pros & Cons

137.8.1 Pros

Unified framework that encompasses many common regression models
Handles non-normal response distributions naturally
Coefficients have interpretable meanings (odds ratios, incidence rate ratios)
Well-established theory for inference (confidence intervals, hypothesis tests)
Flexible: can accommodate various response types by choosing the appropriate family

137.8.2 Cons

Assumes a specific parametric relationship between predictors and response
Quasi-families do not support AIC-based model comparison
Overdispersion must be detected and addressed explicitly
Cannot handle zero-inflated data without extensions (zero-inflated Poisson or hurdle models)
Requires careful model checking; a poor choice of family or link function leads to invalid inference

137.9 Task

Using the warpbreaks dataset in R (data(warpbreaks)), fit a Poisson regression model predicting the number of breaks by wool type and tension level. Check for overdispersion. If present, refit using quasipoisson and negative binomial regression and compare the results.
Generate overdispersed count data and compare the standard errors from Poisson, quasipoisson, and negative binomial models. Which model gives the most honest standard errors?
Fit a quasibinomial model to proportion data of your choice. Compare the results with a standard binomial model and discuss how overdispersion affects inference.