32 Pareto Distribution

The Pareto distribution captures the heavy-tail power-law phenomenon seen in income, city sizes, and internet traffic. Its tail decays as \(x^{-\alpha}\) — far slower than the exponential — giving rare extremely large values much more probability than any Normal-family distribution.

Formally, the random variate \(X\) defined for the range \(X \in [x_m, \infty)\), is said to have a Pareto Distribution (i.e. \(X \sim \text{Pareto}(x_m, \alpha)\)) with minimum value \(x_m > 0\) and shape parameter \(\alpha > 0\). The Pareto distribution does not have a built-in function in base R; custom density and distribution functions are used.

32.1 Probability Density Function

\[ f(x) = \frac{\alpha\, x_m^\alpha}{x^{\alpha+1}}, \quad x \geq x_m \]

The figure below shows examples of the Pareto Probability Density Function for different shape values with \(x_m = 1\).

Code

dpareto <- function(x, xm, alpha) {
  ifelse(x >= xm, alpha * xm^alpha / x^(alpha + 1), 0)
}

par(mfrow = c(2, 2))
x <- seq(1, 6, length = 500)

plot(x, dpareto(x, 1, 0.5), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "f(x)", main = expression(paste(x[m] == 1, ",  ", alpha == 0.5)),
     ylim = c(0, 3))

plot(x, dpareto(x, 1, 1), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "f(x)", main = expression(paste(x[m] == 1, ",  ", alpha == 1)))

plot(x, dpareto(x, 1, 2), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "f(x)", main = expression(paste(x[m] == 1, ",  ", alpha == 2)))

plot(x, dpareto(x, 1, 4), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "f(x)", main = expression(paste(x[m] == 1, ",  ", alpha == 4)))

par(mfrow = c(1, 1))

Figure 32.1: Pareto Probability Density Function for various shape values (xm = 1)

32.2 Purpose

The Pareto distribution models extreme inequality: the “vital few” observations dominate the total, while the “trivial many” contribute little — the 80/20 rule. Its power-law tail makes it the standard choice for phenomena where the largest values are orders of magnitude larger than typical values. Common applications include:

Income and wealth distributions: a small fraction of individuals hold most wealth
City population sizes: a few mega-cities dwarf the many smaller cities
Internet traffic: a small number of files or users account for most data transfer
Insurance claims: rare catastrophic losses dominate aggregate loss portfolios
Earthquake magnitudes, solar flare intensities, and other natural extreme events

Relation to the discrete setting. The Pareto distribution is the continuous analog of the Zeta (Riemann) distribution, which assigns probability \(\propto k^{-\alpha}\) to positive integers — the theoretical basis of Zipf’s law for word and rank frequencies.

32.3 Distribution Function

\[ F(x) = 1 - \left(\frac{x_m}{x}\right)^\alpha, \quad x \geq x_m \]

The figure below shows the Pareto Distribution Function for \(x_m = 1\) and \(\alpha = 2\).

Code

ppareto <- function(x, xm, alpha) {
  ifelse(x >= xm, 1 - (xm / x)^alpha, 0)
}

x <- seq(1, 8, length = 500)
plot(x, ppareto(x, 1, 2), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "F(x)", main = "Pareto Distribution Function",
     sub = expression(paste(x[m] == 1, ",  ", alpha == 2)))

Figure 32.2: Pareto Distribution Function (xm = 1, alpha = 2)

32.4 Moment Generating Function

The moment generating function of the Pareto distribution does not exist for \(t > 0\) because all exponential moments are infinite.

32.5 1st Uncentered Moment

\[ \mu_1' = \frac{\alpha\, x_m}{\alpha - 1}, \quad \alpha > 1 \]

The first moment is infinite for \(\alpha \leq 1\).

32.6 2nd Uncentered Moment

\[ \mu_2' = \frac{\alpha\, x_m^2}{\alpha - 2}, \quad \alpha > 2 \]

32.7 3rd Uncentered Moment

\[ \mu_3' = \frac{\alpha\, x_m^3}{\alpha - 3}, \quad \alpha > 3 \]

32.8 4th Uncentered Moment

\[ \mu_4' = \frac{\alpha\, x_m^4}{\alpha - 4}, \quad \alpha > 4 \]

In general: \(\mu_n' = \dfrac{\alpha\, x_m^n}{\alpha - n}\) for \(n < \alpha\).

32.9 2nd Centered Moment

\[ \mu_2 = \frac{x_m^2\,\alpha}{(\alpha-1)^2(\alpha-2)}, \quad \alpha > 2 \]

32.10 3rd Centered Moment

\[ \mu_3 = \frac{2x_m^3\,\alpha(\alpha+1)}{(\alpha-1)^3(\alpha-2)(\alpha-3)}, \quad \alpha > 3 \]

32.11 4th Centered Moment

\[ \mu_4 = \frac{3\alpha\,x_m^4(3\alpha^2+\alpha+2)}{(\alpha-1)^4(\alpha-2)(\alpha-3)(\alpha-4)}, \quad \alpha > 4 \]

32.12 Expected Value

\[ \text{E}(X) = \frac{\alpha\, x_m}{\alpha - 1}, \quad \alpha > 1 \]

The mean is infinite for \(\alpha \leq 1\).

32.13 Variance

\[ \text{V}(X) = \frac{x_m^2\,\alpha}{(\alpha-1)^2(\alpha-2)}, \quad \alpha > 2 \]

The variance is infinite for \(\alpha \leq 2\).

32.14 Median

\[ \text{Med}(X) = x_m \cdot 2^{1/\alpha} \]

32.15 Mode

\[ \text{Mo}(X) = x_m \]

The density is strictly decreasing on \([x_m, \infty)\), so the mode is always at the left boundary of the support.

32.16 Coefficient of Skewness

\[ g_1 = \frac{2(1+\alpha)}{\alpha-3}\sqrt{\frac{\alpha-2}{\alpha}}, \quad \alpha > 3 \]

The Pareto distribution is always positively skewed.

32.17 Coefficient of Kurtosis

\[ g_2 = 3\,\frac{(\alpha-2)(3\alpha^2+\alpha+2)}{\alpha(\alpha-3)(\alpha-4)}, \quad \alpha > 4 \]

32.18 Parameter Estimation

The MLE estimators have exact closed forms:

\[ \hat{x}_m = x_{(1)} = \min_i x_i, \qquad \hat{\alpha} = \frac{n}{\displaystyle\sum_{i=1}^n \ln(x_i/\hat{x}_m)} \]

# Example: income data above a threshold (thousands USD)
set.seed(42)
xm_true <- 20; alpha_true <- 2.5
u <- runif(100)
income <- xm_true / u^(1/alpha_true)

# MLE
xm_hat <- min(income)
alpha_hat <- length(income) / sum(log(income / xm_hat))
cat("MLE xm:   ", round(xm_hat, 4), "\n")
cat("MLE alpha:", round(alpha_hat, 4), "\n")
cat("True xm:", xm_true, "  True alpha:", alpha_true, "\n")

MLE xm:    20.0896 
MLE alpha: 2.3655 
True xm: 20   True alpha: 2.5

32.19 R Module

32.19.1 RFC

The Pareto Distribution module is available in RFC under the menu “Distributions / Pareto Distribution”.

32.19.2 Direct app link

https://shiny.wessa.net/pareto/

32.19.3 R Code

The following code demonstrates Pareto probability calculations:

xm <- 20; alpha <- 2.5

# Custom functions (no base R built-in)
dpareto <- function(x, xm, alpha) ifelse(x >= xm, alpha * xm^alpha / x^(alpha + 1), 0)
ppareto <- function(x, xm, alpha) ifelse(x >= xm, 1 - (xm / x)^alpha, 0)

# Probability density at x = 25
dpareto(25, xm, alpha)

# P(X <= 50): distribution function
ppareto(50, xm, alpha)

# P(X > 50): survival function
1 - ppareto(50, xm, alpha)

# Mean income (alpha > 1)
alpha * xm / (alpha - 1)

[1] 0.05724334
[1] 0.8988071
[1] 0.1011929
[1] 33.33333

32.20 Example

Annual household income (in thousands of USD) above a minimum of \(x_m = 20\)k is modelled as \(X \sim \text{Pareto}(x_m = 20, \alpha = 2.5)\). The mean income is \(\alpha x_m / (\alpha - 1) = 2.5 \times 20 / 1.5 \approx 33.3\)k.

xm <- 20; alpha <- 2.5
ppareto <- function(x, xm, alpha) ifelse(x >= xm, 1 - (xm / x)^alpha, 0)

# P(X > 50): income exceeds 50k
cat("P(income > 50k):", 1 - ppareto(50, xm, alpha), "\n")

# Mean income
cat("Mean income (k USD):", alpha * xm / (alpha - 1), "\n")

# Median income
cat("Median income (k USD):", xm * 2^(1/alpha), "\n")

P(income > 50k): 0.1011929 
Mean income (k USD): 33.33333 
Median income (k USD): 26.39016

Interactive Shiny app (click to load).

Open in new tab

32.21 Random Number Generator

Pareto random variates are generated via the inverse-CDF method. Since \(F(x) = 1 - (x_m/x)^\alpha\), setting \(U = F(X)\) and solving gives:

\[ X = x_m\, U^{-1/\alpha} \sim \text{Pareto}(x_m, \alpha) \quad \text{when } U \sim \text{U}(0,1) \]

set.seed(123)
n <- 1000
xm <- 20; alpha <- 2.5

# Inverse-transform method
u <- runif(n)
x_inv <- xm * u^(-1/alpha)

cat("Simulated mean:", round(mean(x_inv), 4), "\n")
cat("Theoretical mean:", alpha * xm / (alpha - 1), "\n")
cat("Simulated median:", round(median(x_inv), 4), "\n")
cat("Theoretical median:", xm * 2^(1/alpha), "\n")

Simulated mean: 33.5105 
Theoretical mean: 33.33333 
Simulated median: 26.6054 
Theoretical median: 26.39016

Interactive Shiny app (click to load).

Open in new tab

32.22 Property 1: Power-Law Survival Function

The survival (complementary CDF) function has an exact power-law form:

\[ P(X > x) = \left(\frac{x_m}{x}\right)^\alpha \]

This means that doubling \(x\) reduces the probability by a factor of \(2^\alpha\), regardless of the starting value — a key signature of scale invariance.

32.23 Property 2: Finite Moments Depend on Shape

Mean exists only for \(\alpha > 1\)
Variance exists only for \(\alpha > 2\)
Skewness is defined only for \(\alpha > 3\)
Kurtosis is defined only for \(\alpha > 4\)

Heavy-tailed phenomena with \(\alpha \leq 2\) are common in practice; for these, sample means and variances are poor estimators.

32.24 Property 3: The 80/20 Rule

Under a Pareto model for wealth distribution, the “80/20 rule” (80% of outcomes come from 20% of causes) corresponds to shape \(\alpha = \log 5 / \log 4 \approx 1.161\).

32.1 Probability Density Function

32.2 Purpose

32.3 Distribution Function

32.4 Moment Generating Function

32.5 1st Uncentered Moment

32.6 2nd Uncentered Moment

32.7 3rd Uncentered Moment

32.8 4th Uncentered Moment

32.9 2nd Centered Moment

32.10 3rd Centered Moment

32.11 4th Centered Moment

32.12 Expected Value

32.13 Variance

32.14 Median

32.15 Mode

32.16 Coefficient of Skewness

32.17 Coefficient of Kurtosis

32.18 Parameter Estimation

32.19 R Module

32.19.1 RFC

32.19.2 Direct app link

32.19.3 R Code

32.20 Example

32.21 Random Number Generator

32.22 Property 1: Power-Law Survival Function

32.23 Property 2: Finite Moments Depend on Shape

32.24 Property 3: The 80/20 Rule

32.25 Related Distributions 1: Exponential Distribution

32.26 Related Distributions 2: Beta Prime Distribution

32.27 Related Distributions 3: Fréchet Distribution