30 Beta Distribution

The Beta distribution is designed for quantities bounded between zero and one: proportions, probabilities, and rates. It is the standard choice whenever a fraction or probability is itself uncertain, and it is the natural conjugate prior for Binomial data in Bayesian analysis.

Formally, the random variate \(X\) defined for the range \(0 \leq X \leq 1\), is said to have a Beta Distribution (i.e. \(X \sim \text{Beta}(\alpha, \beta)\)) with shape parameters \(\alpha > 0\) and \(\beta > 0\).

The Beta distribution is the natural choice for modelling proportions, probabilities, and rates constrained to \([0, 1]\). In R, the two shape parameters are referred to as shape1 (\(= \alpha\)) and shape2 (\(= \beta\)). The Beta distribution also serves as the conjugate prior for the Binomial and Bernoulli likelihoods in Bayesian inference (see Chapter 7 and Chapter 113).

30.1 Probability Density Function

\[ f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{\text{B}(\alpha, \beta)}, \quad 0 \leq x \leq 1 \]

where \(\text{B}(\alpha, \beta) = \Gamma(\alpha)\Gamma(\beta)/\Gamma(\alpha+\beta)\) is the Beta function.

The figure below shows examples of the Beta Probability Density Function for different parameter combinations.

Code

par(mfrow = c(2, 2))
x <- seq(0, 1, length = 500)

plot(x, dbeta(x, shape1 = 0.5, shape2 = 0.5), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "f(x)", main = expression(paste(alpha == 0.5, ",  ", beta == 0.5)))

plot(x, dbeta(x, shape1 = 1, shape2 = 1), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "f(x)", main = expression(paste(alpha == 1, ",  ", beta == 1)))

plot(x, dbeta(x, shape1 = 2, shape2 = 5), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "f(x)", main = expression(paste(alpha == 2, ",  ", beta == 5)))

plot(x, dbeta(x, shape1 = 5, shape2 = 2), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "f(x)", main = expression(paste(alpha == 5, ",  ", beta == 2)))

par(mfrow = c(1, 1))

Figure 30.1: Beta Probability Density Function for various parameter combinations

30.2 Purpose

The Beta distribution is used whenever the quantity of interest is a proportion, probability, or rate — something constrained to \([0, 1]\). Its two shape parameters allow it to take a wide variety of shapes: symmetric or skewed in either direction, bell-shaped or U-shaped, concentrated near a boundary or spread across the full interval. Common applications include:

Modelling click-through rates, conversion rates, and defect proportions
Bayesian posterior for an unknown success probability (conjugate prior for Binomial data)
Prior and posterior distributions for proportions in A/B testing
Representing subjective uncertainty about an unknown probability
Distribution of order statistics from the Uniform distribution on \([0,1]\)

Relation to the discrete setting. The Beta distribution is the continuous analog of the Binomial distribution in a precise Bayesian sense: if the unknown success probability \(p\) is given a \(\text{Beta}(\alpha, \beta)\) prior and \(k\) successes are observed in \(n\) trials, the posterior is \(\text{Beta}(\alpha + k,\, \beta + n - k)\). The Binomial models discrete counts given a fixed probability; the Beta models uncertainty about that probability itself. Beta\((\alpha, \beta)\) with positive-integer parameters is also the distribution of the \(\alpha\)-th order statistic from \((\alpha + \beta - 1)\) i.i.d. \(\text{U}(0,1)\) draws, linking it to the Uniform distribution on the discrete side.

30.3 Distribution Function

\[ F(x) = I_x(\alpha, \beta), \quad 0 \leq x \leq 1 \]

where \(I_x(\alpha, \beta) = \text{B}(x;\, \alpha, \beta)/\text{B}(\alpha, \beta)\) is the regularized incomplete beta function. It is computed by pbeta() in R.

The figure below shows the Beta Distribution Function for \(\alpha = 2\) and \(\beta = 5\).

Code

x <- seq(0, 1, length = 500)
plot(x, pbeta(x, shape1 = 2, shape2 = 5), type = "l", lwd = 2, col = "blue",
     xlab = "x", ylab = "F(x)", main = "Beta Distribution Function",
     sub = expression(paste(alpha == 2, ",  ", beta == 5)))

Figure 30.2: Beta Distribution Function (alpha = 2, beta = 5)

30.4 Moment Generating Function

The moment generating function of the Beta distribution is expressed as a confluent hypergeometric series:

\[ M_X(t) = 1 + \sum_{k=1}^{\infty} \left(\prod_{r=0}^{k-1} \frac{\alpha + r}{\alpha + \beta + r}\right) \frac{t^k}{k!} \]

There is no simple closed form. All moments of the Beta distribution are finite.

30.5 1st Uncentered Moment

\[ \mu_1' = \frac{\alpha}{\alpha + \beta} \]

30.6 2nd Uncentered Moment

\[ \mu_2' = \frac{\alpha(\alpha+1)}{(\alpha+\beta)(\alpha+\beta+1)} \]

30.7 3rd Uncentered Moment

\[ \mu_3' = \frac{\alpha(\alpha+1)(\alpha+2)}{(\alpha+\beta)(\alpha+\beta+1)(\alpha+\beta+2)} \]

30.8 4th Uncentered Moment

\[ \mu_4' = \frac{\alpha(\alpha+1)(\alpha+2)(\alpha+3)}{(\alpha+\beta)(\alpha+\beta+1)(\alpha+\beta+2)(\alpha+\beta+3)} \]

The general formula is \(\mu_n' = \prod_{i=0}^{n-1} \frac{\alpha+i}{\alpha+\beta+i}\).

30.9 2nd Centered Moment

\[ \mu_2 = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \]

30.10 3rd Centered Moment

\[ \mu_3 = \frac{2\alpha\beta(\beta - \alpha)}{(\alpha+\beta)^3(\alpha+\beta+1)(\alpha+\beta+2)} \]

30.11 4th Centered Moment

\[ \mu_4 = \mu_4' - 4\mu_1'\mu_3' + 6(\mu_1')^2\mu_2' - 3(\mu_1')^4 \]

where the uncentered moments \(\mu_1', \ldots, \mu_4'\) are given by the formulas above. An equivalent expression in terms of the variance and kurtosis is \(\mu_4 = g_2 \cdot \mu_2^2\), where \(g_2\) is defined in Section Section 30.17.

30.12 Expected Value

\[ \text{E}(X) = \frac{\alpha}{\alpha + \beta} \]

30.13 Variance

\[ \text{V}(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)} \]

30.14 Median

The median of the Beta distribution has no general closed form. It is computed numerically in R:

# Median for Beta(2, 5)
qbeta(0.5, shape1 = 2, shape2 = 5)

[1] 0.26445

A well-known approximation is \(\text{Med}(X) \approx (\alpha - 1/3)/(\alpha + \beta - 2/3)\) for \(\alpha, \beta > 1\), but direct computation via qbeta is preferred.

30.15 Mode

\[ \text{Mo}(X) = \frac{\alpha - 1}{\alpha + \beta - 2} \quad \text{for } \alpha > 1 \text{ and } \beta > 1 \]

Special cases:

If \(\alpha = 1\) and \(\beta = 1\): any value in \([0,1]\) is a mode (Uniform distribution).
If \(\alpha < 1\) and \(\beta < 1\): the density is U-shaped with modes at 0 and 1.
If \(\alpha \leq 1\) and \(\beta > 1\) (or \(\alpha > 1\) and \(\beta \leq 1\)): the mode is at the boundary 0 (or 1).

30.16 Coefficient of Skewness

\[ g_1 = \frac{2(\beta - \alpha)\sqrt{\alpha+\beta+1}}{(\alpha+\beta+2)\sqrt{\alpha\beta}} \]

The distribution is symmetric when \(\alpha = \beta\), right-skewed when \(\alpha < \beta\), and left-skewed when \(\alpha > \beta\).

30.17 Coefficient of Kurtosis

\[ g_2 = 3 + \frac{6\left[(\alpha-\beta)^2(\alpha+\beta+1) - \alpha\beta(\alpha+\beta+2)\right]}{\alpha\beta(\alpha+\beta+2)(\alpha+\beta+3)} \]

When \(\alpha = \beta = 1\) (Uniform distribution), the kurtosis is \(g_2 = 9/5\).

30.18 Parameter Estimation

The maximum likelihood estimators of \(\alpha\) and \(\beta\) require numerical optimization. Method-of-moments starting values are:

\[ \tilde{\alpha} = \bar{x}\left(\frac{\bar{x}(1-\bar{x})}{s^2} - 1\right), \quad \tilde{\beta} = (1-\bar{x})\left(\frac{\bar{x}(1-\bar{x})}{s^2} - 1\right) \]

where \(\bar{x}\) is the sample mean and \(s^2\) is the sample variance. The fitdistr function in R uses numerical MLE.

30.19 R Module

30.19.1 RFC

The Beta Distribution module is available in RFC under the menu “Distributions / Beta Distribution”.

30.19.2 Direct app link

https://shiny.wessa.net/beta/

30.19.3 R Code

The following code demonstrates Beta probability calculations:

# Probability density function: f(x)
dbeta(x = 0.1, shape1 = 5, shape2 = 45)

# Distribution function: P(X <= x)
pbeta(q = 0.1, shape1 = 5, shape2 = 45)

# Quantile function
qbeta(p = 0.5, shape1 = 5, shape2 = 45)

# Generate random Beta numbers
set.seed(42)
rbeta(n = 10, shape1 = 5, shape2 = 45)

[1] 9.24623
[1] 0.5503091
[1] 0.09467489
 [1] 0.07488839 0.11969882 0.13693222 0.12216502 0.09484010 0.21831656
 [7] 0.09538634 0.09692234 0.19461753 0.04600264

To fit a Beta distribution to observed data:

library(MASS)

# Example: click-through rate data (proportions)
set.seed(7)
ctr_data <- rbeta(100, shape1 = 5, shape2 = 45)

fit <- fitdistr(ctr_data, "beta",
                start = list(shape1 = 2, shape2 = 10))
print(fit)

     shape1       shape2  
   6.1332559   52.5077774 
 ( 0.8449713) ( 7.5016311)

30.20 Example

A website’s click-through rate (CTR) is modelled using Bayesian inference. We observe 5 clicks out of 50 impressions and combine this with a weak prior \(\text{Beta}(1, 1)\) (Uniform, expressing no prior knowledge). Bayesian updating with a Binomial likelihood gives the posterior:

\[ \text{Beta}(1 + 5,\; 1 + 45) = \text{Beta}(6,\; 46) \]

Alternatively, starting from a slightly informative prior \(\text{Beta}(3, 27)\) representing past experience of a ~10% CTR:

\[ \text{Prior: } \text{Beta}(3, 27) \quad \longrightarrow \quad \text{Posterior: } \text{Beta}(3+5,\; 27+45) = \text{Beta}(8,\; 72) \]

# Prior: Beta(3, 27)  (encoding prior belief of ~10% CTR)
# Likelihood: 5 clicks in 50 impressions
# Posterior: Beta(3+5, 27+45) = Beta(8, 72)

a_post <- 3 + 5
b_post <- 27 + 45

cat("Posterior mean CTR:", round(a_post / (a_post + b_post), 4), "\n")
cat("Posterior mode CTR:", round((a_post - 1) / (a_post + b_post - 2), 4), "\n")
cat("95% credible interval: [",
    round(qbeta(0.025, a_post, b_post), 4), ",",
    round(qbeta(0.975, a_post, b_post), 4), "]\n")

Posterior mean CTR: 0.1 
Posterior mode CTR: 0.0897 
95% credible interval: [ 0.0447 , 0.1741 ]

Interactive Shiny app (click to load).

Open in new tab

30.21 Random Number Generator

Beta random variates can be generated from two independent Gamma variates. If \(Y_1 \sim \text{Gamma}(\alpha, 1)\) and \(Y_2 \sim \text{Gamma}(\beta, 1)\) are independent, then:

\[ X = \frac{Y_1}{Y_1 + Y_2} \sim \text{Beta}(\alpha, \beta) \]

set.seed(123)
n     <- 1000
alpha <- 2
beta_ <- 5

# Gamma-ratio method
y1 <- rgamma(n, shape = alpha, rate = 1)
y2 <- rgamma(n, shape = beta_,  rate = 1)
x_ratio <- y1 / (y1 + y2)

# Built-in function
x_rbeta <- rbeta(n, shape1 = alpha, shape2 = beta_)

cat("Gamma-ratio: mean =", round(mean(x_ratio), 4),
    "  var =", round(var(x_ratio), 4), "\n")
cat("rbeta():     mean =", round(mean(x_rbeta), 4),
    "  var =", round(var(x_rbeta), 4), "\n")
cat("Theoretical: mean =", alpha/(alpha+beta_),
    "  var =", round(alpha*beta_/((alpha+beta_)^2*(alpha+beta_+1)), 4), "\n")

Gamma-ratio: mean = 0.2786   var = 0.0246 
rbeta():     mean = 0.2845   var = 0.0258 
Theoretical: mean = 0.2857143   var = 0.0255

Code

set.seed(123)
x <- rbeta(1000, shape1 = 2, shape2 = 5)
hist(x, breaks = 35, col = "steelblue", freq = FALSE,
     xlab = "x", main = "Beta Random Numbers (n = 1000, alpha = 2, beta = 5)")
curve(dbeta(x, shape1 = 2, shape2 = 5), add = TRUE, col = "red", lwd = 2)
legend("topright", legend = "Theoretical density", col = "red", lwd = 2)

Figure 30.3: Histogram of simulated Beta random numbers (n = 1000, alpha = 2, beta = 5)

Interactive Shiny app (click to load).

Open in new tab

30.22 Property 1: Uniform as Special Case

The Uniform distribution on \([0,1]\) is the special case \(\alpha = \beta = 1\) (see Chapter 19):

\[ \text{Beta}(1, 1) = \text{U}(0, 1) \]

30.23 Property 2: Reflection Symmetry

If \(X \sim \text{Beta}(\alpha, \beta)\) then \(1 - X \sim \text{Beta}(\beta, \alpha)\). This reflects the symmetry of the density: swapping the two shape parameters is equivalent to reflecting the distribution about \(x = 1/2\).

30.24 Property 3: Conjugate Prior for Bernoulli and Binomial

The Beta distribution is the conjugate prior for the success probability \(\theta\) in a Bernoulli or Binomial model. If the prior is \(\theta \sim \text{Beta}(\alpha, \beta)\) and \(k\) successes are observed in \(n\) trials, the posterior is:

\[ \theta \mid k \sim \text{Beta}(\alpha + k,\; \beta + n - k) \]

This closed-form Bayesian updating rule is the basis of the Bayesian approach to proportion estimation (see Chapter 7 and Chapter 113).

30.1 Probability Density Function

30.2 Purpose

30.3 Distribution Function

30.4 Moment Generating Function

30.5 1st Uncentered Moment

30.6 2nd Uncentered Moment

30.7 3rd Uncentered Moment

30.8 4th Uncentered Moment

30.9 2nd Centered Moment

30.10 3rd Centered Moment

30.11 4th Centered Moment

30.12 Expected Value

30.13 Variance

30.14 Median

30.15 Mode

30.16 Coefficient of Skewness

30.17 Coefficient of Kurtosis

30.18 Parameter Estimation

30.19 R Module

30.19.1 RFC

30.19.2 Direct app link

30.19.3 R Code

30.20 Example

30.21 Random Number Generator

30.22 Property 1: Uniform as Special Case

30.23 Property 2: Reflection Symmetry

30.24 Property 3: Conjugate Prior for Bernoulli and Binomial

30.25 Related Distributions 1: Uniform Distribution

30.26 Related Distributions 2: Bayesian Inference for Proportions

30.27 Related Distributions 3: Arcsine Distribution

30.28 Related Distributions 4: Relation to the F-Distribution

30.29 Related Distributions 5: Dirichlet Distribution