113 Bayesian Inference for Decision-Making

Bayesian inference is often a more natural decision framework than mechanical one-alpha testing because it reports probabilities directly in decision terms:

posterior probability of the claim,
posterior probability of being wrong if we act now,
and posterior uncertainty intervals.

This chapter connects the ideas in Chapter 7 and Chapter 30 to the threshold logic in Chapter 112.

113.1 Practical Relevance

In many applied settings, decision makers do not ask:

“Is the p-value below 5%?”

They ask:

“Given the data, how likely is the claim?”
“How much risk of being wrong remains if we act now?”
“What decision threshold is appropriate for this business context?”

Bayesian inference answers these directly through posterior probabilities.

113.2 Beta-Binomial Update (Core Model)

Suppose an unknown event probability is denoted by \(p\) (for example: fraud rate, conversion rate, default rate).
We observe \(n\) trials and count how many of them are events.

The notation is:

Symbol	Meaning
\(p\)	the unknown underlying event probability in the population or process
\(n\)	the number of observed trials
\(Y\)	the random variable for the number of events in those \(n\) trials before the data are observed
\(y\)	the realized observed number of events after the data are collected

So \(y / n\) is the observed sample proportion, not the true probability itself. It is a data-based estimate of \(p\), but it is not equal to \(p\).

For Binomial data with \(y\) observed events in \(n\) trials:

\[ Y \mid p \sim \text{Binomial}(n, p) \]

and with a Beta prior:

\[ p \sim \text{Beta}(\alpha_0, \beta_0), \]

the posterior is:

\[ p \mid y \sim \text{Beta}(\alpha_0 + y,\; \beta_0 + n - y). \]

This is the practical bridge between:

Bayes’ theorem (Chapter 7),
the Beta distribution (Chapter 30),
and threshold-based decisions (Chapter 112).

113.2.1 Conjugate Priors (what this means)

A prior is called conjugate for a likelihood when the posterior stays in the same distribution family after updating with data.

In this chapter:

prior: \(p \sim \text{Beta}(\alpha_0,\beta_0)\)
likelihood: \(Y\mid p \sim \text{Binomial}(n,p)\)
posterior: \(p\mid y \sim \text{Beta}(\alpha_0+y,\beta_0+n-y)\)

This matters because updating becomes transparent, fast, and easy to explain in teaching and practice.

113.2.2 Choosing a Beta prior in practice

The Beta prior is not just a technical convenience. It is also easy to interpret:

\[ E(p) = \frac{\alpha_0}{\alpha_0 + \beta_0}. \]

So:

the ratio \(\alpha_0 / (\alpha_0 + \beta_0)\) sets your best prior guess for the event rate,
the total \(\alpha_0 + \beta_0\) controls how strongly that prior pulls on the posterior.

In practice, you can think of \(\alpha_0 + \beta_0\) as the prior strength in pseudo-observations. A prior such as \(\text{Beta}(3,97)\) says: “before seeing the new data, I expect about 3% events, and I hold that view with roughly the weight of 100 prior observations.”

This is exactly why \(\text{Beta}(3,97)\), \(\text{Beta}(30,970)\), and \(\text{Beta}(300,9700)\) are not equivalent even though they all have prior mean \(0.03\):

\(\text{Beta}(3,97)\) has prior strength \(100\),
\(\text{Beta}(30,970)\) has prior strength \(1000\),
\(\text{Beta}(300,9700)\) has prior strength \(10000\).

All three priors say “I expect about 3% events,” but they differ in how stubborn that belief is. A stronger prior is harder for the new data to move. It is therefore useful to separate two ideas:

the mean tells you where the prior is centered,
the strength tells you how strongly it resists being updated.

113.3 Decision Rule in Posterior Terms

Define a business-relevant claim, for example \(p < p_0\) or \(p > p_0\).
Here \(p_0\) is the decision threshold that separates acceptable from unacceptable values of the event probability. For example, if management says “the fraud rate must be below 2%,” then \(p_0 = 0.02\).

Then compute the posterior probability of that claim:

\[ q = P(\text{claim} \mid \text{data}). \]

Choose a decision threshold \(\tau\) by context (confirmatory, balanced, diagnostic):

\[ \text{Support claim if } q \ge \tau. \]

This makes the threshold explicit and interpretable:

strict settings: high \(\tau\) (e.g. 0.98),
diagnostic/screening settings: lower \(\tau\) (e.g. 0.80).

113.4 Bayes Factor and Bayes Error Rate

113.4.1 Bayes Factor

For point-null comparison, we can compare two competing statistical statements:

\(H_0\): the event probability is fixed exactly at the threshold value, \(p = p_0\),
\(H_1\): the event probability is not fixed at one value; instead, many plausible values of \(p\) are allowed, weighted by a Beta prior.

So when this chapter says “\(H_1\) with a Beta prior,” it means that under \(H_1\) we do not commit to one single value of \(p\). We spread prior belief across a whole range of possible values of \(p\) using a Beta distribution.

The Bayes factor is then:

\[ \text{BF}_{10} = \frac{P(\text{data}\mid H_1)}{P(\text{data}\mid H_0)}. \]

It quantifies relative evidence between models, while \(P(p < p_0 \mid \text{data})\) answers a directional posterior claim about the parameter.
These are related but distinct: \(\text{BF}_{10}\) compares \(H_0: p=p_0\) to the full \(H_1\) prior (all plausible \(p\) values), so \(\text{BF}_{10}\) and \(P(p < p_0 \mid \text{data})\) can point in different directions.

113.4.2 How large should BF10 be?

Students often ask for a single universal cutoff (for example: BF10 > 3 or > 10).
This is usually the wrong first question for the same reason that one fixed alpha is problematic.

A better workflow is:

specify the decision context and the costs of errors,
report the Bayes factor and posterior model probability,
show sensitivity to alternative reasonable priors,
then decide whether evidence is sufficient for the concrete decision.

So Bayes-factor thresholds are context guides, not universal constants.

For orientation, however, students often benefit from a first reading scale:

\(\text{BF}_{10}\)	Rough classroom interpretation
about 1	little separation between \(H_1\) and \(H_0\)
3 to 10	moderate support for \(H_1\)
greater than 10	strong support for \(H_1\)
below 1	evidence tilts toward \(H_0\) instead

This table is only a starting point. It should never replace context, prior sensitivity, and a clear statement of the actual decision threshold.

113.4.3 Bayes Error Rate (Theoretical vs Operational)

In binary decision settings, two different quantities are often called “Bayes error”:

Theoretical Bayes error rate: minimum achievable misclassification probability under the true class-conditional distributions (not directly computable in practical finite-data settings).
Operational posterior decision error: threshold-dependent local decision error computed from posterior class probabilities.

The reason both appear in the literature is that they answer different questions:

the theoretical quantity is a benchmark or lower bound,
the operational quantity is what you can actually compute from model-based posterior scores in practice.

For the operational quantity, once posterior class probabilities are available, each case has a local decision error probability:

if classify as positive: \(1 - P(\text{positive}\mid x)\),
if classify as negative: \(P(\text{positive}\mid x)\).

The average of these local error probabilities over evaluated cases provides a practical, threshold-dependent decision error summary.
The R snippet below reports this operational quantity.

113.5 Business Example: Fraud-Rate Decision

Assume a payment team deploys a new rule and wants to support the operational claim:

\[ p < 0.02 \]

where \(p\) is the underlying fraud rate after deployment.

Prior belief from historical process knowledge:

\[ p \sim \text{Beta}(3, 97) \]

(prior mean \(= \alpha_0 / (\alpha_0 + \beta_0) = 3 / 100 = 3\%\)).
Observed in a recent batch: \(y=7\) fraud events out of \(n=400\).

The observed fraud rate in the batch is

\[ \hat{p} = \frac{7}{400} = 0.0175, \]

which is below the target threshold of 2%. Even so, the posterior probability of the claim will not automatically be high, because the prior mean was 3% and 400 observations do not completely overwhelm that prior.

113.5.1 R snippet 1: posterior update and decision summary

# Prior and data
alpha0 <- 3
beta0  <- 97
n <- 400
y <- 7
p0 <- 0.02
tau <- 0.90   # decision threshold for P(p < p0 | data)

# Posterior
alpha_post <- alpha0 + y
beta_post  <- beta0 + n - y

prior_mean <- alpha0 / (alpha0 + beta0)
observed_rate <- y / n
post_mean <- alpha_post / (alpha_post + beta_post)
post_prob_claim <- pbeta(p0, shape1 = alpha_post, shape2 = beta_post)  # P(p < p0 | data)
ci90 <- qbeta(c(0.05, 0.95), shape1 = alpha_post, shape2 = beta_post)

decision <- if (post_prob_claim >= tau) "Support claim p < 0.02" else "Do not support claim yet"

if (post_prob_claim >= tau) {
  risk_label <- "Probability of decision error if we support the claim now"
  risk_value <- 1 - post_prob_claim
} else {
  risk_label <- "Probability the claim is true even though we do not support it yet"
  risk_value <- post_prob_claim
}

summary_tab <- data.frame(
  quantity = c(
    "Prior mean",
    "Observed rate",
    "Posterior mean",
    "P(p < 0.02 | data)",
    "90% credible interval",
    "Decision",
    risk_label
  ),
  value = c(
    sprintf("%.4f", prior_mean),
    sprintf("%.4f", observed_rate),
    sprintf("%.4f", post_mean),
    sprintf("%.4f", post_prob_claim),
    sprintf("[%.4f, %.4f]", ci90[1], ci90[2]),
    decision,
    sprintf("%.4f", risk_value)
  )
)

knitr::kable(summary_tab, col.names = c("Summary", "Value"))

Summary	Value
Prior mean	0.0300
Observed rate	0.0175
Posterior mean	0.0200
P(p < 0.02 \| data)	0.5408
90% credible interval	[0.0109, 0.0313]
Decision	Do not support claim yet
Probability the claim is true even though we do not support it yet	0.5408

113.5.2 R snippet 2: prior vs posterior view

x <- seq(0, 0.06, length.out = 600)
prior_dens <- dbeta(x, alpha0, beta0)
post_dens <- dbeta(x, alpha_post, beta_post)

plot(x, prior_dens, type = "l", lwd = 2, col = "steelblue",
     xlab = "Fraud rate p", ylab = "Density", main = "")
lines(x, post_dens, lwd = 2, col = "firebrick")
abline(v = p0, lty = 2, lwd = 2, col = "grey40")

shade_x <- x[x <= p0]
shade_y <- post_dens[x <= p0]
polygon(c(shade_x, rev(shade_x)), c(shade_y, rep(0, length(shade_y))),
        col = adjustcolor("firebrick", alpha.f = 0.20), border = NA)

legend("topright",
       legend = c("prior", "posterior", "threshold p = 0.02"),
       col = c("steelblue", "firebrick", "grey40"),
       lty = c(1, 1, 2), lwd = c(2, 2, 2), bty = "n", cex = 0.85)

Prior and posterior Beta densities for the fraud-rate example. The vertical line marks the operational claim threshold p = 0.02.

The shaded posterior area to the left of \(p_0 = 0.02\) is exactly the probability used in the decision rule:

\[ P(p < 0.02 \mid \text{data}) \approx 0.54. \]

That makes the decision logic easier to interpret: even though the observed rate is below 2%, the posterior mass below 2% is still only about 54%, far below the decision threshold \(\tau = 0.90\).

113.5.3 R snippet 3: sensitivity to alternative reasonable priors

prior_grid <- data.frame(
  prior = c("Uniform prior", "Historical prior", "Stronger historical prior"),
  alpha0 = c(1, 3, 6),
  beta0  = c(1, 97, 194)
)

sensitivity_tab <- do.call(
  rbind,
  lapply(seq_len(nrow(prior_grid)), function(i) {
    a0 <- prior_grid$alpha0[i]
    b0 <- prior_grid$beta0[i]
    a1 <- a0 + y
    b1 <- b0 + n - y
    q_claim <- pbeta(p0, a1, b1)
    data.frame(
      prior = prior_grid$prior[i],
      prior_mean = round(a0 / (a0 + b0), 4),
      posterior_mean = round(a1 / (a1 + b1), 4),
      `P(p < 0.02 | data)` = round(q_claim, 4),
      decision_at_tau_0.90 = ifelse(q_claim >= tau, "support", "do not support")
    )
  })
)

knitr::kable(sensitivity_tab)

prior	prior_mean	posterior_mean	P.p…0.02…data.	decision_at_tau_0.90
Uniform prior	0.50	0.0199	0.5513	do not support
Historical prior	0.03	0.0200	0.5408	do not support
Stronger historical prior	0.03	0.0217	0.4217	do not support

This table is not an argument for changing priors until a desired decision appears. It is a reminder that Bayesian decisions should be accompanied by reasonable prior sensitivity checks.

113.5.4 R snippet 4: Bayes factor for a point-null comparison

# Bayes factor H0: p = p0 versus H1: p ~ Beta(alpha0, beta0)
log_m1 <- lchoose(n, y) + lbeta(alpha_post, beta_post) - lbeta(alpha0, beta0)
log_m0 <- lchoose(n, y) + y * log(p0) + (n - y) * log(1 - p0)
bf10 <- exp(log_m1 - log_m0)

cat("BF10 (H1 vs H0: p = 0.02) =", round(bf10, 6), "\n")

BF10 (H1 vs H0: p = 0.02) = 0.428271

Here the Bayes factor is slightly below 1, so it tilts mildly toward the point null \(H_0: p = 0.02\). That does not contradict the posterior claim probability above. The two quantities answer different questions:

the posterior claim probability asks whether \(p\) is below 0.02,
the Bayes factor compares one exact point null against the whole alternative prior.

113.5.5 R snippet 5: practical Bayes decision error on scored cases

# The fraud-rate example above concerned one population parameter p.
# In a scoring system, we instead work with posterior probabilities for individual cases.
# The values below are illustrative case-level scores; they are not computed from the Beta-Binomial example above.
post_fraud <- c(0.03, 0.08, 0.12, 0.19, 0.27, 0.44, 0.61, 0.74, 0.86, 0.93)
threshold <- 0.25

pred_label <- ifelse(post_fraud >= threshold, "fraud", "legit")
local_error <- ifelse(pred_label == "fraud", 1 - post_fraud, post_fraud)

cat("Average local decision error probability =", round(mean(local_error), 5), "\n")

Average local decision error probability = 0.257

113.5.6 Interactive app (same logic)

Interactive Shiny app (click to load).

Open in new tab

Try the app with the following changes:

increase \(n\) while keeping the observed fraud rate roughly constant, and watch the posterior tighten,
replace \(\text{Beta}(3,97)\) by \(\text{Beta}(1,1)\) and compare the decision,
switch the direction from \(p < p_0\) to \(p > p_0\) and see how the posterior claim probability changes.

113.6 Relation to Other Chapters

For theorem-level foundations: Chapter 7
For Beta prior/posterior mechanics: Chapter 30
For threshold selection across contexts: Chapter 112
For posterior analysis in two-sample settings: Chapter 122

113.7 Practical Takeaway

Bayesian inference does not remove the need for threshold choice. It improves the workflow by making thresholds explicit in posterior probability terms and by reporting decision risk directly.

113.8 Practical Exercises

Recompute the fraud example with \(\text{Beta}(1,1)\) instead of \(\text{Beta}(3,97)\). How much does \(P(p < 0.02 \mid \text{data})\) change?
Keep the prior \(\text{Beta}(3,97)\) but change the batch to \(y = 20\) frauds out of \(n = 1000\). Does the claim \(p < 0.02\) now reach a 90% decision threshold?
In the scored-case example, raise the threshold from 0.25 to 0.60. What happens to the average local decision error?
Explain in your own words why \(\text{BF}_{10}\) and \(P(p < p_0 \mid \text{data})\) are not the same quantity, even when they are based on the same observed data.