162 Hyperparameter Optimization Strategies

Many model settings are not estimated automatically from the data. They must be fixed before fitting begins. Those settings are hyperparameters.

Examples from this handbook include:

Method	Hyperparameter examples
Ridge / lasso / elastic net	`lambda`, `alpha`
Conditional random forest	`ntree`, `mtry`, `mincriterion`
Naive Bayes	Laplace correction, kernel choice
k-nearest neighbors	`k`
Smoothing methods	bandwidth or window width

The central rule is simple: hyperparameters should be chosen by validation, not by convenience.

162.1 Why Hyperparameter Tuning Is a Modeling Stage

Hyperparameter optimization is not an optional technical afterthought. It is one of the main modeling decisions.

If you make the model too flexible, it overfits. If you make it too rigid, it underfits. Tuning is the search for a defensible middle ground.

That is why the tuning stage belongs inside the larger workflow described in Section 158.1:

define the goal,
split the data honestly,
search over candidate settings,
compare held-out performance,
choose a setting,
only then report final performance on untouched data if such a split has been reserved.

162.2 Common Search Strategies

Strategy	Idea	Strength	Weakness
Manual tuning	try a few values by judgment	fast for teaching and small problems	can miss good regions entirely
Grid search	define a small grid and evaluate every combination	transparent and reproducible	becomes expensive as dimensions grow
Random search	sample settings at random from a sensible range	more efficient when many hyperparameters exist	less exhaustive, can feel less intuitive
Adaptive search	use previous results to guide the next trial	efficient for expensive models	harder to explain and audit

For this handbook, grid search is the best teaching device because students can see every candidate value and every comparison. In larger real-world problems, random or adaptive search often becomes more practical.

162.3 A Tuning Workflow That Stays Honest

The search itself can cause optimism if the same data are reused carelessly. The safe pattern is:

use the training data to fit the candidate settings,
use validation or repeated holdout to compare those settings,
keep a separate outer test or locked final test for one last report if the workflow is confirmatory.

This is the same discipline used in Chapter 161. The object being tuned changes; the logic does not.

162.4 Worked Example: Tuning `cforest` on `Pima.tr`

The current manual app now lets you fit cforest directly, but it does not expose a full search grid. The point of this chapter is therefore to show the search explicitly in R.

We tune two forest hyperparameters:

ntree: how many trees are averaged,
mtry: how many predictors are considered at each split.

To keep the example readable, the grid is deliberately small and each setting is evaluated across five repeated holdout splits.

library(MASS)
library(party)

data("Pima.tr", package = "MASS")

auc_manual <- function(actual, score) {
  r <- rank(score)
  n1 <- sum(actual == 1)
  n0 <- sum(actual == 0)
  (sum(r[actual == 1]) - n1 * (n1 + 1) / 2) / (n1 * n0)
}

yes_col <- function(mat) {
  nm <- colnames(mat)
  if ("Yes" %in% nm) return("Yes")
  hit <- grep("Yes$", nm, value = TRUE)
  if (length(hit) > 0) return(hit[1])
  stop("No positive-class column found.")
}

set.seed(123)
forest_grid <- expand.grid(
  ntree = c(50, 100, 200),
  mtry = c(2, 3, 4)
)

forest_grid$mean_auc <- NA_real_
forest_grid$sd_auc <- NA_real_

for (g in seq_len(nrow(forest_grid))) {
  aucs <- numeric(5)

  for (r in 1:5) {
    idx <- sample(seq_len(nrow(Pima.tr)), size = floor(0.8 * nrow(Pima.tr)))
    train_r <- Pima.tr[idx, ]
    test_r <- Pima.tr[-idx, ]

    fit_r <- cforest(
      type ~ .,
      data = train_r,
      controls = cforest_unbiased(
        ntree = forest_grid$ntree[g],
        mtry = forest_grid$mtry[g]
      )
    )

    prob_r <- do.call(
      rbind,
      predict(fit_r, newdata = test_r, OOB = FALSE, type = "prob")
    )

    aucs[r] <- auc_manual(
      as.integer(test_r$type == "Yes"),
      prob_r[, yes_col(prob_r)]
    )
  }

  forest_grid$mean_auc[g] <- mean(aucs)
  forest_grid$sd_auc[g] <- sd(aucs)
}

forest_grid <- forest_grid[order(-forest_grid$mean_auc, forest_grid$sd_auc), ]

knitr::kable(
  transform(
    forest_grid,
    mean_auc = round(mean_auc, 3),
    sd_auc = round(sd_auc, 3)
  ),
  caption = "Repeated-holdout AUC for a small cforest hyperparameter grid"
)

Repeated-holdout AUC for a small cforest hyperparameter grid
	ntree	mtry	mean_auc	sd_auc
6	200	3	0.881	0.048
8	100	4	0.855	0.041
1	50	2	0.822	0.061
2	100	2	0.820	0.051
3	200	2	0.819	0.025
5	100	3	0.817	0.057
4	50	3	0.805	0.063
7	50	4	0.792	0.034
9	200	4	0.792	0.047

162.4.1 Visualizing Accuracy and Reliability Together

The next figure uses the same tuning results in two ways:

the left panel shows mean AUC on the grid,
the right panel shows the tradeoff between average AUC and variability across resamples.

ntree_vals <- sort(unique(forest_grid$ntree))
mtry_vals <- sort(unique(forest_grid$mtry))

mean_mat <- matrix(
  NA_real_,
  nrow = length(mtry_vals),
  ncol = length(ntree_vals),
  dimnames = list(mtry_vals, ntree_vals)
)

for (i in seq_len(nrow(forest_grid))) {
  row_idx <- match(forest_grid$mtry[i], mtry_vals)
  col_idx <- match(forest_grid$ntree[i], ntree_vals)
  mean_mat[row_idx, col_idx] <- forest_grid$mean_auc[i]
}

zvals <- as.vector(mean_mat)
zbreaks <- seq(min(zvals) - 1e-6, max(zvals) + 1e-6, length.out = 11)
zcols <- gray.colors(length(zbreaks) - 1, start = 0.95, end = 0.45)

par(mfrow = c(1, 2), mar = c(4, 4, 2, 1))

plot.new()
plot.window(xlim = c(0.5, length(ntree_vals) + 0.5),
            ylim = c(0.5, length(mtry_vals) + 0.5))

for (i in seq_along(ntree_vals)) {
  for (j in seq_along(mtry_vals)) {
    val <- mean_mat[j, i]
    col_idx <- findInterval(val, zbreaks, all.inside = TRUE)
    rect(i - 0.5, j - 0.5, i + 0.5, j + 0.5,
         col = zcols[col_idx], border = "white")
    text(i, j, labels = sprintf("%.3f", val), cex = 0.9)
  }
}

axis(1, at = seq_along(ntree_vals), labels = ntree_vals)
axis(2, at = seq_along(mtry_vals), labels = mtry_vals)
box()
title(main = "Mean AUC on the search grid",
      xlab = "ntree",
      ylab = "mtry")

best_idx <- which.max(forest_grid$mean_auc)

plot(forest_grid$mean_auc, forest_grid$sd_auc,
     xlab = "Mean AUC",
     ylab = "SD across resamples",
     main = "Accuracy versus reliability",
     pch = 19, col = "grey40",
     xlim = c(min(forest_grid$mean_auc) - 0.01, max(forest_grid$mean_auc) + 0.01),
     ylim = c(max(forest_grid$sd_auc) + 0.005, min(forest_grid$sd_auc) - 0.005))

text(forest_grid$mean_auc, forest_grid$sd_auc,
     labels = paste0("(", forest_grid$ntree, ", ", forest_grid$mtry, ")"),
     pos = 3, cex = 0.8)

points(forest_grid$mean_auc[best_idx], forest_grid$sd_auc[best_idx],
       pch = 17, col = "firebrick", cex = 1.2)
abline(v = max(forest_grid$mean_auc), lty = 3, col = "grey70")
abline(h = min(forest_grid$sd_auc), lty = 3, col = "grey70")

Hyperparameter tuning for cforest on Pima.tr. Left: mean AUC on a small ntree × mtry grid. Right: average AUC versus variability across resamples.

The right-hand panel is especially important. The point with the highest mean AUC is not always the point with the lowest variability.

That gives two different questions:

Which setting is best on average?
Which setting is most reliable across resamples?

If one setting is clearly better on both, the decision is easy. If one setting is slightly better on average but much more variable, the decision becomes a judgment call rather than a mechanical rule.

162.5 How to Read Near Ties

Hyperparameter search almost never ends with one magical number. More often, several settings are practically close.

When settings are close, use a rule like this:

prefer the higher average performance,
but do not ignore variability,
and if the gap in mean performance is tiny, prefer the simpler or cheaper setting.

For a forest, “simpler or cheaper” can mean:

fewer trees,
smaller mtry,
or a setting whose performance is almost as good but clearly more stable.

This is the same reasoning used in the predictive-stability plots of the Guided Model Building app. The vocabulary changes from chapter to chapter, but the workflow principle stays the same: average accuracy and reliability are both part of model quality.

162.6 Beyond Grid Search

The toy grid above is useful because it is small enough to inspect completely. In larger problems, full grids become inefficient.

That is where random search becomes attractive. If a model has several hyperparameters, a coarse grid can waste many fits on unimportant directions, while random search can explore more of the space with the same computational budget.

So the practical progression is:

use grid search when the teaching goal is transparency,
use random or adaptive search when the real-world problem becomes too large for exhaustive grids.

162.7 Try Hyperparameter Tuning in the Apps

The apps now expose two different tuning styles:

the app in the menu Models / Manual Model Building gives you direct control over specific hyperparameters such as ntree, mtry, and the regularization mode,
the Guided Model Building app automates compact training-only searches for regularized coefficient models and then reports the selected tuning summary inside the workflow.

162.7.1 Manual Model Building: Explicit `cforest` Controls

In the manual app, the Tree tab now lets you switch between ctree and cforest, choose a capped number of trees, set mtry, and fit the forest asynchronously. If another heavy forest fit is already running, the app places the request in a simple queue and tells the learner to wait instead of hammering the fit button repeatedly.

Interactive Shiny app (click to load).

Open in new tab

That is not a full grid search, but it is a genuine hyperparameter exercise: you can change the forest size, change mtry, refit, and compare how the confusion matrix, threshold-sensitive metrics, ROC behavior, and forest importance output respond.

162.7.2 Guided Model Building: Automated Shrinkage Tuning

The guided app does not expose a large open search grid. Instead, it automates a compact search over ridge, elastic-net, and lasso style penalties for its regularized coefficient candidates and reports the resulting summary in the fitted model details.

Full-screen use

The Guided Model Building app still works best in a new tab, but the embedded session below opens directly on a regularized Cars93 workflow so you can inspect the tuned search table and the predictive-stability plots.

Interactive Guided Model Building session (click to load).

Open in new tab

The key point is methodological:

the manual app lets you turn the knobs yourself,
the guided app keeps the search smaller and more structured, but still shows which hyperparameter choice won and how stable that choice looks across validation splits.

What the apps still do not do is let the learner launch a very large free-form search across many model families at once. This chapter therefore remains useful because it shows the larger logic of search grids, repeated evaluation, and the tradeoff between average performance and reliability.

162.8 Practical Reading Rule

Treat hyperparameter optimization as part of the model, not as a cosmetic setting panel.

Never tune on the final test data.
Prefer validation summaries over training scores.
Read both average performance and variability.
When two settings are practically tied, prefer the simpler or cheaper one.

162.9 Practical Exercises

Extend the cforest grid by adding ntree = 300. Does the best mean AUC improve enough to justify the extra computation?
Change the evaluation metric from AUC to accuracy at threshold 0.50. Does the ranking of the grid change?
Repeat the same tuning exercise with only mtry = 2 and many different ntree values. At what point does adding more trees stop helping much?
Write one paragraph explaining why a final untouched test set is still useful even after a large validation-based search.

162.1 Why Hyperparameter Tuning Is a Modeling Stage

162.2 Common Search Strategies

162.3 A Tuning Workflow That Stays Honest

162.4 Worked Example: Tuning cforest on Pima.tr

162.4.1 Visualizing Accuracy and Reliability Together

162.5 How to Read Near Ties

162.6 Beyond Grid Search

162.7 Try Hyperparameter Tuning in the Apps

162.7.1 Manual Model Building: Explicit cforest Controls

162.7.2 Guided Model Building: Automated Shrinkage Tuning

162.8 Practical Reading Rule

162.9 Practical Exercises

162.4 Worked Example: Tuning `cforest` on `Pima.tr`

162.7.1 Manual Model Building: Explicit `cforest` Controls