163 Guided Model Building in Practice

The purpose of this chapter is practical: you will open the Guided Model Building app, work through concrete datasets, and see how audit, strategy, model fitting, diagnostics, and export are connected inside one scientific workflow.

The app keeps every methodological choice visible so that you can review, revise, and defend your reasoning at each stage.

163.1 Open the App Full Screen

Full-screen use

The Guided Model Building app is not well suited for the narrow handbook column. Open it in a new tab so that the data step, model comparison step, and diagnostics all remain readable.

Open the blank Guided Model Building app Open the Cars93 handbook session Open the Pima handbook session

The handbook links above do not reopen an old learner session. They load a read-only chapter template on the server and create a fresh working session from it, so you always start from the same clean state.

163.2 The Workflow Used in This Chapter

The app organizes model building into a small number of stages:

Step	What the user does	Role in the workflow
Start	choose processing mode and begin a session	makes retention and replay explicit
Data	choose the dataset, target, goal, predictors, optional group variable, and any prediction-time availability exceptions	ensures modeling begins with a clear research question and a realistic deployment story
Audit	inspect warnings before fitting	connects model building to Chapter 63
Strategy	review preprocessing and candidate workflows	makes the initial path visible and challengeable
Models	fit a small candidate set	emphasizes comparison over single-model commitment
Diagnostics	inspect residual, ROC, calibration, or forecasting behavior	turns criticism into part of the workflow
Export	download the report and R script	makes the reasoning portable and reproducible

The most important practical distinction appears early: the user must choose between prediction and explanation / confirmation (see Section 158.2). That one decision changes how redundancy, validation, interpretability, and diagnostics are weighted.

163.3 Three Workflow Controls That Matter More Than They First Appear

The current version of the app adds three controls that are easy to overlook if you only focus on the model list:

Control	Where it appears	Why it matters
Group / entity variable	`Data` step	keeps repeated rows from the same unit in the same split, so the model is not trained on one row from a patient and tested on another
Prediction-time availability exceptions	`Data` step	separates variables that are immediately available, delayed, retrospective-only, or post-outcome
Locked final test set	confirmatory tabular workflows	reserves a hidden final split for one last check after model choice and revision

These controls are not decorative. They change what counts as a scientifically defensible workflow. A model can have good coefficients or strong validation metrics and still be unacceptable if it leaks information across rows, uses a variable that would not exist at prediction time, or keeps reusing the same test evidence after many revisions.

For students, the most important practical reading rule is this:

use the Data, Audit, Strategy, Models, and Diagnostics steps to build and challenge the workflow,
use the locked final test only at the end, when you are ready for one last confirmatory check in Export.

163.4 Worked Example 1: Explanatory Regression with `Cars93`

Cars93 is a cross-sectional dataset on passenger cars sold in the early 1990s. It contains technical characteristics, size variables, equipment indicators, and price information. In this chapter, the target is Price, and the goal is not just to predict price mechanically but to study how a guided workflow reacts when explanatory regression is confronted with skewness and outlying values.

Code

library(MASS)

cars93_app <- subset(
  Cars93,
  select = c(Price, Horsepower, Fuel.tank.capacity, EngineSize, AirBags)
)

knitr::kable(
  head(cars93_app, 8),
  caption = "Variables used in the handbook session for Cars93"
)

Variables used in the handbook session for Cars93
Price	Horsepower	Fuel.tank.capacity	EngineSize	AirBags
15.9	140	13.2	1.8	None
33.9	200	18.0	3.2	Driver & Passenger
29.1	172	16.9	2.8	Driver only
37.7	172	21.1	2.8	Driver & Passenger
30.0	208	21.1	3.5	Driver only
15.7	110	16.4	2.2	Driver only
20.8	170	18.0	3.8	Driver only
23.7	180	23.0	5.7	Driver only

Code

numeric_cars93 <- cars93_app[sapply(cars93_app, is.numeric)]

knitr::kable(
  round(cor(numeric_cars93, use = "pairwise.complete.obs"), 2),
  caption = "Numeric association structure for the Cars93 example"
)

Numeric association structure for the Cars93 example
	Price	Horsepower	Fuel.tank.capacity	EngineSize
Price	1.00	0.79	0.62	0.60
Horsepower	0.79	1.00	0.71	0.73
Fuel.tank.capacity	0.62	0.71	1.00	0.76
EngineSize	0.60	0.73	0.76	1.00

163.4.1 What the Handbook Session Already Sets Up

The prepared Cars93 session opens with:

target: Price
predictors: Horsepower, Fuel.tank.capacity, EngineSize, AirBags
goal: Explanation / Confirmation
interpretability priority: High

This configuration forces the app to balance coefficient-style interpretation against target outliers and skewness. This setup places the analysis in an explanatory setting: the user wants a defensible account of price differences across cars, not merely the smallest possible prediction error.

163.4.2 What the Audit Teaches

In this handbook session, the audit fires two important warnings:

R016_target_outlier_warning — the target contains values far from the bulk of the data (see Chapter 69 for how outliers are identified),
R010_positive_skew_log — the target distribution is right-skewed (see Section 67.5 for the formal definition).

At this stage the question is no longer “Which regression model do I like?” but rather:

Is the target scale stable enough for ordinary least squares?
Would a transform change the interpretation in a useful way?
Does a robust regression path deserve comparison?

The audit output directly shapes which models and transforms are worth considering next.

163.4.3 What the Strategy Step Adds

For this session the app proposes a strategy that combines:

log transformation of the target (see Chapter 79 for the general family of power transforms),
rare-level pooling (combining categorical levels that appear too few times into a single __OTHER__ group, so that the model does not try to estimate a separate effect from a handful of rows),
winsorization of numeric predictors (clipping extreme values to the 1st and 99th percentiles of the training set, as introduced for the winsorized mean in Chapter 66),
target encoding where appropriate (discussed in detail in Section 165.5),
a compact candidate set of lm, Huber, stepwise AIC, stepwise BIC, ctree, and cforest (Chapter 142).

The current guided app keeps this first working slice intentionally compact, but it now includes regularized linear regression and regularized logistic regression as coefficient-based shrinkage candidates. Those paths tune ridge, elastic-net, and lasso style penalties on the training rows only and then report the selected penalty family, alpha, lambda.min, and lambda.1se in the model details. What the app still does not expose is a large free-form search grid across many model families. The method background for those tuning decisions is therefore treated separately in Chapter 161 and Chapter 162.

Stepwise AIC and stepwise BIC are automated variable-selection procedures. They start from the full predictor set and remove or add predictors one at a time, guided by the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC). AIC tends to keep more predictors; BIC penalizes model size more heavily and usually produces a smaller formula.

Preprocessing refers to any transformation applied to the data before model fitting: rescaling, imputation, encoding, or outlier treatment. This step shows that preprocessing and feature treatment are already part of the scientific argument, not afterthoughts to be handled separately.

163.4.4 What the Model Comparison Shows

In the prepared session the default selected model is Huber robust regression (introduced in Section 165.6). The repeated-holdout summary attached to the selected path reports:

mean held-out RMSE: 3.4147
mean held-out MAE: 2.6794
mean held-out R^2: 0.8083 (see Chapter 135 for the interpretation of \(R^2\))

Repeated holdout (see Section 160.3) means that the data are randomly split into a training set and a held-out test set multiple times, the model is fitted on each training set, and the error metrics are averaged across splits.

The point is not that Huber regression is always superior. The point is that, under an explanatory goal plus target outlier warnings, a robust coefficient-based model is often more defensible than pretending that the ordinary least squares assumptions were never stressed.

This is exactly where the chapter on Chapter 76 becomes relevant. Residual shape and outlier sensitivity are not decorative diagnostics. They change which model deserves to be taken seriously.

If a tree-based regression alternative becomes competitive in this kind of workflow, the next interpretive question is not only where the tree splits, but also how reliable the terminal nodes look internally. That is the specific topic of Chapter 141.

In confirmatory tabular workflows, the current app also reserves a locked final test split before model fitting begins. That split is hidden while the analyst compares candidates and tests revisions. It is revealed only at the end, when the user wants one last explicit confirmation that the selected path still holds up on untouched data.

So the workflow for this kind of session is not “fit once, then immediately read the final result.” It is:

build the model on the analysis subset,
revise it if necessary,
commit to a final path,
only then reveal the locked final test evaluation.

163.5 Worked Example 2: Predictive Classification with `PimaIndiansDiabetes2`

PimaIndiansDiabetes2 is a medical screening dataset in which the target records whether diabetes is present. The predictors are clinical measurements such as glucose, body mass, insulin, triceps skinfold thickness, pedigree, and age. Here the goal is explicitly predictive: the task is to classify future cases as well as possible while still keeping the workflow transparent.

Code

if (requireNamespace("mlbench", quietly = TRUE)) {
  data("PimaIndiansDiabetes2", package = "mlbench")
  pima_app <- PimaIndiansDiabetes2

  pima_missing <- data.frame(
    variable = names(pima_app),
    missing = colSums(is.na(pima_app))
  )

  knitr::kable(
    pima_missing,
    caption = "Missing-value counts in PimaIndiansDiabetes2"
  )
} else {
  knitr::kable(
    data.frame(note = "Package 'mlbench' was not available while rendering this chapter."),
    caption = "PimaIndiansDiabetes2 availability"
  )
}

Missing-value counts in PimaIndiansDiabetes2
	variable	missing
pregnant	pregnant	0
glucose	glucose	5
pressure	pressure	35
triceps	triceps	227
insulin	insulin	374
mass	mass	11
pedigree	pedigree	0
age	age	0
diabetes	diabetes	0

Code

if (exists("pima_app")) {
  diabetes_share <- round(prop.table(table(pima_app$diabetes)), 3)
  knitr::kable(
    data.frame(class = names(diabetes_share), share = as.numeric(diabetes_share)),
    caption = "Outcome shares in PimaIndiansDiabetes2"
  )
}

Outcome shares in PimaIndiansDiabetes2
class	share
neg	0.651
pos	0.349

163.5.1 What the Handbook Session Already Sets Up

The prepared PimaIndiansDiabetes2 session opens with:

target: diabetes
goal: Prediction
predictors: glucose, mass, age, insulin, triceps, pregnant, pedigree

This specification reflects a typical screening problem: several biologically plausible predictors are available, the target is binary, and missing values need to be handled before model comparison begins.

The preprocessing choices are also part of the lesson:

missing-value imputation is enabled (numeric columns are filled with the training-set median; categorical columns with the training-set mode),
numeric winsorization is enabled,
target encoding is disabled,
the app evaluates a compact classifier set: logistic regression (Chapter 136), Gaussian Naive Bayes, conditional inference trees (Chapter 140), and conditional random forests (Chapter 142).

163.5.2 What the Fitted Comparison Teaches

In the prepared session the default selected model is logistic regression. Its validation summary reports:

mean held-out accuracy: 0.7695
mean held-out AUC: 0.8311

This is the right place to connect the workflow app back to the method chapters:

coefficient interpretation from Chapter 136,
classification summaries from Chapter 59,
discrimination from Chapter 60,
tree-based alternatives from Chapter 140,
ensemble tree benchmarks from Chapter 142.

The app adds something new: it makes those ideas compete inside one concrete workflow rather than teaching them as isolated tools.

The diagnostics now also make a distinction that students often miss at first reading:

the automatic default is based on the mean repeated-validation AUC,
the center line in the predictive-stability boxplot is the median repeated-validation AUC,
the diamond shows the current held-out split rather than the repeated average.

This means a model can have the highest median but still lose on the mean if the resample distribution is skewed. The app therefore asks you to think about two questions at once:

Which model is strongest on average?
Which model is most reliable across resamples?

That is why the stability section now includes both a distribution plot and a mean-versus-variability plot.

163.5.3 Why the Goal Matters Here

In a predictive classification exercise, the scientifically relevant question is not merely:

Which model can be interpreted most elegantly?

It is also:

Which model maintains useful discrimination on unseen cases?

That is why the app keeps threshold-aware ROC diagnostics in the workflow. A logistic model can remain preferable even when a tree is more visually intuitive, provided the out-of-sample discrimination is better and the predictive goal is explicit.

The app also shows AUCPR as a complementary metric. AUC remains the automatic ranking criterion, but AUCPR becomes useful when the analyst cares especially about precision among the positive predictions and wants to judge whether a model that looks good on ROC-based discrimination is still attractive when false positives are more costly.

163.6 Exports, Reports, and Session Logic

Every serious workflow should end with files you can save, share, and reuse. The app therefore allows you to export:

an HTML report,
a reproducible R script,
a saved session that you can reopen later.

This is especially useful for teaching. A learner can:

run the workflow,
export the report,
justify the model choice,
return later and resume the session.

The chapter-session links extend that idea: each link opens a prepared workflow state so that you and the handbook are looking at the same starting point.

163.7 Practical Exercises

Open the blank app and rebuild the Cars93 example from scratch instead of using the prepared session. Write down which audit warnings appear before you fit anything.
Open the Cars93 handbook session and explain why Huber regression is preferred to ordinary least squares in this particular workflow.
Open the PimaIndiansDiabetes2 handbook session and compare logistic regression with the tree model. Does the better predictive model also produce the clearest scientific explanation?
In the PimaIndiansDiabetes2 session, inspect the predictive-stability plots. Which model looks strongest on average, and which looks most reliable across resamples?
Export both handbook sessions and inspect the generated R scripts. Which parts are full reproductions of the chosen path, and which parts are still simplifications?

The next chapter, Chapter 164, continues from this point and focuses on the revision loop: when to test a new path, how to compare it, and when it is justified to promote it.

163.1 Open the App Full Screen

163.2 The Workflow Used in This Chapter

163.3 Three Workflow Controls That Matter More Than They First Appear

163.4 Worked Example 1: Explanatory Regression with Cars93

163.4.1 What the Handbook Session Already Sets Up

163.4.2 What the Audit Teaches

163.4.3 What the Strategy Step Adds

163.4.4 What the Model Comparison Shows

163.5 Worked Example 2: Predictive Classification with PimaIndiansDiabetes2

163.5.1 What the Handbook Session Already Sets Up

163.5.2 What the Fitted Comparison Teaches

163.5.3 Why the Goal Matters Here

163.6 Exports, Reports, and Session Logic

163.7 Practical Exercises

163.4 Worked Example 1: Explanatory Regression with `Cars93`

163.5 Worked Example 2: Predictive Classification with `PimaIndiansDiabetes2`