The purpose of this chapter is practical: you will open the Guided Model Building app, work through concrete datasets, and see how audit, strategy, model fitting, diagnostics, and export are connected inside one scientific workflow.
The app keeps every methodological choice visible so that you can review, revise, and defend your reasoning at each stage.
163.1 Open the App Full Screen
WarningFull-screen use
The Guided Model Building app is not well suited for the narrow handbook column. Open it in a new tab so that the data step, model comparison step, and diagnostics all remain readable.
The handbook links above do not reopen an old learner session. They load a read-only chapter template on the server and create a fresh working session from it, so you always start from the same clean state.
163.2 The Workflow Used in This Chapter
The app organizes model building into a small number of stages:
Step
What the user does
Role in the workflow
Start
choose processing mode and begin a session
makes retention and replay explicit
Data
choose the dataset, target, goal, predictors, optional group variable, and any prediction-time availability exceptions
ensures modeling begins with a clear research question and a realistic deployment story
emphasizes comparison over single-model commitment
Diagnostics
inspect residual, ROC, calibration, or forecasting behavior
turns criticism into part of the workflow
Export
download the report and R script
makes the reasoning portable and reproducible
The most important practical distinction appears early: the user must choose between prediction and explanation / confirmation (see Section 158.2). That one decision changes how redundancy, validation, interpretability, and diagnostics are weighted.
163.3 Three Workflow Controls That Matter More Than They First Appear
The current version of the app adds three controls that are easy to overlook if you only focus on the model list:
Control
Where it appears
Why it matters
Group / entity variable
Data step
keeps repeated rows from the same unit in the same split, so the model is not trained on one row from a patient and tested on another
Prediction-time availability exceptions
Data step
separates variables that are immediately available, delayed, retrospective-only, or post-outcome
Locked final test set
confirmatory tabular workflows
reserves a hidden final split for one last check after model choice and revision
These controls are not decorative. They change what counts as a scientifically defensible workflow. A model can have good coefficients or strong validation metrics and still be unacceptable if it leaks information across rows, uses a variable that would not exist at prediction time, or keeps reusing the same test evidence after many revisions.
For students, the most important practical reading rule is this:
use the Data, Audit, Strategy, Models, and Diagnostics steps to build and challenge the workflow,
use the locked final test only at the end, when you are ready for one last confirmatory check in Export.
163.4 Worked Example 1: Explanatory Regression with Cars93
Cars93 is a cross-sectional dataset on passenger cars sold in the early 1990s. It contains technical characteristics, size variables, equipment indicators, and price information. In this chapter, the target is Price, and the goal is not just to predict price mechanically but to study how a guided workflow reacts when explanatory regression is confronted with skewness and outlying values.
Code
library(MASS)cars93_app <-subset( Cars93,select =c(Price, Horsepower, Fuel.tank.capacity, EngineSize, AirBags))knitr::kable(head(cars93_app, 8),caption ="Variables used in the handbook session for Cars93")
Variables used in the handbook session for Cars93
Price
Horsepower
Fuel.tank.capacity
EngineSize
AirBags
15.9
140
13.2
1.8
None
33.9
200
18.0
3.2
Driver & Passenger
29.1
172
16.9
2.8
Driver only
37.7
172
21.1
2.8
Driver & Passenger
30.0
208
21.1
3.5
Driver only
15.7
110
16.4
2.2
Driver only
20.8
170
18.0
3.8
Driver only
23.7
180
23.0
5.7
Driver only
Code
numeric_cars93 <- cars93_app[sapply(cars93_app, is.numeric)]knitr::kable(round(cor(numeric_cars93, use ="pairwise.complete.obs"), 2),caption ="Numeric association structure for the Cars93 example")
Numeric association structure for the Cars93 example
This configuration forces the app to balance coefficient-style interpretation against target outliers and skewness. This setup places the analysis in an explanatory setting: the user wants a defensible account of price differences across cars, not merely the smallest possible prediction error.
163.4.2 What the Audit Teaches
In this handbook session, the audit fires two important warnings:
R016_target_outlier_warning — the target contains values far from the bulk of the data (see Chapter 69 for how outliers are identified),
R010_positive_skew_log — the target distribution is right-skewed (see Section 67.5 for the formal definition).
At this stage the question is no longer “Which regression model do I like?” but rather:
Is the target scale stable enough for ordinary least squares?
Would a transform change the interpretation in a useful way?
Does a robust regression path deserve comparison?
The audit output directly shapes which models and transforms are worth considering next.
163.4.3 What the Strategy Step Adds
For this session the app proposes a strategy that combines:
log transformation of the target (see Chapter 79 for the general family of power transforms),
rare-level pooling (combining categorical levels that appear too few times into a single __OTHER__ group, so that the model does not try to estimate a separate effect from a handful of rows),
winsorization of numeric predictors (clipping extreme values to the 1st and 99th percentiles of the training set, as introduced for the winsorized mean in Chapter 66),
target encoding where appropriate (discussed in detail in Section 165.5),
a compact candidate set of lm, Huber, stepwise AIC, stepwise BIC, ctree, and cforest (Chapter 142).
The current guided app keeps this first working slice intentionally compact, but it now includes regularized linear regression and regularized logistic regression as coefficient-based shrinkage candidates. Those paths tune ridge, elastic-net, and lasso style penalties on the training rows only and then report the selected penalty family, alpha, lambda.min, and lambda.1se in the model details. What the app still does not expose is a large free-form search grid across many model families. The method background for those tuning decisions is therefore treated separately in Chapter 161 and Chapter 162.
Stepwise AIC and stepwise BIC are automated variable-selection procedures. They start from the full predictor set and remove or add predictors one at a time, guided by the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC). AIC tends to keep more predictors; BIC penalizes model size more heavily and usually produces a smaller formula.
Preprocessing refers to any transformation applied to the data before model fitting: rescaling, imputation, encoding, or outlier treatment. This step shows that preprocessing and feature treatment are already part of the scientific argument, not afterthoughts to be handled separately.
163.4.4 What the Model Comparison Shows
In the prepared session the default selected model is Huber robust regression (introduced in Section 165.6). The repeated-holdout summary attached to the selected path reports:
mean held-out RMSE: 3.4147
mean held-out MAE: 2.6794
mean held-out R^2: 0.8083 (see Chapter 135 for the interpretation of \(R^2\))
Repeated holdout (see Section 160.3) means that the data are randomly split into a training set and a held-out test set multiple times, the model is fitted on each training set, and the error metrics are averaged across splits.
The point is not that Huber regression is always superior. The point is that, under an explanatory goal plus target outlier warnings, a robust coefficient-based model is often more defensible than pretending that the ordinary least squares assumptions were never stressed.
This is exactly where the chapter on Chapter 76 becomes relevant. Residual shape and outlier sensitivity are not decorative diagnostics. They change which model deserves to be taken seriously.
If a tree-based regression alternative becomes competitive in this kind of workflow, the next interpretive question is not only where the tree splits, but also how reliable the terminal nodes look internally. That is the specific topic of Chapter 141.
In confirmatory tabular workflows, the current app also reserves a locked final test split before model fitting begins. That split is hidden while the analyst compares candidates and tests revisions. It is revealed only at the end, when the user wants one last explicit confirmation that the selected path still holds up on untouched data.
So the workflow for this kind of session is not “fit once, then immediately read the final result.” It is:
build the model on the analysis subset,
revise it if necessary,
commit to a final path,
only then reveal the locked final test evaluation.
163.5 Worked Example 2: Predictive Classification with PimaIndiansDiabetes2
PimaIndiansDiabetes2 is a medical screening dataset in which the target records whether diabetes is present. The predictors are clinical measurements such as glucose, body mass, insulin, triceps skinfold thickness, pedigree, and age. Here the goal is explicitly predictive: the task is to classify future cases as well as possible while still keeping the workflow transparent.
Code
if (requireNamespace("mlbench", quietly =TRUE)) {data("PimaIndiansDiabetes2", package ="mlbench") pima_app <- PimaIndiansDiabetes2 pima_missing <-data.frame(variable =names(pima_app),missing =colSums(is.na(pima_app)) ) knitr::kable( pima_missing,caption ="Missing-value counts in PimaIndiansDiabetes2" )} else { knitr::kable(data.frame(note ="Package 'mlbench' was not available while rendering this chapter."),caption ="PimaIndiansDiabetes2 availability" )}
Missing-value counts in PimaIndiansDiabetes2
variable
missing
pregnant
pregnant
0
glucose
glucose
5
pressure
pressure
35
triceps
triceps
227
insulin
insulin
374
mass
mass
11
pedigree
pedigree
0
age
age
0
diabetes
diabetes
0
Code
if (exists("pima_app")) { diabetes_share <-round(prop.table(table(pima_app$diabetes)), 3) knitr::kable(data.frame(class =names(diabetes_share), share =as.numeric(diabetes_share)),caption ="Outcome shares in PimaIndiansDiabetes2" )}
Outcome shares in PimaIndiansDiabetes2
class
share
neg
0.651
pos
0.349
163.5.1 What the Handbook Session Already Sets Up
The prepared PimaIndiansDiabetes2 session opens with:
This specification reflects a typical screening problem: several biologically plausible predictors are available, the target is binary, and missing values need to be handled before model comparison begins.
The preprocessing choices are also part of the lesson:
missing-value imputation is enabled (numeric columns are filled with the training-set median; categorical columns with the training-set mode),
numeric winsorization is enabled,
target encoding is disabled,
the app evaluates a compact classifier set: logistic regression (Chapter 136), Gaussian Naive Bayes, conditional inference trees (Chapter 140), and conditional random forests (Chapter 142).
163.5.2 What the Fitted Comparison Teaches
In the prepared session the default selected model is logistic regression. Its validation summary reports:
mean held-out accuracy: 0.7695
mean held-out AUC: 0.8311
This is the right place to connect the workflow app back to the method chapters:
The app adds something new: it makes those ideas compete inside one concrete workflow rather than teaching them as isolated tools.
The diagnostics now also make a distinction that students often miss at first reading:
the automatic default is based on the mean repeated-validation AUC,
the center line in the predictive-stability boxplot is the median repeated-validation AUC,
the diamond shows the current held-out split rather than the repeated average.
This means a model can have the highest median but still lose on the mean if the resample distribution is skewed. The app therefore asks you to think about two questions at once:
Which model is strongest on average?
Which model is most reliable across resamples?
That is why the stability section now includes both a distribution plot and a mean-versus-variability plot.
163.5.3 Why the Goal Matters Here
In a predictive classification exercise, the scientifically relevant question is not merely:
Which model can be interpreted most elegantly?
It is also:
Which model maintains useful discrimination on unseen cases?
That is why the app keeps threshold-aware ROC diagnostics in the workflow. A logistic model can remain preferable even when a tree is more visually intuitive, provided the out-of-sample discrimination is better and the predictive goal is explicit.
The app also shows AUCPR as a complementary metric. AUC remains the automatic ranking criterion, but AUCPR becomes useful when the analyst cares especially about precision among the positive predictions and wants to judge whether a model that looks good on ROC-based discrimination is still attractive when false positives are more costly.
163.6 Exports, Reports, and Session Logic
Every serious workflow should end with files you can save, share, and reuse. The app therefore allows you to export:
an HTML report,
a reproducible R script,
a saved session that you can reopen later.
This is especially useful for teaching. A learner can:
run the workflow,
export the report,
justify the model choice,
return later and resume the session.
The chapter-session links extend that idea: each link opens a prepared workflow state so that you and the handbook are looking at the same starting point.
163.7 Practical Exercises
Open the blank app and rebuild the Cars93 example from scratch instead of using the prepared session. Write down which audit warnings appear before you fit anything.
Open the Cars93 handbook session and explain why Huber regression is preferred to ordinary least squares in this particular workflow.
Open the PimaIndiansDiabetes2 handbook session and compare logistic regression with the tree model. Does the better predictive model also produce the clearest scientific explanation?
In the PimaIndiansDiabetes2 session, inspect the predictive-stability plots. Which model looks strongest on average, and which looks most reliable across resamples?
Export both handbook sessions and inspect the generated R scripts. Which parts are full reproductions of the chosen path, and which parts are still simplifications?
The next chapter, Chapter 164, continues from this point and focuses on the revision loop: when to test a new path, how to compare it, and when it is justified to promote it.