Appendix A — Method Selection Guide

This guide helps you choose the appropriate statistical method for your situation. Unlike typical “decision trees” that jump straight to data types and sample sizes, we start with the most important question: What are you trying to accomplish?

A.1 How to Use This Guide

Start with the Purpose Hub below, then follow the flowchart for your path. Click any destination box (marked with →) to jump directly to that section.

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#E3F2FD', 'primaryTextColor': '#1565C0', 'primaryBorderColor': '#1565C0', 'lineColor': '#666666', 'secondaryColor': '#E8F5E9', 'tertiaryColor': '#FFF3E0', 'fontSize': '14px'}}}%%
flowchart LR
    S1["<b>STEP 1: What is your PURPOSE?</b><br/><i>Choose your primary goal</i>"]
    S1 --> Describe["🔵 Describe / Explore"]
    S1 --> Test["🟢 Test hypothesis"]
    S1 --> Model["🟠 Build model / Predict"]

    Describe --> APath["→ <b>Path A</b><br/>Description & Exploration"]
    Test --> BPath["→ <b>Path B</b><br/>Hypothesis Testing"]
    Model --> MSplit["<b>STEP 2: Data STRUCTURE?</b><br/><i>Modeling</i>"]
    MSplit --> CPath["→ <b>Path C</b><br/>Modeling (Non-sequential)"]
    MSplit --> DPath["→ <b>Path D</b><br/>Modeling (Time Series)"]

    classDef stepBox fill:#F5F5F5,stroke:#424242,stroke-width:2px,color:#333
    classDef blueBox fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#1565C0
    classDef greenBox fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#2E7D32
    classDef orangeBox fill:#FFF3E0,stroke:#EF6C00,stroke-width:2px,color:#EF6C00
    classDef purpleBox fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#7B1FA2
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class S1 stepBox
    class Describe,APath blueBox
    class Test,BPath greenBox
    class Model,MSplit,CPath orangeBox
    class DPath purpleBox

    click APath "#sec-flowchart-path-a"
    click BPath "#sec-flowchart-path-b"
    click CPath "#sec-flowchart-path-c"
    click DPath "#sec-flowchart-path-d"

Figure A.1: Purpose Hub. Click a destination to jump to the relevant path.

A.1.1 Quick-Reference Table

If you already know your research question, use this table to jump directly to the right method.

Research Question	Method	Chapter	App
What does my data look like?	Descriptive statistics (mean, SD, quantiles)	📖	🔗
Are there measurement or data quality issues?	Data quality forensics	📖	🔗
Predict a single value (no covariates)	Mean / median / trimmed / geometric / harmonic / midrange	📖	🔗
Test/CI for central tendency (no model)	Bootstrap / Blocked bootstrap	📖	🔗
Is my data normally distributed?	QQ plot, Shapiro-Wilk, K-S test (with Lilliefors correction)	📖	🔗
Are there outliers?	Box plot, Z-scores	📖	🔗
Is the mean different from a value?	One-sample t-test	📖	🔗
Are two group means different?	Two-sample t-test / Welch’s	📖	🔗
Are paired measurements different?	Paired t-test	📖	🔗
Are two groups equivalent?	TOST equivalence test	📖	🔗
Are 3+ group means different?	ANOVA	📖	🔗
Are two categorical variables associated?	Chi-squared test	📖	🔗
Is there a linear relationship?	Correlation test (Pearson / Spearman)	📖	🔗
Does X predict Y (continuous)?	Linear regression	📖	🔗
Does X predict Y (binary)?	Logistic regression	📖	🔗
Classify cases into categories?	Logistic regression / ctree / Naive Bayes	📖	🔗
Forecast a time series?	ARIMA / Holt-Winters	📖	🔗
Does an external event affect a time series?	Intervention analysis	📖	🔗
Does X dynamically influence Y over time?	Transfer function / CCF	📖	🔗
Does data fit a specific distribution?	K-S test / Chi-squared GoF	📖	🔗

A.1.2 Path A: Description & Exploration

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#E3F2FD', 'primaryTextColor': '#1565C0', 'primaryBorderColor': '#1565C0', 'lineColor': '#666666', 'fontSize': '14px'}}}%%
flowchart LR
    AStart["<b>Describe or Explore?</b>"]
    AStart --> ADescribe["🔵 Describe"]
    AStart --> AExplore["🔵 Explore"]

    ADescribe --> A1["→ <b>A1</b><br/>Pure Description"]
    AExplore --> A2["→ <b>A2</b><br/>Exploratory Analysis"]

    classDef blueBox fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#1565C0
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class AStart,ADescribe,AExplore blueBox
    class A1,A2 destBox

    click A1 "#a1-pure-description"
    click A2 "#a2-exploratory-data-analysis-eda"

Data Quality Forensics App

For a more comprehensive diagnostic workflow (beyond the examples in Path A), use the app: https://shiny.wessa.net/dataqualityforensics/.

A.1.3 Path B: Hypothesis Testing

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#E8F5E9', 'primaryTextColor': '#2E7D32', 'primaryBorderColor': '#2E7D32', 'lineColor': '#666666', 'fontSize': '14px'}}}%%
flowchart LR
    BStart["<b>STEP 2: Data TYPE?</b><br/><i>Hypothesis testing</i>"]
    BStart --> BNum["🟢 Numeric"]
    BStart --> BCat["🟢 Categorical"]
    BStart --> BDist["🟢 Distribution fit"]

    BCat --> B5["→ <b>B5</b><br/>Association Tests"]
    BDist --> B6["→ <b>B6</b><br/>Goodness-of-Fit"]

    BNum --> BGroups["<b>STEP 3: How MANY groups?</b>"]
    BGroups --> B1["🟢 1 sample"]
    BGroups --> B2g["🟢 2 groups"]
    BGroups --> B3g["🟢 3+ groups"]

    B1 --> B2["→ <b>B2</b><br/>One-Sample Tests"]
    B2g --> B3["→ <b>B3</b><br/>Two-Sample Tests"]
    B3g --> B4["→ <b>B4</b><br/>Multi-Group Tests"]

    classDef greenBox fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#2E7D32
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class BStart,BNum,BCat,BDist,BGroups,B1,B2g,B3g greenBox
    class B2,B3,B4,B5,B6 destBox

    click B2 "#b2-one-sample-tests"
    click B3 "#b3-two-sample-tests"
    click B4 "#b4-more-than-two-groups"
    click B5 "#b5-association-and-correlation-tests"
    click B6 "#b6-goodness-of-fit-tests"

Central Tendency Without a Model

If your goal is to test or quantify a single central tendency (no predictors), use the Bootstrap Plot for CIs/tests of mean, median, or trimmed mean. For dependent or time‑ordered data, use the Blocked Bootstrap Plot. Apps: https://shiny.wessa.net/bootstrap/.

A.1.4 Path C: Modeling & Prediction

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#FFF3E0', 'primaryTextColor': '#EF6C00', 'primaryBorderColor': '#EF6C00', 'lineColor': '#666666', 'fontSize': '14px'}}}%%
flowchart LR
    CStart["<b>STEP 2: Outcome TYPE?</b><br/><i>Modeling</i>"]
    CStart --> CCont["🟠 Continuous"]
    CStart --> CBin["🟠 Binary"]
    CStart --> CCount["🟠 Count"]
    CStart --> CAny["🟠 Any type"]

    CCont --> C2Reg["→ <b>C2</b><br/>Linear Regression"]
    CBin --> C2Log["→ <b>C2</b><br/>Logistic Regression"]
    CCount --> C2Pois["→ <b>C2</b><br/>Poisson/GLM"]
    CAny --> C2Tree["→ <b>C2</b><br/>ctree - flexible"]

    C2Reg --> C6["→ <b>C6</b><br/>Model Diagnostics"]
    C2Log --> C6
    C2Pois --> C6
    C2Tree --> C6

    classDef orangeBox fill:#FFF3E0,stroke:#EF6C00,stroke-width:2px,color:#EF6C00
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class CStart,CCont,CBin,CCount,CAny orangeBox
    class C2Reg,C2Log,C2Pois,C2Tree,C6 destBox

    click C2Reg "#c2-choosing-a-regression-model"
    click C2Log "#c2-choosing-a-regression-model"
    click C2Pois "#c2-choosing-a-regression-model"
    click C2Tree "#c2-choosing-a-regression-model"
    click C6 "#sec-c6-model-diagnostics"

Baseline Prediction (No Covariates)

When you only need a single‑number forecast, choose a central‑tendency estimator that matches your loss function or distribution: - Mean: symmetric/normal‑like data; minimizes squared error. - Median: robust to outliers; minimizes absolute error (L1 loss). - Trimmed mean: robustness-efficiency compromise; not an L1 minimizer. - Geometric mean: multiplicative or log‑normal processes. - Harmonic mean: rates or ratios (e.g., speed, productivity). - Midrange: uniform processes (center of min/max). Use Bootstrap Plot for uncertainty; use Blocked Bootstrap for dependent data.

A.1.5 Path D: Time Series

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#F3E5F5', 'primaryTextColor': '#7B1FA2', 'primaryBorderColor': '#7B1FA2', 'lineColor': '#666666', 'fontSize': '14px'}}}%%
flowchart LR
    DStart["<b>Sequential / Time-Ordered Data?</b>"]
    DStart --> DEDA["🔵 Time Series EDA"]
    DStart --> DModel["🟣 Time Series Models"]
    DEDA --> A2TS["→ <b>A2</b><br/>Time Series EDA"]
    DModel --> C7["→ <b>C7</b><br/>Time Series Models"]

    classDef purpleBox fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#7B1FA2
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class DStart,DEDA,DModel purpleBox
    class A2TS,C7 destBox

    click A2TS "#a2-exploratory-data-analysis-eda"
    click C7 "#c7-time-series-models"

A.2 Define Your Purpose

Before choosing any method, answer this question clearly:

What do I want to achieve with this analysis?

Table A.1: Primary purposes of statistical analysis

Purpose	Description	Example	Go to
Describe	Summarize the data at hand	“What is the average income in our sample?”	Part A
Explore	Discover patterns, problems, or questions	“Are there unusual subgroups in the data?”	Part A
Test	Evaluate a specific claim or hypothesis	“Is the new drug better than placebo?”	Part B
Estimate	Quantify an unknown parameter	“What is the population mean, with uncertainty?”	Part B
Predict	Forecast new or future observations	“What will sales be next quarter?”	Part C
Classify	Assign cases to categories	“Is this email spam or not?”	Part C

Your purpose determines not just which method to use, but how to interpret the results.

A.3 Part A: Descriptive and Exploratory Analysis

A.3.1 A1: Pure Description

Goal: Summarize the data without generalizing beyond it.

When to use: You have a complete population (not a sample), or you simply want to characterize the data you have.

Example: A company wants to describe the salaries of its 50 employees.

Table A.2: Descriptive methods by data type

Data Type	Methods	Chapter	App
One numeric variable	Mean / median / SD / quantiles	📖	🔗
One numeric variable	Stem-and-leaf plot	📖	🔗
One categorical variable	Frequency table / proportions / bar plot	📖	🔗
Two numeric variables	Correlation / scatter plot	📖	🔗
Two categorical variables	Contingency table / mosaic plot	📖	🔗
Distribution shape	Histogram / box plot	📖	🔗
Distribution shape	Skewness / kurtosis / moments	📖	🔗
Concentration / inequality	Concentration curve / Gini	📖	🔗

Key distinction: Descriptive statistics tell you about this specific dataset. They do not tell you about a larger population or what would happen with different data.

A.3.2 A2: Exploratory Data Analysis (EDA)

Goal: Discover unexpected patterns, anomalies, or new questions.

When to use: Before formal analysis, when you don’t know what to expect, or when checking data quality.

Example: A researcher receives a new dataset and wants to understand what’s in it before testing any hypotheses.

EDA Checklist:

Check data quality
- Missing values: How many? Random or systematic?
- Outliers: Are extreme values errors or genuine?
- Data types: Are variables coded correctly?
- Digit preference: Do terminal digits follow a uniform distribution? (Chapter 63)
- Value repetition: Do certain round numbers appear far more often than expected? (Section 63.4.1)
- Cross-variable consistency: Do relationships between variables match domain knowledge? (Section 63.8)
Examine distributions
- Histogram for each numeric variable
- Bar chart for each categorical variable
- Look for: skewness, bimodality, gaps, unusual values
Look for relationships
- Scatter plots for pairs of numeric variables
- Box plots of numeric variable by group
- Correlation matrix for multiple variables
Check for subgroups
- Are there natural clusters?
- Do patterns differ across subgroups?

Table A.3: EDA methods for cross-sectional data

EDA Goal	Method(s)	Chapter	App
Find outliers	Box plot	📖	🔗
Find outliers	Scatter plot	📖	🔗
Find outliers	Z-scores	📖	🔗
Check normality	QQ plot	📖	🔗
Check normality	Normal probability plot	📖	🔗
Check normality	PPCC plot	📖	🔗
Check normality	Histogram	📖	🔗
Check normality	Box-Cox normality plot	📖	🔗
Check normality	Skewness & kurtosis tests	📖	🔗
Check normality	Skewness-kurtosis plot	📖	🔗
Check normality	Kernel density	📖	🔗
Distribution fit	ML fitting (distribution)	📖	🔗
Distribution fit	Cullen-Frey graph	📖	🔗
Find clusters	Scatter plot	📖	🔗
Find clusters	Kernel density	📖	🔗
Examine joint density	Bivariate KDE	📖	🔗
Detect relationships	Correlation matrix	📖	🔗
Detect relationships	Scatterplot matrix	📖	🔗
Compare rankings	Rank order comparison	📖	🔗
Check reliability	Cronbach alpha	📖	🔗
Check data quality	Terminal digit analysis / Benford’s law	📖	🔗

A.3.2.1 Time Series EDA (Sequential Data)

If your data are ordered in time, use these additional EDA tools:

Table A.4: EDA methods for time series data

Goal	Method	What to Look For	Chapter	App
Visualize patterns	Time series plot	Trend, seasonality, outliers, level shifts	📖	🔗
Separate components	Decomposition	Separate trend, seasonal, residual components	📖	🔗
Check seasonal structure	Mean Plot	Seasonal stability across subseries	📖	🔗
Mean-SD structure	SMP	Variability vs. level relationship	📖	🔗
Choose differencing	VRM	Select d and D for stationarity	📖	🔗
Choose differencing	ACF / PACF	Significant spikes indicate dependence structure	📖	🔗
Choose differencing	Spectral analysis / periodogram	Hidden periodicities	📖	🔗
Resampling with dependence	Blocked bootstrap	Preserve autocorrelation structure	📖	🔗
Check autocorrelation	ACF / PACF	Significant spikes indicate dependence structure	📖	🔗
Detect cycles	Spectral analysis / periodogram	Hidden periodicities	📖	🔗
Check stationarity	ADF / stationarity	Constant mean/variance over time?	📖	🔗
Identify lead/lag relationships	Cross-Correlation Function (CCF)	Lead/lag between two series	📖
Test temporal causation	Granger causality test	Does X help predict Y?	📖

Key insight: Time series EDA is essential before building time series models (C7). The patterns you discover here determine which model to use.

Key distinction from description: EDA is about discovery — you’re looking for things you didn’t know to look for. Description summarizes what you already expect to report.

A.3.3 A3: Residual Analysis

Goal: Check whether a fitted model is appropriate.

When to use: After fitting any model (regression, time series, etc.).

Example: After fitting a linear regression, you want to check if the assumptions hold.

What to check:

Table A.5: Residual diagnostics

Assumption	Diagnostic	What to Look For	Chapter
Linearity	Residuals vs fitted plot	No curved pattern	📖
Normality	QQ plot of residuals	Points on the line	📖
Constant variance	Residuals vs fitted plot	No funnel shape	📖
Independence	Residual time plot, ACF	No pattern over time	📖
No outliers	Standardized residuals	Values within ±3	📖

Key distinction from EDA: Residual analysis checks model assumptions. EDA explores the raw data before modeling.

See Section A.5.6 for a worked example of regression residual diagnostics.

A.4 Part B: Hypothesis Testing

A.4.1 B1: Before Choosing a Test — Critical Questions

Question 1: What is the decision structure of your research?

This question is crucial and often overlooked. Set the significance level before data collection based on error costs and decision context, not on a preferred outcome.

Table A.6: Approach by research goal

Situation	Decision Structure	Approach	Reasoning
Testing a new drug	Superiority (difference from placebo)	Standard test with pre-specified α (often 0.05)	Protect against false claims of efficacy
Showing products are equivalent	Equivalence with pre-defined margin	TOST equivalence test (Chapter 120)	In TOST, H₀ states “not equivalent”
Safety monitoring	Signal detection (prioritize sensitivity)	Pre-specified higher α (e.g., 0.10)	Be sensitive to potential safety issues
Confirming no meaningful difference	Equivalence with pre-defined margin	TOST equivalence test (Chapter 120)	Standard tests cannot prove equivalence

Important note on equivalence testing: In the Two One-Sided Tests (TOST) procedure (Chapter 120), the null hypothesis states that the products are not equivalent. Rejecting this null hypothesis provides evidence of equivalence. This is the opposite of standard hypothesis testing.

Example — Standard superiority test:

A pharmaceutical company tests whether their new drug is better than placebo. Before looking at data, they pre-specify α = 0.05 (or α = 0.01 in high-risk contexts) to control false efficacy claims.

Example — Equivalence test (different null hypothesis):

A manufacturer wants to show that a generic drug is equivalent to the brand-name drug. In TOST, H₀ states the drugs are not equivalent. By rejecting H₀, they demonstrate equivalence. A standard t-test cannot prove equivalence — a non-significant result only means “we couldn’t detect a difference.”

Question 2: What is the consequence of each type of error?

Table A.7: Types of errors

Error Type	Description	Example Consequence
Type I (false positive)	Reject H₀ when it’s true	Approve an ineffective drug
Type II (false negative)	Fail to reject H₀ when it’s false	Miss an effective treatment

Balance these risks based on your context:

Medical screening: Prefer false positives (Type I) over missed diseases (Type II)
Criminal trials: Prefer false negatives (acquit guilty) over false positives (convict innocent)
Quality control: Balance depends on cost of defective products vs. cost of discarding good ones

A.4.2 B2: One-Sample Tests

Question: Does my sample come from a population with a specific parameter value?

Variable is NUMERIC — Testing LOCATION / DISTRIBUTION:

Data approximately normal OR n ≥ 30 (rough guideline; for highly skewed or heavy-tailed data, prefer larger samples or bootstrap/Wilcoxon)?
- Yes → One-sample t-test (Chapter 114)
  - Shiny App: Hypotheses / Empirical Tests
- No, small sample, non-normal → Wilcoxon signed-rank test (Chapter 117)
  - Shiny App: Hypotheses / Empirical Tests
  - Interpretation note: nonparametric conclusions are about location/distribution shifts (median under symmetry), not means.
Testing equivalence to a value (not just difference)?
- Use one-sample equivalence test

If your estimand is the mean under non-normality, consider bootstrap or permutation methods for mean differences.

Variable is NUMERIC — Testing the VARIANCE:

Chi-squared test for variance (Chapter 105)
- Shiny App: Hypotheses / Empirical Tests

Variable is NUMERIC — Testing the DISTRIBUTION:

Against a specific distribution → K-S test (Chapter 125)
For normality → Shapiro-Wilk test (Section 125.10), K-S test (use Lilliefors correction when parameters are estimated), or QQ plot (Chapter 76)
- Shiny App: Descriptive / Normal QQ Plot
Against expected frequencies → Chi-squared goodness-of-fit (Chapter 124)
- Shiny App: Hypotheses / Empirical Tests

Variable is CATEGORICAL — Testing a PROPORTION:

One-sample proportion test (binomial test) (Chapter 106)
- Shiny App: Hypotheses / Empirical Tests

Example — One-sample t-test:

A manufacturer claims their batteries last 500 hours on average. You test 25 batteries and want to check this claim.

battery_life <- c(495, 502, 498, 510, 485, 492, 505, 488, 515, 499,
                  478, 503, 497, 506, 494, 501, 489, 508, 496, 502,
                  493, 507, 490, 504, 500)

t.test(battery_life, mu = 500)


    One Sample t-test

data:  battery_life
t = -1.0109, df = 24, p-value = 0.3222
alternative hypothesis: true mean is not equal to 500
95 percent confidence interval:
 494.7683 501.7917
sample estimates:
mean of x 
   498.28

A.4.3 B3: Two-Sample Tests

Question: Do two groups differ on some measure?

First: Are the samples PAIRED or INDEPENDENT?

Table A.8: Paired vs. Independent samples

Paired Example	Independent Example
Same subject measured before and after treatment	Different people in treatment and control groups
Twins or matched pairs	Random assignment to two groups
Left eye vs. right eye of same person	Men vs. women
Same item measured by two methods	Two different products

PAIRED samples — Numeric variable:

Differences approximately normal → Paired t-test (Chapter 116)
- Shiny App: Hypotheses / Empirical Tests
Non-normal differences → Wilcoxon signed-rank test (Chapter 117)
- Shiny App: Hypotheses / Empirical Tests

PAIRED samples — Categorical variable:

McNemar’s test (Section 124.9.2)
- Shiny App: Hypotheses / Empirical Tests

INDEPENDENT samples — Testing LOCATION / DISTRIBUTION:

Both groups approximately normal?
- Equal variances → Two-sample t-test (Chapter 118)
  - Shiny App: Hypotheses / Empirical Tests
- Unequal variances → Welch’s t-test (Chapter 119)
  - Shiny App: Hypotheses / Empirical Tests
Non-normal or ordinal → Mann-Whitney U test (Chapter 121)
- Shiny App: Hypotheses / Empirical Tests
- Interpretation note: this test targets distribution/location differences (median under additional assumptions), not mean differences.

If your estimand is the mean under non-normality, consider bootstrap or permutation methods for mean differences.

INDEPENDENT samples — Testing EQUIVALENCE:

TOST procedure (Chapter 120)
- Shiny App: Hypotheses / Equivalence Testing

INDEPENDENT samples — Testing VARIANCES:

F-test or Levene’s test (Chapter 110)
- Shiny App: Hypotheses / Empirical Tests

INDEPENDENT samples — Categorical variable:

2×2 table → Chi-squared (Chapter 124) or Fisher’s exact test (Section 124.7)
Larger table → Chi-squared test (Chapter 124)
- Shiny App: Hypotheses / Empirical Tests

A.4.4 B4: More Than Two Groups

INDEPENDENT groups — Numeric variable:

Normality and equal variances → One-way ANOVA (Chapter 126)
- Shiny App: Hypotheses / Empirical Tests
Normality holds, variances unequal → Welch’s one-way ANOVA (oneway.test(..., var.equal = FALSE))
- Shiny App: Hypotheses / Empirical Tests
Two factors (grouping variables) → Two-way ANOVA (Chapter 128)
- Shiny App: Hypotheses / Empirical Tests
Non-normal or ordinal → Kruskal-Wallis test (Chapter 127)
- Shiny App: Hypotheses / Multivariate (pair-wise) Testing
- Interpretation note: nonparametric conclusions are about rank/location/distribution differences, not mean differences.

RELATED groups (repeated measures) — Numeric variable:

Sphericity holds → Repeated measures ANOVA (Chapter 129)
Sphericity violated, approximately normal → Repeated measures ANOVA with Greenhouse-Geisser or Huynh-Feldt correction (Chapter 129)
Sphericity violated, non-normal or ordinal → Friedman test (Chapter 130)

Categorical variable:

Chi-squared test for independence (Chapter 124)
- Shiny App: Hypotheses / Empirical Tests

A.4.5 B5: Association and Correlation Tests

Both variables NUMERIC:

Linear relationship, bivariate normal → Pearson correlation test (Chapter 131)
- Shiny App: Descriptive / Multivariate Descriptive Statistics
Partial Pearson correlation → Partial correlation test (Chapter 73)
- Shiny App: Descriptive / Multivariate Descriptive Statistics
Non-linear or ordinal → Spearman or Kendall correlation (Section 72.4)
- Shiny App: Descriptive / Multivariate Descriptive Statistics

Both variables CATEGORICAL:

Chi-squared test of independence (Chapter 124)
- Shiny App: Hypotheses / Empirical Tests

One NUMERIC, one CATEGORICAL:

Compare means across categories → t-test (2 groups) or ANOVA (3+ groups)
- Shiny App: Hypotheses / Empirical Tests

A.4.6 B6: Goodness-of-Fit Tests

Table A.9: Goodness-of-fit tests

Situation	Test	Chapter	App
Against a fully specified distribution	Kolmogorov-Smirnov test	📖	🔗
Testing normality (parameters estimated)	Shapiro-Wilk (Section 125.10) or Lilliefors test (Section 125.9)	📖	🔗
Comparing observed to expected frequencies	Chi-squared goodness-of-fit	📖	🔗
Comparing two samples’ distributions	Two-sample K-S test	📖	🔗

A.5 Part C: Regression and Modeling

A.5.1 C1: The Critical Question — Inference or Prediction?

Before building any model, answer: Why am I building this model?

Table A.10: Inference vs. Prediction

Aspect	Inference Goal	Prediction Goal
Primary aim	Understand relationships between variables	Forecast outcomes for new cases
Focus	Test hypotheses about coefficients	Minimize prediction error
Interpretation	Interpret the effect of each predictor	Accuracy matters more than interpretability
Key output	p-values and confidence intervals are important	Out-of-sample performance is key
Complexity	Simpler models may be preferred for clarity	Complex models are acceptable if they predict well

Example — Inference focus:

A health researcher wants to know: “Does smoking increase the risk of heart disease, after controlling for age and exercise?”

Primary interest: The coefficient for smoking (odds ratio)
Needs: Statistical significance, confidence interval
Model: Logistic regression with interpretable coefficients
Key output: “Smokers have 2.3 times the odds of heart disease (95% CI: 1.8-2.9, p < 0.001)”

Example — Prediction focus:

A hospital wants to predict which patients are likely to be readmitted within 30 days.

Primary interest: Accurate identification of high-risk patients
Needs: Good sensitivity and specificity on new patients
Model: Could use logistic regression, decision trees, or any method that predicts well
Key output: “The model correctly identifies 75% of patients who will be readmitted (AUC = 0.82)”

A.5.2 C2: Choosing a Regression Model

CONTINUOUS outcome (numeric):

One predictor, linear relationship → Simple linear regression (Chapter 134)
- Shiny App: Models / Manual Model Building (Regression tab)
One predictor, non-linear relationship → Polynomial or transformed regression
- Shiny App: Models / Manual Model Building (Regression tab)
Multiple predictors → Multiple linear regression (Chapter 135)
- Shiny App: Models / Manual Model Building (Regression tab)

BINARY outcome (yes/no, success/failure):

Logistic regression (Chapter 136)
- Shiny App: Models / Manual Model Building (GLM tab, family = binomial)

COUNT outcome (0, 1, 2, 3, …):

Variance ≈ Mean → Poisson regression (Section 137.2)
- Shiny App: Models / Manual Model Building (GLM tab, family = poisson)
Variance > Mean (overdispersion) → Quasipoisson (Section 137.3.1) or Negative binomial regression (Section 137.3.2)
- Shiny App: Models / Manual Model Building (GLM tab, family = quasipoisson)

PROPORTIONS (between 0 and 1):

Quasibinomial regression (Section 137.4)
- Shiny App: Models / Manual Model Building (GLM tab, family = quasibinomial)

CATEGORICAL outcome (more than two unordered categories):

Multinomial logistic regression (Section 138.1)

ORDINAL outcome (ordered categories):

Ordinal logistic regression (Section 138.2)

TIME-TO-EVENT (survival data):

Cox proportional hazards regression (Chapter 139)

FLEXIBLE ALTERNATIVE — Conditional Inference Trees:

The ctree() function (Chapter 140) can handle almost any outcome type — continuous, binary, categorical, count, or ordinal — making it a versatile choice when you’re unsure which regression model to use or when you want an interpretable tree-based model.

Shiny App: Models / Manual Model Building (Tree tab)

Important: The outcome variable must have the correct R data type. For example, if heartattack is coded as 0/1, you must convert it to a factor (factor(heartattack)) for ctree() to treat it as a classification problem rather than regression.

A.5.3 C3: Classification Methods

Table A.11: Classification methods comparison

Method	When to Use	Pros	Cons	Chapter	App
Logistic regression	Binary outcome, want interpretable model	Coefficients are odds ratios, well-understood	Assumes linear relationship in log-odds	📖	🔗
Conditional inference tree	Any outcome, want visual rules	Easy to interpret, handles interactions	May overfit, less stable	📖	🔗
Naive Bayes	Many categorical predictors	Fast, handles many features	Assumes independence of predictors	📖	🔗

A.5.4 C4: Evaluating Classification Models

After building a classifier, how do you evaluate it?

Table A.12: Classification evaluation metrics

Metric	Use When	Chapter	App
Accuracy	Classes are balanced	📖	🔗
Sensitivity (Recall)	Missing positives is costly	📖	🔗
Specificity	False positives are costly	📖	🔗
AUC	Overall discrimination ability	📖	🔗
Precision	False positives are very costly	📖	🔗
Binomial classification metrics	Binary classification summaries	📖	🔗

Example — Choosing the right metric:

Cancer screening: Missing a cancer (false negative) is much worse than a false alarm. Prioritize sensitivity.

Spam filter: Losing an important email (false positive) is worse than seeing some spam. Prioritize specificity or precision.

Credit scoring: Need to balance defaults (false negatives) and rejected good customers (false positives). Use AUC and examine the ROC curve (Chapter 60) to choose an appropriate threshold.

A.5.5 C5: Choosing a Classification Threshold

For probabilistic classifiers (logistic regression, etc.), you must choose a threshold to convert probabilities to class predictions.

Table A.13: Threshold selection methods

Method	When to Use	Chapter	App
Default 0.5	Classes are balanced, costs are equal	📖	🔗
Youden’s index	Maximize sensitivity + specificity	📖	🔗
Cost-optimal threshold	Different costs for false positives vs. false negatives	📖	🔗

A.5.6 C6: Model Diagnostics — A Critical Checkpoint

After fitting any model, you must check whether the model is appropriate. This step is often skipped but is essential for valid inference and reliable predictions.

A.5.6.1 Minimum Diagnostic Set (Do This First)

If you only do a few checks, do these:

Residuals vs. fitted (or predicted vs. observed for classifiers)
QQ plot of residuals (if residuals exist)
Residuals vs. observation order (independence)
One influence check (Cook’s distance / leverage)

A.5.6.2 Quick Workflow (Order Matters)

Check fit structure: residuals vs. fitted (or predicted vs. observed)
Check distribution: QQ plot / histogram of residuals
Check independence: residuals vs. time/order, ACF
Check leverage: influence diagnostics
If problems appear: apply fixes, then re-check

A.5.6.3 Inference vs. Prediction (Different Priorities)

Inference models: assumptions (normality, homoscedasticity, independence) matter most for valid p-values and confidence intervals.
Prediction models: calibration and out-of-sample performance matter most; assumptions matter mainly if they harm predictive accuracy.

A.5.6.4 What “Good” Looks Like (At a Glance)

Residuals vs. fitted: random cloud, no curve or funnel
QQ plot: points roughly on the line
Residuals vs. order: no trend or cycles
ACF: no large spikes
Influence: no single point dominates

A.5.6.5 Key Insight: A1/A2 Methods Applied to Residuals

Almost all the descriptive and exploratory methods from Parts A1 and A2 can be reused as model diagnostics — simply apply them to the residuals (or other model outputs) instead of the raw data. If your model is appropriate, the residuals should look like random noise with no structure.

Table A.14: A1/A2 methods as model diagnostics

A1/A2 Method	Applied to Residuals	What It Checks	Problem Signs	Chapter
Histogram (A1)	Histogram of residuals	Normality assumption	Skewed, bimodal, or heavy-tailed	Chapter 62
Box plot (A1)	Box plot of residuals	Outliers, symmetry	Many outliers, asymmetric distribution	Chapter 69
QQ plot (A2)	QQ plot of residuals	Normality assumption	Points deviate from diagonal line	Chapter 76
Scatter plot (A1)	Residuals vs. fitted values	Linearity, homoscedasticity	Curved pattern, funnel shape	Chapter 70
Scatter plot (A1)	Residuals vs. each predictor	Missed nonlinearity	Curved pattern for any predictor	Chapter 70
Time series plot (A2)	Residuals vs. observation order	Independence	Trends, cycles, or autocorrelation	Chapter 146
ACF/PACF (A2)	ACF of residuals	Independence, autocorrelation	Significant spikes at any lag	Section 92.6
Kernel density (A2)	Density of residuals	Distribution shape	Non-normal shape	Section 80.12
Summary statistics (A1)	Mean, SD of residuals	Model fit	Mean ≠ 0, large SD relative to response	Section 65.2, Section 66.6
Correlation (A1)	Residuals vs. predictors	Missed relationships	Significant correlation (should be ~0)	Section 71.3

The principle: If residuals show any systematic pattern detectable by A1/A2 methods, your model is missing something.

Note on ctree: Conditional inference trees do not produce residuals in the traditional sense (they predict class labels or means for terminal nodes). For ctree diagnostics, focus on comparing predicted vs. actual values, examining terminal node purity, and evaluating out-of-sample performance rather than residual analysis.

A.5.6.6 Decision Table: If You See This, Do That

Table A.15: Diagnostic problem-fix guide

Problem	Most Common Fix	Chapter
Curved residual pattern	Add nonlinear terms, transform variables	Chapter 134
Funnel shape (heteroscedasticity)	Transform response, use robust SE, WLS	Chapter 134
Strong autocorrelation	Use time series model or add lagged terms	Chapter 148
Non-normal residuals	Robust methods or transformation	Chapter 76
High influence point	Investigate, refit with/without, robust regression	Chapter 134
Poor calibration (classification)	Recalibrate, try different model	Chapter 136

Shiny Apps for residual diagnostics:

Descriptive / Histogram & Frequency Table — histogram of residuals
Descriptive / Normal QQ Plot — normality check
Descriptive / Kernel Density Estimation — residual distribution shape
Time Series / (P)ACF — autocorrelation in residuals

If you only do two plots in Shiny: residuals vs. fitted (scatter plot) and QQ plot.

A.5.6.7 For Linear Regression Models

Table A.16: Regression diagnostics checklist

What to Check	Diagnostic	Problem Signs	If Violated	Chapter
Linearity	Residuals vs. fitted plot	Curved pattern	Transform variables, add polynomial terms, or use nonlinear model	📖
Normality of residuals	QQ plot, Shapiro-Wilk test	Points deviate from line	May affect inference (CI, p-values); consider robust methods or transformations	📖
Constant variance	Residuals vs. fitted plot	Funnel/fan shape (heteroscedasticity)	Use weighted least squares or robust standard errors	📖
Independence	Residuals vs. order plot, Durbin-Watson test	Pattern over time	Use time series methods or add lagged terms	📖
Multicollinearity	Variance Inflation Factor (VIF)	VIF > 5 or 10	Remove or combine correlated predictors	📖
Influential points	Cook’s distance, leverage	Cook’s D > 1 or high leverage	Investigate outliers; consider robust regression	📖

# Quick diagnostic example for linear regression
set.seed(42)
x <- 1:50
y <- 2 + 0.5*x + rnorm(50, sd = 3)
model <- lm(y ~ x)

# Standard diagnostic plots
par(mfrow = c(2, 2))
plot(model)

par(mfrow = c(1, 1))

A.5.6.8 Common Mistakes in Diagnostics

Checking only R^2 and ignoring residuals
Declaring success because p-values are small
Removing outliers without documenting why
Ignoring multicollinearity when coefficients are unstable

A.5.6.9 For Logistic Regression and Classification Models

Table A.17: Classification diagnostics checklist

What to Check	Diagnostic	Problem Signs	If Violated	Chapter
Calibration	Calibration plot (observed vs. predicted probabilities)	Predicted probabilities don’t match observed rates	Recalibrate or use different model	📖
Discrimination	ROC curve, AUC	AUC near 0.5	Model has no predictive power; add features or try different model	📖
Influential observations	dfbetas, Cook’s distance analog	Large influence on coefficients	Investigate; consider robust methods	📖
Goodness of fit	Hosmer-Lemeshow test	Significant p-value	Model doesn’t fit well; consider interactions or nonlinear terms	📖
Separation	Coefficient estimates very large	Perfect or quasi-separation	Use penalized regression (Firth) or exact methods	📖

A.5.6.10 For Conditional Inference Trees

Table A.18: Tree model diagnostics

What to Check	Diagnostic	Problem Signs	If Violated
Overfitting	Compare training vs. test performance	Large gap between training and test accuracy	Increase `mincriterion`, prune tree
Tree too simple	Only 1-2 terminal nodes	Not capturing real patterns	Decrease `mincriterion`, check data quality
Instability	Different trees from bootstrap samples	Tree structure changes dramatically	Consider ensemble methods (random forest)

A.5.6.11 Decision: When to Revisit Your Model Choice

Diagnostics reveal problems?
│
├── YES: Assumptions badly violated
│   ├── Try transformation (log, sqrt)
│   ├── Try nonparametric alternative
│   ├── Try robust methods
│   └── Consider different model family
│
├── YES: Poor fit or discrimination
│   ├── Add/remove predictors
│   ├── Add interaction terms
│   ├── Try flexible model (ctree, GAM)
│   └── Collect better data
│
└── NO: Diagnostics look acceptable
    └── Proceed with inference or prediction

Key principle: A model with good diagnostics and moderate fit is more trustworthy than a model with excellent fit but violated assumptions.

Important: Passing diagnostics does not prove a causal relationship; it only supports that the model is not obviously misspecified.

A.5.7 C7: Time Series Models

Time series modeling is for sequential data where observations are ordered in time and typically dependent on previous values. This is fundamentally different from cross-sectional modeling where observations are independent.

When to use time series models instead of regression:

Data are collected over time at regular intervals
You want to forecast future values
Observations are autocorrelated (today’s value depends on yesterday’s)
There’s trend, seasonality, or other temporal patterns

A.5.7.1 Time Series EDA (before modeling)

Before building a time series model, explore the data using the methods in Table A.4 (Section A2 above).

A.5.7.2 Choosing a Time Series Model

Table A.19: Time series model selection

Situation	Model	When to Use	Chapter	App
Simple trend, no seasonality	Exponential smoothing (Holt)	Short-term forecasts, smooth data	📖	🔗
Trend + seasonality	Holt-Winters	Clear seasonal pattern	📖	🔗
Complex autocorrelation	ARIMA	Flexible, handles many patterns	📖	🔗
Seasonal + complex	Seasonal ARIMA (SARIMA)	Seasonal with ARIMA errors	📖	🔗
Known event / structural break	Intervention analysis	ARIMA + pulse/step dummy	📖	🔗
External input influences Y	Transfer function / ARIMAX	Input series available	📖	🔗

A.5.7.3 Time Series Model Identification (Box-Jenkins approach)

1. Plot the series → Check for trend, seasonality, variance changes
   │
2. Make stationary → Differencing (d), seasonal differencing (D)
   │                  Transform if variance changes (log, sqrt)
   │
3. Examine ACF/PACF → Identify p, q (and P, Q for seasonal)
   │                   ACF cuts off → MA(q)
   │                   PACF cuts off → AR(p)
   │                   Both decay → ARMA(p,q)
   │
4. Fit model → Estimate parameters
   │
5. Diagnostics → Check residuals (should be white noise)
   │              Ljung-Box test, residual ACF
   │
6. Forecast → Generate predictions with confidence intervals

A.5.7.4 Time Series Diagnostics

After fitting a time series model, check the residuals:

Table A.20: Time series diagnostics

What to Check	Method	Problem Signs	If Violated
No autocorrelation	Residual ACF, Ljung-Box test	Significant spikes in ACF	Increase model order or add seasonal terms
Normality	QQ plot, histogram of residuals	Non-normal shape	May affect confidence intervals
Constant variance	Plot residuals over time	Funnel shape, changing spread	Consider GARCH or transformation
No pattern	Residuals vs. fitted	Systematic pattern	Model is missing something

Shiny App: Time Series / ARIMA (includes residual diagnostics)

A.6 Part D: Common Mistakes and How to Avoid Them

A.6.1 Mistake 1: Using the Wrong Test for Your Purpose

Wrong: Using a standard t-test to show two products are equivalent.

A non-significant p-value does NOT prove equivalence — it only means you failed to detect a difference. This could be because there is no difference, OR because your sample was too small.

Right: Use equivalence testing (TOST, Chapter 120) when you want to demonstrate that two things are practically the same.

A.6.2 Mistake 2: Ignoring the Purpose When Evaluating Models

Wrong: Choosing a regression model based on R² when your goal is prediction.

R² measures fit to the training data. A model with high R² might overfit and predict poorly on new data.

Right: For prediction, use cross-validation or a held-out test set to evaluate performance on new data.

A.6.3 Mistake 3: Confusing Description with Inference

Wrong: Calculating a confidence interval when you have the entire population.

If you surveyed all 50 employees in a company, the mean salary is the population mean — there’s nothing to infer.

Right: Confidence intervals and p-values are for inference from samples to populations. If you have the whole population, just report the descriptive statistics.

A.6.4 Mistake 4: Choosing α Without Considering Consequences

Wrong: Always using α = 0.05 because “that’s what everyone does.”

The 5% significance level is arbitrary. In some contexts, 1% is more appropriate; in others, 10% makes sense.

Right: Consider the costs of Type I and Type II errors. Set α to balance these costs appropriately.

A.6.5 Mistake 5: Testing Many Hypotheses Without Correction

Wrong: Testing 20 different relationships and reporting only the significant ones.

With 20 tests at α = 0.05, you expect about 1 false positive by chance alone.

Right: Use multiple testing corrections (Bonferroni, FDR) or pre-register your primary hypothesis.

A.7 Quick Reference Tables

A.7.1 By Research Question

Table A.21: Methods by research question

Question	Method	Requirements	App
Is the mean different from a specific value?	One-sample t-test	Numeric, approximately normal	🔗
Are two means different?	Two-sample t-test or Welch’s	Numeric, independent groups	🔗
Are paired measurements different?	Paired t-test	Numeric, matched pairs	🔗
Are two groups equivalent?	TOST	Numeric, specify equivalence margin	🔗
Are three or more means different?	ANOVA	Numeric, independent groups	🔗
Are two categorical variables associated?	Chi-squared test	Categorical, sufficient counts	🔗
Does data follow a distribution?	K-S test	Continuous data	🔗
Is there a linear relationship?	Correlation test	Two numeric variables	🔗
Does X predict Y?	Regression	Depends on Y type	🔗

A.7.2 By Variable Type

Table A.22: Methods by variable types

Outcome	Predictor(s)	Method	App
Numeric	None (one group)	One-sample t-test	🔗
Numeric	Categorical (2 groups)	Two-sample t-test	🔗
Numeric	Categorical (3+ groups)	ANOVA	🔗
Numeric	Numeric (1 predictor)	Simple linear regression	🔗
Numeric	Mixed (multiple)	Multiple linear regression	🔗
Binary	Mixed	Logistic regression (GLM, binomial)	🔗
Categorical	Categorical	Chi-squared test	🔗
Count	Mixed	Poisson regression (GLM, poisson)	🔗
Time series	Time	ARIMA, exponential smoothing	🔗

For an interactive version of this guide, try the Method Selection Tool — select your constraints (purpose, data type, assumptions) and it will show matching methods.

Use this guide in conjunction with the introductory chapter (Chapter 4) to ensure your statistical analysis matches your research goals.