• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Appendices
  2. A  Method Selection Guide
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • A.1 How to Use This Guide
    • A.1.1 Quick-Reference Table
    • A.1.2 Path A: Description & Exploration
    • A.1.3 Path B: Hypothesis Testing
    • A.1.4 Path C: Modeling & Prediction
    • A.1.5 Path D: Time Series
  • A.2 Define Your Purpose
  • A.3 Part A: Descriptive and Exploratory Analysis
    • A.3.1 A1: Pure Description
    • A.3.2 A2: Exploratory Data Analysis (EDA)
    • A.3.3 A3: Residual Analysis
  • A.4 Part B: Hypothesis Testing
    • A.4.1 B1: Before Choosing a Test — Critical Questions
    • A.4.2 B2: One-Sample Tests
    • A.4.3 B3: Two-Sample Tests
    • A.4.4 B4: More Than Two Groups
    • A.4.5 B5: Association and Correlation Tests
    • A.4.6 B6: Goodness-of-Fit Tests
  • A.5 Part C: Regression and Modeling
    • A.5.1 C1: The Critical Question — Inference or Prediction?
    • A.5.2 C2: Choosing a Regression Model
    • A.5.3 C3: Classification Methods
    • A.5.4 C4: Evaluating Classification Models
    • A.5.5 C5: Choosing a Classification Threshold
    • A.5.6 C6: Model Diagnostics — A Critical Checkpoint
    • A.5.7 C7: Time Series Models
  • A.6 Part D: Common Mistakes and How to Avoid Them
    • A.6.1 Mistake 1: Using the Wrong Test for Your Purpose
    • A.6.2 Mistake 2: Ignoring the Purpose When Evaluating Models
    • A.6.3 Mistake 3: Confusing Description with Inference
    • A.6.4 Mistake 4: Choosing α Without Considering Consequences
    • A.6.5 Mistake 5: Testing Many Hypotheses Without Correction
  • A.7 Quick Reference Tables
    • A.7.1 By Research Question
    • A.7.2 By Variable Type
  1. Appendices
  2. A  Method Selection Guide

Appendix A — Method Selection Guide

This guide helps you choose the appropriate statistical method for your situation. Unlike typical “decision trees” that jump straight to data types and sample sizes, we start with the most important question: What are you trying to accomplish?

A.1 How to Use This Guide

Start with the Purpose Hub below, then follow the flowchart for your path. Click any destination box (marked with →) to jump directly to that section.

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#E3F2FD', 'primaryTextColor': '#1565C0', 'primaryBorderColor': '#1565C0', 'lineColor': '#666666', 'secondaryColor': '#E8F5E9', 'tertiaryColor': '#FFF3E0', 'fontSize': '14px'}}}%%
flowchart LR
    S1["<b>STEP 1: What is your PURPOSE?</b><br/><i>Choose your primary goal</i>"]
    S1 --> Describe["🔵 Describe / Explore"]
    S1 --> Test["🟢 Test hypothesis"]
    S1 --> Model["🟠 Build model / Predict"]

    Describe --> APath["→ <b>Path A</b><br/>Description & Exploration"]
    Test --> BPath["→ <b>Path B</b><br/>Hypothesis Testing"]
    Model --> MSplit["<b>STEP 2: Data STRUCTURE?</b><br/><i>Modeling</i>"]
    MSplit --> CPath["→ <b>Path C</b><br/>Modeling (Non-sequential)"]
    MSplit --> DPath["→ <b>Path D</b><br/>Modeling (Time Series)"]

    classDef stepBox fill:#F5F5F5,stroke:#424242,stroke-width:2px,color:#333
    classDef blueBox fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#1565C0
    classDef greenBox fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#2E7D32
    classDef orangeBox fill:#FFF3E0,stroke:#EF6C00,stroke-width:2px,color:#EF6C00
    classDef purpleBox fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#7B1FA2
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class S1 stepBox
    class Describe,APath blueBox
    class Test,BPath greenBox
    class Model,MSplit,CPath orangeBox
    class DPath purpleBox

    click APath "#sec-flowchart-path-a"
    click BPath "#sec-flowchart-path-b"
    click CPath "#sec-flowchart-path-c"
    click DPath "#sec-flowchart-path-d"
Figure A.1: Purpose Hub. Click a destination to jump to the relevant path.

A.1.1 Quick-Reference Table

If you already know your research question, use this table to jump directly to the right method.

Research Question Method Chapter App
What does my data look like? Descriptive statistics (mean, SD, quantiles) 📖 🔗
Are there measurement or data quality issues? Data quality forensics 📖 🔗
Predict a single value (no covariates) Mean / median / trimmed / geometric / harmonic / midrange 📖 🔗
Test/CI for central tendency (no model) Bootstrap / Blocked bootstrap 📖 🔗
Is my data normally distributed? QQ plot, Shapiro-Wilk, K-S test (with Lilliefors correction) 📖 🔗
Are there outliers? Box plot, Z-scores 📖 🔗
Is the mean different from a value? One-sample t-test 📖 🔗
Are two group means different? Two-sample t-test / Welch’s 📖 🔗
Are paired measurements different? Paired t-test 📖 🔗
Are two groups equivalent? TOST equivalence test 📖 🔗
Are 3+ group means different? ANOVA 📖 🔗
Are two categorical variables associated? Chi-squared test 📖 🔗
Is there a linear relationship? Correlation test (Pearson / Spearman) 📖 🔗
Does X predict Y (continuous)? Linear regression 📖 🔗
Does X predict Y (binary)? Logistic regression 📖 🔗
Classify cases into categories? Logistic regression / ctree / Naive Bayes 📖 🔗
Forecast a time series? ARIMA / Holt-Winters 📖 🔗
Does an external event affect a time series? Intervention analysis 📖 🔗
Does X dynamically influence Y over time? Transfer function / CCF 📖 🔗
Does data fit a specific distribution? K-S test / Chi-squared GoF 📖 🔗

A.1.2 Path A: Description & Exploration

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#E3F2FD', 'primaryTextColor': '#1565C0', 'primaryBorderColor': '#1565C0', 'lineColor': '#666666', 'fontSize': '14px'}}}%%
flowchart LR
    AStart["<b>Describe or Explore?</b>"]
    AStart --> ADescribe["🔵 Describe"]
    AStart --> AExplore["🔵 Explore"]

    ADescribe --> A1["→ <b>A1</b><br/>Pure Description"]
    AExplore --> A2["→ <b>A2</b><br/>Exploratory Analysis"]

    classDef blueBox fill:#E3F2FD,stroke:#1565C0,stroke-width:2px,color:#1565C0
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class AStart,ADescribe,AExplore blueBox
    class A1,A2 destBox

    click A1 "#a1-pure-description"
    click A2 "#a2-exploratory-data-analysis-eda"

NoteData Quality Forensics App

For a more comprehensive diagnostic workflow (beyond the examples in Path A), use the app: https://shiny.wessa.net/dataqualityforensics/.

A.1.3 Path B: Hypothesis Testing

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#E8F5E9', 'primaryTextColor': '#2E7D32', 'primaryBorderColor': '#2E7D32', 'lineColor': '#666666', 'fontSize': '14px'}}}%%
flowchart LR
    BStart["<b>STEP 2: Data TYPE?</b><br/><i>Hypothesis testing</i>"]
    BStart --> BNum["🟢 Numeric"]
    BStart --> BCat["🟢 Categorical"]
    BStart --> BDist["🟢 Distribution fit"]

    BCat --> B5["→ <b>B5</b><br/>Association Tests"]
    BDist --> B6["→ <b>B6</b><br/>Goodness-of-Fit"]

    BNum --> BGroups["<b>STEP 3: How MANY groups?</b>"]
    BGroups --> B1["🟢 1 sample"]
    BGroups --> B2g["🟢 2 groups"]
    BGroups --> B3g["🟢 3+ groups"]

    B1 --> B2["→ <b>B2</b><br/>One-Sample Tests"]
    B2g --> B3["→ <b>B3</b><br/>Two-Sample Tests"]
    B3g --> B4["→ <b>B4</b><br/>Multi-Group Tests"]

    classDef greenBox fill:#E8F5E9,stroke:#2E7D32,stroke-width:2px,color:#2E7D32
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class BStart,BNum,BCat,BDist,BGroups,B1,B2g,B3g greenBox
    class B2,B3,B4,B5,B6 destBox

    click B2 "#b2-one-sample-tests"
    click B3 "#b3-two-sample-tests"
    click B4 "#b4-more-than-two-groups"
    click B5 "#b5-association-and-correlation-tests"
    click B6 "#b6-goodness-of-fit-tests"

NoteCentral Tendency Without a Model

If your goal is to test or quantify a single central tendency (no predictors), use the Bootstrap Plot for CIs/tests of mean, median, or trimmed mean. For dependent or time‑ordered data, use the Blocked Bootstrap Plot. Apps: https://shiny.wessa.net/bootstrap/.

A.1.4 Path C: Modeling & Prediction

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#FFF3E0', 'primaryTextColor': '#EF6C00', 'primaryBorderColor': '#EF6C00', 'lineColor': '#666666', 'fontSize': '14px'}}}%%
flowchart LR
    CStart["<b>STEP 2: Outcome TYPE?</b><br/><i>Modeling</i>"]
    CStart --> CCont["🟠 Continuous"]
    CStart --> CBin["🟠 Binary"]
    CStart --> CCount["🟠 Count"]
    CStart --> CAny["🟠 Any type"]

    CCont --> C2Reg["→ <b>C2</b><br/>Linear Regression"]
    CBin --> C2Log["→ <b>C2</b><br/>Logistic Regression"]
    CCount --> C2Pois["→ <b>C2</b><br/>Poisson/GLM"]
    CAny --> C2Tree["→ <b>C2</b><br/>ctree - flexible"]

    C2Reg --> C6["→ <b>C6</b><br/>Model Diagnostics"]
    C2Log --> C6
    C2Pois --> C6
    C2Tree --> C6

    classDef orangeBox fill:#FFF3E0,stroke:#EF6C00,stroke-width:2px,color:#EF6C00
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class CStart,CCont,CBin,CCount,CAny orangeBox
    class C2Reg,C2Log,C2Pois,C2Tree,C6 destBox

    click C2Reg "#c2-choosing-a-regression-model"
    click C2Log "#c2-choosing-a-regression-model"
    click C2Pois "#c2-choosing-a-regression-model"
    click C2Tree "#c2-choosing-a-regression-model"
    click C6 "#sec-c6-model-diagnostics"

NoteBaseline Prediction (No Covariates)

When you only need a single‑number forecast, choose a central‑tendency estimator that matches your loss function or distribution: - Mean: symmetric/normal‑like data; minimizes squared error. - Median: robust to outliers; minimizes absolute error (L1 loss). - Trimmed mean: robustness-efficiency compromise; not an L1 minimizer. - Geometric mean: multiplicative or log‑normal processes. - Harmonic mean: rates or ratios (e.g., speed, productivity). - Midrange: uniform processes (center of min/max). Use Bootstrap Plot for uncertainty; use Blocked Bootstrap for dependent data.

A.1.5 Path D: Time Series

%%{init: {'theme': 'base', 'securityLevel': 'loose', 'flowchart': {'htmlLabels': true}, 'themeVariables': { 'primaryColor': '#F3E5F5', 'primaryTextColor': '#7B1FA2', 'primaryBorderColor': '#7B1FA2', 'lineColor': '#666666', 'fontSize': '14px'}}}%%
flowchart LR
    DStart["<b>Sequential / Time-Ordered Data?</b>"]
    DStart --> DEDA["🔵 Time Series EDA"]
    DStart --> DModel["🟣 Time Series Models"]
    DEDA --> A2TS["→ <b>A2</b><br/>Time Series EDA"]
    DModel --> C7["→ <b>C7</b><br/>Time Series Models"]

    classDef purpleBox fill:#F3E5F5,stroke:#7B1FA2,stroke-width:2px,color:#7B1FA2
    classDef destBox fill:#fff,stroke:#333,stroke-width:2px,color:#333,font-weight:bold

    class DStart,DEDA,DModel purpleBox
    class A2TS,C7 destBox

    click A2TS "#a2-exploratory-data-analysis-eda"
    click C7 "#c7-time-series-models"


A.2 Define Your Purpose

Before choosing any method, answer this question clearly:

What do I want to achieve with this analysis?

Table A.1: Primary purposes of statistical analysis
Purpose Description Example Go to
Describe Summarize the data at hand “What is the average income in our sample?” Part A
Explore Discover patterns, problems, or questions “Are there unusual subgroups in the data?” Part A
Test Evaluate a specific claim or hypothesis “Is the new drug better than placebo?” Part B
Estimate Quantify an unknown parameter “What is the population mean, with uncertainty?” Part B
Predict Forecast new or future observations “What will sales be next quarter?” Part C
Classify Assign cases to categories “Is this email spam or not?” Part C

Your purpose determines not just which method to use, but how to interpret the results.


A.3 Part A: Descriptive and Exploratory Analysis

A.3.1 A1: Pure Description

Goal: Summarize the data without generalizing beyond it.

When to use: You have a complete population (not a sample), or you simply want to characterize the data you have.

Example: A company wants to describe the salaries of its 50 employees.

Table A.2: Descriptive methods by data type
Data Type Methods Chapter App
One numeric variable Mean / median / SD / quantiles 📖 🔗
One numeric variable Stem-and-leaf plot 📖 🔗
One categorical variable Frequency table / proportions / bar plot 📖 🔗
Two numeric variables Correlation / scatter plot 📖 🔗
Two categorical variables Contingency table / mosaic plot 📖 🔗
Distribution shape Histogram / box plot 📖 🔗
Distribution shape Skewness / kurtosis / moments 📖 🔗
Concentration / inequality Concentration curve / Gini 📖 🔗

Key distinction: Descriptive statistics tell you about this specific dataset. They do not tell you about a larger population or what would happen with different data.

A.3.2 A2: Exploratory Data Analysis (EDA)

Goal: Discover unexpected patterns, anomalies, or new questions.

When to use: Before formal analysis, when you don’t know what to expect, or when checking data quality.

Example: A researcher receives a new dataset and wants to understand what’s in it before testing any hypotheses.

EDA Checklist:

  1. Check data quality
    • Missing values: How many? Random or systematic?
    • Outliers: Are extreme values errors or genuine?
    • Data types: Are variables coded correctly?
    • Digit preference: Do terminal digits follow a uniform distribution? (Chapter 63)
    • Value repetition: Do certain round numbers appear far more often than expected? (Section 63.4.1)
    • Cross-variable consistency: Do relationships between variables match domain knowledge? (Section 63.8)
  2. Examine distributions
    • Histogram for each numeric variable
    • Bar chart for each categorical variable
    • Look for: skewness, bimodality, gaps, unusual values
  3. Look for relationships
    • Scatter plots for pairs of numeric variables
    • Box plots of numeric variable by group
    • Correlation matrix for multiple variables
  4. Check for subgroups
    • Are there natural clusters?
    • Do patterns differ across subgroups?
Table A.3: EDA methods for cross-sectional data
EDA Goal Method(s) Chapter App
Find outliers Box plot 📖 🔗
Find outliers Scatter plot 📖 🔗
Find outliers Z-scores 📖 🔗
Check normality QQ plot 📖 🔗
Check normality Normal probability plot 📖 🔗
Check normality PPCC plot 📖 🔗
Check normality Histogram 📖 🔗
Check normality Box-Cox normality plot 📖 🔗
Check normality Skewness & kurtosis tests 📖 🔗
Check normality Skewness-kurtosis plot 📖 🔗
Check normality Kernel density 📖 🔗
Distribution fit ML fitting (distribution) 📖 🔗
Distribution fit Cullen-Frey graph 📖 🔗
Find clusters Scatter plot 📖 🔗
Find clusters Kernel density 📖 🔗
Examine joint density Bivariate KDE 📖 🔗
Detect relationships Correlation matrix 📖 🔗
Detect relationships Scatterplot matrix 📖 🔗
Compare rankings Rank order comparison 📖 🔗
Check reliability Cronbach alpha 📖 🔗
Check data quality Terminal digit analysis / Benford’s law 📖 🔗

A.3.2.1 Time Series EDA (Sequential Data)

If your data are ordered in time, use these additional EDA tools:

Table A.4: EDA methods for time series data
Goal Method What to Look For Chapter App
Visualize patterns Time series plot Trend, seasonality, outliers, level shifts 📖 🔗
Separate components Decomposition Separate trend, seasonal, residual components 📖 🔗
Check seasonal structure Mean Plot Seasonal stability across subseries 📖 🔗
Mean-SD structure SMP Variability vs. level relationship 📖 🔗
Choose differencing VRM Select d and D for stationarity 📖 🔗
Choose differencing ACF / PACF Significant spikes indicate dependence structure 📖 🔗
Choose differencing Spectral analysis / periodogram Hidden periodicities 📖 🔗
Resampling with dependence Blocked bootstrap Preserve autocorrelation structure 📖 🔗
Check autocorrelation ACF / PACF Significant spikes indicate dependence structure 📖 🔗
Detect cycles Spectral analysis / periodogram Hidden periodicities 📖 🔗
Check stationarity ADF / stationarity Constant mean/variance over time? 📖 🔗
Identify lead/lag relationships Cross-Correlation Function (CCF) Lead/lag between two series 📖
Test temporal causation Granger causality test Does X help predict Y? 📖

Key insight: Time series EDA is essential before building time series models (C7). The patterns you discover here determine which model to use.

Key distinction from description: EDA is about discovery — you’re looking for things you didn’t know to look for. Description summarizes what you already expect to report.

A.3.3 A3: Residual Analysis

Goal: Check whether a fitted model is appropriate.

When to use: After fitting any model (regression, time series, etc.).

Example: After fitting a linear regression, you want to check if the assumptions hold.

What to check:

Table A.5: Residual diagnostics
Assumption Diagnostic What to Look For Chapter
Linearity Residuals vs fitted plot No curved pattern 📖
Normality QQ plot of residuals Points on the line 📖
Constant variance Residuals vs fitted plot No funnel shape 📖
Independence Residual time plot, ACF No pattern over time 📖
No outliers Standardized residuals Values within ±3 📖

Key distinction from EDA: Residual analysis checks model assumptions. EDA explores the raw data before modeling.

See Section A.5.6 for a worked example of regression residual diagnostics.


A.4 Part B: Hypothesis Testing

A.4.1 B1: Before Choosing a Test — Critical Questions

Question 1: What is the decision structure of your research?

This question is crucial and often overlooked. Set the significance level before data collection based on error costs and decision context, not on a preferred outcome.

Table A.6: Approach by research goal
Situation Decision Structure Approach Reasoning
Testing a new drug Superiority (difference from placebo) Standard test with pre-specified α (often 0.05) Protect against false claims of efficacy
Showing products are equivalent Equivalence with pre-defined margin TOST equivalence test (Chapter 120) In TOST, H₀ states “not equivalent”
Safety monitoring Signal detection (prioritize sensitivity) Pre-specified higher α (e.g., 0.10) Be sensitive to potential safety issues
Confirming no meaningful difference Equivalence with pre-defined margin TOST equivalence test (Chapter 120) Standard tests cannot prove equivalence

Important note on equivalence testing: In the Two One-Sided Tests (TOST) procedure (Chapter 120), the null hypothesis states that the products are not equivalent. Rejecting this null hypothesis provides evidence of equivalence. This is the opposite of standard hypothesis testing.

Example — Standard superiority test:

A pharmaceutical company tests whether their new drug is better than placebo. Before looking at data, they pre-specify α = 0.05 (or α = 0.01 in high-risk contexts) to control false efficacy claims.

Example — Equivalence test (different null hypothesis):

A manufacturer wants to show that a generic drug is equivalent to the brand-name drug. In TOST, H₀ states the drugs are not equivalent. By rejecting H₀, they demonstrate equivalence. A standard t-test cannot prove equivalence — a non-significant result only means “we couldn’t detect a difference.”

Question 2: What is the consequence of each type of error?

Table A.7: Types of errors
Error Type Description Example Consequence
Type I (false positive) Reject H₀ when it’s true Approve an ineffective drug
Type II (false negative) Fail to reject H₀ when it’s false Miss an effective treatment

Balance these risks based on your context:

  • Medical screening: Prefer false positives (Type I) over missed diseases (Type II)
  • Criminal trials: Prefer false negatives (acquit guilty) over false positives (convict innocent)
  • Quality control: Balance depends on cost of defective products vs. cost of discarding good ones

A.4.2 B2: One-Sample Tests

Question: Does my sample come from a population with a specific parameter value?

Variable is NUMERIC — Testing LOCATION / DISTRIBUTION:

  • Data approximately normal OR n ≥ 30 (rough guideline; for highly skewed or heavy-tailed data, prefer larger samples or bootstrap/Wilcoxon)?
    • Yes → One-sample t-test (Chapter 114)
      • Shiny App: Hypotheses / Empirical Tests
    • No, small sample, non-normal → Wilcoxon signed-rank test (Chapter 117)
      • Shiny App: Hypotheses / Empirical Tests
      • Interpretation note: nonparametric conclusions are about location/distribution shifts (median under symmetry), not means.
  • Testing equivalence to a value (not just difference)?
    • Use one-sample equivalence test

If your estimand is the mean under non-normality, consider bootstrap or permutation methods for mean differences.

Variable is NUMERIC — Testing the VARIANCE:

  • Chi-squared test for variance (Chapter 105)
    • Shiny App: Hypotheses / Empirical Tests

Variable is NUMERIC — Testing the DISTRIBUTION:

  • Against a specific distribution → K-S test (Chapter 125)
  • For normality → Shapiro-Wilk test (Section 125.10), K-S test (use Lilliefors correction when parameters are estimated), or QQ plot (Chapter 76)
    • Shiny App: Descriptive / Normal QQ Plot
  • Against expected frequencies → Chi-squared goodness-of-fit (Chapter 124)
    • Shiny App: Hypotheses / Empirical Tests

Variable is CATEGORICAL — Testing a PROPORTION:

  • One-sample proportion test (binomial test) (Chapter 106)
    • Shiny App: Hypotheses / Empirical Tests

Example — One-sample t-test:

A manufacturer claims their batteries last 500 hours on average. You test 25 batteries and want to check this claim.

battery_life <- c(495, 502, 498, 510, 485, 492, 505, 488, 515, 499,
                  478, 503, 497, 506, 494, 501, 489, 508, 496, 502,
                  493, 507, 490, 504, 500)

t.test(battery_life, mu = 500)

    One Sample t-test

data:  battery_life
t = -1.0109, df = 24, p-value = 0.3222
alternative hypothesis: true mean is not equal to 500
95 percent confidence interval:
 494.7683 501.7917
sample estimates:
mean of x 
   498.28 

A.4.3 B3: Two-Sample Tests

Question: Do two groups differ on some measure?

First: Are the samples PAIRED or INDEPENDENT?

Table A.8: Paired vs. Independent samples
Paired Example Independent Example
Same subject measured before and after treatment Different people in treatment and control groups
Twins or matched pairs Random assignment to two groups
Left eye vs. right eye of same person Men vs. women
Same item measured by two methods Two different products

PAIRED samples — Numeric variable:

  • Differences approximately normal → Paired t-test (Chapter 116)
    • Shiny App: Hypotheses / Empirical Tests
  • Non-normal differences → Wilcoxon signed-rank test (Chapter 117)
    • Shiny App: Hypotheses / Empirical Tests

PAIRED samples — Categorical variable:

  • McNemar’s test (Section 124.9.2)
    • Shiny App: Hypotheses / Empirical Tests

INDEPENDENT samples — Testing LOCATION / DISTRIBUTION:

  • Both groups approximately normal?
    • Equal variances → Two-sample t-test (Chapter 118)
      • Shiny App: Hypotheses / Empirical Tests
    • Unequal variances → Welch’s t-test (Chapter 119)
      • Shiny App: Hypotheses / Empirical Tests
  • Non-normal or ordinal → Mann-Whitney U test (Chapter 121)
    • Shiny App: Hypotheses / Empirical Tests
    • Interpretation note: this test targets distribution/location differences (median under additional assumptions), not mean differences.

If your estimand is the mean under non-normality, consider bootstrap or permutation methods for mean differences.

INDEPENDENT samples — Testing EQUIVALENCE:

  • TOST procedure (Chapter 120)
    • Shiny App: Hypotheses / Equivalence Testing

INDEPENDENT samples — Testing VARIANCES:

  • F-test or Levene’s test (Chapter 110)
    • Shiny App: Hypotheses / Empirical Tests

INDEPENDENT samples — Categorical variable:

  • 2×2 table → Chi-squared (Chapter 124) or Fisher’s exact test (Section 124.7)
  • Larger table → Chi-squared test (Chapter 124)
    • Shiny App: Hypotheses / Empirical Tests

A.4.4 B4: More Than Two Groups

INDEPENDENT groups — Numeric variable:

  • Normality and equal variances → One-way ANOVA (Chapter 126)
    • Shiny App: Hypotheses / Empirical Tests
  • Normality holds, variances unequal → Welch’s one-way ANOVA (oneway.test(..., var.equal = FALSE))
    • Shiny App: Hypotheses / Empirical Tests
  • Two factors (grouping variables) → Two-way ANOVA (Chapter 128)
    • Shiny App: Hypotheses / Empirical Tests
  • Non-normal or ordinal → Kruskal-Wallis test (Chapter 127)
    • Shiny App: Hypotheses / Multivariate (pair-wise) Testing
    • Interpretation note: nonparametric conclusions are about rank/location/distribution differences, not mean differences.

RELATED groups (repeated measures) — Numeric variable:

  • Sphericity holds → Repeated measures ANOVA (Chapter 129)
  • Sphericity violated, approximately normal → Repeated measures ANOVA with Greenhouse-Geisser or Huynh-Feldt correction (Chapter 129)
  • Sphericity violated, non-normal or ordinal → Friedman test (Chapter 130)

Categorical variable:

  • Chi-squared test for independence (Chapter 124)
    • Shiny App: Hypotheses / Empirical Tests

A.4.5 B5: Association and Correlation Tests

Both variables NUMERIC:

  • Linear relationship, bivariate normal → Pearson correlation test (Chapter 131)
    • Shiny App: Descriptive / Multivariate Descriptive Statistics
  • Partial Pearson correlation → Partial correlation test (Chapter 73)
    • Shiny App: Descriptive / Multivariate Descriptive Statistics
  • Non-linear or ordinal → Spearman or Kendall correlation (Section 72.4)
    • Shiny App: Descriptive / Multivariate Descriptive Statistics

Both variables CATEGORICAL:

  • Chi-squared test of independence (Chapter 124)
    • Shiny App: Hypotheses / Empirical Tests

One NUMERIC, one CATEGORICAL:

  • Compare means across categories → t-test (2 groups) or ANOVA (3+ groups)
    • Shiny App: Hypotheses / Empirical Tests

A.4.6 B6: Goodness-of-Fit Tests

Table A.9: Goodness-of-fit tests
Situation Test Chapter App
Against a fully specified distribution Kolmogorov-Smirnov test 📖 🔗
Testing normality (parameters estimated) Shapiro-Wilk (Section 125.10) or Lilliefors test (Section 125.9) 📖 🔗
Comparing observed to expected frequencies Chi-squared goodness-of-fit 📖 🔗
Comparing two samples’ distributions Two-sample K-S test 📖 🔗

A.5 Part C: Regression and Modeling

A.5.1 C1: The Critical Question — Inference or Prediction?

Before building any model, answer: Why am I building this model?

Table A.10: Inference vs. Prediction
Aspect Inference Goal Prediction Goal
Primary aim Understand relationships between variables Forecast outcomes for new cases
Focus Test hypotheses about coefficients Minimize prediction error
Interpretation Interpret the effect of each predictor Accuracy matters more than interpretability
Key output p-values and confidence intervals are important Out-of-sample performance is key
Complexity Simpler models may be preferred for clarity Complex models are acceptable if they predict well

Example — Inference focus:

A health researcher wants to know: “Does smoking increase the risk of heart disease, after controlling for age and exercise?”

  • Primary interest: The coefficient for smoking (odds ratio)
  • Needs: Statistical significance, confidence interval
  • Model: Logistic regression with interpretable coefficients
  • Key output: “Smokers have 2.3 times the odds of heart disease (95% CI: 1.8-2.9, p < 0.001)”

Example — Prediction focus:

A hospital wants to predict which patients are likely to be readmitted within 30 days.

  • Primary interest: Accurate identification of high-risk patients
  • Needs: Good sensitivity and specificity on new patients
  • Model: Could use logistic regression, decision trees, or any method that predicts well
  • Key output: “The model correctly identifies 75% of patients who will be readmitted (AUC = 0.82)”

A.5.2 C2: Choosing a Regression Model

CONTINUOUS outcome (numeric):

  • One predictor, linear relationship → Simple linear regression (Chapter 134)
    • Shiny App: Models / Manual Model Building (Regression tab)
  • One predictor, non-linear relationship → Polynomial or transformed regression
    • Shiny App: Models / Manual Model Building (Regression tab)
  • Multiple predictors → Multiple linear regression (Chapter 135)
    • Shiny App: Models / Manual Model Building (Regression tab)

BINARY outcome (yes/no, success/failure):

  • Logistic regression (Chapter 136)
    • Shiny App: Models / Manual Model Building (GLM tab, family = binomial)

COUNT outcome (0, 1, 2, 3, …):

  • Variance ≈ Mean → Poisson regression (Section 137.2)
    • Shiny App: Models / Manual Model Building (GLM tab, family = poisson)
  • Variance > Mean (overdispersion) → Quasipoisson (Section 137.3.1) or Negative binomial regression (Section 137.3.2)
    • Shiny App: Models / Manual Model Building (GLM tab, family = quasipoisson)

PROPORTIONS (between 0 and 1):

  • Quasibinomial regression (Section 137.4)
    • Shiny App: Models / Manual Model Building (GLM tab, family = quasibinomial)

CATEGORICAL outcome (more than two unordered categories):

  • Multinomial logistic regression (Section 138.1)

ORDINAL outcome (ordered categories):

  • Ordinal logistic regression (Section 138.2)

TIME-TO-EVENT (survival data):

  • Cox proportional hazards regression (Chapter 139)

FLEXIBLE ALTERNATIVE — Conditional Inference Trees:

The ctree() function (Chapter 140) can handle almost any outcome type — continuous, binary, categorical, count, or ordinal — making it a versatile choice when you’re unsure which regression model to use or when you want an interpretable tree-based model.

  • Shiny App: Models / Manual Model Building (Tree tab)

Important: The outcome variable must have the correct R data type. For example, if heartattack is coded as 0/1, you must convert it to a factor (factor(heartattack)) for ctree() to treat it as a classification problem rather than regression.

A.5.3 C3: Classification Methods

Table A.11: Classification methods comparison
Method When to Use Pros Cons Chapter App
Logistic regression Binary outcome, want interpretable model Coefficients are odds ratios, well-understood Assumes linear relationship in log-odds 📖 🔗
Conditional inference tree Any outcome, want visual rules Easy to interpret, handles interactions May overfit, less stable 📖 🔗
Naive Bayes Many categorical predictors Fast, handles many features Assumes independence of predictors 📖 🔗

A.5.4 C4: Evaluating Classification Models

After building a classifier, how do you evaluate it?

Table A.12: Classification evaluation metrics
Metric Use When Chapter App
Accuracy Classes are balanced 📖 🔗
Sensitivity (Recall) Missing positives is costly 📖 🔗
Specificity False positives are costly 📖 🔗
AUC Overall discrimination ability 📖 🔗
Precision False positives are very costly 📖 🔗
Binomial classification metrics Binary classification summaries 📖 🔗

Example — Choosing the right metric:

Cancer screening: Missing a cancer (false negative) is much worse than a false alarm. Prioritize sensitivity.

Spam filter: Losing an important email (false positive) is worse than seeing some spam. Prioritize specificity or precision.

Credit scoring: Need to balance defaults (false negatives) and rejected good customers (false positives). Use AUC and examine the ROC curve (Chapter 60) to choose an appropriate threshold.

A.5.5 C5: Choosing a Classification Threshold

For probabilistic classifiers (logistic regression, etc.), you must choose a threshold to convert probabilities to class predictions.

Table A.13: Threshold selection methods
Method When to Use Chapter App
Default 0.5 Classes are balanced, costs are equal 📖 🔗
Youden’s index Maximize sensitivity + specificity 📖 🔗
Cost-optimal threshold Different costs for false positives vs. false negatives 📖 🔗

A.5.6 C6: Model Diagnostics — A Critical Checkpoint

After fitting any model, you must check whether the model is appropriate. This step is often skipped but is essential for valid inference and reliable predictions.

A.5.6.1 Minimum Diagnostic Set (Do This First)

If you only do a few checks, do these:

  1. Residuals vs. fitted (or predicted vs. observed for classifiers)
  2. QQ plot of residuals (if residuals exist)
  3. Residuals vs. observation order (independence)
  4. One influence check (Cook’s distance / leverage)

A.5.6.2 Quick Workflow (Order Matters)

  1. Check fit structure: residuals vs. fitted (or predicted vs. observed)
  2. Check distribution: QQ plot / histogram of residuals
  3. Check independence: residuals vs. time/order, ACF
  4. Check leverage: influence diagnostics
  5. If problems appear: apply fixes, then re-check

A.5.6.3 Inference vs. Prediction (Different Priorities)

  • Inference models: assumptions (normality, homoscedasticity, independence) matter most for valid p-values and confidence intervals.
  • Prediction models: calibration and out-of-sample performance matter most; assumptions matter mainly if they harm predictive accuracy.

A.5.6.4 What “Good” Looks Like (At a Glance)

  • Residuals vs. fitted: random cloud, no curve or funnel
  • QQ plot: points roughly on the line
  • Residuals vs. order: no trend or cycles
  • ACF: no large spikes
  • Influence: no single point dominates

A.5.6.5 Key Insight: A1/A2 Methods Applied to Residuals

Almost all the descriptive and exploratory methods from Parts A1 and A2 can be reused as model diagnostics — simply apply them to the residuals (or other model outputs) instead of the raw data. If your model is appropriate, the residuals should look like random noise with no structure.

Table A.14: A1/A2 methods as model diagnostics
A1/A2 Method Applied to Residuals What It Checks Problem Signs Chapter
Histogram (A1) Histogram of residuals Normality assumption Skewed, bimodal, or heavy-tailed Chapter 62
Box plot (A1) Box plot of residuals Outliers, symmetry Many outliers, asymmetric distribution Chapter 69
QQ plot (A2) QQ plot of residuals Normality assumption Points deviate from diagonal line Chapter 76
Scatter plot (A1) Residuals vs. fitted values Linearity, homoscedasticity Curved pattern, funnel shape Chapter 70
Scatter plot (A1) Residuals vs. each predictor Missed nonlinearity Curved pattern for any predictor Chapter 70
Time series plot (A2) Residuals vs. observation order Independence Trends, cycles, or autocorrelation Chapter 146
ACF/PACF (A2) ACF of residuals Independence, autocorrelation Significant spikes at any lag Section 92.6
Kernel density (A2) Density of residuals Distribution shape Non-normal shape Section 80.12
Summary statistics (A1) Mean, SD of residuals Model fit Mean ≠ 0, large SD relative to response Section 65.2, Section 66.6
Correlation (A1) Residuals vs. predictors Missed relationships Significant correlation (should be ~0) Section 71.3

The principle: If residuals show any systematic pattern detectable by A1/A2 methods, your model is missing something.

Note on ctree: Conditional inference trees do not produce residuals in the traditional sense (they predict class labels or means for terminal nodes). For ctree diagnostics, focus on comparing predicted vs. actual values, examining terminal node purity, and evaluating out-of-sample performance rather than residual analysis.

A.5.6.6 Decision Table: If You See This, Do That

Table A.15: Diagnostic problem-fix guide
Problem Most Common Fix Chapter
Curved residual pattern Add nonlinear terms, transform variables Chapter 134
Funnel shape (heteroscedasticity) Transform response, use robust SE, WLS Chapter 134
Strong autocorrelation Use time series model or add lagged terms Chapter 148
Non-normal residuals Robust methods or transformation Chapter 76
High influence point Investigate, refit with/without, robust regression Chapter 134
Poor calibration (classification) Recalibrate, try different model Chapter 136

Shiny Apps for residual diagnostics:

  • Descriptive / Histogram & Frequency Table — histogram of residuals
  • Descriptive / Normal QQ Plot — normality check
  • Descriptive / Kernel Density Estimation — residual distribution shape
  • Time Series / (P)ACF — autocorrelation in residuals

If you only do two plots in Shiny: residuals vs. fitted (scatter plot) and QQ plot.

A.5.6.7 For Linear Regression Models

Table A.16: Regression diagnostics checklist
What to Check Diagnostic Problem Signs If Violated Chapter
Linearity Residuals vs. fitted plot Curved pattern Transform variables, add polynomial terms, or use nonlinear model 📖
Normality of residuals QQ plot, Shapiro-Wilk test Points deviate from line May affect inference (CI, p-values); consider robust methods or transformations 📖
Constant variance Residuals vs. fitted plot Funnel/fan shape (heteroscedasticity) Use weighted least squares or robust standard errors 📖
Independence Residuals vs. order plot, Durbin-Watson test Pattern over time Use time series methods or add lagged terms 📖
Multicollinearity Variance Inflation Factor (VIF) VIF > 5 or 10 Remove or combine correlated predictors 📖
Influential points Cook’s distance, leverage Cook’s D > 1 or high leverage Investigate outliers; consider robust regression 📖
# Quick diagnostic example for linear regression
set.seed(42)
x <- 1:50
y <- 2 + 0.5*x + rnorm(50, sd = 3)
model <- lm(y ~ x)

# Standard diagnostic plots
par(mfrow = c(2, 2))
plot(model)

par(mfrow = c(1, 1))

A.5.6.8 Common Mistakes in Diagnostics

  • Checking only R^2 and ignoring residuals
  • Declaring success because p-values are small
  • Removing outliers without documenting why
  • Ignoring multicollinearity when coefficients are unstable

A.5.6.9 For Logistic Regression and Classification Models

Table A.17: Classification diagnostics checklist
What to Check Diagnostic Problem Signs If Violated Chapter
Calibration Calibration plot (observed vs. predicted probabilities) Predicted probabilities don’t match observed rates Recalibrate or use different model 📖
Discrimination ROC curve, AUC AUC near 0.5 Model has no predictive power; add features or try different model 📖
Influential observations dfbetas, Cook’s distance analog Large influence on coefficients Investigate; consider robust methods 📖
Goodness of fit Hosmer-Lemeshow test Significant p-value Model doesn’t fit well; consider interactions or nonlinear terms 📖
Separation Coefficient estimates very large Perfect or quasi-separation Use penalized regression (Firth) or exact methods 📖

A.5.6.10 For Conditional Inference Trees

Table A.18: Tree model diagnostics
What to Check Diagnostic Problem Signs If Violated
Overfitting Compare training vs. test performance Large gap between training and test accuracy Increase mincriterion, prune tree
Tree too simple Only 1-2 terminal nodes Not capturing real patterns Decrease mincriterion, check data quality
Instability Different trees from bootstrap samples Tree structure changes dramatically Consider ensemble methods (random forest)

A.5.6.11 Decision: When to Revisit Your Model Choice

Diagnostics reveal problems?
│
├── YES: Assumptions badly violated
│   ├── Try transformation (log, sqrt)
│   ├── Try nonparametric alternative
│   ├── Try robust methods
│   └── Consider different model family
│
├── YES: Poor fit or discrimination
│   ├── Add/remove predictors
│   ├── Add interaction terms
│   ├── Try flexible model (ctree, GAM)
│   └── Collect better data
│
└── NO: Diagnostics look acceptable
    └── Proceed with inference or prediction

Key principle: A model with good diagnostics and moderate fit is more trustworthy than a model with excellent fit but violated assumptions.

Important: Passing diagnostics does not prove a causal relationship; it only supports that the model is not obviously misspecified.

A.5.7 C7: Time Series Models

Time series modeling is for sequential data where observations are ordered in time and typically dependent on previous values. This is fundamentally different from cross-sectional modeling where observations are independent.

When to use time series models instead of regression:

  • Data are collected over time at regular intervals
  • You want to forecast future values
  • Observations are autocorrelated (today’s value depends on yesterday’s)
  • There’s trend, seasonality, or other temporal patterns

A.5.7.1 Time Series EDA (before modeling)

Before building a time series model, explore the data using the methods in Table A.4 (Section A2 above).

A.5.7.2 Choosing a Time Series Model

Table A.19: Time series model selection
Situation Model When to Use Chapter App
Simple trend, no seasonality Exponential smoothing (Holt) Short-term forecasts, smooth data 📖 🔗
Trend + seasonality Holt-Winters Clear seasonal pattern 📖 🔗
Complex autocorrelation ARIMA Flexible, handles many patterns 📖 🔗
Seasonal + complex Seasonal ARIMA (SARIMA) Seasonal with ARIMA errors 📖 🔗
Known event / structural break Intervention analysis ARIMA + pulse/step dummy 📖 🔗
External input influences Y Transfer function / ARIMAX Input series available 📖 🔗

A.5.7.3 Time Series Model Identification (Box-Jenkins approach)

1. Plot the series → Check for trend, seasonality, variance changes
   │
2. Make stationary → Differencing (d), seasonal differencing (D)
   │                  Transform if variance changes (log, sqrt)
   │
3. Examine ACF/PACF → Identify p, q (and P, Q for seasonal)
   │                   ACF cuts off → MA(q)
   │                   PACF cuts off → AR(p)
   │                   Both decay → ARMA(p,q)
   │
4. Fit model → Estimate parameters
   │
5. Diagnostics → Check residuals (should be white noise)
   │              Ljung-Box test, residual ACF
   │
6. Forecast → Generate predictions with confidence intervals

A.5.7.4 Time Series Diagnostics

After fitting a time series model, check the residuals:

Table A.20: Time series diagnostics
What to Check Method Problem Signs If Violated
No autocorrelation Residual ACF, Ljung-Box test Significant spikes in ACF Increase model order or add seasonal terms
Normality QQ plot, histogram of residuals Non-normal shape May affect confidence intervals
Constant variance Plot residuals over time Funnel shape, changing spread Consider GARCH or transformation
No pattern Residuals vs. fitted Systematic pattern Model is missing something

Shiny App: Time Series / ARIMA (includes residual diagnostics)


A.6 Part D: Common Mistakes and How to Avoid Them

A.6.1 Mistake 1: Using the Wrong Test for Your Purpose

Wrong: Using a standard t-test to show two products are equivalent.

A non-significant p-value does NOT prove equivalence — it only means you failed to detect a difference. This could be because there is no difference, OR because your sample was too small.

Right: Use equivalence testing (TOST, Chapter 120) when you want to demonstrate that two things are practically the same.

A.6.2 Mistake 2: Ignoring the Purpose When Evaluating Models

Wrong: Choosing a regression model based on R² when your goal is prediction.

R² measures fit to the training data. A model with high R² might overfit and predict poorly on new data.

Right: For prediction, use cross-validation or a held-out test set to evaluate performance on new data.

A.6.3 Mistake 3: Confusing Description with Inference

Wrong: Calculating a confidence interval when you have the entire population.

If you surveyed all 50 employees in a company, the mean salary is the population mean — there’s nothing to infer.

Right: Confidence intervals and p-values are for inference from samples to populations. If you have the whole population, just report the descriptive statistics.

A.6.4 Mistake 4: Choosing α Without Considering Consequences

Wrong: Always using α = 0.05 because “that’s what everyone does.”

The 5% significance level is arbitrary. In some contexts, 1% is more appropriate; in others, 10% makes sense.

Right: Consider the costs of Type I and Type II errors. Set α to balance these costs appropriately.

A.6.5 Mistake 5: Testing Many Hypotheses Without Correction

Wrong: Testing 20 different relationships and reporting only the significant ones.

With 20 tests at α = 0.05, you expect about 1 false positive by chance alone.

Right: Use multiple testing corrections (Bonferroni, FDR) or pre-register your primary hypothesis.


A.7 Quick Reference Tables

A.7.1 By Research Question

Table A.21: Methods by research question
Question Method Requirements App
Is the mean different from a specific value? One-sample t-test Numeric, approximately normal 🔗
Are two means different? Two-sample t-test or Welch’s Numeric, independent groups 🔗
Are paired measurements different? Paired t-test Numeric, matched pairs 🔗
Are two groups equivalent? TOST Numeric, specify equivalence margin 🔗
Are three or more means different? ANOVA Numeric, independent groups 🔗
Are two categorical variables associated? Chi-squared test Categorical, sufficient counts 🔗
Does data follow a distribution? K-S test Continuous data 🔗
Is there a linear relationship? Correlation test Two numeric variables 🔗
Does X predict Y? Regression Depends on Y type 🔗

A.7.2 By Variable Type

Table A.22: Methods by variable types
Outcome Predictor(s) Method App
Numeric None (one group) One-sample t-test 🔗
Numeric Categorical (2 groups) Two-sample t-test 🔗
Numeric Categorical (3+ groups) ANOVA 🔗
Numeric Numeric (1 predictor) Simple linear regression 🔗
Numeric Mixed (multiple) Multiple linear regression 🔗
Binary Mixed Logistic regression (GLM, binomial) 🔗
Categorical Categorical Chi-squared test 🔗
Count Mixed Poisson regression (GLM, poisson) 🔗
Time series Time ARIMA, exponential smoothing 🔗

For an interactive version of this guide, try the Method Selection Tool — select your constraints (purpose, data type, assumptions) and it will show matching methods.


Use this guide in conjunction with the introductory chapter (Chapter 4) to ensure your statistical analysis matches your research goals.

Appendices
B  Presentations and Teaching Materials

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences