Descriptive Statistics & Exploratory Data Analysis
This part is the handbook’s diagnostic layer. Its purpose is to help you understand what your data look like before running formal tests or predictive models.
You can work through this entire part with Shiny apps only. Using R and RStudio is optional and is mainly useful if you want fully reproducible scripts or custom extensions.
How This Part Is Structured
The chapters are organized as a progression from basic data understanding to deeper pattern discovery:
| Block | Purpose | Main chapters |
|---|---|---|
| Data foundations | Identify variable types and data layout | Types of Data, Datasheets |
| Categorical summaries | Summarize counts and evaluate binary classification outputs | Frequency Plot, Frequency Table, Contingency Table, Binomial Classification Metrics, Confusion Matrix, ROC Analysis |
| Univariate numeric exploration | Describe center, spread, shape, and outliers | Stem-and-Leaf Plot, Histogram, Quantiles, Central Tendency, Variability, Skewness & Kurtosis, Boxplot |
| Bivariate and multivariate structure | Study relationships between variables | Scatterplot, Pearson/Rank/Partial Correlation, Simple Linear Regression, Moments, KDE, Bivariate Density |
| Distribution diagnostics and transformations | Check distributional assumptions and improve fit | QQ Plot, Normal Probability Plot, PPCC Plot, Box-Cox Normality Plot |
| Reliability and ranking tools | Evaluate score consistency and ordered comparisons | Survey Scores Rank Order Comparison, Cronbach Alpha |
| Time-ordered exploratory diagnostics | Detect trend, seasonality, dependence, and spectral structure | Equi-distant Time Series, Time Series Plot, Mean Plot, Blocked Bootstrap Plot, Standard Deviation-Mean Plot, Variance Reduction Matrix, (Partial) Autocorrelation Function, Periodogram & Cumulative Periodogram |
| Integrated practice | Apply methods end-to-end | Problems |
How To Use It
If your data are categorical, start with the categorical-summaries block.
If your data are numeric and cross-sectional, start with histogram/quantiles/boxplot and then move to relationship chapters.
If your data are time-indexed, start directly at the time-ordered diagnostics block.
Use this section to produce a clear exploratory summary: what patterns are present, which assumptions look plausible, and what to analyze next. For navigation, combine this with the purpose framing in 4 The Big Picture: Why We Analyze Data (Chapter 4) and the method map in Appendix A — Method Selection Guide (Appendix A).