82 Conditional EDA: Panel Diagnostics
82.1 Definition
Conditional exploratory data analysis (Conditional EDA) studies how the distribution of one quantitative outcome changes across subgroups. The subgroups are defined either by:
- one categorical variable (1D panels), or
- two categorical variables (2D panels).
Each panel displays diagnostics for the same outcome variable, so shape differences can be compared directly across groups.
82.2 Purpose
Conditional EDA is useful when a single overall summary hides meaningful subgroup differences. In one workflow, it supports:
- distributional comparison across groups,
- detection of asymmetric or heavy-tailed behavior in specific groups,
- identification of groups with unstable variability,
- better model preparation before formal inference or prediction.
82.3 Panel Structure
82.3.1 1D Panel (one factor)
A 1D panel plot conditions on one factor (for example, sexLabel). Each panel corresponds to one level of that factor.
82.3.2 2D Panel (two factors)
A 2D panel plot conditions on two factors (for example, sexLabel and chestpainLabel). Each panel corresponds to one level combination.
To keep interpretation manageable, the module limits factor cardinality and the total number of panels.
82.4 Interpreting the Diagnostics
For each panel, the module supports one diagnostic view at a time:
- Quantiles (Harrell-Davis + CI): compares tails and central quantiles with uncertainty bands.
- Central Tendency: compares mean, median, and robust center (trimmed/winsorized mean).
- Variability: compares variance, standard deviation, IQR, MAD, and coefficient of variation.
- Cullen & Frey Plot: compares skewness and kurtosis patterns across groups.
- QQ Plot: checks panel-level alignment with a Normal reference.
- PPCC Plot: checks transformation-dependent Normal fit within panels.
- Density Plot: compares full shape differences without reducing to one statistic.
A useful reading order is: quantiles -> center -> variability -> shape diagnostics.
82.5 R Module
82.5.1 Public website
Conditional EDA is available on the public website:
82.5.2 RFC
The Conditional EDA module is available in RFC under the menu “Descriptive / Conditional EDA”.
82.6 Example 1: Cholesterol by Sex
The analysis below studies cholesterolNum with 1D panels by sexLabel. In this dataset, female and male groups show visibly different shape behavior in Cullen-Frey space.
The 1D panel interpretation is:
- Female panel: the point lies in a right-skew / high-kurtosis area, consistent with a heavier upper tail (high-cholesterol outliers are more likely).
- Male panel: the point is closer to low-skew, moderate-kurtosis behavior, indicating a more compact and more symmetric distribution.
- Practical conclusion: a single pooled mean would hide subgroup shape differences. For female patients, robust summaries and quantile-based reporting are more defensible than relying only on the arithmetic mean.
82.7 Example 2: Cholesterol by Sex and Chest Pain
A 2D panel setup (sex x chest pain category) can reveal interaction-like subgroup patterns that are not visible in a single-factor view.
The 2D panel interpretation is:
- Shape differences are not uniform across chest-pain classes: some sex-by-chestpain cells remain near symmetric, while others move to stronger skew/kurtosis regions.
- The strongest non-normal pattern appears in female atypical-angina profiles, where right-tail behavior is much stronger than in most male cells.
- The spread of panel points confirms heterogeneity of risk structure across subgroups; this supports subgroup-specific summaries instead of one global cholesterol model.
- Small cells must be interpreted cautiously: extreme skew/kurtosis in very small panels can be partly sampling noise, so always cross-check panel size before drawing substantive conclusions.
82.8 Pros & Cons
82.8.1 Pros
- highlights subgroup-specific distribution behavior,
- combines robust center, variability, and shape diagnostics in one framework,
- supports fast screening before hypothesis testing and modeling.
82.8.2 Cons
- still descriptive: panel differences are not causal claims,
- many panels can reduce readability,
- small panel sample sizes can produce unstable diagnostics.
82.9 Task
Use the heart data and compare cholesterolNum in:
- 1D panels by
sexLabel, and - 2D panels by
sexLabel x chestpainLabel.
For each setup, report which subgroup(s) look least compatible with a Normal shape and justify your conclusion using at least two diagnostic views.