111 Hypothesis Testing for Research Purposes
The following sections describe the use of various types of Hypothesis Testing from a practical point of view. For each test, an illustration is provided first at the statistical level, followed by contextual interpretation in the case-study chapters.
When reporting hypothesis tests in university-level work, three components should be presented together:
- Statistical significance: report the p-value and state whether H\(_0\) is rejected at the pre-specified \(\alpha\).
- Effect size: report magnitude (not only significance), e.g. Cohen’s \(d\) (Cohen 2013), rank-biserial effect, \(\eta^2\), or Cramer’s \(V\) (Cramér 1946) depending on the test.
- Uncertainty and precision: report confidence/credible intervals and discuss practical relevance.
Power also matters. A non-significant result can mean either “no meaningful effect” or “insufficient sample size.” Therefore, interpretation should always acknowledge sample size, variability, and design quality.
In short: do not rely on p-values alone. Use p-values, effect sizes, and intervals jointly.
This chapter explains what should be reported together (p-value, effect size, interval, and power/sample-size context). The next step is to choose the threshold (e.g. \(\alpha\) or confidence level) by the role of the analysis:
- confirmatory,
- diagnostic,
- exploratory/selection,
- equivalence.
The handbook-wide framework for making and reporting that threshold choice is developed in Chapter 112.
111.0.1 Typical Effect Sizes by Test Family
- One-sample / paired / unpaired mean tests: Cohen’s \(d\) (Cohen 2013) (or Hedges’ \(g\) (Hedges 1981))
- ANOVA: \(\eta^2\) or partial \(\eta^2\)
- Chi-squared tests: Cramer’s \(V\) (Cramér 1946) (or \(\phi\) for \(2\times2\) tables)
- Rank-based tests: rank-biserial effect size or Cliff’s delta (Cliff 1993)
- Correlation tests: \(r\), \(\rho\), or \(\tau\) (the correlation itself is the effect size)
111.0.2 Power and Sample Size
Before data collection, power analysis should align three inputs: target effect size, acceptable type I error \(\alpha\), and desired power (\(1-\beta\)).
After analysis, low-power studies should avoid strong “no effect” conclusions when p-values are non-significant.