• Descriptive
    • Moments
    • Concentration
    • Central Tendency
    • Variability
    • Stem-and-Leaf Plot
    • Histogram & Frequency Table
    • Data Quality Forensics
    • Conditional EDA
    • Quantiles
    • Kernel Density Estimation
    • Normal QQ Plot
    • Bootstrap Plot

    • Multivariate Descriptive Statistics
  • Distributions
    • Binomial Probabilities
    • Geometric Probabilities
    • Negative Binomial Probabilities
    • Hypergeometric Probabilities
    • Multinomial Probabilities
    • Dirichlet
    • Poisson Probabilities

    • Exponential
    • Gamma
    • Erlang
    • Weibull
    • Rayleigh
    • Maxwell-Boltzmann
    • Lognormal
    • Pareto
    • Inverse Gamma
    • Inverse Chi-Square

    • Beta
    • Power
    • Beta Prime (Inv. Beta)
    • Triangular

    • Normal (area)
    • Logistic
    • Laplace
    • Cauchy (standard)
    • Cauchy (location-scale)
    • Gumbel
    • Fréchet
    • Generalized Extreme Value

    • Normal RNG
    • ML Fitting
    • Tukey Lambda PPCC
    • Box-Cox Normality Plot
    • Noncentral t
    • Noncentral F
    • Sample Correlation r

    • Empirical Tests
  • Hypotheses
    • Theoretical Aspects of Hypothesis Testing
    • Bayesian Inference
    • Minimum Sample Size

    • Empirical Tests
    • Multivariate (pair-wise) Testing
  • Models
    • Manual Model Building
    • Guided Model Building
  • Time Series
    • Time Series Plot
    • Decomposition
    • Exponential Smoothing

    • Blocked Bootstrap Plot
    • Mean Plot
    • (P)ACF
    • VRM
    • Standard Deviation-Mean Plot
    • Spectral Analysis
    • ARIMA

    • Cross Correlation Function
    • Granger Causality
  1. Descriptive Statistics & Exploratory Data Analysis
  2. 64  Quantiles
  • Preface
  • Getting Started
    • 1  Introduction
    • 2  Why Do We Need Innovative Technology?
    • 3  Basic Definitions
    • 4  The Big Picture: Why We Analyze Data
  • Introduction to Probability
    • 5  Definitions of Probability
    • 6  Jeffreys’ axiom system
    • 7  Bayes’ Theorem
    • 8  Sensitivity and Specificity
    • 9  Naive Bayes Classifier
    • 10  Law of Large Numbers

    • 11  Problems
  • Probability Distributions
    • 12  Bernoulli Distribution
    • 13  Binomial Distribution
    • 14  Geometric Distribution
    • 15  Negative Binomial Distribution
    • 16  Hypergeometric Distribution
    • 17  Multinomial Distribution
    • 18  Poisson Distribution

    • 19  Uniform Distribution (Rectangular Distribution)
    • 20  Normal Distribution (Gaussian Distribution)
    • 21  Gaussian Naive Bayes Classifier
    • 22  Chi Distribution
    • 23  Chi-squared Distribution (1 parameter)
    • 24  Chi-squared Distribution (2 parameters)
    • 25  Student t-Distribution
    • 26  Fisher F-Distribution
    • 27  Exponential Distribution
    • 28  Lognormal Distribution
    • 29  Gamma Distribution
    • 30  Beta Distribution
    • 31  Weibull Distribution
    • 32  Pareto Distribution
    • 33  Inverse Gamma Distribution
    • 34  Rayleigh Distribution
    • 35  Erlang Distribution
    • 36  Logistic Distribution
    • 37  Laplace Distribution
    • 38  Gumbel Distribution
    • 39  Cauchy Distribution
    • 40  Triangular Distribution
    • 41  Power Distribution
    • 42  Beta Prime Distribution
    • 43  Sample Correlation Distribution
    • 44  Dirichlet Distribution
    • 45  Generalized Extreme Value (GEV) Distribution
    • 46  Frechet Distribution
    • 47  Noncentral t Distribution
    • 48  Noncentral F Distribution
    • 49  Inverse Chi-Squared Distribution
    • 50  Maxwell-Boltzmann Distribution
    • 51  Distribution Relationship Map

    • 52  Problems
  • Descriptive Statistics & Exploratory Data Analysis
    • 53  Types of Data
    • 54  Datasheets

    • 55  Frequency Plot (Bar Plot)
    • 56  Frequency Table
    • 57  Contingency Table
    • 58  Binomial Classification Metrics
    • 59  Confusion Matrix
    • 60  ROC Analysis

    • 61  Stem-and-Leaf Plot
    • 62  Histogram
    • 63  Data Quality Forensics
    • 64  Quantiles
    • 65  Central Tendency
    • 66  Variability
    • 67  Skewness & Kurtosis
    • 68  Concentration
    • 69  Notched Boxplot
    • 70  Scatterplot
    • 71  Pearson Correlation
    • 72  Rank Correlation
    • 73  Partial Pearson Correlation
    • 74  Simple Linear Regression
    • 75  Moments
    • 76  Quantile-Quantile Plot (QQ Plot)
    • 77  Normal Probability Plot
    • 78  Probability Plot Correlation Coefficient Plot (PPCC Plot)
    • 79  Box-Cox Normality Plot
    • 80  Kernel Density Estimation
    • 81  Bivariate Kernel Density Plot
    • 82  Conditional EDA: Panel Diagnostics
    • 83  Bootstrap Plot (Central Tendency)
    • 84  Survey Scores Rank Order Comparison
    • 85  Cronbach Alpha

    • 86  Equi-distant Time Series
    • 87  Time Series Plot (Run Sequence Plot)
    • 88  Mean Plot
    • 89  Blocked Bootstrap Plot (Central Tendency)
    • 90  Standard Deviation-Mean Plot
    • 91  Variance Reduction Matrix
    • 92  (Partial) Autocorrelation Function
    • 93  Periodogram & Cumulative Periodogram

    • 94  Problems
  • Hypothesis Testing
    • 95  Normal Distributions revisited
    • 96  The Population
    • 97  The Sample
    • 98  The One-Sided Hypothesis Test
    • 99  The Two-Sided Hypothesis Test
    • 100  When to use a one-sided or two-sided test?
    • 101  What if \(\sigma\) is unknown?
    • 102  The Central Limit Theorem (revisited)
    • 103  Statistical Test of the Population Mean with known Variance
    • 104  Statistical Test of the Population Mean with unknown Variance
    • 105  Statistical Test of the Variance
    • 106  Statistical Test of the Population Proportion
    • 107  Statistical Test of the Standard Deviation \(\sigma\)
    • 108  Statistical Test of the difference between Means -- Independent/Unpaired Samples
    • 109  Statistical Test of the difference between Means -- Dependent/Paired Samples
    • 110  Statistical Test of the difference between Variances -- Independent/Unpaired Samples

    • 111  Hypothesis Testing for Research Purposes
    • 112  Decision Thresholds, Alpha, and Confidence Levels
    • 113  Bayesian Inference for Decision-Making
    • 114  One Sample t-Test
    • 115  Skewness & Kurtosis Tests
    • 116  Paired Two Sample t-Test
    • 117  Wilcoxon Signed-Rank Test
    • 118  Unpaired Two Sample t-Test
    • 119  Unpaired Two Sample Welch Test
    • 120  Two One-Sided Tests (TOST) for Equivalence
    • 121  Mann-Whitney U test (Wilcoxon Rank-Sum Test)
    • 122  Bayesian Two Sample Test
    • 123  Median Test based on Notched Boxplots
    • 124  Chi-Squared Tests for Count Data
    • 125  Kolmogorov-Smirnov Test
    • 126  One Way Analysis of Variance (1-way ANOVA)
    • 127  Kruskal-Wallis Test
    • 128  Two Way Analysis of Variance (2-way ANOVA)
    • 129  Repeated Measures ANOVA
    • 130  Friedman Test
    • 131  Testing Correlations
    • 132  A Note on Causality

    • 133  Problems
  • Regression Models
    • 134  Simple Linear Regression Model (SLRM)
    • 135  Multiple Linear Regression Model (MLRM)
    • 136  Logistic Regression
    • 137  Generalized Linear Models
    • 138  Multinomial and Ordinal Logistic Regression
    • 139  Cox Proportional Hazards Regression
    • 140  Conditional Inference Trees
    • 141  Leaf Diagnostics for Conditional Inference Trees
    • 142  Conditional Random Forests
    • 143  Hypothesis Testing with Linear Regression Models (from a Practical Point of View)

    • 144  Problems
  • Introduction to Time Series Analysis
    • 145  Case: the Market of Health and Personal Care Products
    • 146  Decomposition of Time Series
    • 147  Ad hoc Forecasting of Time Series
  • Box-Jenkins Analysis
    • 148  Introduction to Box-Jenkins Analysis
    • 149  Theoretical Concepts
    • 150  Stationarity
    • 151  Identifying ARMA parameters
    • 152  Estimating ARMA Parameters and Residual Diagnostics
    • 153  Forecasting with ARIMA models
    • 154  Intervention Analysis
    • 155  Cross-Correlation Function
    • 156  Transfer Function Noise Models
    • 157  General-to-Specific Modeling
  • Model Building Strategies
    • 158  Introduction to Model Building Strategies
    • 159  Manual Model Building
    • 160  Model Validation
    • 161  Regularization Methods
    • 162  Hyperparameter Optimization Strategies
    • 163  Guided Model Building in Practice
    • 164  Diagnostics, Revision, and Guided Forecasting
    • 165  Leakage, Target Encoding, and Robust Regression
  • References
  • Appendices
    • Appendices
    • A  Method Selection Guide
    • B  Presentations and Teaching Materials
    • C  R Language Concepts for Statistical Computing
    • D  Matrix Algebra
    • E  Standard Normal Table (Gaussian Table)
    • F  Critical values of Student’s \(t\) distribution with \(\nu\) degrees of freedom
    • G  Upper-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom
    • H  Lower-tail critical values of the \(\chi^2\)-distribution with \(\nu\) degrees of freedom

Table of contents

  • 64.1 Practical Choice Guide
  • 64.2 Quantiles based on Weighted Averages at \(X_{nq}\)
    • 64.2.1 Definition
    • 64.2.2 Example
  • 64.3 Quantiles based on Weighted Averages at \(X_{(n+1)q}\)
    • 64.3.1 Definition
    • 64.3.2 Example
  • 64.4 Quantiles based on the Empirical Distribution Function
    • 64.4.1 Definition
    • 64.4.2 Example
  • 64.5 Quantiles based on Averaging
    • 64.5.1 Definition
    • 64.5.2 Example
  • 64.6 Quantiles based on Interpolation
    • 64.6.1 Definition
    • 64.6.2 Example
  • 64.7 Quantiles based on Closest Observation
    • 64.7.1 Definition
    • 64.7.2 Example
  • 64.8 Quantiles based on Statistics Graphics Toolkit (True Basic)
    • 64.8.1 Definition
    • 64.8.2 Example
  • 64.9 Quantiles based on Excel (old versions)
    • 64.9.1 Definition
    • 64.9.2 Example
  • 64.10 Harrell-Davis Quantiles
    • 64.10.1 Definition
  • 64.11 R Module
    • 64.11.1 Public website
    • 64.11.2 RFC
  • 64.12 Purpose
  • 64.13 Pros & Cons
    • 64.13.1 Pros
    • 64.13.2 Cons
  • 64.14 Example
  • 64.15 Task
  1. Descriptive Statistics & Exploratory Data Analysis
  2. 64  Quantiles

64  Quantiles

This section defines quantiles and how they are computed. The q-th h-quantile of a random variable \(x\) is the smallest value \(v\) for which \(P(x \leq v) \geq \frac{q}{h}\), where \(h\) represents the number of bins used to partition the data (for example: tertiles with \(h=3\), quartiles with \(h=4\), deciles with \(h=10\), percentiles with \(h=100\), and permilles with \(h=1000\)).

64.1 Practical Choice Guide

Definition family Typical use
Weighted average at \(X_{nq}\) or \(X_{(n+1)q}\) Smooth interpolation for continuous data
Empirical distribution / closest observation Discrete data where observed values should be preserved
Harrell-Davis (Harrell and Davis 1982) Small to moderate samples when robust, efficient quantile estimation is preferred
Legacy Excel rule Backward compatibility with older spreadsheets

64.2 Quantiles based on Weighted Averages at \(X_{nq}\)

64.2.1 Definition

When the observations are ordered in ascending order, the quantile at q is defined as:

\[ Quantile(q) = (1 - f)x_i + f x_{i+1} \]

Note that \(nq = i + f\) where i is the integer part of nq and f is the fractional part.

64.2.2 Example

Assume the following original dataset: 21, 24, 50, 10, 23, 27. The number of observations \(n = 6\). Question: what is the percentile value for q = 0.75?

First we obtain the ordered dataset: 10, 21, 23, 24, 27, 50. Next, we compute \(nq = 6 * 0.75 = 4.5\) and obtain the integer part (= 4) and the fractional part (= 0.5). Now we are able to apply the formula: \(Percentile(0.75) = (1-0.5)x_4 + 0.5x_5 = 0.5*24 + 0.5*27 = 25.5\). This is the 75th-percentile estimate under this definition.

64.3 Quantiles based on Weighted Averages at \(X_{(n+1)q}\)

64.3.1 Definition

When the observations are ordered in ascending order, the quantile at q is defined as:

\[ Quantile(q) = (1 - f)x_i + f x_{i+1} \]

Note that \((n+1)q = i + f\) where i is the integer part of (n+1)q and f is the fractional part.

64.3.2 Example

Assume the following original dataset: 21, 24, 50, 10, 23, 27. The number of observations \(n = 6\). Question: what is the percentile value for q = 0.75?

First we obtain the ordered dataset: 10, 21, 23, 24, 27, 50. Next, we compute \((n+1)q = (6+1) * 0.75 = 5.25\) and obtain the integer part (= 5) and the fractional part (= 0.25). Now we are able to apply the formula: \(Percentile(0.75) = (1-0.25)x_5 + 0.25x_6 = 0.75*27 + 0.25*50 = 32.75\). This is the 75th-percentile estimate under this definition.

64.4 Quantiles based on the Empirical Distribution Function

64.4.1 Definition

When the observations are ordered in ascending order, the quantile at q is defined as:

\[ f=0 \Rightarrow Quantile(q) = x_i \]

\[ f>0 \Rightarrow Quantile(q) = x_{i+1} \]

Note that \(nq = i + f\) where i is the integer part of nq and f is the fractional part.

64.4.2 Example

Assume the following original dataset: 21, 24, 50, 10, 23, 27. The number of observations \(n = 6\). Question: what is the percentile value for q = 0.75?

First we obtain the ordered dataset: 10, 21, 23, 24, 27, 50. Next, we compute \(nq = 6 * 0.75 = 4.5\) and obtain the integer part (= 4) and the fractional part (= 0.5). Now we are able to apply the formula (for \(f > 0\)): \(Percentile(0.75) = x_5 = 27\). This is the 75th-percentile estimate under this definition.

64.5 Quantiles based on Averaging

64.5.1 Definition

When the observations are ordered in ascending order, the quantile at q is defined as:

\[ f=0 \Rightarrow Quantile(q) = \frac{1}{2} \left( x_i + x_{i+1} \right) \]

\[ f>0 \Rightarrow Quantile(q) = x_{i+1} \]

Note that \(nq = i + f\) where i is the integer part of nq and f is the fractional part.

64.5.2 Example

Assume the following original dataset: 21, 24, 50, 10, 23, 27. The number of observations \(n = 6\). Question: what is the percentile value for q = 0.75?

First we obtain the ordered dataset: 10, 21, 23, 24, 27, 50. Next, we compute \(nq = 6 * 0.75 = 4.5\) and obtain the integer part (= 4) and the fractional part (= 0.5). Now we are able to apply the formula (for \(f > 0\)): \(Percentile(0.75) = x_5 = 27\). This is the 75th-percentile estimate under this definition.

64.6 Quantiles based on Interpolation

64.6.1 Definition

When the observations are ordered in ascending order, the quantile at q is defined as:

\[ f=0 \Rightarrow Quantile(q) = x_{i+1} \]

\[ f>0 \Rightarrow Quantile(q) = x_{i+1} + f \left( x_{i+2} - x_{i+1} \right) \]

Note that \((n-1)q = i + f\) where i is the integer part of (n-1)q and f is the fractional part.

64.6.2 Example

Assume the following original dataset: 21, 24, 50, 10, 23, 27. The number of observations \(n = 6\). Question: what is the percentile value for q = 0.75?

First we obtain the ordered dataset: 10, 21, 23, 24, 27, 50. Next, we compute \((n-1)q = (6 - 1) * 0.75 = 3.75\) and obtain the integer part (= 3) and the fractional part (= 0.75). Now we are able to apply the formula (for \(f > 0\)): \(Percentile(0.75) = x_4 + 0.75 (x_5 - x_4) = 24 + 0.75(27 - 24) = 24 + 2.25 = 26.25\). This is the 75th-percentile estimate under this definition.

64.7 Quantiles based on Closest Observation

64.7.1 Definition

When the observations are ordered in ascending order, the quantile at q is defined as:

\[ Quantile(q) = x_i \]

Note that \(nq + \frac{1}{2} = i + f\) where i is the integer part of \(nq + \frac{1}{2}\) and f is the fractional part.

64.7.2 Example

Assume the following original dataset: 21, 24, 50, 10, 23, 27. The number of observations \(n = 6\). Question: what is the percentile value for q = 0.75?

First we obtain the ordered dataset: 10, 21, 23, 24, 27, 50. Next, we compute \(nq + \frac{1}{2} = 6 * 0.75 + 0.5 = 5.0\) and obtain the integer part (= 5) and the fractional part (= 0). Now we are able to apply the formula: \(Percentile(0.75) = x_5 = 27\). This is the 75th-percentile estimate under this definition.

64.8 Quantiles based on Statistics Graphics Toolkit (True Basic)

64.8.1 Definition

When the observations are ordered in ascending order, the quantile at q is defined as:

\[ f=0 \Rightarrow Quantile(q) = x_i \]

\[ f>0 \Rightarrow Quantile(q) = (1-f) x_i + f x_{i+1} \]

Note that \((n+1)q = i + f\) where i is the integer part of (n+1)q and f is the fractional part.

This definition is algebraically identical to the previous “Weighted Averages at \(X_{(n+1)q}\)” definition; it is included here as a historical alias used by the Statistics Graphics Toolkit (True Basic).

64.8.2 Example

Assume the following original dataset: 21, 24, 50, 10, 23, 27. The number of observations \(n = 6\). Question: what is the percentile value for q = 0.75?

First we obtain the ordered dataset: 10, 21, 23, 24, 27, 50. Next, we compute \((n+1)q = (6 + 1) * 0.75 = 5.25\) and obtain the integer part (= 5) and the fractional part (= 0.25). Now we are able to apply the formula (for \(f > 0\)): \(Percentile(0.75) = 0.75 * x_5 + 0.25 * x_6 = 0.75 * 27 + 0.25 * 50 = 20.25 + 12.5 = 32.75\). This is the 75th-percentile estimate under this definition.

64.9 Quantiles based on Excel (old versions)

64.9.1 Definition

When the observations are ordered in ascending order, the quantile at q is defined as:

\[ f=0 \Rightarrow Quantile(q) = x_i \]

\[ f=0.5 \Rightarrow Quantile(q) = \frac{1}{2} \left( x_i + x_{i+1} \right) \]

\[ f<0.5 \Rightarrow Quantile(q) = x_i \]

\[ f>0.5 \Rightarrow Quantile(q) = x_{i+1} \]

Note that \((n+1)q = i + f\) where i is the integer part of (n+1)q and f is the fractional part.

64.9.2 Example

Assume the following original dataset: 21, 24, 50, 10, 23, 27. The number of observations \(n = 6\). Question: what is the percentile value for q = 0.75?

First we obtain the ordered dataset: 10, 21, 23, 24, 27, 50. Next, we compute \((n+1)q = (6 + 1) * 0.75 = 5.25\) and obtain the integer part (= 5) and the fractional part (= 0.25). Now we are able to apply the formula (for \(f < 0.5\)): \(Percentile(0.75) = x_5 = 27\). This is the 75th-percentile estimate under this definition.

64.10 Harrell-Davis Quantiles

64.10.1 Definition

The definition of the Harrell-Davis Quantiles is beyond the scope of this book. It is sufficient to know that the Harrell-Davis approach to estimate Quantiles is much more efficient than other techniques, especially when the sample size is relatively small.

64.11 R Module

64.11.1 Public website

The Harrell-Davis Quantiles can be found on the public website:

  • https://compute.wessa.net/rwasp_harrell_davis.wasp

The public website also features a Percentiles module which uses other types of (more traditional) Quantiles:

  • https://compute.wessa.net/rwasp_percentiles.wasp

64.11.2 RFC

The Percentiles and Harrell-Davis Quantiles estimator are available in RFC under the menu “Descriptive / Quantiles”.

If you prefer to compute the traditional Quantiles on your local computer, the following code snippet can be used in the R console:

x <- rnorm(200, 4, 10)
x <-sort(x)
i <- 0; f <- 0
q1 <- function(data,n,p,i,f) {
  np <- n*p;
  i <- floor(np)
  f <- np - i
  qvalue <- (1-f)*data[i] + f*data[i+1]
}
q2 <- function(data,n,p,i,f) {
  np <- (n+1)*p
  i <- floor(np)
  f <- np - i
  qvalue <- (1-f)*data[i] + f*data[i+1]
}
q3 <- function(data,n,p,i,f) {
  np <- n*p
  i <- floor(np)
  f <- np - i
  if (f==0) {
    qvalue <- data[i]
  } else {
    qvalue <- data[i+1]
  }
}
q4 <- function(data,n,p,i,f) {
  np <- n*p
  i <- floor(np)
  f <- np - i
  if (f==0) {
    qvalue <- (data[i]+data[i+1])/2
  } else {
    qvalue <- data[i+1]
  }
}
q5 <- function(data,n,p,i,f) {
  np <- (n-1)*p
  i <- floor(np)
  f <- np - i
  if (f==0) {
    qvalue <- data[i+1]
  } else {
    qvalue <- data[i+1] + f*(data[i+2]-data[i+1])
  }
}
q6 <- function(data,n,p,i,f) {
  np <- n*p+0.5
  i <- floor(np)
  f <- np - i
  qvalue <- data[i]
}
q7 <- function(data,n,p,i,f) {
  # Definition 7 is algebraically identical to q2 in this implementation.
  q2(data,n,p,i,f)
}
q8 <- function(data,n,p,i,f) {
  np <- (n+1)*p
  i <- floor(np)
  f <- np - i
  if (f==0) {
    qvalue <- data[i]
  } else {
    if (f == 0.5) {
      qvalue <- (data[i]+data[i+1])/2
    } else {
      if (f < 0.5) {
        qvalue <- data[i]
      } else {
        qvalue <- data[i+1]
      }
    }
  }
}
lx <- length(x)
qval <- array(NA,dim=c(99,8))
mystep <- 25
mystart <- 25
if (lx>10){
  mystep=10
  mystart=10
}
if (lx>20){
  mystep=5
  mystart=5
}
if (lx>50){
  mystep=2
  mystart=2
}
if (lx>=100){
  mystep=1
  mystart=1
}
for (perc in seq(mystart,99,mystep)) {
  qval[perc,1] <- q1(x,lx,perc/100,i,f)
  qval[perc,2] <- q2(x,lx,perc/100,i,f)
  qval[perc,3] <- q3(x,lx,perc/100,i,f)
  qval[perc,4] <- q4(x,lx,perc/100,i,f)
  qval[perc,5] <- q5(x,lx,perc/100,i,f)
  qval[perc,6] <- q6(x,lx,perc/100,i,f)
  qval[perc,7] <- q7(x,lx,perc/100,i,f)
  qval[perc,8] <- q8(x,lx,perc/100,i,f)
}

mydf = data.frame(`WA Xnp` = qval[,1], 
                  `WA X(n+1)p` = qval[,2],
                  `EDF` = qval[,3],
                  `EDF Av` = qval[,4],
                  `EDF Int` = qval[,5],
                  `Cl Obs` = qval[,6],
                  `TBasic` = qval[,7],
                  `Excel` = qval[,8]
                  )
rownames(mydf) = seq(mystart,99,mystep) / 100
mydf

# we only plot the values for the fifth definition
plot(rownames(mydf), mydf[,5], xlab = "quantile", ylab = "value", main = "Quantiles (definition 5)")

            WA.Xnp    WA.X.n.1.p           EDF        EDF.Av       EDF.Int        Cl.Obs        TBasic         Excel
0.01 -16.715924561 -16.693461396 -16.715924561 -15.592766323 -14.492071250 -16.715924561 -16.693461396 -16.715924561
0.02 -13.921877924 -13.905445098 -13.921877924 -13.511057275 -13.116669452 -13.921877924 -13.905445098 -13.921877924
0.03 -12.990746726 -12.976840702 -12.990746726 -12.758979659 -12.541118616 -12.990746726 -12.976840702 -12.990746726
0.04 -12.425890517 -12.423175081 -12.425890517 -12.391947571 -12.360720062 -12.425890517 -12.423175081 -12.425890517
0.05 -11.543116818 -11.523204555 -11.543116818 -11.343994194 -11.164783833 -11.543116818 -11.523204555 -11.543116818
0.06 -10.797452777 -10.794539214 -10.797452777 -10.773173085 -10.751806957 -10.797452777 -10.794539214 -10.797452777
0.07 -10.720463487 -10.720066749 -10.714795801 -10.714795801 -10.715192539 -10.720463487 -10.720066749 -10.720463487
0.08 -10.354458781 -10.349953778 -10.354458781 -10.326302512 -10.302651245 -10.354458781 -10.349953778 -10.354458781
0.09  -9.772542050  -9.767058218  -9.772542050  -9.742076314  -9.717094410  -9.772542050  -9.767058218  -9.772542050
0.1   -9.369567606  -9.365150847  -9.369567606  -9.347483809  -9.329816772  -9.369567606  -9.365150847  -9.369567606
0.11  -8.821270192  -8.791241887  -8.821270192  -8.684777897  -8.578313906  -8.821270192  -8.791241887  -8.821270192
0.12  -8.111435216  -8.086310239  -8.111435216  -8.006747814  -7.927185389  -8.111435216  -8.086310239  -8.111435216
0.13  -7.824599510  -7.821713499  -7.824599510  -7.813499467  -7.805285436  -7.824599510  -7.821713499  -7.824599510
0.14  -7.773803200  -7.750871169  -7.610002978  -7.610002978  -7.632935009  -7.773803200  -7.750871169  -7.773803200
0.15  -7.532997179  -7.488768463  -7.532997179  -7.385568124  -7.282367785  -7.532997179  -7.488768463  -7.532997179
0.16  -6.946385845  -6.945916420  -6.946385845  -6.944918890  -6.943921360  -6.946385845  -6.945916420  -6.946385845
0.17  -6.779704721  -6.733214473  -6.779704721  -6.642968698  -6.552722922  -6.779704721  -6.733214473  -6.779704721
0.18  -6.246236560  -6.150742431  -6.246236560  -5.980975091  -5.811207750  -6.246236560  -6.150742431  -6.246236560
0.19  -5.502187811  -5.474957130  -5.502187811  -5.430528125  -5.386099121  -5.502187811  -5.474957130  -5.502187811
0.2   -5.164710111  -5.153497897  -5.164710111  -5.136679575  -5.119861254  -5.164710111  -5.153497897  -5.164710111
0.21  -4.671390475  -4.671328835  -4.671390475  -4.671243714  -4.671158593  -4.671390475  -4.671328835  -4.671390475
0.22  -4.548527868  -4.522841881  -4.548527868  -4.490150625  -4.457459368  -4.548527868  -4.522841881  -4.548527868
0.23  -4.388516828  -4.377457822  -4.388516828  -4.364475510  -4.351493199  -4.388516828  -4.377457822  -4.388516828
0.24  -4.316096124  -4.307551751  -4.316096124  -4.298295348  -4.289038944  -4.316096124  -4.307551751  -4.316096124
0.25  -4.079947130  -4.050063929  -4.079947130  -4.020180727  -3.990297525  -4.079947130  -4.050063929  -4.079947130
0.26  -3.473246638  -3.454072144  -3.473246638  -3.436372611  -3.418673078  -3.473246638  -3.454072144  -3.473246638
0.27  -3.245534137  -3.185150290  -3.245534137  -3.133712197  -3.082274105  -3.245534137  -3.185150290  -3.245534137
0.28  -2.168810651  -2.142745273  -2.075720015  -2.075720015  -2.101785393  -2.168810651  -2.142745273  -2.168810651
0.29  -2.070811903  -2.062411466  -2.070811903  -2.070811903  -2.050245315  -2.070811903  -2.062411466  -2.070811903
0.3   -2.037002697  -2.030136898  -2.037002697  -2.025559699  -2.020982500  -2.037002697  -2.030136898  -2.037002697
0.31  -1.965890486  -1.936065016  -1.965890486  -1.917784889  -1.899504762  -1.965890486  -1.936065016  -1.965890486
0.32  -1.605320572  -1.531190675  -1.605320572  -1.489492609  -1.447794542  -1.605320572  -1.531190675  -1.605320572
0.33  -0.596281505  -0.520885436  -0.596281505  -0.482045036  -0.443204637  -0.596281505  -0.520885436  -0.596281505
0.34  -0.366486036  -0.352071155  -0.366486036  -0.345287682  -0.338504209  -0.366486036  -0.352071155  -0.366486036
0.35  -0.274901821  -0.263945769  -0.274901821  -0.259250318  -0.254554867  -0.274901821  -0.263945769  -0.274901821
0.36   0.003113984   0.005607992   0.003113984   0.006577884   0.007547776   0.003113984   0.005607992   0.003113984
0.37   0.387715771   0.407663617   0.387715771   0.414672320   0.421681022   0.387715771   0.407663617   0.387715771
0.38   0.562667464   0.572921674   0.562667464   0.576159845   0.579398017   0.562667464   0.572921674   0.562667464
0.39   0.715495338   0.723665852   0.715495338   0.725970356   0.728274860   0.715495338   0.723665852   0.715495338
0.4    1.087178826   1.117467457   1.087178826   1.125039614   1.132611772   1.087178826   1.117467457   1.087178826
0.41   1.695679153   1.804477258   1.695679153   1.828359768   1.852242279   1.695679153   1.804477258   1.695679153
0.42   2.111192091   2.155906470   2.111192091   2.164423494   2.172940519   2.111192091   2.155906470   2.111192091
0.43   2.320818520   2.343886327   2.320818520   2.347641551   2.351396776   2.320818520   2.343886327   2.320818520
0.44   2.649583267   2.650795273   2.649583267   2.650960546   2.651125820   2.649583267   2.650795273   2.649583267
0.45   2.764758649   2.800098936   2.764758649   2.804025635   2.807952333   2.764758649   2.800098936   2.764758649
0.46   3.064414063   3.068369148   3.064414063   3.068713068   3.069056988   3.064414063   3.068369148   3.064414063
0.47   3.287773621   3.328490813   3.287773621   3.331089783   3.333688753   3.287773621   3.328490813   3.287773621
0.48   3.387892659   3.412686920   3.387892659   3.413720014   3.414753108   3.387892659   3.412686920   3.387892659
0.49   3.642843405   3.667090476   3.642843405   3.667585314   3.668080152   3.642843405   3.667090476   3.642843405
0.5    3.807705059   3.942983017   3.807705059   3.942983017   3.942983017   3.807705059   3.942983017   3.942983017
0.51   4.170923434   4.241254812   4.170923434   4.239875766   4.238496719   4.170923434   4.241254812   4.308828097
0.52   4.346172112   4.472752829   4.346172112   4.467884340   4.463015851   4.346172112   4.472752829   4.589596568
0.53   4.695779951   4.698863844   4.695779951   4.698689284   4.698514724   4.695779951   4.698863844   4.701598618
0.54   4.727270028   4.732293447   4.727270028   4.731921341   4.731549236   4.727270028   4.732293447   4.736572655
0.55   4.957652788   5.101049997   5.218374986   5.218374986   5.074977777   4.957652788   5.101049997   5.218374986
0.56   5.242431464   5.378200080   5.484875421   5.484875421   5.349106805   5.242431464   5.378200080   5.484875421
0.57   5.645387992   5.692783612   5.645387992   5.645387992   5.681142582   5.645387992   5.692783612   5.728538202
0.58   5.728904793   5.899060536   5.728904793   5.728904793   5.852121021   5.728904793   5.899060536   6.022276763
0.59   6.027017982   6.214433608   6.027017982   6.185844784   6.157255959   6.027017982   6.214433608   6.344671586
0.6    6.453739001   6.510850163   6.453739001   6.501331636   6.491813109   6.453739001   6.510850163   6.548924271
0.61   6.855828556   6.856560901   6.855828556   6.856428839   6.856296776   6.855828556   6.856560901   6.857029121
0.62   6.875313507   6.950495481   6.875313507   6.935944132   6.921392782   6.875313507   6.950495481   6.996574756
0.63   7.150261074   7.199864021   7.150261074   7.189628492   7.179392964   7.150261074   7.199864021   7.228995910
0.64   7.326055742   7.450529801   7.326055742   7.423301100   7.396072400   7.326055742   7.450529801   7.520546459
0.65   7.535425755   7.537496188   7.535425755   7.537018396   7.536540603   7.535425755   7.537496188   7.538611036
0.66   7.541473093   7.579733114   7.541473093   7.570457958   7.561182801   7.541473093   7.579733114   7.599442822
0.67   7.647253779   7.650758407   7.647253779   7.649869173   7.648979939   7.647253779   7.650758407   7.652484566
0.68   7.785682996   7.857408696   7.785682996   7.838422482   7.819436267   7.785682996   7.857408696   7.891161967
0.69   8.203210964   8.387360780   8.203210964   8.336652859   8.285944939   8.203210964   8.387360780   8.470094755
0.7    8.513693104   8.530572444   8.513693104   8.525749775   8.520927107   8.513693104   8.530572444   8.537806447
0.71   8.566152935   8.630305832   8.566152935   8.611331032   8.592356231   8.566152935   8.630305832   8.656509129
0.72   8.879994050   9.048284497   8.879994050   8.996862416   8.945440335   8.879994050   9.048284497   9.113730782
0.73   9.128040100   9.399984864   9.128040100   9.314303637   9.228622410   9.128040100   9.399984864   9.500567174
0.74   9.569823392   9.592319924   9.569823392   9.585023751   9.577727579   9.569823392   9.592319924   9.600224110
0.75   9.630714864   9.649418237   9.630714864   9.643183779   9.636949321   9.630714864   9.649418237   9.655652694
0.76   9.681168708   9.881360937   9.681168708   9.812874122   9.744387307   9.681168708   9.881360937   9.944579535
0.77   9.962688185  10.042186100   9.962688185  10.014310208   9.986434316   9.962688185  10.042186100  10.065932231
0.78  10.561454903  10.655507187  10.561454903  10.621744828  10.587982470  10.561454903  10.655507187  10.682034754
0.79  10.785551276  10.881821465  10.785551276  10.846481776  10.811142086  10.785551276  10.881821465  10.907412275
0.8   11.084823027  11.297186395  11.084823027  11.217550132  11.137913869  11.084823027  11.297186395  11.350277236
0.81  11.597064341  11.897233740  11.597064341  11.782354094  11.667474447  11.597064341  11.897233740  11.967643846
0.82  12.004520977  12.432263444  12.004520977  12.265339555  12.098415665  12.004520977  12.432263444  12.526158132
0.83  12.698592002  13.038596291  12.698592002  12.903413863  12.768231435  12.698592002  13.038596291  13.108235724
0.84  13.765011713  13.925287483  13.765011713  13.860413957  13.795540431  13.765011713  13.925287483  13.955816201
0.85  14.629557590  14.713481248  14.629557590  14.678924448  14.644367648  14.629557590  14.713481248  14.728291305
0.86  14.835971872  14.836235663  14.835971872  14.836125239  14.836014815  14.835971872  14.836235663  14.836278605
0.87  15.450384890  15.663322229  15.450384890  15.572762671  15.482203113  15.450384890  15.663322229  15.695140452
0.88  15.758680926  16.098548681  15.758680926  15.951787605  15.805026529  15.758680926  16.098548681  16.144894284
0.89  16.474240831  16.775425681  16.474240831  16.643445803  16.511465924  16.474240831  16.775425681  16.812650775
0.9   17.210435197  17.337557215  17.210435197  17.281058541  17.224559866  17.210435197  17.337557215  17.351681884
0.91  17.458258095  18.177340497  17.458258095  17.853358316  17.529376134  17.458258095  18.177340497  18.248458537
0.92  18.324252415  18.653534223  18.324252415  18.503209919  18.352885616  18.324252415  18.653534223  18.682167424
0.93  18.894348589  19.641181839  18.894348589  19.295871841  18.950561844  18.894348589  19.641181839  19.697395094
0.94  19.722272701  19.785473645  19.722272701  19.755890224  19.726306804  19.722272701  19.785473645  19.789507747
0.95  19.972394824  20.645310631  19.972394824  20.326561038  20.007811445  19.972394824  20.645310631  20.680727253
0.96  20.694155217  21.048210896  20.694155217  20.878559217  20.708907537  20.694155217  21.048210896  21.062963216
0.97  21.942717090  22.319577346  21.942717090  22.136974954  21.954372561  21.942717090  22.319577346  22.331232818
0.98  23.166258382  25.369776217  23.166258382  24.290502175  23.211228133  23.166258382  25.369776217  25.414745968
0.99  26.068333020  27.283915778  26.068333020  26.682263706  26.080611634  26.068333020  27.283915778  27.296194392

To compute traditional Quantiles, the R code must sort the data and define new functions for each definition: the functions are called q1, q2, …, q8. To make the script a bit more intelligent, the number of quantiles is made dependent on the number of observations \(N\). For instance, if \(N <= 10\) the script produces deciles and if \(N >= 100\) it computes percentiles.

A better alternative are the Harrell-Davis Quantiles that can also be computed in the R console:

library(Hmisc)

# we use the same data in variable x as before
par1 = 0.01 #lowest quantile
par2 = 0.99 #highest quantile
par3 = 0.01 #step size
ylab = 'value'
xlab = 'quantile'
main = 'Harrell-Davis Quantiles'
myseq <- seq(par1, par2, par3)
hd <- hdquantile(x, probs = myseq, se = TRUE, na.rm = FALSE, names = TRUE, weights=FALSE)
hd
plot(myseq,hd,col=2,main=main,xlab=xlab,ylab=ylab)
grid()

        0.01         0.02         0.03         0.04         0.05         0.06         0.07         0.08         0.09         0.10 
-17.18760915 -14.04734019 -12.90368046 -12.19836073 -11.59423056 -11.06343568 -10.60259520 -10.16932644  -9.72379997  -9.26075437 
        0.11         0.12         0.13         0.14         0.15         0.16         0.17         0.18         0.19         0.20 
 -8.80067514  -8.36636118  -7.96627936  -7.59123261  -7.22256018  -6.84452670  -6.45271449  -6.05469144  -5.66447306  -5.29480995 
        0.21         0.22         0.23         0.24         0.25         0.26         0.27         0.28         0.29         0.30 
 -4.95115025  -4.62958466  -4.31910960  -4.00672757  -3.68280727  -3.34421222  -2.99401831  -2.63854053  -2.28375917  -1.93320498 
        0.31         0.32         0.33         0.34         0.35         0.36         0.37         0.38         0.39         0.40 
 -1.58803145  -1.24842019  -0.91479174  -0.58786623  -0.26778214   0.04674571   0.35835986   0.66995907   0.98317566   1.29730261 
        0.41         0.42         0.43         0.44         0.45         0.46         0.47         0.48         0.49         0.50 
  1.60925195   1.91455027   2.20887279   2.48942735   2.75566310   3.00915018   3.25284098   3.49011954   3.72403188   3.95692219 
        0.51         0.52         0.53         0.54         0.55         0.56         0.57         0.58         0.59         0.60 
  4.19047636   4.42599059   4.66460857   4.90732976   5.15474743   5.40664000   5.66162903   5.91709654   6.16945598   6.41475521 
        0.61         0.62         0.63         0.64         0.65         0.66         0.67         0.68         0.69         0.70 
  6.64949924   6.87152134   7.08069185   7.27923905   7.47149978   7.66305764   7.85944987   8.06483728   8.28110528   8.50773878 
        0.71         0.72         0.73         0.74         0.75         0.76         0.77         0.78         0.79         0.80 
  8.74254978   8.98305860   9.22811702   9.47924489   9.74119192  10.02151562  10.32941425  10.67434559  11.06476869  11.50679122 
        0.81         0.82         0.83         0.84         0.85         0.86         0.87         0.88         0.89         0.90 
 12.00225839  12.54650084  13.12726212  13.72683574  14.32783053  14.91988040  15.50293912  16.08494392  16.67594914  17.28261515 
        0.91         0.92         0.93         0.94         0.95         0.96         0.97         0.98         0.99 
 17.90461811  18.53332899  19.15759695  19.78380426  20.46667375  21.33932245  22.62487579  24.54210048  27.56824790 
attr(,"se")
     0.01      0.02      0.03      0.04      0.05      0.06      0.07      0.08      0.09      0.10      0.11      0.12      0.13 
2.7289443 1.1353442 0.7937828 0.7991865 0.8033645 0.7385770 0.7171491 0.7857519 0.8815757 0.9387016 0.9346940 0.8891761 0.8460287 
     0.14      0.15      0.16      0.17      0.18      0.19      0.20      0.21      0.22      0.23      0.24      0.25      0.26 
0.8435742 0.8885694 0.9563156 1.0117977 1.0296823 1.0043249 0.9499919 0.8933468 0.8615035 0.8697036 0.9143216 0.9764166 1.0332639 
     0.27      0.28      0.29      0.30      0.31      0.32      0.33      0.34      0.35      0.36      0.37      0.38      0.39 
1.0696908 1.0832951 1.0818576 1.0754991 1.0694036 1.0623196 1.0508821 1.0349410 1.0192664 1.0107312 1.0134579 1.0255046 1.0392414 
     0.40      0.41      0.42      0.43      0.44      0.45      0.46      0.47      0.48      0.49      0.50      0.51      0.52 
1.0448868 1.0349236 1.0071506 0.9652192 0.9167947 0.8704689 0.8328732 0.8070895 0.7927181 0.7872140 0.7876655 0.7921444 0.8000632 
     0.53      0.54      0.55      0.56      0.57      0.58      0.59      0.60      0.61      0.62      0.63      0.64      0.65 
0.8114932 0.8259503 0.8414406 0.8543459 0.8601782 0.8548054 0.8356839 0.8027831 0.7590305 0.7101715 0.6639628 0.6286475 0.6107775 
     0.66      0.67      0.68      0.69      0.70      0.71      0.72      0.73      0.74      0.75      0.76      0.77      0.78 
0.6128825 0.6321924 0.6615201 0.6919671 0.7160278 0.7299740 0.7351163 0.7377886 0.7477733 0.7750923 0.8263257 0.9028163 1.0016998 
     0.79      0.80      0.81      0.82      0.83      0.84      0.85      0.86      0.87      0.88      0.89      0.90      0.91 
1.1175173 1.2416516 1.3599511 1.4524097 1.4988696 1.4899308 1.4356256 1.3627286 1.2994759 1.2592539 1.2375471 1.2206384 1.1918038 
     0.92      0.93      0.94      0.95      0.96      0.97      0.98      0.99 
1.1341652 1.0467219 0.9719568 0.9936027 1.1922712 1.5975403 1.9287457 2.2231892 

To compute Harrell-Davis Quantiles, the R code uses the hdquantile function from the Hmisc library to produce the output. The dataset is simulated with the rnorm function as a series of random numbers (N = 200).

64.12 Purpose

Quantiles are used to describe the distribution of a random variable. They are also used to construct new types of statistical methods such as the QQ Plot, the Normal Probability Plot, PPCC Plot, etc.

64.13 Pros & Cons

64.13.1 Pros

The use of Quantiles has the following advantages:

  • They are easily computed with most software packages (even though most packages don’t explain which definition is used).
  • They are relatively easy to interpret and convey a lot of information in a simple graph or table.
  • Educated readers may be familiar with Quantiles -- therefore it is one of the preferred methods to report information about the distribution of a variable of interest.

64.13.2 Cons

The use of Quantiles has the following disadvantages:

  • Most software packages don’t explain the exact definition that is used to compute the Quantiles (see Hyndman and Fan 1996 for a systematic comparison of nine definitions). This can have serious consequences, especially for small samples or when computing Quantiles for distributions with heavy tails.
  • Most software packages do not use the Harrell-Davis Quantiles.
  • Traditional types of Percentiles, Quartiles, etc. often require the researcher to make interpolations between adjacent quantile points.

64.14 Example

The analysis shown below illustrates the (traditional) Percentiles for a dataset containing student motivation scores. The column headings are abbreviated and correspond to the various definitions of quantiles (the Harrell-Davis Quantiles are not included).

Interactive Shiny app (click to load).
Open in new tab

Suppose we wish to compute an interval which contains the middle 95% of observations. How should we determine this interval?

To find the interval \(\left[ Quantile(0.025), Quantile(0.975) \right]\) we have to lookup the quantiles that are closest to the lower and upper bounds (0.025 and 0.975). The problem is that we can’t find the exact values in the output, implying that we need to approximate the bounds by interpolation:

  • The lower bound is found by computing the average Quantile of P(0.02) and P(0.03). For instance, if we use definition 6, we compute the average of 11 and 13 (which is 12).
  • The upper bound is found by computing the average Quantile of P(0.97) and P(0.98). For instance, if we use definition 6, we compute the average of 26 and 27 (which is 26.5).

According to definition 6, the interval [12, 26.5] contains the middle 95% of all observations.

64.15 Task

For the same data as in previous example, compute the 95% interval by using:

  • the traditional quantiles with an appropriate stepsize

  • the Harrell-Davis method

Which method is best?

Harrell, Frank E., and C. E. Davis. 1982. “A New Distribution-Free Quantile Estimator.” Biometrika 69 (3): 635–40. https://doi.org/10.1093/biomet/69.3.635.
Hyndman, Rob J., and Yanan Fan. 1996. “Sample Quantiles in Statistical Packages.” The American Statistician 50 (4): 361–65. https://doi.org/10.1080/00031305.1996.10473566.
63  Data Quality Forensics
65  Central Tendency

© 2026 Patrick Wessa. Provided as-is, without warranty.

Feedback: e-mail | Anonymous contributions: click to copy (Sats) | click to copy (XMR)

Cookie Preferences