The above chart shows an example application of the DKW inequality in constructing confidence bounds (in purple) around an empirical distribution function (in light blue). In this random draw, the true CDF (orange) is entirely contained within the DKW bounds.
so is the probability that a single random variable is smaller than , and is the fraction of random variables that are smaller than .
The Dvoretzky–Kiefer–Wolfowitz inequality bounds the probability that the random functionFn differs from F by more than a given constant ε > 0 anywhere on the real line. More precisely, there is the one-sided estimate
This strengthens the Glivenko–Cantelli theorem by quantifying the rate of convergence as n tends to infinity. It also estimates the tail probability of the Kolmogorov–Smirnov statistic. The inequalities above follow from the case where F corresponds to be the uniform distribution on [0,1] [5]
as Fn has the same distributions as Gn(F) where Gn is the empirical distribution of
U1, U2, …, Un where these are independent and Uniform(0,1), and noting that
with equality if and only if F is continuous.
Kaplan–Meier estimator
The Dvoretzky–Kiefer–Wolfowitz inequality is obtained for the Kaplan–Meier estimator which is a right-censored data analog of the empirical distribution function
for every and for some constant , where is the Kaplan–Meier estimator, and is the censoring distribution function.[6]
The Dvoretzky–Kiefer–Wolfowitz inequality is one method for generating CDF-based confidence bounds and producing a confidence band, which is sometimes called the Kolmogorov–Smirnov confidence band. The purpose of this confidence interval is to contain the entire CDF at the specified confidence level, while alternative approaches attempt to only achieve the confidence level on each individual point, which can allow for a tighter bound. The DKW bounds runs parallel to, and is equally above and below, the empirical CDF. The equally spaced confidence interval around the empirical CDF allows for different rates of violations across the support of the distribution. In particular, it is more common for a CDF to be outside of the CDF bound estimated using the DKW inequality near the median of the distribution than near the endpoints of the distribution.
The interval that contains the true CDF, , with probability is often specified as
^Kosorok, M.R. (2008), "Chapter 11: Additional Empirical Process Results", Introduction to Empirical Processes and Semiparametric Inference, Springer, p. 210, ISBN9780387749778