# Kolmorov-Smirnov test: A usual error

The Kolmogorov-Smirnov test is a non-parametric test whose purpose is to test that if the true distribution of some i.i.d. sample is some **specific **distribution. However, the test is often used in order to test if the true distribution belongs to some **family **of distribution (for example, the Gaussian family) where we estimate the parameters. Although the idea could seem good at first sight, this biases the distribution of the *p-value* which leads in general to accept every distribution with a very high *p-value*. We see how the effect of estimating parameters here, which looks innocent at first sight, leads systematically to those results.

First, recall the Kolmogorov-Smirnov test. We start by observing a sequence of random variables \(X = (X_{i})_{1 \leq i \leq n}\) which follows an i.i.d. distribution with cumulative distribution function \(F_{0}\) assumed to be continuous. the following convergence result holds:

\[

K_{n} := \sqrt{n}\left|\hat{F}_{n}(x) – F_{0}(x)\right| \overset{\text{loi}}{\underset{n \rightarrow +\infty}{\longrightarrow}} \mathbb{B}.

\]

Here, \(F_{n}\) is the empirical cumulative distribution associated to the sample \(X\):

\[

\hat{F}_{n}(x) = \frac{1}{n}\sum_{i=1}^{n} \mathbf{1}_{\{]-\infty, x]\}}(X_{i}).

\]

Following the convergence of \(K_{n}\), we can perform a non parametric test on the distribution of \(X\) when we **fix** \(F_{0}\). Here is the key point. \(F_{0}\) cannot be estimated. The usual error done is when one wants to test if the distribution of \(X\) is part of a specific *family *of distribution (for example, if it is a Normal distribution, or a Gamma distribution). Then, one estimates the parameters and makes \(F_{0}\) to be estimated and then to be random (according to the sample).

More precisely, we would like to estimate \(\mathcal{H}_{0} : F \in \{F_{\theta}, \theta \in \Theta\}\) when the test does only allow \(\mathcal{H}_{0} : F = F_{0}\).

This causes to \(\hat{F}_{n}\) to be in probability closer to \(F_{0}\) since this one in not a fixed function anymore, but it is the closest distribution in the *family* that matches the sample (by estimating the parameters from the sample, actually we replace \(F_{0}\) by \(F_{\hat{\theta}}\)). By doing this, the power of the test is lower than the one expected if the true distribution \(F\) is appart of the chosen family.

The intuition is the following : recall degrees of freedom of the \(\chi^2\) test : \(d-k-1\). \(d\) is the number of classes chosen and \(k\) the number of parameters estimated. If \(k > 0\), then, the asymptotic distribution is closer to zero and this is asymptotically persistent. Here, we have the same trouble be we cannot easily compute the correct distribution if \(k > 0\).

We now test it and see an illustration of the expected results through simulations.

This test is implemented in R with the function `ks.test`

. Note that, if the sample is smaller than 100, by default, the test does not use the convergence above but the exact distribution (see documentation).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | set.seed(0) n <- 10^6 m <- 10 mu <- 4 s <- 1 X <- matrix(rnorm(n*m, mu, s), n, m) pvaleur <- numeric(n) for(i in 1:n) pvaleur[i] <- ks.test(X[i, ], "pnorm", mean=mu, sd=s)$p.value ks.test(pvaleur, "punif") plot(density(pvaleur)) |

Here, we simulate a sample of size \(m = 10\), and we simulate \(n = 10^6\) of them. The distribution chosen here is the \(\mathcal{N}(4,1)\). Then, we apply the `ks.test`

to all samples and get the associated *p-value* for the true distribution of the sample. If we are right, under \(\mathcal{H}_{0}\), the *p-value* follows a \(\mathcal{U}\left([0, 1]\right)\) distribution. This is verified by the test itself with `ks.test(pvaleur, "punif")`

which returns `p-value = 0.1337`

and can be also seen graphically using `density`

function.

Then, if we set the test at level \(\alpha\), we have exactly probability \(\alpha\) to reject \(\mathcal{H}_{0}\) by error.

However, if we now estimate the parameters, using the followig code with a slight modification:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | set.seed(0) n <- 10^6 m <- 10 mu <- 4 s <- 1 X <- matrix(rnorm(n*m, mu, s), n, m) pvaleur <- numeric(n) for(i in 1:n) pvaleur[i] <- ks.test(X[i, ], "pnorm", mean=mean(X[i,]), sd=sd(X[i,]))$p.value ks.test(pvaleur, "punif") plot(density(pvaleur)) |

This time we get for the uniform test `p-value < 2.2e-16`

and we can see that the distribution is concentrated near one:

We can see that there is almost no mass below \(\alpha=0.05\), more precisely, `mean(pvaleur <0.05)`

returns `0.000028`

which is far away from \(\alpha\).

This effect still holds if we inscrease the size of the sample. For \(m=40\), for `mean(pvaleur <0.05)`

we get `0.000131`

which is better but still far away. For \(m=120\) we get `0.00011`

. Below are the two associated `plot(density(pvaleur))`

(left \(m=40\) and right \(m=120\)).

The effect looks persistent. At first sight, there is no big problem since its good to accept \(\mathcal{H}_{0}\) when it is true. However, this implies a **strong** lack of power of the test.

Now, look at the same case but when the true distribution is a Gamma distribution \(\alpha, \beta\) with mean \(\mu = 4\) and standard deviation \(\sigma = 1\), i.e. \(\alpha = \frac{\mu^2}{\sigma^{2}}, \beta = \frac{\mu}{\sigma^{2}}\). We compute the *p-value* distribition for \(m \in \{10 ; 40 ; 120, 480\}\). We then simulate the gamma distribution and test the distribution \(\mathcal{N}(4,1)\) at first and with estimated mean and standard deviation at second.

The code is:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | set.seed(0) n <- 10^6 m <- 10 mu <- 4 s <- 1 beta <- mu/s^2 alpha <- mu^2/s^2 X <- matrix(rgamma(n*m, alpha, beta), n, m) pvaleur <- numeric(n) for(i in 1:n) pvaleur[i] <- ks.test(X[i, ], "pnorm", mean = mu, sd = s)$p.value ks.test(pvaleur, "punif") plot(density(pvaleur)) |

Here are the results:

- \(m=10\),
`mean(pvaleur <0.05)`

:`0.053861`

, - \(m=40\),
`mean(pvaleur <0.05)`

:`0.067884`

, - \(m=120\),
`mean(pvaleur <0.05)`

:`0.09859`

, - \(m=480\),
`mean(pvaleur <0.05)`

:`0.280049`

.

Note that we have more than \(\alpha = 0.05\) which is always the case when the test is well-posed. We can notice by the way how it is almost impossible to reject a distribution which has here the same two moments of the true one, with a very small sample. Below are the graphics which correspond to `plot(density(pvaleur))`

for the four cases above from left to right.

As you see the distribution starts to go to 0 when \(m\) increases, however it is slow and even a sample here of size \(m=480\) does only give \(\approx 28\%\) of success for the rejection at level \(\alpha=0.05\). However at this stage what we did is right: the power of the test is weak for small samples when the distributions are close, which is the case here (a graphic with both densities is given later).

Now, if we we make the mistake to put estimated mean and variance as parameters, the rejection will be very difficult. The code is:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | set.seed(0) n <- 10^6 m <- 10 mu <- 4 s <- 1 beta <- mu/s^2 alpha <- mu^2/s^2 X <- matrix(rgamma(n*m, alpha, beta), n, m) pvaleur <- numeric(n) for(i in 1:n) pvaleur[i] <- ks.test(X[i, ], "pnorm", mean = mean(X[i, ]), sd = sd(X[i, ]))$p.value ks.test(pvaleur, "punif") plot(density(pvaleur)) |

Again, we test for \(m \in \{10 ; 40 ; 120, 480\}\). Here are the results:

- \(m=10\),
`mean(pvaleur <0.05)`

:`0.00006`

, - \(m=40\),
`mean(pvaleur <0.05)`

:`0.000984`

, - \(m=120\),
`mean(pvaleur <0.05)`

:`0.005458`

, - \(m=480\),
`mean(pvaleur <0.05)`

:`0.12026`

.

As we can see, in the case when \(m\) is small, the power associated to this wrong application is strongly below, most *p-value* will be high and close to 1. Before to conclude, here is the graphic with both distributions.

1 2 3 4 | x <- seq(0, 10, 0.01) plot(x, dgamma(x, alpha, beta), type="l", main="Comparison", col="red", ylab=expression(f(x))) lines(x, dnorm(x, mu, s), col="darkblue") legend("topleft", legend=c(expression(N(mu, sigma)), expression(Ga(alpha, beta))), lty=1, col=c("darkblue", "red")) |

As we can see both distributions look close as it is often the case when we set to both the same mean and variance. However both left and right tails are really different in relative values (but not so much in absolute value).

**Conclusion**: The Kolmogorov-Smirnov test does not appear to be often powerful in order to reject \(\mathcal{H}_{0}\). It is made in order to test the sample against a *single* distribution. If we test against a *family *of distributions (ex: the gaussian family) when we estimate the parameters (here \(\hat{\mu}\) and \(\hat{\sigma}^{2}\)), for small samples, we have no chance to reject \(\mathcal{H}_{0}\), we will artificialy have high *p-values* very often above 0.5. This leads to know that it accepts strongly anything without even regarding the hypothesis and the sample.

### Nicolas Baradel

#### Latest posts by Nicolas Baradel (see all)

- Make your R scripts faster with C code - October 19, 2017
- Kolmorov-Smirnov test: A usual error - January 31, 2017