Kolmorov-Smirnov test: A usual error

# Kolmorov-Smirnov test: A usual error

The Kolmogorov-Smirnov test is a non-parametric test whose purpose is to test that if the true distribution of some i.i.d. sample is some specific distribution. However, the test is often used in order to test if the true distribution belongs to some family of distribution (for example, the Gaussian family) where we estimate the parameters. Although the idea could seem good at first sight, this biases the distribution of the p-value which leads in general to accept every distribution with a very high p-value. We see how the effect of estimating parameters here, which looks innocent at first sight, leads systematically to those results.

First, recall the Kolmogorov-Smirnov test. We start by observing a sequence of random variables $$X = (X_{i})_{1 \leq i \leq n}$$ which follows an i.i.d. distribution with cumulative distribution function $$F_{0}$$ assumed to be continuous. the following convergence result holds:

$K_{n} := \sqrt{n}\left|\hat{F}_{n}(x) – F_{0}(x)\right| \overset{\text{loi}}{\underset{n \rightarrow +\infty}{\longrightarrow}} \mathbb{B}.$

Here, $$F_{n}$$ is the empirical cumulative distribution associated to the sample $$X$$:

$\hat{F}_{n}(x) = \frac{1}{n}\sum_{i=1}^{n} \mathbf{1}_{\{]-\infty, x]\}}(X_{i}).$

Following the convergence of $$K_{n}$$, we can perform a non parametric test on the distribution of $$X$$ when we fix $$F_{0}$$. Here is the key point. $$F_{0}$$ cannot be estimated. The usual error done is when one wants to test if the distribution of $$X$$ is part of a specific family of distribution (for example, if it is a Normal distribution, or a Gamma distribution). Then, one estimates the parameters and makes $$F_{0}$$ to be estimated and then to be random (according to the sample).

More precisely, we would like to estimate $$\mathcal{H}_{0} : F \in \{F_{\theta}, \theta \in \Theta\}$$ when the test does only allow $$\mathcal{H}_{0} : F = F_{0}$$.

This causes to $$\hat{F}_{n}$$ to be in probability closer to $$F_{0}$$ since this one in not a fixed function anymore, but it is the closest distribution in the family that matches the sample (by estimating the parameters from the sample, actually we replace $$F_{0}$$ by $$F_{\hat{\theta}}$$). By doing this, the power of the test is lower than the one expected if the true distribution $$F$$ is appart of the chosen family.

The intuition is the following : recall degrees of freedom of the $$\chi^2$$ test : $$d-k-1$$. $$d$$ is the number of classes chosen and $$k$$ the number of parameters estimated. If $$k > 0$$, then, the asymptotic distribution is closer to zero and this is asymptotically persistent. Here, we have the same trouble be we cannot easily compute the correct distribution if $$k > 0$$.

We now test it and see an illustration of the expected results through simulations.

This test is implemented in R with the function ks.test. Note that, if the sample is smaller than 100, by default, the test does not use the convergence above but the exact distribution (see documentation).

Here, we simulate a sample of size $$m = 10$$, and we simulate $$n = 10^6$$ of them. The distribution chosen here is the $$\mathcal{N}(4,1)$$. Then, we apply the ks.test to all samples and get the associated p-value for the true distribution of the sample. If we are right, under $$\mathcal{H}_{0}$$, the p-value follows a $$\mathcal{U}\left([0, 1]\right)$$ distribution. This is verified by the test itself with ks.test(pvaleur, "punif") which returns p-value = 0.1337 and can be also seen graphically using density function.

Then, if we set the test at level $$\alpha$$, we have exactly probability $$\alpha$$ to reject $$\mathcal{H}_{0}$$ by error.

However, if we now estimate the parameters, using the followig code with a slight modification:

This time we get for the uniform test p-value < 2.2e-16 and we can see that the distribution is concentrated near one:

We can see that there is almost no mass below $$\alpha=0.05$$, more precisely, mean(pvaleur <0.05) returns 0.000028 which is far away from $$\alpha$$.

This effect still holds if we inscrease the size of the sample. For $$m=40$$, for mean(pvaleur <0.05) we get 0.000131 which is better but still far away. For $$m=120$$ we get 0.00011. Below are the two associated plot(density(pvaleur)) (left $$m=40$$ and right $$m=120$$).

The effect looks persistent. At first sight, there is no big problem since its good to accept $$\mathcal{H}_{0}$$ when it is true. However, this implies a strong lack of power of the test.

Now, look at the same case but when the true distribution is a Gamma distribution $$\alpha, \beta$$ with mean $$\mu = 4$$ and standard deviation $$\sigma = 1$$, i.e. $$\alpha = \frac{\mu^2}{\sigma^{2}}, \beta = \frac{\mu}{\sigma^{2}}$$. We compute the p-value distribition for $$m \in \{10 ; 40 ; 120, 480\}$$. We then simulate the gamma distribution and test the distribution $$\mathcal{N}(4,1)$$ at first and with estimated mean and standard deviation at second.

The code is:

Here are the results:

• $$m=10$$, mean(pvaleur <0.05) :0.053861,
• $$m=40$$, mean(pvaleur <0.05) :0.067884,
• $$m=120$$, mean(pvaleur <0.05) :0.09859,
• $$m=480$$, mean(pvaleur <0.05) :0.280049.

Note that we have more than $$\alpha = 0.05$$ which is always the case when the test is well-posed. We can notice by the way how it is almost impossible to reject a distribution which has here the same two moments of the true one, with a very small sample. Below are the graphics which correspond to plot(density(pvaleur)) for the four cases above from left to right.

As you see the distribution starts to go to 0 when $$m$$ increases, however it is slow and even a sample here of size $$m=480$$ does only give $$\approx 28\%$$ of success for the rejection at level $$\alpha=0.05$$. However at this stage what we did is right: the power of the test is weak for small samples when the distributions are close, which is the case here (a graphic with both densities is given later).

Now, if we we make the mistake to put estimated mean and variance as parameters, the rejection will be very difficult. The code is:

Again, we test for $$m \in \{10 ; 40 ; 120, 480\}$$. Here are the results:

• $$m=10$$, mean(pvaleur <0.05) :0.00006,
• $$m=40$$, mean(pvaleur <0.05) :0.000984,
• $$m=120$$, mean(pvaleur <0.05) :0.005458,
• $$m=480$$, mean(pvaleur <0.05) :0.12026.

As we can see, in the case when $$m$$ is small, the power associated to this wrong application is strongly below, most p-value will be high and close to 1. Before to conclude, here is the graphic with both distributions.

As we can see both distributions look close as it is often the case when we set to both the same mean and variance. However both left and right tails are really different in relative values (but not so much in absolute value).

Conclusion: The Kolmogorov-Smirnov test does not appear to be often powerful in order to reject $$\mathcal{H}_{0}$$. It is made in order to test the sample against a single distribution. If we test against a family of distributions (ex: the gaussian family) when we estimate the parameters (here $$\hat{\mu}$$ and $$\hat{\sigma}^{2}$$), for small samples, we have no chance to reject $$\mathcal{H}_{0}$$, we will artificialy have high p-values very often above 0.5. This leads to know that it accepts strongly anything without even regarding the hypothesis and the sample.