Spacing

 

Spacing is the difference between consecutive order statistics of a data set, or in simpler language, the difference between points after sorting them.

 

The abstract to "Expected Spacing", the primary paper presenting our analysis, reads:

The expected spacing, or average difference between consecutive order statistics, is known for uniform and exponential random variates. For other distributions we can estimate it using the derivative of the inverse cumulative density (quantile) function, since passing a uniformly drawn value, whose spacing we know, through this function generates a random value from the distribution, and the difference between two such uniform values approximates the derivative. We calculate the spacing for two new distributions, the logistic and Gumbel, and show the estimator is exact for the first and approximate for the second. Comparing the estimators for six other distributions to numeric simulations shows they are also approximations, best in the middle of the order statistics with an error that goes inversely with the square of the sample size, but degrading in the tails.

The supplement to the paper contains derivations of all results, additional analysis of the simulations including fits of the actual spacing to the estimator to see if there are systematic errors, and descriptions of the simulation data used to generate all the figures.

 

A second paper, "A Spacing Estimator", adds the variance of the logistic spacing to these results.

 

The papers discuss the numeric problems that arise when evaluating the series for the expected spacing of the Gumbel and logistic distributions, or the variance of the logistic spacing. A high-precision math library like mpfr is needed for three programs that implement each of these calculations. Comments at the start of each file will tell you how to compile and run the program.