The fundamental problem addressed in this thesis is the problem of constructing confidence limits for mean or totals in finite populations, when the underlying distribution is highly skewed and contains a substantial proportion of zero values. This situation is often encountered in statistical applications such as statistical auditing, reliability, insurance, meteorology and biostatistics. The motivating example underlying this research is that of auditing (see the report published by the National Academy Press entitled “Statistical
Models and Analysis in Auditing”, Panel on Non-standard Mixtures of Distributions 1989), where interest is focused on computing the confidence bounds for the true total error amount. In such populations the use of the classical survey-sampling estimators such as the mean-per-unit, the difference, the ratio or regression, based on the normality assumption of the sampling distribution of the estimates, has been found unreliable, (e.g. Stringer 1963, Kaplan 1973, and Neter and Loebbecke 1975, 1977). Several alternative methods have been proposed, of which the Stringer bound (Stringer 1963), is the most widely used. This bound, while overcoming the unreliability problem of
the classical estimators, has been found to be extremely conservative.
In this research, we develop new methods for constructing confidence intervals for the mean of a bounded random variable. Further, we apply these new methods to data that are heavily skewed and marked by many zero values. Our
proposed confidence intervals have a good coverage probability and precision. The first method is based on a novel use of the Edgeworth expansion for the studentised compound Poisson sum. In this work, we have reduced the
problem of estimating the total error amount in auditing to the compound Poisson sum, and explored the asymptotic expansion for a compound Poisson distribution as a method of constructing confidence bounds on the total error amount. This method is less restrictive than the Stringer bound, and imposes no prior structure on the error distribution. We obtain a bound on the cumulative distribution function of the prorated errors, which we then use to give an alternative form of the Stringer bound.
With this form of the Stringer bound, we were able to use Bolshev’s recursion to obtain a lower bound on its coverage probability, and showed that, for a sample size, n ≤ 2, this lower bound is greater than or equal to the stated coverage probability. We illustrate numerically that the Stringer Bound is reliable when (n, α) falls into a number of ranges; specifically n ≤ 11 and a significance level α ∈ (0, 0.05); n ≤ 10 and α ∈ (0, 0.1); n ≤ 9 and α ∈ (0, 0.20); n ≤ 8 and α ∈ (0, 0.40); and n ≤ 7 and α ∈ (0, .5). We also proposed an extension to the
Stringer method based on Rom’s adjusted significance levels, and illustrated numerically the reliability of the extended Stringer bound for values of α in
the range .05 to .5, and for n = 1 to n = 20.
For the new bounds, we provide explicit expressions which make their computations
straightforward. Monte Carlo simulations are carried out to evaluate
the performance of the methods developed in this thesis when applied to
accounting data, we investigate the performance of each method and assess
whether or not it is affected by varying the distribution of accounting data,
the effects of 100-percent overstatement error and the effects of error rates,
using real and simulated populations. The method based on compound Poisson
sum seems to reliable for large samples. However, for small samples the
compound Poisson bound has the poorest results (in the sense of coverage
probability), in particular, for populations containing a lower concentration
of small error amounts. Although the extended Stringer bound, has a good
coverage probability for all sample sizes and significance levels, it shares the
extreme conservativeness of the Stringer bound.