Estimation of Pareto Distribution Functions from Samples Contaminated by Measurement Errors
Abstract
Estimation of population distributions, from samples that are contaminated
by measurement errors, is a common problem. This study considers the problem
of estimating the population distribution of independent random variables
Xi, from error-contaminated samples ~i (.j = 1, ... , n) such that Yi = Xi + f·.i,
where E is the measurement error, which is assumed independent of X. The
measurement error ( is also assumed to be normally distributed. Since the
observed distribution function is a convolution of the error distribution with
the true underlying distribution, estimation of the latter is often referred to
as a deconvolution problem. A thorough study of the relevant deconvolution
literature in statistics is reported.
We also deal with the specific case when X is assumed to follow a truncated
Pareto form. If observations are subject to Gaussian errors, then the observed
Y is distributed as the convolution of the finite-support Pareto and Gaussian
error distributions. The convolved probability density function (PDF)
and cumulative distribution function (CDF) of the finite-support Pareto and
Gaussian distributions are derived.
The intention is to draw more specific connections bet.ween certain deconvolution
methods and also to demonstrate the application of the statistical theory
of estimation in the presence of measurement error.
A parametric methodology for deconvolution when the underlying distribution
is of the Pareto form is developed.
Maximum likelihood estimation (MLE) of the parameters of the convolved distributions
is considered. Standard errors of the estimated parameters are calculated
from the inverse Fisher's information matrix and a jackknife method.
Probability-probability (P-P) plots and Kolmogorov-Smirnov (K-S) goodnessof-
fit tests are used to evaluate the fit of the posited distribution. A bootstrapping
method is used to calculate the critical values of the K-S test statistic,
which are not available.
Simulated data are used to validate the methodology. A real-life application
of the methodology is illustrated by fitting convolved distributions to astronomical
data