In this paper, we study the classical problem of estimating the proportion of a finite population. First, we consider a fixed sample size method and derive an explicit sample size formula which ensures a mixed criterion of absolute and relative errors. Second, we consider an inverse sampling scheme such that the sampling is continue until the number of units having a certain attribute reaches a threshold value or the whole population is examined. We have established a simple method to determine the threshold so that a prescribed relative precision is guaranteed. Finally, we develop a multistage sampling scheme for constructing fixed-width confidence interval for the proportion of a finite population. Powerful computational techniques are introduced to make it possible that the fixed-width confidence interval ensures prescribed level of coverage probability.
Deep Dive into On Estimation of Finite Population Proportion.
In this paper, we study the classical problem of estimating the proportion of a finite population. First, we consider a fixed sample size method and derive an explicit sample size formula which ensures a mixed criterion of absolute and relative errors. Second, we consider an inverse sampling scheme such that the sampling is continue until the number of units having a certain attribute reaches a threshold value or the whole population is examined. We have established a simple method to determine the threshold so that a prescribed relative precision is guaranteed. Finally, we develop a multistage sampling scheme for constructing fixed-width confidence interval for the proportion of a finite population. Powerful computational techniques are introduced to make it possible that the fixed-width confidence interval ensures prescribed level of coverage probability.
The estimation of the proportion of a finite population is a basic and very important problem in probability and statistics [6,8]. Such problem finds applications spanning many areas of sciences and engineering. The problem is formulated as follows.
Consider a finite population of N units, among which there are M units having a certain attribute. The objective is to estimate the proportion p = M N based on sampling without replacement.
One popular method of sampling is to draw n units without replacement from the population and count the number, k, of units having the attribute. Then, the estimate of the proportion is taken as p = k n . In this process, the sample size n is fixed. Clearly, the random variable k possesses a hypergeometric distribution. The reliability of the estimator p = k n depends on n. For error control purpose, we are interested in a crucial question as follows:
For prescribed margin of absolute error ε a ∈ (0, 1), margin of relative error ε r ∈ (0, 1), and confidence parameter δ ∈ (0, 1), how large the sample size n should be to guarantee
In this regard, we have Theorem 1 Let ε a ∈ (0, 1) and ε r ∈ (0, 1) be real numbers such that εa εr + ε a ≤ 1 2 . Then, ( 1) is guaranteed provided that
The proof of Theorem 1 is given in Appendix A. It should be noted that conventional methods for determining sample sizes are based on normal approximation, see [6] and the references therein. In contrast, Theorem 1 offers a rigorous method for determining sample sizes. To reduce conservativeness, a numerical approach has been developed by Chen [4] which permits exact computation of the minimum sample size.
To estimate the proportion p, a frequently-used sampling method is the inverse sampling scheme described as follows:
Continuing sampling from the population (without replacement) until r units found to carry the attribute or the number of sample size n reaches the population size N . The estimator of the proportion p is taken as the ratio p = k n , where k is the number of units having the attribute among the n units.
Clearly, the reliability of the estimator p depends on the threshold value r. Hence, we are interested in a crucial question as follows:
For prescribed margin of relative error ε ∈ (0, 1) and confidence parameter δ ∈ (0, 1), how large the threshold r should be to guarantee
For this purpose, we have Theorem 2 For any ε ∈ (0, 1),
which is monotonically decreasing with respect to r. Moreover, for any δ ∈ (0, 1), there exists a unique number r * such that Q(ε, r * ) = δ and
The proof of Theorem 2 is given in Appendix B. As an immediate consequence of Theorem 2, we have
3 Multistage Fixed-width Confidence Intervals
So far we have only considered point estimation for the proportion p. Interval estimation is also an important method for estimating p. Motivated by the fact that a confidence interval must be sufficiently narrow to be useful, we shall develop a multistage sampling scheme for constructing a fixed-width confidence interval for the proportion, p, of the finite population discussed in previous sections.
Note that the procedure of sampling without replacement can be precisely described as follows:
Each time a single unit is drawn without replacement from the remaining population so that every unit of the remaining population has equal chance of being selected.
Such a sampling process can be exactly characterized by random variables X 1 , • • • , X N defined in a probability space (Ω, F , Pr) such that X i denotes the characteristics of the i-th sample in the sense that X i = 1 if the i-th sample has the attribute and X i = 0 otherwise. By the nature of the sampling procedure, it can be shown that
we can define a multistage sampling scheme of the following basic structure. The sampling process is divided into s stages with sample sizes
The continuation or termination of sampling is determined by decision variables. For each stage with index ℓ, a decision variable
The decision variable D ℓ assumes only two possible values 0, 1 with the notion that the sampling is continued until D ℓ = 1 for some ℓ ∈ {1, • • • , s}. Since the sampling must be terminated at or before the s-th stage, it is required that D s = 1. For simplicity of notations, we also define D ℓ = 0 for ℓ = 0.
Our goal is to construct a fixed-width confidence interval (L, U ) such that U -L ≤ 2ε and that Pr{L < p < U | p} > 1 -δ for any p ∈ { i N : 0 ≤ i ≤ N } with prescribed ε ∈ (0, 1 2 ) and δ ∈ (0, 1). Toward this goal, we need to define some multivariate functions as follows.
For α ∈ (0, 1) and integers 0
where n is the sample size when the sampling is terminated. Then, a sufficient condition to guarantee
for all M ∈ {0, 1, • • • , N }, where (4) is satisfied if ζ > 0 is sufficiently small.
It should be noted that Theorem 3 has employed the double-decision-variable method recently proposed by Chen in [1]. To further reduce computational complexity, the techniques of bisection confidence tuning an
…(Full text truncated)…
This content is AI-processed based on ArXiv data.