On Estimation of Finite Population Proportion

Reading time: 6 minute
...

📝 Original Info

  • Title: On Estimation of Finite Population Proportion
  • ArXiv ID: 0804.3779
  • Date: 2008-04-23
  • Authors: Xinjia Chen

📝 Abstract

In this paper, we study the classical problem of estimating the proportion of a finite population. First, we consider a fixed sample size method and derive an explicit sample size formula which ensures a mixed criterion of absolute and relative errors. Second, we consider an inverse sampling scheme such that the sampling is continue until the number of units having a certain attribute reaches a threshold value or the whole population is examined. We have established a simple method to determine the threshold so that a prescribed relative precision is guaranteed. Finally, we develop a multistage sampling scheme for constructing fixed-width confidence interval for the proportion of a finite population. Powerful computational techniques are introduced to make it possible that the fixed-width confidence interval ensures prescribed level of coverage probability.

💡 Deep Analysis

Deep Dive into On Estimation of Finite Population Proportion.

In this paper, we study the classical problem of estimating the proportion of a finite population. First, we consider a fixed sample size method and derive an explicit sample size formula which ensures a mixed criterion of absolute and relative errors. Second, we consider an inverse sampling scheme such that the sampling is continue until the number of units having a certain attribute reaches a threshold value or the whole population is examined. We have established a simple method to determine the threshold so that a prescribed relative precision is guaranteed. Finally, we develop a multistage sampling scheme for constructing fixed-width confidence interval for the proportion of a finite population. Powerful computational techniques are introduced to make it possible that the fixed-width confidence interval ensures prescribed level of coverage probability.

📄 Full Content

The estimation of the proportion of a finite population is a basic and very important problem in probability and statistics [6,8]. Such problem finds applications spanning many areas of sciences and engineering. The problem is formulated as follows.

Consider a finite population of N units, among which there are M units having a certain attribute. The objective is to estimate the proportion p = M N based on sampling without replacement.

One popular method of sampling is to draw n units without replacement from the population and count the number, k, of units having the attribute. Then, the estimate of the proportion is taken as p = k n . In this process, the sample size n is fixed. Clearly, the random variable k possesses a hypergeometric distribution. The reliability of the estimator p = k n depends on n. For error control purpose, we are interested in a crucial question as follows:

For prescribed margin of absolute error ε a ∈ (0, 1), margin of relative error ε r ∈ (0, 1), and confidence parameter δ ∈ (0, 1), how large the sample size n should be to guarantee

In this regard, we have Theorem 1 Let ε a ∈ (0, 1) and ε r ∈ (0, 1) be real numbers such that εa εr + ε a ≤ 1 2 . Then, ( 1) is guaranteed provided that

The proof of Theorem 1 is given in Appendix A. It should be noted that conventional methods for determining sample sizes are based on normal approximation, see [6] and the references therein. In contrast, Theorem 1 offers a rigorous method for determining sample sizes. To reduce conservativeness, a numerical approach has been developed by Chen [4] which permits exact computation of the minimum sample size.

To estimate the proportion p, a frequently-used sampling method is the inverse sampling scheme described as follows:

Continuing sampling from the population (without replacement) until r units found to carry the attribute or the number of sample size n reaches the population size N . The estimator of the proportion p is taken as the ratio p = k n , where k is the number of units having the attribute among the n units.

Clearly, the reliability of the estimator p depends on the threshold value r. Hence, we are interested in a crucial question as follows:

For prescribed margin of relative error ε ∈ (0, 1) and confidence parameter δ ∈ (0, 1), how large the threshold r should be to guarantee

For this purpose, we have Theorem 2 For any ε ∈ (0, 1),

which is monotonically decreasing with respect to r. Moreover, for any δ ∈ (0, 1), there exists a unique number r * such that Q(ε, r * ) = δ and

The proof of Theorem 2 is given in Appendix B. As an immediate consequence of Theorem 2, we have

3 Multistage Fixed-width Confidence Intervals

So far we have only considered point estimation for the proportion p. Interval estimation is also an important method for estimating p. Motivated by the fact that a confidence interval must be sufficiently narrow to be useful, we shall develop a multistage sampling scheme for constructing a fixed-width confidence interval for the proportion, p, of the finite population discussed in previous sections.

Note that the procedure of sampling without replacement can be precisely described as follows:

Each time a single unit is drawn without replacement from the remaining population so that every unit of the remaining population has equal chance of being selected.

Such a sampling process can be exactly characterized by random variables X 1 , • • • , X N defined in a probability space (Ω, F , Pr) such that X i denotes the characteristics of the i-th sample in the sense that X i = 1 if the i-th sample has the attribute and X i = 0 otherwise. By the nature of the sampling procedure, it can be shown that

we can define a multistage sampling scheme of the following basic structure. The sampling process is divided into s stages with sample sizes

The continuation or termination of sampling is determined by decision variables. For each stage with index ℓ, a decision variable

The decision variable D ℓ assumes only two possible values 0, 1 with the notion that the sampling is continued until D ℓ = 1 for some ℓ ∈ {1, • • • , s}. Since the sampling must be terminated at or before the s-th stage, it is required that D s = 1. For simplicity of notations, we also define D ℓ = 0 for ℓ = 0.

Our goal is to construct a fixed-width confidence interval (L, U ) such that U -L ≤ 2ε and that Pr{L < p < U | p} > 1 -δ for any p ∈ { i N : 0 ≤ i ≤ N } with prescribed ε ∈ (0, 1 2 ) and δ ∈ (0, 1). Toward this goal, we need to define some multivariate functions as follows.

For α ∈ (0, 1) and integers 0

where n is the sample size when the sampling is terminated. Then, a sufficient condition to guarantee

for all M ∈ {0, 1, • • • , N }, where (4) is satisfied if ζ > 0 is sufficiently small.

It should be noted that Theorem 3 has employed the double-decision-variable method recently proposed by Chen in [1]. To further reduce computational complexity, the techniques of bisection confidence tuning an

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut