Random Sampling in an Age of Automation: Minimizing Expenditures through Balanced Collection and Annotation
Methods for automated collection and annotation are changing the cost-structures of sampling surveys for a wide range of applications. Digital samples in the form of images or audio recordings can be collected rapidly, and annotated by computer progr…
Authors: Oscar Beijbom
Random Sampling in an Age of A utomation: Minimizing Expenditur es thr ough Balanced Collection and Annotation Oscar Beijbom UC Berkeley obeijbom@berkeley.edu Abstract Methods for automated collection and annotation are changing the cost-structures of sampling surveys for a wide range of applications. Digital samples in the form of images or audio recordings can be collected rapidly , and annotated by computer programs or crowd work ers. W e consider the problem of estimating a population mean under these new cost-structures, and propose a Hybrid-Offset sampling de- sign. This design utilizes two annotators: a primary , which is accurate but costly (e.g. a human expert) and an auxiliary which is noisy but cheap (e.g. a computer program), in order to minimize total sampling expenditures. Our analysis gi ves necessary conditions for the Hybrid-Offset design and specifies optimal sample sizes for both annotators. Simulations on data from a coral reef surv ey program indicate that the Hybrid-Offset design outperforms several alternativ e sampling designs. In particular , sampling expenditures are reduced 50% compared to the Con ventional design currently deployed by the coral ecologists. 1 Introduction Using random sampling to estimate the mean of a population is a fundamentally important method to the sciences and society at large, and has been studied extensiv ely [24, 4]. Deployment of any random sampling design requires collection of some number of observations sampled randomly from nature. In the ecological sciences, this was traditionally done in situ by an expert. Recently , advances in robotics, sensor technology , digital storage, and information technology hav e enabled rapid collection of samples in digital format, such as images [19, 10] or audio [11]. The popularity of digital sample collection can be attributed to three key factors: it creates a permanent record; it can be done cheaply using automated sampling v ehicles or non-expert personnel; and it is generally fast. Howe ver , such samples (e.g. a photoquadrat of the forest floor) typically require annotation by an expert in order to re veal the desired quantity of interest (e.g. a count of insects). Such annotation work can be slo w , tedious, expensiv e, and prone to error [13, 14]. Concurrent with the de velopment of automated collection methods, advances in computer-vision and computer-audition hav e enabled automation of said annotation work. Such methods often rely on machine learning where e xpert annotated archiv ed data sets are utilized to train automated anno- tators. Automation is a compelling low-cost alternati ve to expert annotations, but it’ s generally less reliable and may be biased [2, 8, 25]. This is particularly problematic if the probability density of the archiv ed data differs from the density of the data to be sampled [17]. Crowdsourcing of fers another low-cost alternativ e to expert annotation for e.g. document or image annotation [27, 15]. Cro wdsourced annotations can be noisy , and much work has been dev oted to improving the quality of such annotations. This is generally done either by carefully designing the tasks given to the crowd workers [15], or by collecting multiple cro wd annotations for the same sample and then modeling, and compensating for , the annotation errors [27]. 1 W e consider the problem of estimating a population mean under these new cost-structures of data collection and annotation. This is formalized as follows: Given a procedure for collecting random samples, x i ∈ X (e.g. images or audio recordings), each with an associated quantity of interest (value), y i ∈ Y ⊆ [0 , 1] ; and two annotators: a primary , f a ( x i ) ∈ Y which is accurate but costly (e.g. a human expert), and an auxiliary , f b ( x i ) ∈ Y which is cheap but noisy (e.g. a computer program, lay-person, or crowd worker). Our goal is to deriv e a sampling design that achiev es un- biased estimates of the population mean ( E [ y i ] ) at a target error and confidence, while minimizing total cost of collection and annotation. In particular , we in vestigate the optimal balance between the number of samples annotated by the primary and auxiliary annotators. This work is, to the best of our knowledge, the first to consider this problem. A key challenge is to define a procedure that can correct for the potential bias of the auxiliary annotator . This is difficult as we cannot assume any prior knowledge of the underlying probability density from which the samples are drawn. Indeed, if this density was known, the population mean could be estimated directly , making the sampling work unnecessary . If the auxiliary annotator is based on machine learning and trained on archiv ed data with a different probability density , the problem of transfer learning arises for which the generalization bounds of statistical learning theory generally do not apply [17]. Methods for bias-correction hav e been proposed independently by [23, 9], that do not require kno wledge of the full underlying probability density (of the sampled data) but only of the conditional probabilities of a label given a sample. Ho wev er , as we shall see, this information may not always be at hand. The ke y contribution of this work is an analysis of a Hybrid-Of fset design that directly models the offset (bias) of the auxiliary annotator . It is “Hybrid” because it requires a subset of the samples to be annotated by both annotators, and is unbiased if the samples are independent and identically distributed, which they are by construction under random sampling. As demonstrated by simula- tions on coral reef surve y data, the Hybrid-Offset design is cost-effecti ve and robust. In particular , it outperforms a Hybrid-Ratio design which utilizes the ratio-estimator commonly used in the sam- pling literature [24, 20]. It also outperforms sev eral designs that only rely on one of the annotators, including the design currently used by the coral ecologists. W e belie ve that the Hybrid-Offset can be widely utilized, in particular for ecological surv eys relying on digital samples [29, 16, 19, 1, 11]. Other contributions of this work are: (1) an analysis of the bias-correction method of [23, 9] in the context of random sampling; and (2) an improv ed machine-learning method, based on con volutional neural networks, for annotation of coral reef surve y images. 1.1 Related work Our work is most closely related to the literature on survey sampling with auxiliary data [24, 20]. In that context, “auxiliary data” is typically not a direct estimate of the v ariable of interest but some other , related quantity . For example, if the variable of interest, y i is the number of animals per plot, the auxiliary data can be the plot area, ve getation type, or plot elev ation. In this literature, auxiliary data is typically incorporated using a ratio estimator which can reduce the estimation errors if there is an approximately linear relationship between the auxiliary data and the variable of interest [24]. Howe ver , the ratio estimator is design-biased, and analysis of the estimator variance typically as- sumes that auxiliary data is a v ailable for the whole population [24, 20]. In this work, in contrast, we do not assume that the auxiliary data is av ailable for all samples, but that this data can be acquired at a cost by collecting additional samples. This additional cost is then taken into account when de- riving optimal sampling sizes. Another key dif ference is that we do not use the ratio estimator but an offset estimator which directly estimates the bias of the auxiliary annotator . The offset estimator is design-unbiased and allows for a more straight-forw ard analysis. Another line of related work utilizes stratified random sampling [3], importance sampling [21] or generativ e models of the classifier score distribution [28], to achieve cost-effecti ve estimates of classifier performance on new data. The work of Garnett et al. [5] is particularly relev ant, and in vestigates methods for acti ve selection of samples in order to estimate class proportions. Howe ver , these methods all operate on a fixed set of samples. In contrast, we include the sample collection in our model which enables a joint minimization of annotation and collection costs. Our work is also related to acti ve learning and transfer learning. It is related, in particular , to recent work on active transfer learning where labels are queried to optimize classifier performance in a 2 target domain [26]. A key difference between that work and ours is that activ e learning methods optimize the labeling effort to create the best classifier (which then can, presumably , be used to label more data and in order to estimate the desired data-products). In contrast, we directly optimize the labeling effort to deri ve the desired data-pr oduct (i.e. the population mean). 2 Preliminaries 2.1 Problem Setup W e denote by µ p ≡ E [ y i ] and σ 2 p ≡ v ar[ y i ] the first and second moments of the (unkno wn) probabil- ity density function of the values. The values are sampled randomly from nature, and are therefore i.i.d. W e further denote by f a : X → Y the primary , and by f b : X → Y the auxiliary annota- tor , a,i ≡ f a ( x i ) − y i the error of f a on sample i , µ a ≡ E [ a,i ] and σ 2 a ≡ v ar[ a,i ] . Similarly , µ b ≡ E [ b,i ] and σ 2 b ≡ v ar[ b,i ] . Note that we do not make any assumptions on the underlying probability densities of the sample values or annotator errors. W e denote by n a and n b the number of samples annotated by f a and f b , respectiv ely . The number of collected samples is given by max( n a , n b ) since samples needs to be annotated to provide any information, and conv ersely , needs to be collected in order to be annotated. W e denote by c c , c a and c b the cost per sample for collection, annotation by f a and annotation by f b , respectiv ely . The ‘accurate and expensi ve’ characteristics of f a are operationalized by letting σ 2 a < σ 2 b and c a > c b . W e can now precisely state our goal: Giv en costs c c , c a and c b , and two annotators, f a and f b , deri ve a sampling design that estimates the population mean, µ p by defining the number of annotated samples ( n a and n b ), so that E [ ˆ µ p ] = µ p and Pr ( | ˆ µ p − µ p | > d ) < δ , for a target error, d and confidence, δ . The utility of the sampling design is ev aluated by the T otal Sampling Cost (TSC), b : b ( n a , n b ) = c a n a + c b n b + max( n a , n b ) c c . (1) W e make three assumptions. First, we assume that the number of collected samples is small in com- parison with the total size of the population which allo ws us to omit the finite-population correction factor [24]. Second, we assume that the primary annotator , f a is unbiased, i.e. µ a = 0 , and that the correlation between a,i and y i is negligible. Third, because the two annotators are independent entities, we assume zero correlation between the annotator errors a,i and b,i . Howe ver , we do not make any assumptions on the correlation between the auxiliary annotation errors b,i and the sample value y i , which may be large. All proofs are in the Appendix. 2.2 Con ventional design W e denote by ‘conv entional’, a sampling design where all collected samples, x i are annotated by the primary annotator f a , i.e. n b = 0 . In such design, an unbiased estimator of µ p is giv en by ˆ µ p = 1 n a n a X i =1 f a ( x i ) , (2) with variance v ar[ ˆ µ p ] = 1 n a ( σ 2 p + σ 2 a ) . (3) The v ariance ( σ 2 p + σ 2 a ) is often unkno wn, and must be estimated by the sample variance of f a ( x 1 ) , . . . , f a ( x n a ) . The sample size, n a needs to be large enough to ensure that Pr ( | ˆ µ p − µ p | > d ) < δ , for a tar get error d and confidence δ . From the Central Limit Theorem, this is satisfied when ζ δ q v ar[ ˆ µ p ] ≤ d, (4) where ζ δ is the upper 1 − δ / 2 point on on the standard normal distribution curve [24]. The target sample size is giv en by inserting (3) into (4) yielding n ∗ a = ζ 2 δ d 2 ( σ 2 p + σ 2 a ) , (5) for a TSC: ( c c + c a ) n ∗ a . 3 3 Hybrid-Offset design Now consider a hybrid design where n b ≥ n ∗ a samples are collected and annotated by the auxiliary annotator f b , and where a subset n a ≤ n b is also annotated by the primary annotator f a . An of fset estimator of µ p under this design is giv en by ˆ µ p = 1 n b n b X i =1 f b ( x i ) − ˆ µ b . (6) The offset estimator is unbiased and an unbiased estimate of µ b is giv en by ˆ µ b = 1 n a n a X i =1 f b ( x i ) − f a ( x i ) . (7) The variance of ˆ µ p is giv en by v ar[ ˆ µ p ] = 1 n b ( σ 2 p − σ 2 b ) + 1 n a ( σ 2 a + σ 2 b ) , (8) and notably does not depend on the cov ariance between y i and b,i . This follo ws directly from the deriv ations of (8), which are provided in Appendix A. W e denote by “Hybrid-Offset” a design that balances n a and n b to minimize the TSC. If the costs are such that a large number of samples, n b can be collected and annotated by the auxiliary annotator, the magnitude of the first term in (8) becomes small and the sampling error depends mainly on the auxiliary annotation error, σ 2 b . In contrast, the conv entional sampling design depends mainly on the data variance, σ 2 p (3). This is compelling because while the data variance is a fixed constant of nature, the auxiliary annotation errors depend on the choice and quality of the auxiliary annotator , which we control. It also leads to our first result: Theorem 1. F or any fixed n b > n a , the variance of the Hybrid-Offset estimator (8) is smaller than the variance of the Con ventional design (3) if and only if σ 2 b < σ 2 p . Theorem 1 implies that the uncertainty introduced by f b can be compensated for by using more samples if and only if σ 2 b < σ 2 p . Howe ver , the additional collection of samples is only economical for certain cost functions, and should in the general case be determined by comparing the TSC of the two designs. T o determine the TSC of the Hybrid-Offset design we begin by deriving optimal sample sizes n b and n a . By combining (8) and (4), and solving for equality , the following trade-off between n b and n a is deriv ed n a = σ 2 b + σ 2 a d 2 ζ 2 δ − 1 n b ( σ 2 p − σ 2 b ) . (9) Example trade-off curves demonstrate how the amount of primary annotations can be reduced by increasing the amount of auxiliary annotations (Fig. 1A). Note that if n b = n ∗ a , it follows from (9) that n a = n ∗ a , and the Hybrid-Offset design reduces to the Conv entional design. The optimal operating point along the n b , n a trade-off curve can be deri ved by minimizing the TSC. Using (9) to eliminate n a , the TSC becomes b ( n a , n b ) = ( c c + c b ) n b + k ( σ 2 b + σ 2 a ) d 2 ζ 2 δ − 1 n b ( σ 2 p − σ 2 b ) , (10) where k = c a c c + c b is the relati ve cost of f a . The optimal sample size, n b is gi ven by minimizing (10) under the constraint that n b ≥ n ∗ a . This yields the following theorem: Theorem 2. Under a Hybrid-Offset design (6) , the optimal auxiliary annotation size is n b = max ζ 2 δ d 2 σ 2 p − σ 2 b + q k ( σ 2 b + σ 2 a )( σ 2 p − σ 2 b ) , n ∗ a (11) The corr esponding primary annotation size is given by (9) . 4 0 500 1000 1500 2000 2500 0 100 200 300 400 500 nbr. auxiliary samples nbr. primary samples k = 100 k = 10 (A) σ b = 0.25 σ p σ b = 0.5 σ p σ b = σ p 0 1 2 3 4 0 20 40 60 80 100 Relative cost of primary annotation (k) Difference in TSC (B) σ b = 0.25 σ p σ b = 0.5 σ p σ b = 0.75 σ p Figure 1: (A) Amount of auxiliary ( n b ), and primary ( n a ) annotation required to achiev e error d ≤ 0 . 02 at 95% confidence for σ p = 0 . 2 , σ a = 0 . 02 , and σ b = { 0 . 25 σ p , 0 . 5 σ p , σ p } under the Hybrid-Offset sampling design. Solid gray line indicates n b = n a . Optimal operating points (11), for relative cost of primary annotation, k = 10 and k = 100 are marked with X on the σ b = 0 . 5 σ p curve. (B) Dif ference in T otal Sampling Cost (TSC Con ventional − TSC offset ) for σ p = 0 . 2 , σ a = 0 . 02 , σ b = { 0 . 25 σ p , 0 . 5 σ p , 0 . 75 σ p } , c b = 0 and c c = 1 , as a function of the relativ e cost of primary annotation, k = c a / ( c c + c b ) . The threshold costs, σ ∆ are marked with black stumps. If k < σ ∆ the sampling designs and costs are identical. Using the optimal sample sizes, the TSC of Hybrid-Of fset sampling can be calculated from (10) and compared to the TSC of Conv entional sampling in order to determine the most cost-ef fecti ve design. For the important special case where c b = 0 (which can occur e.g. if f b is a computer algorithm) the following theorem applies: Theorem 3. If c b = 0 , the TSC of Hybrid-Offset sampling is smaller than the TSC of Con ventional sampling if and only if k > σ ∆ , wher e k = c a c c + c b and σ ∆ = σ 2 a + σ 2 b σ 2 p − σ 2 b . For example, if the primary annotation errors are zero ( σ 2 a = 0 ), and the auxiliary errors are half as large as the data variance ( σ 2 b = σ 2 p / 2 ), then σ ∆ = 1 . This means that Hybrid-Offset sampling is cheaper than Con ventional sampling if k > 1 , which occurs if the cost of collection is smaller than the cost of primary annotation. The dif ference in TSC between the two sampling designs is shown in Fig. 1B for various v alues of k , σ p and σ b . Hybrid-Offset sample sizes can also be derived directly from a target TSC, b , and costs, c a , c b and c c by minimizing (8) under the TSC constraint: b = n a c a + n b ( c b + c c ) . This yields: n b = b p c a ( c b + c c ) σ ∆ c a ( c b + c c ) σ ∆ − ( c b + c c ) 2 − b c a σ ∆ − ( c b + c c ) , (12) where as previous, σ ∆ = σ 2 a + σ 2 b σ 2 p − σ 2 b . The corresponding n a is giv en by the TSC constraint. 4 Experiments The proposed method is discussed in the context of an annual coral reef surve y performed by the Moorea Coral Reef Long T erm Ecological Research (MCR-L TER) program (http://mcr .lternet.edu). The program surveys six sites across the island of Moorea in French Polynesia. At each site, three habitats are surveyed: the fringing-reef and two habitats on the outer reef at 10 and 17 meter depth respectiv ely , for a total of 18 sampling “units”. In each unit the goal, as dictated by the ecologists, is to estimate the percent cov er of ke y benthic substrates, such as coral and algae. These data provide important information about the ecology when compared across sites, habitats and years. T o estimate the percent cover for each unit, ecologists capture photographs ( in-situ by a research- div er) at random locations along fi ve line-transects at each site. The photographs are then annotated in order to estimate the percent cov er for each photograph. This is done through random point sampling in which the substrate is identified at 200 random point locations in each photograph [18]. This procedure of using random sampling to annotate each collected random sample is commonly referred to as two-stage sampling [24, 4]. For the purpose of this discussion we will focus on the estimation quality of percent coral cover for each unit, and in vestigate the effect of using an Hybrid-Offset design in place of the Con ventional 5 designs currently in use. For the simulations we use the Moorea Labeled Corals dataset, which is publicly av ailable 1 and contains the full-resolution images and annotations from the L TER-MCR surve ys conducted 2008 and 2009. W e use the data from 2008 to train the auxiliary annotators, and estimate the sampling parameters, and the data from 2009 to run the sampling simulations. As described above, each photograph, x i , is a random sample with a corresponding coral cov er y i ∈ [0 , 1] , and n a = 40 in each unit. The coral cover estimated by the expert annotator , f a ( x i ) is highly accurate [14], and we therefore use σ 2 a = 0 in the simulations below . W e do not account for approximation errors introduced by the second-stage (point-annotation) method, as this have been shown to hav e very limited effect on the final mean estimator [4, 18]. Manual annotation requires ∼ 10 minutes per image to complete, while collection is quicker , with the 40 samples in a sampling-unit captured in a single 40 minute dive. W ith these parameters, the TSC for each unit is approximately 40(10 + 1) minutes, or 7.3 person-hours. A uxiliary annotators: W e use two auxiliary annotators, the “texton”-based classifier proposed by [2] which is publicly av ailable, and a nov el annotator based on conv olutional neural networks (CNN) [7]. Both of these methods operate on p × p pixel images patches, and are denoted f (Ψ) b ∗ ( x i,j ) : R p × p × 3 → { 0 , 1 } , for Ψ ∈ { texton , CNN } , where x i,j is an patch from image i around random point-location j , and where output 1 indicates ‘coral’, and 0 ‘other’. The output of the auxiliary annotator , f (Ψ) b ( x i ) is giv en by av eraging the point-classifications for the 200 random points in each image: f (Ψ) b ( x i ) = 1 200 200 X j =1 f (Ψ) b ∗ ( x i,j ) . (13) T o de velop the CNN based point-classifier , f (CNN) b , we adopted a 16 layer CNN model dev eloped for image classification [22]. The VGG16 model is publicly av ailable 2 and operates on 226 × 226 RGB images. T o fine-tune this model for coral classification, we cropped 226 × 226 patches from the 2008 images at each of the 200 annotated point locations. These cropped patches were used together with mirrored and rotated (by 90 , 180 and 270 degrees) v ersions as training-data to fine-tune the weights of the VGG16 model lar gely following the procedure proposed by [7]. Classification w as performed by cropping patches from the test images, and propagating them through the network. Cost analysis: Using the data from 2008, the sample variance of the percent coral cover ˆ σ p was 0 . 16 ± 0 . 1 (mean ± SD, n = 18 ) across the units, meaning that for an av erage unit, a 95% confidence interval of mean coral cover , from the n a = 40 samples, is 5.8%. By cross-v alidation on the data from 2008, the auxiliary annotator errors of f (texton) b , ˆ σ b , was estimated as 0 . 047 ± 0 . 024 (mean ± SD, n = 18 ). Since ˆ σ p < ˆ σ b the Hybrid-Offset design is likely more cost-effecti ve than a Con ventional design (Theorem 1). Indeed, with d = 5 . 8% , δ = 0 . 05 , the optimal Hybrid-Offset design is to collect 53 samples and manually annotate 5 (Theorem 2), for a TSC of 2.3 person-hours per unit; a 68.7% reduction compared to the Con ventional design. Simulation details: In order to validate the expected cost-sa vings, simulations were carried out on the coral surv ey images collected and annotated during 2009. Using the v ariance estimates from the 2008 data, sample sizes were determined for TSC, b = 1 , 2 , . . . 15 person-hours using Eq. (12). For each of the 18 units, for each budget, and for 500 iterations, the required number of images were drawn randomly with replacement from the images pertaining to that unit 3 , and mean estimates ˆ µ p were calculated. From these estimates the mean error (bias): 1 500 P 500 k =1 ˆ µ p − µ 0 p and mean absolute error (MAE): 1 500 P 500 k =1 | ˆ µ p − µ 0 p | of each method was calculated by comparing to the “ground- truth” cover µ 0 p , which was estimated from the expert annotations of the 40 images in the unit. In addition to the Con ventional and Hybrid-Offset designs, we include three other designs, which are defined below . All sampling designs are ev aluated using both f (texton) b and f (CNN) b . 1 http://vision.ucsd.edu/content/moorea-labeled-corals 2 https://github .com/BVLC/caffe/wiki/Model-Zoo 3 Sampling with replacement was used to av oid finite-population artifacts due to the limited pool of 40 images per unit. In an actual application, there would be a large number of locations from where to capture the images and finite-population correction would not be needed. 6 2 4 6 8 10 12 14 −1 0 1 (A) Bias (%) 2 4 6 8 10 12 14 −1 0 1 2 (B) Bias (%) Budget (h) Hybrid−Offset Hybrid−Ratio Conventional Auxiliary Auxiliary Bias−Corrected 2 4 6 8 10 12 14 1 1.5 2 2.5 Mean Absolute Error (%) Budget (h) (C) Figure 2: Simulations results on the coral reef survey data. Results displayed as mean ± SE for n = 18 sampling units and for TSC (budgets) between 1 and 15 person-hours per unit. Estimator bias is shown for the f (texton) b auxiliary annotator in panel (A) , and for f (CNN) b in panel (B) . Mean Absolute Error (MAE) is shown in panel (C) , where estimates using f (texton) b are indicated with solid lines, and f (CNN) b with dotted. Gray dash-dotted lines indicate the current operating point of the Con ventional sampling design currently utilized by the ecological monitoring program. The MAEs of the Bias-Corrected estimates were > 3% (shown in Fig. S1). Hybrid-Ratio Design: Auxiliary information is commonly incorporated using a ratio-estimator which assumes a linear relation between auxiliary values and the v alues of interest [24, 20]. The mean estimate of a ratio-estimator is ˆ µ p = 1 n b P n b i =1 ˆ rf b ( x i ) , where the ratio, r is estimated as ˆ r = P n a i =1 f a ( x i ) P n a i =1 f b ( x i ) (14) The ratio-estimator is design-biased, and there are se veral approximations of the ratio-estimator variance [20]. W e do not analysis the ratio-estimator, but include it in the simulations using a Hybrid-Ratio design with the same sample sizes as the Hybrid-Offset design (12). W e set ˆ r = 1 if the nominator or denominator of (14) is zero in order to av oid ill-defined estimates of r . A uxiliary Design: Using only the auxiliary annotator, µ p can be estimated as ˆ µ p = 1 n b P n b i =1 f b ( x i ) , with sample sizes n a = 0 , n b = b/c c . This estimator is not design-unbiased, but will hav e small variance since a lar ge number of samples can be collected and annotated cheaply . A uxiliary Bias-Corrected Design : The expected value of f b ∗ ( x ) for a sample x with value y ∈ { 0 , 1 } is E f b ∗ ( x ) 1 − f b ∗ ( x ) = α 1 − β 1 − α β y 1 − y , (15) where α and β are the classifier sensitivity and specificity respecti vely . As noted by [23, 9], an unbiased estimate of y is gi ven by in verting the confusion-matrix in the center of (15) yielding: ˜ y = f b ∗ ( x )+ β − 1 α + β − 1 . W e denote this operation as “bias-correction”, since the corrected values are unbiased in expectation. Since f b is a linear combination of f b ∗ (13), µ p can be estimated as ˆ µ p = 1 n b P n b i =1 f b ( x i )+ β − 1 α + β − 1 . Using cross-validation on the data from 2008, specificity and sensiti vity were estimated as ˆ α = 0 . 738 , ˆ β = 0 . 963 for f ( texton ) b ∗ , and ˆ α = 0 . 740 , ˆ β = 0 . 968 for f ( CNN ) b ∗ , and used for the bias-correction. An analysis of the variance of a mean estimator from abundance-corrected values is pro vided in Appendices E & F. 5 Results & Discussion As expected, the Con ventional and Hybrid-Offset designs were unbiased (Fig. 2A). The Hybrid- Ratio design also had low bias, except for smaller budgets, where, as mentioned above, the ratio- estimator (14) may be ill-posed. The Auxiliary Bias-Corrected design was, in-fact, biased (Fig. 2A, 7 B). This may seem surprising, but the corrected estimates are only unbiased if the sensitivity , α and specitivity , β are known [23, 9]. These results thus indicate that ˆ α and ˆ β estimated from the 2008 data were not v alid for the sampling units from 2009. This may be due to domain-shifts, which can sev erely affect the performance of machine-learning based classifieres [25, 17]. One way to circumvent this problem is to use Bias-Correction in a Hybrid design, and estimate ˆ α and ˆ β from the n a samples annotated by both annotators. Howe ver , as shown in Appendix H, such Hybrid- Bias-Correction design is inferior to the Hybrid-Offset design. Finally , and less surprisingly , the un-corrected Auxiliary design was biased, although to a lesser extent for f (CNN) b (Fig. 2A, B). The MAE of the Conv entional design at the 7 . 3 person-hour budget currently utilized by the MCR- L TER program was 1 . 63 ± . 32% (mean ± SE, n = 18 ; Fig. 2B). This was outperformed by the Hybrid-Offset and Hybrid-Ratio designs, which utilized both annotators. At the 7 . 3 person-hour budget, the Hybrid-Of fset estimator MAE was 1 . 08 ± . 16% when relying on f (texton) b and 1 . 00 ± . 18% when relying on f (CNN) b , which is significantly lower than for the Con ventional design. Conv ersely , the MAE that the Con ventional design achieved at 7 . 3 person-hours, can be achiev ed by the Hybrid- Offset design at around 3 . 5 person-hours; a ∼ 50% cost reduction. The Hybrid-Ratio design, while comfortably better than the Con ventional design, performed worse than the Hybrid-Offset design for all budgets. This may be becuase the sample sizes were optimate for the of fset-estimator and not the ratio-estimator . The Auxilary and Auxiliary Bias-Corrected designs, which both relied only on the auxiliary annotator , performed weaker than the Hybrid and Con ventional designs. The MAE of the Bias-Corrected design was > 3% for all b udgets, indicating that the correction method of [23, 9] was inef fecti ve. The Auxiliary design performed poorly when relying on f (texton) b (MAE > 2% ), but better when relying on f (CNN) b , barely outperforming the Con ventional design at the 7 . 3 person-hour budget (Fig. 2C). It is also clear from the simulations that f (CNN) b outperformed f (texton) b , reducing the errors for the Hy- brid and Auxiliary estimators. This is e xpected as CCN-based methods ha ve recently achie ved state of the art performance on several visual recognition tasks [7]. As new , and stronger classification methods are developed, the requirement of Theorem 1 will be satisfied for an increasing number of applications, suggesting increasing utility of the offset sampling design. W e ha ve used linear cost-functions throughout this work, with a fixed cost per sample. In reality , the cost per sample is likely to decrease when more samples are collected. This is true, in particular for spatial surveys where the sample collector will, on a verage, hav e shorter distance to tra vel between the samples. Since Hybrid sampling designs require larger number of collected samples, the cost- savings estimated by our simulations should be considered a lo wer-bound on the actual cost-sa vings. 6 Conclusion W e have inv estigated the implications of modeling and incorporating the cost and accuracy of two annotators in random sampling designs for population mean estimation, and shown that significant cost-savings are possible using data from a coral reef surv ey . Howe ver , the deri ved formulations are general and applicable in many other situations. These includes other marine surveys of e.g. fish [29] or plankton [16], terrestrial surveys of crops [10], forests [6], rangelands [19] and deserts [1]; and audio-based surve ys of e.g. marine mammal or bird populations [11]. T o the best of our knowledge, this is the first work that models automated annotation as part of the sampling design and we believe that there are se veral interesting directions of future work, notably with respect to stratified and sequential sampling procedures. References [1] S. Archer, C. Scifres, C. Bassham, and R. Maggio. Autogenic succession in a subtropical sav anna: con version of grassland to thorn woodland. Ecol Monograph , 1988. [2] O. Beijbom, P . J. Edmunds, D. I. Kline, B. G. Mitchell, and D. Kriegman. Automated annota- tion of coral reef surve y images. In CVPR , 2012. [3] P . N. Bennett and V . R. Carvalho. Online stratified sampling: e valuating classifiers at web- scale. In CIKM , 2010. 8 [4] W . E. Deming. Some theory of sampling . Courier Dov er Publications, 1966. [5] R. Garnett, Y . Krishnamurthy , X. Xiong, J. Schneider, and R. Mann. Bayesian optimal active search and surve ying. ICML , 2012. [6] S. Getzin, K. W iegand, and I. Sch ¨ oning. Assessing biodiversity in forests using very high- resolution images and unmanned aerial vehicles. Methods Ecol. Evol. , 2012. [7] R. Girshick, J. Donahue, T . Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR , 2014. [8] D. J. Hand. Classifier technology and the illusion of progress. Statistical Science , 2006. [9] D. J. Hopkins and G. King. A method of automated nonparametric content analysis for social science. Am J P olit Sci , 2010. [10] E. R. Hunt, W . D. Hiv ely , S. J. Fujikawa, D. S. Linden, C. S. Daughtry , and G. W . McCarty . Acquisition of nir-green-blue digital photographs from unmanned aircraft for crop monitoring. Remote Sens , 2010. [11] M. P . Johnson and P . L. T yack. A digital acoustic recording tag for measuring the response of wild marine mammals to sound. J. Ocean. Eng. , 2003. [12] K. E. Kohler and S. M. Gill. Coral point count with excel extensions (cpce): A visual ba- sic program for the determination of coral and substrate coverage using random point count methodology . Comput Geosci , 2006. [13] N. MacLeod, M. Benfield, and P . Culv erhouse. Time to automate identification. Nature , 2010. [14] R. Ninio, J. Delean, K. Osborne, and H. Sweatman. Estimating cover of benthic organisms from underwater video images: variability associated with multiple observ ers. MEPS , 2003. [15] J. Noronha, E. Hysen, H. Zhang, and K. Z. Gajos. Platemate: crowdsourcing nutritional anal- ysis from food photographs. In UIST , 2011. [16] R. J. Olson and H. M. Sosik. A submersible imaging-in-flow instrument to analyze nano-and microplankton: Imaging flowcytobot. Limnol. Oceanogr . Methods , 2007. [17] S. J. Pan and Q. Y ang. A survey on transfer learning. KDE , 2010. [18] E. Pante and P . Dustan. Getting to the point: Accuracy of point count in monitoring ecosystem change. J Mar Bio , 2012. [19] A. Rango, A. Laliberte, J. E. Herrick, C. Winters, K. Havstad, C. Steele, and D. Browning. Un- manned aerial vehicle-based remote sensing for rangeland assessment, monitoring, and man- agement. J Appl Remote Sens , 2009. [20] R. M. Royall and W . G. Cumberland. An empirical study of the ratio estimator and estimators of its variance. J Am Stat Assoc , 1981. [21] C. Sawade, N. Landwehr , S. Bickel, and T . Scheffer . Activ e risk estimation. In ICML , 2010. [22] K. Simonyan and A. Zisserman. V ery deep conv olutional networks for large-scale image recog- nition. arXiv , 2014. [23] A. Solow , C. Davis, and Q. Hu. Estimating the taxonomic composition of a sample when individuals are classified with error . Mar Ecol Pr og Ser , 2001. [24] S. K. Thompson. Sampling . John W iley & Sons, Inc., 2012. [25] A. T orralba and A. A. Efros. Unbiased look at dataset bias. In CVPR , 2011. [26] X. W ang, T .-K. Huang, and J. Schneider . Activ e transfer learning under model shift. In ICML , 2014. [27] P . W elinder, S. Branson, P . Perona, and S. J. Belongie. The multidimensional wisdom of crowds. In NIPS , 2010. [28] P . W elinder , M. W elling, and P . Perona. A lazy man’ s approach to benchmarking: Semisuper- vised classifier ev aluation and recalibration. In CVPR , 2013. [29] T . J. Willis and R. C. Babcock. A baited underwater video system for the determination of relativ e density of carnivorous reef fish. Mar F r eshwater Res , 2000. 9 A ppendices: Random Sampling in an Age of A utomation: Minimizing Expenditures through Balanced Collection and Annotation A V ariance of the offset estimator T o deriv e the v ariance of ˆ µ p we begin by e xpanding out and separating all terms, ˆ µ p = 1 n b n b X i =1 f b ( x i ) − ˆ µ b . (S1) = 1 n b n b X i =1 f b ( x i ) − 1 n a n a X i =1 f b ( x i ) − f a ( x i ) (S2) = 1 n b n b X i =1 y i + n a − n b n a n b n a X i =1 b,i + 1 n b n b X i = n a +1 b,i + 1 n a n a X i =1 a,i , (S3) after which the variance of ˆ µ p is giv en by: v ar[ ˆ µ p ] = 1 n b ( σ 2 p − σ 2 b ) + 1 n a ( σ 2 a + σ 2 b ) + 2 n b σ a,p + 2 n a − n b n a n b σ a,b . (S4) The first two terms are the variance of the data and annotator errors. The third term is the covari- ance between the primary annotator errors and the sample values σ a,p . Since, by assumption, the primary annotator is “accurate”, we can expect this term to be small, and can be omitted. The last term accounts for the covariance between the annotator errors σ a,b . Since the two annotators are operating independently , this is assumed to be zero, and it can also be omitted. Interestingly , the third covariance term, σ p,b , between the auxiliary annotator and the actual values, which may be significant, cancels out and does not affect the final e xpression. B Proof of Theorem 1 Pr oof. It follows from (3) and (8) that if σ b = σ p ⇒ v ar[ ˆ µ p ] = v ar[ ˆ µ p ] . Since v ar[ ˆ µ p ] increases monotonically with increasing σ b (since n b > n a ) the theorem follows tri vially . C Proof of Theorem 2 W e start with the following lemma: Lemma 1. The cost function of offset sampling (10) is con vex for σ 2 b < σ 2 p , n b ≥ n ∗ a wher e n ∗ a is given by (5) . Pr oof. W e begin by deri ving the first and second deri vati ves of (10): ∂ t ( n a , n b ) ∂ n b = ( c c + c b ) 1 − k ( σ 2 b + σ 2 a )( σ 2 p − σ 2 b ) ( n b d 2 ζ 2 δ − ( σ 2 p − σ 2 b )) 2 (S5) ∂ 2 t ( n a , n b ) ∂ n b 2 = ( c c + c b ) 2 k ( σ 2 b + σ 2 a )( σ 2 p − σ 2 b )( d 2 ζ 2 δ ) ( n b d 2 ζ 2 δ − ( σ 2 p − σ 2 b )) 3 . (S6) Since by assumption σ 2 b < σ 2 p , the second deriv ativ e is positiv e for n b ≥ n ∗ a > ζ 2 δ d 2 ( σ 2 p − σ 2 b ) , (S7) where the strict inequality requires either σ 2 a or σ 2 b to be non-zero. This concludes the proof. Since, according to Theorem 1, offset sampling should only be considered if σ 2 b < σ 2 p , and since by design, n b ≥ n ∗ a , (10) is conv ex, and Theorem 2 follows by setting the first deriv ative to zero and solving for n b . 1 D Proof of Theorem 3 Pr oof. The TSC of conv entional sampling, under the assumption that c b = 0 , is gi ven by n a ( c c + c a ) and the TSC of offset sampling is given by n b c c + n a c a . As noted previously , if n b = n a = n ∗ a , offset sampling reduces to con ventional sampling and the TSCs are equal. Since n b minimize the offset TSC, which is con ve x for n b ≥ n ∗ a (Appendix C), it follows that the TSC of offset sampling is smaller than the TSC of con ventional sampling if and only if n b > n ∗ a . The threshold, σ ∆ for when this occurs can be calculated by equating the two arguments inside the max operator of (11) and solving for k : ζ 2 δ d 2 ( σ 2 p + σ 2 a ) = ζ 2 δ d 2 σ 2 p − σ 2 b + q σ ∆ ( σ 2 b + σ 2 a )( σ 2 p − σ 2 b ) ⇒ (S8) σ 2 a + σ 2 b = q σ ∆ ( σ 2 b + σ 2 a )( σ 2 p − σ 2 b ) ⇒ (S9) σ ∆ = σ 2 a + σ 2 b σ 2 p − σ 2 b . (S10) E Random Sampling with Abundance Correction A third sampling design can be defined under one critical additional assumption. This assumption, which we will denote the ‘population drift assumption’, is that the performance of f b can be defined in terms of a matrix of confusion, Q (which excludes real-valued output spaces Y ), and that Q is known a priori for the data to be sampled. For these deri vations we will let Y = { 0 , 1 } , meaning that the samples y are drawn from Ber( µ p ) , a Bernoulli distribution with mean µ p . This corresponds to annotating each sample x i as containing or not containing the quantity of interest. In Appendix F we deri ve the statistics of auxiliary sampling in a tw o-stage sampling design [24, 4], where each sample x i is annotated by second stage sampling, from which the corresponding y i is obtained. A matrix of confusion, Q characterizes the misclassification rates of f b . In the binary case, Q is a two by two matrix Q = α 1 − β 1 − α β , where α is the sensiti vity and β the specificity . As noted independently by [23, 9] Q can be used to create an unbiased estimate of y , and we begin by recalling this procedure. The expected v alue of the auxiliary annotation f b of a sample x with value y is E f b ( x ) 1 − f b ( x ) = Q y 1 − y , and an unbiased estimator of y is gi ven by in verting Q ˜ y = f b ( x ) + β − 1 α + β − 1 . (S11) W e refer to this as the ‘abundance corrected’ value, and deri ve a sampling procedure based on this correction. The variance of ˜ y i , giv en the true value y i is v ar[ ˜ y i | y i ] = v ar[ f b ( x i ) | y i ] ( α + β − 1) 2 , (S12) which follows directly from (S11). W e also note that v ar[ f b ( x i ) | y i ] = y i α (1 − α ) + (1 − y i )(1 − β ) β , (S13) since if y i = 1 , f b ( x i ) ∼ Ber( α ) , and if y i = 0 , f b ( x i ) ∼ Ber(1 − β ) . Combining (S12) and (S13), yields v ar[ ˜ y i | y i ] = y i α (1 − α ) + (1 − y i )(1 − β ) β ( α + β − 1) 2 . (S14) 2 Finally , v ar[ ˜ y i ] is giv en by the law of total variance v ar[ ˜ y i ] = E [v ar( ˜ y i | y i )] + v ar [ E ( ˜ y i | y i )] (S15) = σ 2 s + σ 2 p , (S16) where σ 2 p is the data variance and σ 2 s = µ p α (1 − α ) + (1 − µ p )(1 − β ) β ( α + β − 1) 2 , (S17) the variance introduced by the abundance correction. If the classifier is balanced, i.e. α = β , (S17) simplifies to σ 2 s = α (1 − α ) (2 α − 1) 2 . Since ˜ y i is an unbiased estimator of y i , we can achieve an unbiased estimation of µ p as ˆ µ p = 1 n b n b X i =1 ˜ y i , (S18) with variance, assuming that σ 2 s and σ 2 p are uncorrelated v ar[ ˆ µ p ] = 1 n b ( σ 2 s + σ 2 p ) , (S19) and sample size n b = ζ 2 δ d 2 ( σ 2 s + σ 2 p ) . (S20) Finally , the TSC, since n a = 0 , is giv en by t ( n a , n b ) = ( c c + c b ) n b . (S21) The auxiliary sampling design requires annotation of n b samples by the auxiliary annotator , but as it does not require any annotations by the primary annotator, the TSC can be low . The following theorem is giv en directly from (S21) and the cost function for conv entional sampling. Theorem 4. F or binary output spaces, Y = { 0 , 1 } , the TSC of auxiliary sampling is smaller than the TSC of con ventional sampling if and only if k 0 > σ 2 p + σ 2 s σ 2 p + σ 2 a (S22) wher e k 0 = ( c c + c a ) / ( c c + c b ) . If f b is accurate then σ 2 s is small and auxiliary sampling is cheaper then con ventional sampling e ven for low primary annotation costs. For example, if α = β = 0 . 9 ⇒ σ 2 s ≈ 0 . 14 , σ 2 p = 0 . 04 , and σ 2 a = 0 , it suffices that k 0 is larger than 4 . 5 , which is satisfied e.g. if c b = 0 and c a > 3 . 5 c c . If, on the other hand, α = β = 0 . 7 ⇒ σ 2 s ≈ 1 . 3 , k 0 must be larger than 33 . 5 . F T wo-Stage Random Sampling with Abundance Correction In two-stage sampling designs each first stage sample, x i is again sampled randomly using some number , s of second-stage samples [24]. An analysis of the errors using such designs is pro vided by Deming [4]. Second stage sampling is commonly used e.g. in benthic surve ys where each collected photoquadrat is annotated using random point sampling [12]. This protocol requires s points to be ov erlaid on each image at locations selected randomly with replacement. The substrate under each point is then annotated by an expert as pertaining to one of some number of classes. An unbiased estimator of the abundance (benthic cov er) of each class for a certain sample can be derived by counting how man y of the s annotations that were annotated as that class. W e derive the statistics of two-stage sampling under the population drift assumption, namely that each decision is made by some noisy annotator f b , with kno wn matrix of confusion. W e will denote by x i 1 , . . . , x is the s locations to be annotated in each sample x i , and u ij ∈ { 0 , 1 } the true value as- sociated with each location. The value of each first stage sample is approximated by y i = P s j =1 u ij . 3 W e do not make any assumptions on the probability density from which the first stage samples are drawn, but as pre viously let µ p denote the expected value and σ 2 p the variance. Giv en a classifier f b with known matrix of confusion, an unbiased estimator of u ij is giv en as previously by ˜ u ij = f b ( x ij ) + β − 1 α + β − 1 . (S23) From this an unbiased estimator of ˜ y i is giv en by ˜ y i = 1 s s X j =1 ˜ u ij . (S24) W e hav e derived the variance of ˜ y i for the special case where s = 1 in the main paper . Next, we show ho w to deriv e the variance of ˜ y i for a general s by applying the law of total variation twice. W e begin by noting that v ar( ˜ y i ) = E [v ar( ˜ y i | y i )] + v ar[ E ( ˜ y i | y i )] . (S25) The second term is simply v ar[ E ( ˜ y i | y i )] = v ar[ y i ] = σ 2 p , and the first term can be expressed in terms of ˜ u ij v ar( ˜ y i | y i ) = 1 s 2 s X i =1 v ar( ˜ u ij | y i ) , (S26) which can be expressed, by again using the la w of total v ariation, as v ar( ˜ u ij | y i ) = E [v ar( ˜ u ij | u ij , y i )] + v ar[ E ( ˜ u ij | u ij , y i ))] . (S27) The second term of (S27) is simply giv en by v ar[ E ( ˜ u ij | u ij , y i ))] = v ar[ u ij | y i ] = y i (1 − y i ) , but the first term is less obvious. Follo wing Solow et al. [23], we first note that v ar( f b ( x ij ) | u ij , y i ) = u ij α (1 − α ) + (1 − u ij ) β (1 − β ) , (S28) since if u ij = 1 , f b ( x ij ) ∼ Ber( α ) , and if u ij = 0 , f b ( x ij ) ∼ Ber(1 − β ) . W e then note that v ar[ ˜ u ij | u ij , y i ] = v ar[ f b ( x ij ) | u ij , y i ] ( α + β − 1) 2 , (S29) which follo ws directly from (S23), and also that E [ u ij ] = y i . Putting this together yields the following e xpression for the first term of (S27): E [v ar( ˜ u ij | u ij , y i )] = y i α (1 − α ) + (1 − y i ) β (1 − β ) ( α + β − 1) 2 . (S30) Putting this all together yields v ar( ˜ y i ) = E [v ar( ˜ y i | y i )] + v ar[ E ( ˜ y i | y i )] (S31) = E [ 1 s 2 s X i =1 v ar( ˜ u ij | y i )] + σ 2 p (S32) = 1 s 2 s X i =1 E [v ar( ˜ u ij | y i )] + σ 2 p (S33) = 1 s 2 s X i =1 E y i α (1 − α ) + (1 − y i ) β (1 − β ) ( α + β − 1) 2 + y i (1 − y i ) + σ 2 p (S34) = 1 s 2 s X i =1 µ p α (1 − α ) + (1 − µ p ) β (1 − β ) ( α + β − 1) 2 + µ p (1 − µ p ) + σ 2 p (S35) = 1 s [ σ 2 s + µ p (1 − µ p )] + σ 2 p , (S36) where σ 2 s is given by (S17). Interestingly , the variance of ˜ y approach σ p for large number of sec- ondary stage samples, s . This is to be expected under the assumption that f b is perfectly modeled by 4 a kno wn matrix of confusion Q . Since ˜ y i is an a verage across s decisions, the variance introduced by the abundance correction cancels out with lar ge values of s . Finally , the total variance of ˆ µ p is giv en by v ar[ ˆ µ p ] = 1 n b 1 s [ σ 2 s + µ p (1 − µ p )] + σ 2 p , (S37) and the sample size by n b = ζ 2 δ d 2 1 s [ σ 2 s + µ p (1 − µ p )] + σ 2 p . (S38) G Supplementary results Detailed simulation results are shown in Fig. S1. H Bias-Correction Sampling With Unknown Confusion Matrix The Auxiliary Bias-Correction design ev alued in the simulations assumes that the specificity and sensitivity of f b is known a priori for the data to be sampled. This assumption is strong, and may not always hold. In such cases, one could rely on a Hybrid sampling design and use n a samples annotated by both annotators to estimate ˆ α and ˆ β . In such Hybrid-Bias-Correction design, α and β can be estimated as ˆ α = P n a i =1 f b ∗ ( x i ) f a ∗ ( x i ) P n a i =1 f a ∗ ( x i ) (S39) ˆ β = P n a i =1 (1 − f b ∗ ( x i ))(1 − f a ∗ ( x i )) P n a i =1 (1 − f a ∗ ( x i )) , (S40) and an estimator of µ p can be defined as ˆ µ p = 1 n b " n a X i =1 f a ( x i ) + n b X i = n a +1 f b ( x i ) + ˆ β − 1 ˆ α + ˆ β − 1 # . (S41) Howe ver , we argue that such design is inferior to hybrid sampling for sev eral reasons. First, the bias-corrected mean estimate of (S41) is biased if estimates of α and β are used in place of the true values [23]. Second, it is difficult to deriv e an analytical expression for v ar[ ˆ µ p ] that accounts for the variances of ˆ α and ˆ β . W ithout this expression, one cannot derive optimal sample sizes. Third, simulations detailed belo w indicate that the Hybrid-Offset design achie ved lo wer errors for the same TSC for a wide array of parameters ( α, β , µ p , n a , n b ). Simulations: For all combinations of α = [0 . 6 , 0 . 8 , 0 . 95] , β = [0 . 6 , 0 . 8 , 0 . 95] , µ p = [0 . 5 , 0 . 75 , 0 . 9] , n a = [100 , 150 , . . . , 500] , and n b = 1000 , the following simulation was performed. First, n a samples f a ( x 1 ) , . . . , f a ( x n a ) were drawn from a Bernoulli (Ber) distribution with mean µ p . For each f a ( x i ) = 1 , f b ( x i ) was drawn from Ber ( α ) , and for each f a ( x i ) = 0 , f b ( x i ) was drawn from Ber (1 − β ) . The parameters α, β and µ b were then estimated according to (S39), (S40), and (7). Finally n b − n a new samples f b ( x i ) were drawn using the same procedure and used to estimes µ p from (S41) and from (6). This procedure was repeated 2000 times and sample standard deviations were calculated. The signed difference between the standard deviations were calculated for each value of α , β , µ p , and n a , is sho wn in Fig. S2. These results indicate that the Hybrid-Of fset design is more accurate that the Hybrid-Bias-Corrected design for all parameters. 5 2 4 6 8 10 12 14 0 1 2 3 4 5 6 Budget (h) Mean Absolute Error (%) Hybrid−Offset Hybrid−Ratio Conventional Auxiliary Auxiliary Bias−Corrected 2 4 6 8 10 12 14 0 10 20 30 40 50 60 70 80 90 Budget (h) Mean Square Error (%) Hybrid−Offset Hybrid−Ratio Conventional Auxiliary Auxiliary Bias−Corrected Figure S1: Supplementary Simulations Restuls Results displayed as mean ± SE for n = 18 sampling units, for TSC (budgets) between 1 and 15 person-hours per unit and for (T op) Mean Absolute Error and (Bottom) Mean Square Error 6 −8 −6 −4 −2 0 α = β =0.60 Error difference −0.4 −0.3 −0.2 −0.1 0 α =0.60, β =0.80 Error difference 100 200 300 400 500 −0.04 −0.03 −0.02 −0.01 0 α =0.60, β =0.95 Error difference n a −0.8 −0.6 −0.4 −0.2 0 α =0.80, β =0.60 µ p =0.5 µ p =0.75 µ p =0.9 −0.04 −0.03 −0.02 −0.01 0 α = β =0.80 100 200 300 400 500 −0.015 −0.01 −0.005 0 α =0.80, β =0.95 n a −0.06 −0.04 −0.02 0 α =0.95, β =0.60 −0.015 −0.01 −0.005 0 α =0.95, β =0.80 100 200 300 400 500 −3 −2 −1 0 x 10 −3 α = β =0.95 n a Figure S2: Dif ference in estimated standard errors between the Hybrid-Of fset design and the Hybrid- Bias-Corrected design, for n b = 1000 , and different values of n a , α , β , and µ p . Note that all differences are ne gative, indicating that the sampling errors of the Hybrid-Of fset design are smaller 7
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment