More with Less - Bethel Allocation and Precision-Preserving Sample Size Reduction via Hierarchical Bayes Modelling

More with Less: Bethel Allo cation and Precision-Preserving Sample Size Reduction via Hierarc hical Ba y es Mo delling Siu-Ming T am T am Data Advisory , Australia stattam@gmail.com Abstract Statistical oﬃces face a familiar and in tensifying dilemma: rising demand for detailed regional and domain-lev el estimates under budgets that are ﬁxed or shrinking. National statistical oﬃces (NSOs) either ignore the problem of optimal sample allo cation for m ultiple target v ariables when designing a multi-purpose survey , or address it incorrectly — relying on ad ho c approaches such as computing Neyman allo cations separately p er v ariable and taking the elemen t-wise maxim um, a practice that sim ultaneously w astes budget and fails to guaran tee precision across all domains. This pap er presen ts a practical t w o-stage strategy that reframes the question: not how to al lo c ate a giv en sample, but how smal l the sample can b e made while still meeting pre-deﬁned precision targets for all target v ariables across all geographic domains at once. The innov ation lies not in in v en ting new metho ds, but in the nov el com bination of tw o w ell-established techniques applied to this cost-reduction problem: (i) multiv ariate constrained optimisation via Bethel allocation, which ﬁnds the globally minim um sample satisfying all precision constrain ts simultaneously; and (ii) Hierarc hical Bay es (HB) small area mo delling, which b orro ws strength across strata and p ermits a further reduction of the Bethel sample. The approach is v alidated using a Monte Carlo study ( B = 1 , 000 replications) based on a syn thetic lab our-force p opulation of one million individuals, where kno wn p opulation truth allo ws rigorous ev aluation of precision, accuracy , and credible-interv al cov erage. Keyw ords: Bethel allo cation; Hierarchical Bay es; small area estimation; sample size reduction; m ultiv ariate optimisation; lab our force surv ey; co eﬃcien t of v ariation. 1 In tro duction Statistical agencies w orldwide face a familiar and in tensifying dilemma: rising demand for gran ular, timely , and domain-level statistics under budgets that are ﬁxed or shrinking. The curren t threats to the traditional survey paradigm include declining resp onse rates and rising op erational costs, alongside gro wing exp ectations for more timely and disaggregated information ( National Academies , 2017 ). NSOs no w op erate in a rapidly shifting landscap e mark ed b y an explosion of new data sources, growing priv acy concerns, declining survey resp onse rates, and the rise of artiﬁcial intelligence, all of whic h demand a fundamen tal rethinking of ho w statistics are pro duced ( UNECE , 2025 ). In resp onse, NSOs hav e explored a range of cost-reducing innov ations. One prominent strand in tegrates big data and administrative data with traditional surv ey data to supplement or partially replace exp ensive data collection op erations. Kim and T am ( 2021 ) formalised data in tegration as a general to ol for combining big data with probability sample data for ﬁnite p opulation inference, demonstrating that hybrid estimators can b e substantially more eﬃcien t than surv ey data alone. T am ( 2024 ) extended this approach to small area estimation of binary v ariables, showing that a calibrated nearest-neighbour algorithm combining big data with survey data yields small area estimates that outp erform standard F ay–Herriot and logit-normal HB metho ds when used without big data. T am et al. ( 2023 ) show ed that non-resp onse follow-up resources — a ma jor cost driver in probability surveys — can b e allo cated optimally to minimise mean squared error rather than simply to maximise resp onse 2 rates, yielding cost savings without sacriﬁcing data quality . More broadly , UNECE ( 2022 ) do cumen ted that machine learning holds great p oten tial for making statistical pro duction more eﬃcien t by automating pro cesses and enabling the use of new data sources. These strategies, ho w ev er imp ortant, largely op erate on the data after the sample has b een designed. A complementary and under-explored approach is to ask, at the design stage itself: how smal l c an the sample b e made while stil l me eting al l pr e-sp e ciﬁe d pr e cision r e quir ements? This pap er addresses precisely that question for the practically imp ortant case of a m ulti-purp ose survey with multiple target v ariables and multiple geographic domains. The traditional approac h to sample size determination in this setting relies on the Neyman allo cation form ula applied separately to each target v ariable, follow ed b y taking the element- wise maxim um of the resulting stratum sample sizes across v ariables. W e sho w that this approac h is doubly ﬂa w ed: it o ver-samples for v ariables whose Neyman allo cation is dominated, while simultaneously failing to guaran tee domain-level precision for the v ariable that drives the maximum. The Bethel algorithm ( Bethel , 1989 ) provides the correct solution, ﬁnding the smallest total sample that sim ultaneously satisﬁes the co eﬃcient-of-v ariation (CV) target for every v ariable in every domain. Y et the Bethel sample size can itself b e large, b ecause it m ust protect precision using direct survey estimates alone. The key insight of this pap er is that Hierarchical Ba y es small area mo dels ( Rao and Molina , 2015 ) can b e used to reduce the Bethel sample further. By b orro wing strength across domains, HB mo dels pro duce more precise domain estimates than direct estimates at the same sample size. This precision surplus can b e exc hanged for a smaller sample — one that, when combined with HB estimation, still meets all pre-deﬁned precision targets. The result is a t w o-stage strategy: Bethel allo cation pro vides the globally optimal design-based b enchmark; HB mo delling pro vides a principled means of going further. The approac h is v alidated through a Mon te Carlo sim ulation study ( B = 1 , 000 replications) based on a syn thetic lab our-force p opulation of one million individuals. The three target v ariables — employmen t status, unemplo ymen t status, and hours work ed — span b oth binary 3 and contin uous types, enabling the illustration of b oth logit-normal binomial and Gaussian F ay–Herriot HB mo dels. Known p opulation truth at all levels p ermits rigorous ev aluation of precision, accuracy , and credible-interv al cov erage under rep eated sampling. The remainder of the pap er is organised as follows. Section 2 establishes the design setting and notation. Section 3 presents Stage 1, explaining why the Neyman-max approach fails and ho w Bethel allo cation corrects it. Section 4 presents Stage 2, describing the HB mo dels, the prior calibration strategy , and the sample reduction algorithm. Section 5 discusses the shift in inference paradigm from design-based to mo del-based. Section 6 describ es the simulation study design and ev aluation criteria. Section 7 rep orts the n umerical results. Section 8 iden tiﬁes directions for further work, and Section 9 concludes. A complete, self-con tained sp eciﬁcation of the simulation data-generating pro cess — including p opulation structure, laten t v ariable mo dels, auxiliary cov ariate generation, and a full parameter summary — is pro vided in App endix A . 2 Design Setting and Notation Let U = { 1 , . . . , N } denote a ﬁnite p opulation of N units partitioned into H non-o v erlapping strata, indexed h = 1 , . . . , H , with stratum sizes N 1 , . . . , N H . A stratiﬁed simple random sample of size n h is drawn without replacement from stratum h , giving a total sample of n = P H h =1 n h . Let c h > 0 denote the unit cost in stratum h and f h = n h / N h the sampling fraction. The p opulation is also partitioned into D + 1 publication domains, indexed d = 0 , 1 , . . . , D , where d = 0 denotes the national domain and d = 1 , . . . , D denote geographic sub-national domains. There are K target v ariables, indexed k = 1 , . . . , K . F or each v ariable–domain pair ( d, k ), let Y d,k denote the an ticipated domain total and g d,k the pre-sp eciﬁed upp er b ound on CV ( ˆ Y d,k ). Let S 2 h,d,k denote the within-stratum v ariance comp onent for ( d, k ), and DEFF h,d,k the corresp onding design eﬀect. 4 The precision constrain t for domain d and v ariable k is: CV  ˆ Y d,k  ≤ g d,k , ∀ d = 0 , 1 , . . . , D , k = 1 , . . . , K . (1) Throughout this pap er, precision targets are set at g 0 ,k = 0 . 03 (national CV ≤ 3%) and g d,k = 0 . 08 (domain CV ≤ 8% for all d ≥ 1) for all k . In the Australian Bureau of Statistics (ABS) terminology , CVs are referred to as Relative Standard Errors (RSEs). 3 Stage 1 — Multiv ariate Constrained Optimisation: The Bethel Allo cation 3.1 The Neyman Allo cation and Its Limitations F or a single target v ariable k , equal unit costs, and design eﬀects equal to unity , the Neyman allo cation assigns stratum sample sizes prop ortional to N h S h,k : n Neyman h,k ∝ N h S h,k . (2) Because S h,k v aries across v ariables k , the allocation ( 2 ) produces a diﬀeren t stratum allocation for eac h v ariable, and no single set of { n h } is sim ultaneously optimal for all K v ariables. A common NSO practice is to compute ( 2 ) separately for eac h v ariable and tak e the elemen t-wise maximum across v ariables: n NSO h = max k =1 ,...,K n Neyman h,k . (3) This approac h suﬀers from tw o distinct deﬁciencies. Deﬁciency 1: Budget inﬂation. The total n NSO = P h n NSO h is driv en b y the v ariable with the largest Neyman requiremen t in each stratum. F or all other v ariables the resulting 5 allo cation massively ov er-samples, wasting budget without improving precision. Deﬁciency 2: Domain precision not guaran teed. The Neyman allo cation for v ariable k optimises national precision for that v ariable alone. The stratum weigh ts that are nationally optimal for v ariable k will in general distribute sample diﬀerently across strata than the allo cation required to protect domain-lev el precision for another v ariable k ′  = k . Consequently , ev en at the inﬂated total n NSO , the domain CV target may still b e violated for the v ariable that driv es the maxim um — precisely b ecause its Neyman weigh ts were not designed to protect domain-lev el precision. 3.2 The Bethel F orm ulation The Bethel algorithm ( Bethel , 1989 ) solves the correct problem: ﬁnd the minimum-cost stratum allo cation that sim ultaneously satisﬁes all CV constraints ( 1 ) for all v ariables and all domains. F ormally: min { n h } H X h =1 c h n h s.t. H X h =1 DEFF h,d,k (1 − f h ) N 2 h S 2 h,d,k n h ≤  g d,k Y d,k  2 , ∀ d = 0 , 1 , . . . , D , k = 1 , . . . , K , n min h ≤ n h ≤ N h , n h ∈ Z + , h = 1 , . . . , H . (4) The left-hand sid e of eac h inequalit y is the ﬁnite-p opulation v ariance of the Horvitz–Thompson estimator of Y d,k , inﬂated by the relev ant design eﬀect. The national domain ( d = 0) en ters as an additional set of constraints alongside the geographic domains. The R pack age R2BEAT ( F alorsi et al. , 2021 ) solves ( 4 ) via a Lagrangian relaxation and is used throughout this pap er. Remark 1. The Bethel al lo c ation is design-b ase d: its pr e cision guar ante es r est solely on the r andomisation distribution of the str atiﬁe d design and r e quir e no mo del for the tar get variables. 6 The Neyman al lo c ation ( 2 ) is the sp e cial c ase of ( 4 ) with K = 1 , D = 0 , e qual c osts, and DEFF ≡ 1 . The inputs to ( 4 ) — stratum means, standard deviations, and design eﬀects — are estimated from a preliminary 5% stratiﬁed baseline sample (see Section 6 and App endix A for details). 4 Stage 2 — Hierarc hical Ba y es Sample Reduction Let n ⋆ = P h n ⋆ h denote the Bethel optimal total from solving ( 4 ). Stage 2 asks: can we dra w a sub-sample of size n < n ⋆ and, using a Hierarchical Bay es mo del ﬁtted to this smaller sample, still satisfy all precision targets ( 1 )? 4.1 HB Mo del for Binary T arget V ariables F or each stratum h = 1 , . . . , H , let y h denote the observ ed count of successes in n h Bernoulli trials. W e use the logit-normal binomial HB mo del ( Rao and Molina , 2015 , Section 10.11): y h | p h ∼ Binomial( n h , p h ) , (5) logit( p h ) = z ⊤ h β + v h , v h iid ∼ N (0 , σ 2 v ) , (6) where z h is a vector of stratum-lev el auxiliary cov ariates and v h is a stratum random eﬀect that induces shrink age tow ard the cov ariate-predicted mean z ⊤ h β . The priors are: β ∼ N  0 , τ 2 β I  , σ 2 v ∼ In v- χ 2 ( ν, s 2 ) , (7) with τ 2 β = 10 6 (diﬀuse ﬂat prior on regression co eﬃcients) and hyperparameters ( ν, s 2 ) calibrated as describ ed in Section 4.3 . Posterior stratum means ˆ p HB h are obtained via the in v erse-logit of the MCMC draws. Domain-lev el estimates are formed as weigh ted sums of stratum p osteriors, with weigh ts prop ortional to N h . 7 4.2 HB Mo del for the Con tinuous T arget V ariable F or contin uous v ariables, we use the Gaussian area-level mo del of F ay and Herriot ( 1979 ): ˆ θ h | θ h ∼ N ( θ h , ψ h ) , ψ h = (1 − f h ) S 2 h n h , (8) θ h = z ⊤ h β + v h , v h iid ∼ N (0 , σ 2 v ) , (9) where ˆ θ h is the direct stratum sample mean and the sampling v ariance ψ h is treated as known, incorp orating the ﬁnite p opulation correction and, where required, the estimated design eﬀect. The same priors ( 7 ) apply . Posterior stratum means ˆ θ HB h are extracted from the linear predictor. Both mo dels are estimated using the R pack age mcmcsae ( Bo onstra , 2021 ). Three in- dep enden t MCMC chains of 2,000 iterations each are run from disp ersed starting v alues. Con v ergence is assessed using the Gelman–Rubin statistic ( Gelman and Rubin , 1992 ); a candidate reduction is accepted only if the maximum diagnostic across all parameters satisﬁes ˆ R max ≤ 1 . 05. In the n umerical example described in Section 6 b elo w, the maxim um ˆ R ac hiev ed across all v ariables, chains, and domains is 1 . 02, providing strong evidence of conv ergence. 4.3 Prior Calibration The in v erse- χ 2 prior σ 2 v ∼ In v- χ 2 ( ν, s 2 ) con trols the degree of shrink age tow ard the cov ariate- predicted mean. The degrees-of-freedom parameter ν go v erns the tightness of the prior, and the scale parameter s 2 lo cates it on the appropriate v ariance scale. A prior that is to o concen trated causes ov er-shrink age, biasing domain estimates to w ard the grand mean and degrading credible-interv al cov erage; a prior that is to o diﬀuse discards the precision gain from b orro wing strength across domains. The hyperparameters ( ν, s 2 ) are selected via a tw o-dimensional grid search, ev aluating candidate pairs ov er ν ∈ { 2 , 3 , 5 , 10 , 20 } and a range of s 2 v alues on the appropriate scale (logit scale for binary v ariables; natural scale for the contin uous v ariable). F or each candidate 8 pair, at least three indep enden t MCMC chains of ≥ 2 , 000 iterations are run at the minimax recommended sample fraction (Section 4.4 ), and the follo wing four qualit y criteria are recorded: (i) CI cov erage: n umber of estimation areas (domains plus national) for whic h the 95% credible in terv al contains the true v alue. (ii) National relativ e bias: ( ˆ θ HB nat − θ true nat ) /θ true nat . (iii) Domain MARE: mean absolute relative error across all domains. (iv) Domain max ARE: worst-case absolute relative error among all domains. The selected pair achiev es CI co v erage at or ab o v e the nominal level while minimising domain MARE, thereb y balancing calibration against accuracy . The key p oin t is that this grid searc h do es not require knowledge of the true v alue of σ 2 v . The prior is chosen en tirely on the basis of the mo del’s p erformance on criteria (i)–(iv), ev aluated against the known p opulation truth in the sim ulation contex t. This is feasible here b ecause the p opulation was constructed synthetically and truth is av ailable at all lev els. In practice, NSOs do not hav e access to p opulation truth. How ever, a fully op erational surrogate is av ailable: the direct domain estimates from the most recen t previous survey cycle serv e as the proxy for the true domain v alues in criteria (i)–(iv). The grid search is then conducted against these proxies, and the ( ν, s 2 ) pair that achiev es nominal co v erage while minimising domain MARE is selected. This approach requires nothing b eyond what NSOs routinely pro duce from their existing survey programme, and it represents the b est a v ailable strategy for prior calibration in the absence of p opulation truth. 4.4 The Reduction Algorithm F or each v ariable k and candidate reduction fraction α k ∈ [0 , 1), a nested sub-sample of size ⌊ (1 − α k ) n ⋆ h ⌉ is drawn without replacement from stratum h of the Bethel master sample, and the HB mo del for v ariable k is ﬁtted to this sub-sample. Nesting ensures that estimates 9 at diﬀerent reduction levels are directly comparable, isolating the eﬀect of sample size from sampling v ariabilit y . The sub-sample at fraction α k is declared eligible for v ariable k if and only if all four of the follo wing gates are passed: Gate 1. CV gate. CV HB d,k ( α k ) ≤ g d,k for all domains d = 0 , 1 , . . . , D . Gate 2. Con v ergence gate. ˆ R max ≤ 1 . 05. Gate 3. National accuracy gate. ARE nat ,k =   ˆ θ HB nat ,k − θ true nat ,k   /θ true nat ,k do es not exceed a pre-sp eciﬁed tolerance. Gate 4. Domain accuracy gate. MARE k and max d ARE d,k do not exceed pre-sp eciﬁed tolerances. Gates 3 and 4 ensure that the HB mo del do es not merely compress p osterior uncertain t y artiﬁcially — the p osterior means must also b e suﬃciently close to the truth. In a simulation con text, the p opulation truth is av ailable, so these gates can b e ev aluated directly . In a real-surv ey context, the tolerances would b e set b y the NSO based on ﬁtness-for-purp ose criteria, and the previous survey cycle’s direct domain estimates w ould serve as a pro xy for the truth in Gate 3 and 4 ev aluations. F or each v ariable k , the optimal p er-v ariable reduction fraction is: α ⋆ k = sup  α k ∈ [0 , 1) : all four gates pass for v ariable k  , (10) giving a p er-v ariable reduced total n HB k = ⌊ (1 − α ⋆ k ) n ⋆ ⌉ . T o ensure that all K v ariables meet their requiremen ts simultaneously , the ﬁnal sample size is determined b y the minimax rule : α ⋆ = min k =1 ,...,K α ⋆ k , n HB = j (1 − α ⋆ ) n ⋆ m . (11) 10 5 F rom Design-Based to Mo del-Based Inference Stage 1 operates in the design-based framew ork: precision guaran tees rest on the randomisation distribution induced b y the sampling design, and uncertain ty is communicated through conﬁdence interv als with long-run cov erage prop erties. Stage 2 mov es to a mo del-based paradigm, in whic h the HB p osterior distribution conditions on b oth the observed data and the assumed mo del structure, and uncertaint y is communicated through credible interv als. The tw o types of interv al diﬀer in their in terpretation. A 95% credible interv al pro vides a direct p osterior probability statemen t: giv en the observed data and the assumed mo del, the probability that the parameter lies in the rep orted interv al is 95%. A 95% conﬁdence in terv al, by contrast, is deﬁned through its long-run b eha viour: o v er rep eated sampling, 95% of suc h in terv als would contain the true parameter. Once the data are observ ed, the sp eciﬁc realised conﬁdence in terv al either con tains the true v alue or it do es not; the 95% refers to the pro cedure, not to the probability asso ciated with the particular in terv al in hand ( Neyman , 1937 ; Casella and Berger , 2002 ). Credible interv als therefore pro vide an interpretation that is closer to ho w most users and decision-makers think ab out uncertaint y . This shift in inference framew ork is not new to oﬃcial statistics: HB mo dels are routinely used by NSOs for small area estimation ( Rao and Molina , 2015 ; Pfeﬀermann , 2013 ). The k ey con tribution here is applying this shift sp eciﬁcally to enable a reduction in survey op erating costs. The frequentist prop erties of the HB credible interv als — sp eciﬁcally , whether they ac hiev e appro ximately correct co v erage under repeated sampling from the syn thetic p opulation — are assessed empirically via the Monte Carlo study describ ed in Section 6 . 6 Sim ulation Study Design T o v alidate the t w o-stage strategy , w e construct a synthetic lab our-force p opulation with kno wn p opulation truth at all levels and conduct a Monte Carlo study with B = 1 , 000 replications. F ull details of the data-generating pro cess, including the latent v ariable mo dels 11 and auxiliary cov ariate structures, are provided in App endix A . Here we describ e the key features of the p opulation, the baseline sample, and the ev aluation criteria. 6.1 P opulation and T arget V ariables The synthetic p opulation comprises N = 1 , 000 , 000 individuals partitioned into H = 100 strata nested within D = 10 geographic domains (approximately 10 strata p er domain). Three target v ariables are generated indep endently for each individual: • Emplo ymen t status E hi (binary): stratum employmen t probabilities are generated via a logit-normal mo del with national mean ≈ 62%. Two stratum-level auxiliary co v ariates and a domain random eﬀect go v ern b et w een-stratum v ariation in the logit of the emplo ymen t probability . • Unemplo ymen t status U hi (binary): generated indep endently of employmen t via a logit-normal mo del with national mean ≈ 4%, reﬂecting the rarity of unemploymen t. Mutual exclusivity of E and U is enforced by resolving E hi = U hi = 1 ov erlaps in the ratio 62 : 4. • Hours work ed Hrs hi (con tin uous): individual hours are drawn from a truncated normal distribution with stratum mean follo wing a logistic function of t w o stratum-lev el co v ariates, truncated to the interv al [15 , 60] hours p er week. National mean ≈ 37 hours. Stratum design eﬀects are drawn once as δ h ∼ U (1 . 1 , 1 . 2) and shared across all three v ariables, reﬂecting that the design eﬀect is a prop ert y of the sampling design rather than of the v ariable. 6.2 Baseline Sample and Bethel Inputs A stratiﬁed simple random baseline sample at 5% p er stratum, n (0) h = max { 2 , ⌊ 0 . 05 N h ⌉} , is dra wn from the p opulation. Stratum-lev el means, standard deviations, and eﬀective sample 12 sizes n eﬀ h = ⌊ n (0) h (1 − n (0) h / N h ) /δ h ⌉ are computed from this baseline sample and pro vided as inputs to the Bethel optimisation ( 4 ). Kno wn p opulation truth is retained at the stratum, domain, and national lev el for ev aluation. 6.3 Mon te Carlo Design and Ev aluation Criteria In eac h replication b = 1 , . . . , 1 , 000: (1) A stratiﬁed random sample of size n ⋆ = 91 , 308 (the Bethel total) is drawn from the p opulation under the Bethel stratum allo cation. (2) A nested sub-sample of size n HB = 18 , 262 is extracted from the master sample. (3) HB mo dels are ﬁtted to the sub-sample for eac h v ariable, and p osterior means and 95% credible in terv als are computed for each domain and nationally . (4) The following criteria are recorded for eac h v ariable k and domain d : • Cover age indic ator: whether θ true d,k falls inside the 95% credible interv al. • A bsolute r elative err or: ARE ( b ) d,k = | ˆ θ HB , ( b ) d,k − θ true d,k | /θ true d,k . • CV gate: whether CV HB , ( b ) d,k ≤ g d,k for all d . Mon te Carlo summaries across B = 1 , 000 replications are: Co v erage d,k = 1 B B X b =1 1 h θ true d,k ∈ CI ( b ) d,k i , (12) MARE k = 1 B D B X b =1 D X d =1 ARE ( b ) d,k , (13) CV-pass k = 1 B B X b =1 1 h CV HB , ( b ) d,k ≤ g d,k ∀ d = 0 , . . . , D i . (14) 13 7 Results 7.1 Sample Size Comparisons T able 1 compares total sample sizes across all four strategies. The Neyman allo cations range from 198 (Hours W ork ed) to 68 , 697 (Unemploy ed), reﬂecting the v ery diﬀerent within-stratum v ariability of a rare binary v ariable versus a con tinuous one. The NSO-max collapses these to a single shared total of 68 , 697, driven en tirely b y Unemploymen t. The Bethel allo cation requires n ⋆ = 91 , 308 — larger than the NSO-max — b ecause it must protect domain-lev el precision for all three v ariables simultaneously . The HB-combined allo cation ac hiev es n HB = 18 , 262, an 80% reduction relativ e to Bethel and smaller even than the NSO-max total. T able 1: T otal sample sizes under eac h allo cation strategy . V ariable Neyman NSO Max Bethel HB Com bined Emplo y ed 768 68,697 91,308 18,262 Unemplo y ed 68,697 68,697 91,308 18,262 Hours W orked 198 68,697 91,308 18,262 7.2 Precision Under Neyman and NSO-Max Approaches T able 2 rep orts national and worst-domain CVs under the p er-v ariable Neyman allo cations and under the NSO-max. Under Neyman (each v ariable’s own n h ), all three v ariables fail the domain CV target of 8%. Under NSO-max ( n = 68 , 697), Emplo yment and Hours W orked comfortably satisfy b oth targets, but Unemploymen t still fails the domain target (w orst-domain CV = 13 . 5%). This conﬁrms Deﬁciency 2: the Neyman stratum weigh ts for Unemplo ymen t w ere designed for national precision, not domain precision, and scaling up to the maxim um do es not rectify this. 14 T able 2: National and worst-domain CVs under Neyman and NSO-max. T argets: national CV ≤ 3%; domain CV ≤ 8%. Bold entries fail their target. After Neyman (o wn n h ) After NSO Max ( n = 68 , 697) V ariable National CV W orst-Domain CV National CV W orst-Domain CV Emplo y ed 0.030 0.124 0.003 0.012 Unemplo y ed 0.029 0.135 0.029 0.135 Hours W ork ed 0.023 0.079 0.001 0.005 7.3 Precision Under Bethel Allo cation T able 3 rep orts the CVs at the Bethel solution. With n ⋆ = 91 , 308, all national and domain CV constrain ts are satisﬁed simultaneously for all three v ariables. Bethel is th us the correct instrumen t for m ulti-v ariable, multi-domain precision requirements: it is b oth necessary (the NSO-max fails) and suﬃcien t (all constraints are met). T able 3: National and w orst-domain CVs after Bethel allo cation ( n ⋆ = 91 , 308). All constrain ts satisﬁed. National W orst Domain V ariable CV T arget Max CV T arget Emplo y ed 0.026 ≤ 3% 0.075 ≤ 8% Unemplo y ed 0.003 ≤ 3% 0.010 ≤ 8% Hours W ork ed 0.001 ≤ 3% 0.003 ≤ 8% 7.4 Precision After HB Reduction T able 4 rep orts national and worst-domain CVs after HB mo delling at the reduced sample n HB = 18 , 262. All national and domain CV constrain ts are satisﬁed for all three v ariables. The reduction in CV relativ e to the Bethel results reﬂects the precision gain from b orrowing strength across domains, whic h more than comp ensates for the reduction in sample size. 15 T able 4: National and worst-domain CVs after HB mo delling ( n HB = 18 , 262). All constrain ts satisﬁed. National W orst Domain V ariable CV T arget Max CV T arget Emplo y ed 0.018 ≤ 3% 0.065 ≤ 8% Unemplo y ed 0.006 ≤ 3% 0.020 ≤ 8% Hours W ork ed 0.002 ≤ 3% 0.008 ≤ 8% 7.5 Accuracy of HB Estimates: Single Sample T able 5 compares HB estimates against p opulation truth for the single illustrative sample. National relativ e bias is at most 1% for all v ariables. Employmen t and Hours W orked exhibit small domain errors (MARE of 2% and 0% resp ectively). Unemploymen t exhibits a domain MARE of 13% and a maxim um domain ARE of 43%, attributable to a single outlier domain in whic h the p osterior mean unemplo ymen t rate (0 . 014) is approximately 43% b elow the true v alue (0 . 018). This reﬂects the inherent diﬃculty of estimating a rare ev en t ( ≈ 4% national mean) in a small domain at the reduced sample size. This is also the domain for which the credible in terv al fails to contain the true v alue in T able 6 . T able 5: Accuracy of HB estimates relativ e to p opulation truth: single illustrativ e sample ( n HB = 18 , 262). V ariable Nat. Rel. Bias Domain MARE Domain Max ARE Emplo y ed +1% 2% 5% Unemplo y ed +1% 13% 43% Hours W orked +0% 0% 1% ARE = Absolute Relativ e Error. MARE = Mean Absolute Relative Error. 7.6 Credible In terv al Co verage: Single Sample T able 6 rep orts, for the single illustrative sample, the num b er of the 11 estimation areas (10 domains plus the national estimate) for which the true v alue falls within the 95% credible in terv al. Co v erage is 10 / 11 for Emplo ymen t and Unemplo ymen t, and 11 / 11 for Hours W ork ed. 16 T able 6: Credible interv al cov erage: single illustrativ e sample. “Cases” = num b er of areas (10 domains + national = 11) where the true v alue lies inside the 95% credible interv al. V ariable Cases Empirical Co v erage Emplo y ed 10/11 90.9% Unemplo y ed 10/11 90.9% Hours W orked 11/11 100.0% 7.7 Mon te Carlo Results: Credible In terv al Co verage T able 7 rep orts Mon te Carlo ( B = 1 , 000) mean credible interv al cov erage across all 11 estimation areas, together with the Monte Carlo standard deviation. All three v ariables ac hiev e mean co v erage close to or exceeding the nominal 95% level. Emplo ymen t ac hiev es 93 . 0%, Unemploymen t 97 . 7%, and Hours W ork ed 100 . 0%. None of these v alues is statistically signiﬁcan tly diﬀerent from 95% at the 5% signiﬁcance level. T able 7: Mon te Carlo ( B = 1 , 000) credible interv al co v erage at n HB = 18 , 262. “MC Mean Co v erage” = av erage prop ortion of 11 estimation areas whose interv al contains the truth. V ariable MC Mean Co v erage MC SD Emplo y ed 93.0% 0.255 Unemplo y ed 97.7% 0.150 Hours W orked 100.0% 0.000 95% credible in terv als computed via MCMC (2,000 iterations; ˆ R max = 1 . 02). 7.8 Mon te Carlo Results: Accuracy of HB Estimates T able 8 rep orts Mon te Carlo summaries of estimate accuracy across B = 1 , 000 replications. MC mean national relativ e bias is +0 . 37% for Employmen t, +0 . 35% for Unemploymen t, and − 1 . 54% for Hours W orked — all negligible in magnitude. MC mean domain MARE is 0 . 58% for Employmen t and near zero for Hours W orked. F or Unemploymen t, MC mean domain MARE is 4 . 10%, and MC mean domain maximum ARE is 18 . 4%, again reﬂecting the diﬃcult y of estimating a rare even t in small domains. 17 T able 8: Monte Carlo ( B = 1 , 000) accuracy of HB estimates at n HB = 18 , 262. V ariable MC Mean Nat. Rel. Bias MC Mean Domain MARE MC Mean Domain Max ARE Emplo y ed +0 . 37% 0 . 58% 3 . 12% Unemplo y ed +0 . 35% 4 . 10% 18 . 40% Hours W orked − 1 . 54% 0 . 02% 0 . 15% MC Mean Domain Max ARE = a verage across replications of the worst-domain ARE. 7.9 Mon te Carlo Results: CV Gate P ass Rate T able 9 rep orts the prop ortion of Mon te Carlo replications in which the HB estimates satisfy the CV gate for all domains simultaneously . The pass rate exceeds 95% for all three v ariables, conﬁrming that the precision targets are met reliably under rep eated sampling and not merely for the single illustrativ e sample. T able 9: Mon te Carlo ( B = 1 , 000) prop ortion of replications satisfying the CV gate for all domains sim ultaneously ( n HB = 18 , 262). T argets: national CV ≤ 3%; domain CV ≤ 8%. V ariable CV Gate P ass Rate Emplo y ed 97.8% Unemplo y ed 95.3% Hours W orked 99.6% 7.10 Implemen tation Under Multi-Stage Cluster Designs Man y NSO surv eys use m ulti-stage cluster designs rather than stratiﬁed simple random sampling. The cluster design eﬀect is approximately: DEFF cluster ≈ 1 + ( b − 1) ρ, (15) where b is the within-primary-sampling-unit (PSU) take and ρ is the intra-class correlation. The k ey observ ation is that ( 15 ) dep ends primarily on b and ρ — not directly on the num b er of sampled PSUs m . Consequently , a prop ortional reduction in total sample size can b e 18 implemen ted by reducing m while holding b ﬁxed: n ′ = 0 . 9 n ⇒ b ′ = b, m ′ ≈ 0 . 9 m, so that DEFF ′ cluster ≈ DEFF cluster . This preserv es the design eﬀect approximately unchanged, so that the v ariance inputs to ( 4 ) remain v alid after the HB reduction. 8 F urther W ork The t w o-stage strategy op ens several directions for further developmen t. Ad ho c surv eys. When an NSO disseminates unit record ﬁles or cross-tabulations derived from a reduced-sample HB survey , t w o practical challenges arise. First, unit record weigh ts m ust b e adjusted so that cross-tabulated ﬁgures are consistent with the domain- and national- lev el mo delled estimates. Second, metho ds are needed to assist users in constructing credible in terv als from cross-tabulations or unit record ﬁles without requiring them to re-run the MCMC. Rep eated surv eys and time series. The metho dology is formulated for a single cross- sectional surv ey . Extension to rep eated surv eys requires addressing rotational panel designs, where the ov erlap b etw een successive samples introduces correlations across time that m ust b e accoun ted for in b oth the Bethel allo cation and the HB mo del. Seasonal adjustment and trend analysis under the mo del-based inference paradigm present additional challenges. Robustness to mo del missp eciﬁcation. The HB reduction dep ends on the v alidity of the linking mo del. F uture work should in v estigate sensitivity of α ⋆ to departures from the assumed mo del, and develop robust v ariants that pro vide v alid guaran tees under a wider class of data-generating pro cesses. 19 9 Conclusion This pap er presen ts a t wo-stage strategy for reducing the operational cost of multi-purpose sur- v eys while preserving pre-sp eciﬁed precision targets across all target v ariables and geographic domains sim ultaneously . Stage 1 replaces the common NSO practice of computing Neyman allo cations separately p er v ariable and taking the element-wise maximum — a pro cedure that o v er-samples for most v ariables and still fails domain precision for the v ariable that drives the maxim um — with Bethel allo cation ( Bethel , 1989 ), which ﬁnds the globally minimum sample satisfying all CV constrain ts simultaneously . Stage 2 asks whether the Bethel sample can b e reduced further using Hierarchical Ba y es mo delling ( Rao and Molina , 2015 ). By borrowing strength across domains, HB mo dels produce more precise domain estimates than direct estimates at the same sample size. A nested sub- sampling and four-gate eligibility algorithm iden tiﬁes the largest reduction fraction for which all precision, con v ergence, and accuracy criteria are met across all v ariables simultaneously . Applied to a syn thetic lab our-force p opulation, the strategy reduces the required sample from the Bethel b enchmark of 91 , 308 to an HB-combined total of 18 , 262 — a reduction of 80% — while meeting all national and domain CV targets. A Monte Carlo study with B = 1 , 000 replications conﬁrms that the CV gate passes in more than 95% of replications for all three v ariables, and that credible interv al cov erage is close to the nominal 95% level throughout. The principal trade-oﬀ is a shift from design-based conﬁdence in terv als to mo del-based credible interv als. This is not a genuine downside: HB mo dels are already routinely used b y NSOs for small area estimation; credible in terv als ha v e more in tuitiv e in terpretations for decision-mak ers; and the Monte Carlo results conﬁrm adequate cov erage under rep eated sampling. The key message for statistical agencies is this: b y com bining Bethel allo cation with Hierarc hical Bay es estimation, it is p ossible to obtain a substantially smaller surv ey sample — and therefore substantially lo w er op erating costs — while main taining reliable precision for 20 all required outputs across all geographic domains. Ac kno wledgemen ts The author thanks the developers of the R2BEAT pac k age (Stefano F alorsi, Andrea F asulo, Alessio Guandalini, Daniela Pagliuca, Marco D. T erribili, Giulio Barcaroli, and Ilaria Bom b elli, primarily aﬃliated with the Italian National Institute of Statistics, Istat) and the mcmcsae pac k age (Harm Jan Bo onstra, with contributions from Grzegorz Baltissen, of the Cen tral Bureau of Statistics, The Netherlands) for making their soft w are freely av ailable. The author also thanks Dr Harm Jan Bo onstra and Dr Andrea F asulo for helpful advice on the use of mcmcsae and R2BEAT resp ectiv ely . References Bethel, J. (1989). Sample allo cation in multiv ariate surv eys. Survey Metho dolo gy , 15(1):47–57. Bo onstra, H.J. (2021). mcmcsae : Mark o v Chain Monte Carlo Small Area Estimation. R pac k age version 0.7.7. https://CRAN.R- project.org/package=mcmcsae . F alorsi, S., F asulo, A., Guandalini, A., Pagliuca, D., and T erribili, M.D. (2021). R2BEAT : Optimal Allo cation for Multiv ariate and Multi-domain Surve ys. R pack age v ersion 1.0.4. https://CRAN.R- project.org/package=R2BEAT . F ay , R.E. and Herriot, R.A. (1979). Estimates of income for small places: An application of James–Stein pro cedures to census data. Journal of the Americ an Statistic al Asso ciation , 74(366):269–277. Gelman, A. and Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistic al Scienc e , 7(4):457–511. 21 Kim, J.K. and T am, S.-M. (2021). Data in tegration b y combining big data and survey sample data for ﬁnite p opulation inference. International Statistic al R eview , 89(2):382–401. Casella, G. and Berger, R.L. (2002). Statistic al Infer enc e , 2nd edition. Duxbury Press, Paciﬁc Gro v e, CA. Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probabilit y . Philosophic al T r ansactions of the R oyal So ciety of L ondon, Series A , 236:333–380. National Academies of Sciences, Engineering, and Medicine (2017). Innovations in F e der al Statistics: Combining Data Sour c es While Pr ote cting Privacy . National Academies Press, W ashington, DC. Pfeﬀermann, D. (2013). New imp ortan t developmen ts in small area estimation. Statistic al Scienc e , 28(1):40–68. Rao, J.N.K. and Molina, I. (2015). Smal l A r e a Estimation , 2nd edition. John Wiley & Sons, New Y ork. T am, S.-M. (2024). Small data estimation for binary v ariables with big data: A comparison of calibrated nearest neighbour and hierarc hical Bay es metho ds of estimation. Statistic al Journal of the IAOS , 40:581–589. T am, S.-M. and Clark e, F. (2015). Big data, oﬃcial statistics and some initiatives by the Australian Bureau of Statistics. International Statistic al R eview , 83(3):436–448. T am, S.-M., Holm b erg, A., and W ang, S. (2023). A note on the optimum allo cation of resources to follo w up unit non-resp ondents in probability surveys. Journal of Oﬃcial Statistics , 39(4):451–472. United Nations Economic Commission for Europ e (2022). Machine L e arning for Oﬃcial 22 Statistics . United Nations, Genev a. https://unece.org/statistics/publications/ machine- learning- official- statistics . United Nations Economic Commission for Europ e (2025). The F utur e of National Statistics Oﬃc es: A Cal l to A ction . United Nations, Genev a. 23 App endix A Sim ulation Data-Generating Pro cess This app endix pro vides a complete, self-contained sp eciﬁcation of the synthetic lab our-force p opulation used in Section 6 . The structure closely mirrors a Lab our F orce Survey (LFS) setting, with three target v ariables — employmen t status, unemploymen t status, and hours w ork ed — generated across 100 strata nested within 10 geographic domains. The p opulation is generated once and held ﬁxed throughout the Monte Carlo study; kno wn truth at the stratum, domain, and national lev el is retained for ev aluation. A.1 P opulation Structure and Design Eﬀects The p opulation comprises N = 1 , 000 , 000 individuals partitioned in to H = 100 strata nested within D = 10 geographic domains, with appro ximately 10 strata p er domain. Stratum sizes N h v ary across the p opulation. Stratum design eﬀects are drawn once as δ h ∼ U (1 . 1 , 1 . 2) , h = 1 , . . . , 100 , and held ﬁxed throughout the study . Design eﬀects are shared across all three target v ariables, reﬂecting that δ h is a prop erty of the sampling design (PSU clustering) rather than of the v ariable. A.2 Auxiliary Co v ariates Eac h target v ariable has its own set of stratum-level auxiliary co v ariates, dra wn indep endently: Emplo ymen t co v ariates. X 1 , 1 ,h ∼ N (3 , 1 2 ) , X 1 , 2 ,h ∼ N (4 , 1 . 5 2 ) , h = 1 , . . . , 100 . 24 Unemplo ymen t co v ariates. X 2 , 1 ,h ∼ N (3 , 1 2 ) , X 2 , 2 ,h ∼ N (4 , 1 . 5 2 ) , h = 1 , . . . , 100 , dra wn indep endently of ( X 1 , 1 ,h , X 1 , 2 ,h ). Hours W ork ed co v ariates. X 3 , 1 ,h ∼ N (0 , 3 2 ) , X 3 , 2 ,h ∼ N (0 , 3 2 ) , h = 1 , . . . , 100 . T o each stratum-level co v ariate, individual-lev el noise is added: X j,k,i = X j,k,h ( i ) + η i , η i iid ∼ N (0 , 0 . 2 2 ) , where h ( i ) denotes the stratum con taining individual i . The unit-level co v ariates X j,k,i are used as linking-mo del cov ariates z h in the HB mo dels of Section 4 , with stratum means substituted for individual v alues in the area-level formulation. A.3 Emplo yment Status: Laten t V ariable Mo del Emplo ymen t status E hi ∈ { 0 , 1 } is generated via a tw o-level logit-normal mo del. At the stratum lev el, the latent logit probability of employmen t is: η E h = logit(0 . 62) + 0 . 15 ( X 1 , 1 ,h − 3) + 0 . 10 ( X 1 , 2 ,h − 4) + γ E d ( h ) + ε E h , (16) where logit (0 . 62) ≈ 0 . 489 is the in tercept corresp onding to a national employmen t rate of 62%. The terms 0 . 15( X 1 , 1 ,h − 3) and 0 . 10( X 1 , 2 ,h − 4) are centred co v ariate eﬀects, ensuring that the intercept retains its interpretation as the national logit mean when a v eraged ov er the co v ariate distribution. 25 The domain random eﬀect γ E d ( h ) in tro duces b etw een-domain heterogeneity: γ E d iid ∼ N (0 , 0 . 20 2 ) , d = 1 , . . . , 10 . The stratum residual ε E h in tro duces within-domain, b etw een-stratum heterogeneity: ε E h iid ∼ N (0 , 0 . 15 2 ) , h = 1 , . . . , 100 . The stratum emplo ymen t probability is obtained via the inv erse logit: p E ,h = logistic( η E h ) = 1 1 + e − η E h . Individual emplo ymen t indicators are then drawn as E hi iid ∼ Bernoulli( p E ,h ) , i = 1 , . . . , N h . The marginal national emplo ymen t rate is ≈ 62%. A.4 Unemplo yment Status: Laten t V ariable Mo del Unemplo ymen t status U hi ∈ { 0 , 1 } is generated indep enden tly of emplo ymen t via an analogous logit-normal mo del. The latent logit probability is: η U h = logit(0 . 04) + 0 . 15 ( X 2 , 1 ,h − 3) + 0 . 10 ( X 2 , 2 ,h − 4) + γ U d ( h ) + ε U h , (17) where logit (0 . 04) ≈ − 3 . 178 reﬂects a lo w national unemplo ymen t rate of 4%. Domain and stratum random eﬀects are: γ U d iid ∼ N (0 , 0 . 10 2 ) , ε U h iid ∼ N (0 , 0 . 08 2 ) . 26 The smaller v ariance parameters for unemploymen t (compared to employmen t) reﬂect that the unemplo ymen t rate is less v ariable across domains on the logit scale. Individual unemploymen t indicators are: U hi iid ∼ Bernoulli  logistic( η U h )  . The marginal national unemplo ymen t rate is ≈ 4%. Mutual exclusivit y . Emplo ymen t and unemplo ymen t are generated indep endently , so a small num b er of individuals ma y hav e E hi = U hi = 1. These ov erlaps are resolved by reclassifying eac h such individual as employ ed with probabilit y 62 / (62 + 4) = 62 / 66 and as unemplo y ed with probability 4 / 66, preserving the marginal rates in exp ectation. A.5 Hours W ork ed: T runcated Normal Mo del F or individuals in stratum h , hours w ork ed are drawn from a truncated normal distribution: Hrs hi ∼ N ( µ h , 12 2 ) truncated to [15 , 60] , (18) where the stratum mean is gov erned by a logistic link: µ h = 15 + 45 logistic(0 . 10 X 3 , 1 ,h + 0 . 08 X 3 , 2 ,h ) . (19) The logistic function maps the linear predictor in to (0 , 1), so µ h ∈ (15 , 60) for all strata. The co eﬃcien ts 0 . 10 and 0 . 08 go v ern the sensitivit y of stratum mean hours to the cen tred co v ariates X 3 , 1 ,h and X 3 , 2 ,h resp ectiv ely . The national mean is approximately 37 hours p er w eek. The within-stratum standard deviation of 12 hours pro duces substantial within-stratum v ariation, consisten t with a realistic lab our force p opulation. 27 A.6 Baseline Sample and Bethel Inputs A 5% stratiﬁed simple random sample is drawn with: n (0) h = max n 2 , j 0 . 05 N h mo , h = 1 , . . . , 100 . The minim um of 2 ensures that a within-stratum v ariance can b e computed in every stratum. Eﬀectiv e sample sizes are: n eﬀ h = $ n (0) h (1 − n (0) h / N h ) δ h ' . Stratum means and standard deviations estimated from this baseline sample, together with the n eﬀ h , are the sole inputs to R2BEAT for the Bethel optimisation ( 4 ). A.7 Summary of Sim ulation P arameters 28 T able 10: Complete summary of sim ulation parameters. Comp onen t P arameter V alue P opulation T otal size N 1 , 000 , 000 Num b er of strata H 100 Num b er of domains D 10 Design eﬀect δ h U (1 . 1 , 1 . 2) Emplo ymen t National mean ¯ p E ≈ 62% In tercept logit(0 . 62) ≈ 0 . 489 Co v ariate 1: mean, SD N (3 , 1 2 ) Co v ariate 2: mean, SD N (4 , 1 . 5 2 ) Co v ariate co eﬃcients 0 . 15 , 0 . 10 Domain logit SD σ E γ 0 . 20 Stratum logit SD σ E ε 0 . 15 Unemplo ymen t National mean ¯ p U ≈ 4% In tercept logit(0 . 04) ≈ − 3 . 178 Co v ariate 1: mean, SD N (3 , 1 2 ) Co v ariate 2: mean, SD N (4 , 1 . 5 2 ) Co v ariate co eﬃcients 0 . 15 , 0 . 10 Domain logit SD σ U γ 0 . 10 Stratum logit SD σ U ε 0 . 08 Hours W orked National mean ¯ µ ≈ 37 hrs Stratum mean link Logistic Co v ariate 1: mean, SD N (0 , 3 2 ) Co v ariate 2: mean, SD N (0 , 3 2 ) Co v ariate co eﬃcients 0 . 10 , 0 . 08 Within-stratum SD σ ε 12 hrs T runcation range [15 , 60] hrs Oﬀset and scale in link 15 + 45 · logistic( · ) Unit noise Added to all co v ariates N (0 , 0 . 2 2 ) Mutual E hi = U hi = 1 resolv ed Ratio 62 : 4 exclusivit y in fa v our of E vs U Baseline Sampling fraction 5% sample Minim um stratum size 2 Precision National CV target g 0 ,k 3% targets Domain CV target g d,k 8% Mon te Carlo Replications B 1 , 000 MCMC iterations p er chain 2 , 000 Num b er of MCMC chains 3 Con v ergence threshold ˆ R max 1 . 05 Ac hiev ed ˆ R max 1 . 02 29

More with Less - Bethel Allocation and Precision-Preserving Sample Size Reduction via Hierarchical Bayes Modelling

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment