The future of census coverage surveys

IMS Collectio ns Probability and St atistics: Essays i n Honor o f David A. F reedman V ol. 2 (2008) 234–245 c  Institute of Mathematical Statistics , 2008 DOI: 10.1214/ 19394030 70000004 64 The future of cen sus co v erage surv eys Kenneth W a c h ter 1 University of California, Berk eley Abstract: A quarter-cen tury of statistical researc h has sho wn that ce nsus co v erage surv eys, v aluable as th ey are in oﬀering a rep ort card on each de- cennial census, do not provide usable estimates of geographical diﬀerences in co v erage. The determining r eason is the large num ber of “doubly missi ng” people missing both from the census e n umeration and fr om co v erage survey es- timates. F uture co v erage s urve ys should b e designed to meet achiev able goals, foregoing eﬀorts at spatial sp eciﬁcit y . One implication is a sample size no more than ab out 30 , 000, setting fr ee resources for cont rolling pro cessing errors and inv esting in cov erage improv emen t. P ossible integrat ion of cov erage measure- men t wi th the American Comm unit y Sur vey would hav e many b eneﬁts and should b e giv en careful consideration. 1. Co v e rage surv eys and their purp oses During the ﬁnal decades o f the Twen tieth Ce ntury , pr op osals for the sta tistical adjustment of the decennial census in the United States provided a large- scale proving ground for a pproaches to the ev aluation of s tatistical pr o cedures. In many resp ects, David F reedman has b een the c ent ral ﬁgur e in these studies. He more than any one else directed attent ion b eyond politica l contro versies sur rounding particular decisions to the general scie n tiﬁc ques tion of how one separ ates s ta tistical ﬁndings driven by data from ﬁndings dependent on hypothetical a s sumptions in a modeling pro cess. Census adjustment is a topic well-suited to raise g eneral ques tions ab out the nature and eﬃcacy of models. First, the s c ale and c o mplexity o f the topic ha ve meant that a wide range of statistical metho ds have play e d a role and come under scrutiny . Second, the profess ionalism of the Census Bureau and the willingnes s of Bureau leaders hip to make lar ge data sets a v ailable to outside resear chers have enabled indep endent r eplication of e s timates to an extent r arely s e en. Third, the background pressur es of p olitical s tr uggle and le gal review hav e kept the distinction betw een data and ass umptions in the fo regro und. A leg acy of this enterprise o f statistical exa mination is a n unpara lleled foundation for rational pla nning for census ev a luation a s we enter the Twent yﬁr st Century . In this es say , I bring together less ons that hav e b een learned ab out the capabilities o f census coverage s urveys and their design and deplo yment for the future. “Cov erage” is a g eneral de mo graphic term referring to accura cy of enumeration in the fac e o f err ors of undercounting, ov ercounting, or miscoun ting members of a 1 Unive rsity of California, Departmen t of Demograph y and D epartment of Statistics, 2232 Pied- mon t Av en ue, Berkeley , California 94720-2120, USA, e-m ail: wachter@ demog.be rkeley.edu AMS 2000 subje ct classiﬁc ations: 62P25, 62D05. Keywor ds and phr ases: census adjustmen t, correl ation bias, cov erage measurement , dual sys- tem estimation. 234 Cover age 235 po pulation. Census cov erage sur veys us ually take the form of “p o s t-enumeration” surveys conducted after the completion of a census enumeration. They hav e long accompanied census taking in statistically numerate so cieties. In the United States, census cov erage surveys hav e b een mounted in conjunction with every decennial census since 19 50, with expanded eﬀorts since 1980. The 1980 ent erprise went under the acronym of PEP , for “ Post-Enumeration P rogr am”. It made use, in par t, of samples amounting to 168 , 000 hous eholds from the Current Population Survey r ather than from its own ﬁe ld op eration. The 1990 survey was called the PES, sta nding simply for “Post-Enumeration Survey”. It included its own ﬁeld survey with a sa mple, the “P - sample”, of 5,2 9 0 blo ck groups and ab out 17 0 , 00 0 housing units along with a c o rresp onding sample of Ce ns us reco rds, the “E-sample” . The 200 0 progra m was called ACE, for “Accura cy a nd Cov erage Ev alua tion.” As in 1990, it collected its own P -sample a nd E - sample, with an a ugmented s ample size of ab out 300 , 000 housing units. Pla ns a re in formation for a counterpart in 20 10. The orig inal purp ose o f census cov erage s ur veys is to genera te indices of the likely extent of er ror in key census ﬁgur es and of the uncertaint ies in p opula tio n counts. In the United States, summary estimates of co verage error from each decennia l survey hav e come to b e reg arded as a re po rt car d on the census. Receiving most attent ion have b een the estimate o f net na tio nal undercount and the estimates of diﬀerences b etw een ra cial and ethnic gro ups. The diﬀerence in estimated net national undercount betw een African-Americans a nd members o f other races is often referr ed to a s “the diﬀerential underco unt ”. The specia l v ir tue of a full census enumeration is the reﬁned information on geogr a phical lo cation which it provides. Surveys in trinsically have lower spatial resolution. A sample of a size adeq ua te for precise estimates for lar ge p o pulation aggre g ates cannot b e indeﬁnitely split up place by place to yield dir e ct estimates for small ge ographica l units. In the United States, census cov erage surveys through the 1980 r ound were never int ended to supp ort disaggreg ated geogra phica l es timates of undercounts. F rom 1980 onw ard, how ever, census coverage surveys have b een pres sed in to service for this originally unintended purp ose, for estima tes of spatial diﬀere nce s in coverage er r ors. Calcula tions were put in play for gene r ating adjusted census counts area by a rea a nd ultimately blo ck by blo ck. Direct estimates b eing o ut of the questio n, a combination of data collection with e xtensive statistical mo deling and inferences bas ed o n a ssumptions came to the fore. In the to-and-fro of litigation, changing federal policies, a nd Supreme Court rul- ings, a djusted ﬁgures have b een rep eatedly computed, but they have never b een accepted as oﬃcia l census counts. The 1 980 PE P was redeployed to generate can- didate sets of adjusted ﬁgures in resp onse to a court decre e . The 1990 PES was developed on a la rge s c ale for po ssible implemen tation o f adjustment, a nd it was accompanied by a suite of ev a luation pro jects known a s the P-Studies and the E-Studies. Rep orts from these 19 90 studies still constitute the mo st de ta iled infor - mation av ailable for understanding cov erage error and adjustmen t dynamics. The 2000 A CE, with its la rger sample size, also included v a lua ble ev aluation studies, although high lev els o f initially undetected duplica te counting in the 20 00 Ce ns us hav e complicated the inv estigations. Many r esearchers ha ve contributed to ana lyses o f the surveys, estimates , and ev aluatio n studies. Overviews with extensive references can b e found in F reedman and W ach ter [ 7 ], Anderson and Fien berg [ 1 ], a nd in B r own and Zhao [ 4 ] in this volume. 236 K. Wachter 2. Lessons Building on the ex emplary technical litera ture, as it pe r tains to the sub ject of this essay , I oﬀer a genera l co nclusion: A quarter-cen tury of experi ence tells us that census cov erage s ur ve ys are useless for measuring spatial diﬀerences in co v erage. The purp ose that can b e met by ce ns us cov erage sur veys is their original purp ose. They can ser ve to characterize o v erall lev els of error and diﬀerentials betw een po p- ulation groups a nd give a rep ort card on the census. They cannot reliably ascertain geogr a phical distributions. Many diﬃculties a nd limitations o f census cov erage surveys with resp ect to ge- ography hav e be e n iden tiﬁed a nd studied, but one is intractable a nd disp osa tive. That one is the large “ doubly missing” po pulation. The phrase “doubly missing” r efers to p eople who ar e missing bo th from the Census enumeration and from estimates c o mputed from the co verage s urvey . “Un- reached p eople” is an alternate descriptor. The technical term for this kind of error is “cor relation bia s”. The reaso n for corr elation bias in survey-based estimates is obvious. People who ar e hard to count ar e also ha rd to survey . The doubly missing add a “ﬁfth cell” to the four c e lls of a tw o-wa y table used in the es timation pro cess . Thr ee cells contain observ ations: (1), a t upp er left, the nu m ber included in b oth census and sur vey , (2), at upp er right, the num ber included in the census, missing from the survey , a nd (3), at lower left, the num b er missing from the cens us included in the survey . In the fourth, low er-right cell, it is custo mary to ﬁll in the pro duct (4) = (2 ) × (3) / (1). This pr o duct equa ls the num b er we should exp ect to see missing a t random, in the counterfactual se tting in which being mis s ed in the survey were statistically independent of being missed in the census. Estimates from s urveys r outinely comp e ns ate for p eople missing at r andom but not for unmeasured corr elations in missingness. The ﬁfth cell r epresents the excess of p eople miss ing fr om b oth data s y stems due to corr elation bias . The estimation metho d in use with census adjustment is known a s the “Capture- Recapture” o r “Dual Sys tem Estimator”, a bbr eviated DSE. It is des crib ed in Hog an [ 8 ] and Brown et al. [ 3 ] a nd in Brown and Zhao [ 4 ] in this volume. Dual Sys tem estimates are computed separately for broad p opulation groups and then added together. The groups are called “p ost-str ata” b ecause individuals a re as s igned to them p ost-ho c, after da ta collection. Post-stratiﬁcation controls for so me dimensions of being har d to co un t. But r easons for being missed by census and survey tak ers are m uc h more v arious and p erso na l than ar e pinp ointed by a class iﬁcation into po st-strata. Only a mode s t p or tion of correlatio n bias is r emov ed by stratiﬁcatio n exploiting av ailable v ariables. La rge excesse s o f unreached peo ple remain miss ing beyond the a llow ances made by Dual System estimator s. W e detect the existence of doubly missing p eople not fro m any information in the coverage sur vey , but from comparis on with independent national p opulation estimates by a ge, sex , and race from an approach known as Demogra phic Analysis. “Demogra phic Analysis” abbrevia ted DA, is written with c a pital letters D and A to distinguish it fro m g eneric a na lysis by demogr a phers. DA draws on administr a tive records, birth and de a th registr ation, Medicare e nr ollments, and guesses at net immigration informed by ancilla ry sur vey-based estimates of the for eign-b orn [ 10 ]. D A is sub ject to ma n y kinds of err ors a nd uncerta int ies of its own, but it is the bes t av ailable source for na tional num bers and it is the Census Bur e au’s standard against which other co unts a nd estimates are co mpared. Cover age 237 Prop erties of the estimation metho ds a nd comparisons b et ween Census , Dual System, and DA totals hav e b een intensively scrutinized by the scientiﬁc co mm unit y ov er the last tw en t y-ﬁve years. E vidence has accum ulated ab out doubly missing peo ple a nd their impact o n estimates. Thr ee conclusio ns bea ring on the future of census coverage have emerged: 1. The doubly missing, for who m no geogr aphical information is av ailable, are so numerous that they a mount to a substa n tial fractio n of any census under- count. 2. Breakdowns of doubly missing p eople by sex and race show hig her than av- erage num b er s among Africa n-American males, in line with e xp e ctations. 3. There is stro ng circumstantial evidence that doubly missing p eo ple w ere un- evenly distributed acr oss reg ions of the country in 1990 and 2 000. 3. Num b ers of the doubly mi ssing W e review these c o nclusions o ne by one, b eginning with the num bers of doubly missing p eople. Thr ee elements go into an estimate o f doubly missing p eople, a n initial DSE ﬁgur e for net undercount from the cov erage survey , an estimate o f pro cessing error in the DSE genera ted by the ev alua tion s tudies (P- studies a nd E-studies) and an estimate of total net underco unt from Demogr aphic Analysis. Overcoun ts as well as undercounts o ccur in Census ﬁgures. Net undercount is the amount by which undercounts exce e d ov ercounts. It can b e p ositive o r negative. The magnitudes of estimates, c hoices among alternatives, and cr itiques of assumptions are set out in W ach ter a nd F reedman [ 13 ] and F reedman and W ach ter [ 7 ]. The summary here is based on these works, and readers sho uld c o nsult them for details. The ﬁrst and third of these elements, the DSE a nd DA, ha v e already b een men- tioned. The second, pro ces sing error in the po st-enumeration survey e s timates, ent ers fro m a v ariet y of sources. A prominent e x ample is a failur e in the 20 00 survey to detect and corr ect for a larg e num b er of duplicate census recor ds that independently came to light. Sources of pro cessing error include g eo co ding er rors, census day address er rors , matching err o rs, imputatio n erro rs, undetected fabrica- tions, impr e cise ma tching of mo vers, and errors in balancing the scop e o f census and survey . Many sources of pro cessing erro r are measured b y quality control pro cedures, blind rec o ding exp eriments, and targeted ﬁeld followup of samples of cases. These measurements are used in estimates of overall c o rrelatio n bias. The 19 90 estimates of pro cessing error were thoroug hly vetted (Breiman [ 2 ], F ay and Thompso n [ 6 ]). The 2000 estimates are sub ject to wider uncertaint y (F reedman and W a ch ter [ 7 ]). Some comp onents ar e p os itive, some negative. On balance, pro cessing erro rs typically lead the DSE to overstate the net undercount and constitute a negative cor r ection to the DSE ﬁgure s . They are r eferred to as “meas ured bias es” to distinguish them from cor relation bias, which, in the context of qua lity control and ﬁeld followup, is an unmeasur e d bias . DSE ﬁgures for underco unt come out at a per c ent or t w o of the whole p opulation. Erro r s in survey pr o cessing come out at no less than a per cent or t w o. Indeed, 98% or 99% accura cy is hard to a chiev e. Thus pro ces s ing e r rors , co nc e ntrated among hard ca ses, ca n easily aﬀect a sizable fractien of the under count. It is not s ur prising that estimates of pr o cessing error are pivotal. Magnitudes of the relev an t q ua ntit ies fo r 199 0 drawn from W ach ter a nd F reed- man [ 13 ] and for 200 0 drawn from F reedman and W ach ter [ 7 ] are shown in T able 1 . 238 K. Wachter T a ble 1 Elements of c over age err or, estimates in mil lions 1990 PES 2000 ACE Dual-System Estim ate +5 . 3 +3 . 3 Pro cessing Er r or − 3 . 6 − 5 . 5 Corrected Survey Estimate +1 . 7 − 2 . 2 Doubly M i ssing Peo ple +3 . 0 +2 . 5 DA Es timate +4 . 7 +0 . 3 The initial DSE ﬁgures are shown in the ﬁrst row, estimates of pro cessing err or in the seco nd row, and their diﬀerence, the cor rected survey estimates, in the third row. The entries in the second row are middle c hoices from a r ange of alter natives presented in W a ch ter and F r eedman [ 13 ]. F or ea ch census, the e s timate of doubly missing p eople is the n um ber , in the fourth row, that has to be added on to the corrected survey estimate to give the D A net underco unt in the ﬁna l r ow. The DA net underco unt for 1 990 is the diﬀerence betw een the D A p opulation estimate of 253 . 394 million fro m the Oﬃce of the Sec- retary of Commerce [ 9 ] and the Census count o f 248 . 710 million. F or 2000, it is the corres p o nding diﬀerence 28 1 . 760 − 281 . 42 1 million calculated from the r evised DA estimate in Robins on et al. [ 10 ]. T able 1 indicates tota ls of doubly missing p eo ple in the millions, around 3 million in 199 0, on the order of 2 . 5 million in 2000. In 1990 the doubly missing a ccount for a ma jority of the net n um ber of p eople missing from the Census acco r ding to the standard supplied by Demogra phic Analysis. In 2000, initial estimates sugg e sted a small net Census ov ercount (attributed to duplicates) and r evised estimates suggest a net underco unt of a few h undred thousand. Doubly missing p eople oﬀset the whole of the net undercount estimate derived from the cov erage survey corrected for measured sour ces o f pro c essing er ror. The n um ber s of doubly missing p eople are hefty in c o mparison to net under- counts. But they are not surpr ising when compa red to the larg er num b er s of g ross omissions. Gr oss omis s ions repr esent num bers o f p eople omitted from the census befo re the num ber s are o ﬀset b y p eople er roneously enumerated in the census. Gro ss estimates a re dep endent on details of de ﬁnitio n, but, at least b y Census B ur eau ﬁg - ures discuss ed in F reedman a nd W acht er [ 7 ], p. 10–11, the doubly missing in 2000 could b e less tha n a qua rter o f ov erall gros s omissio ns. Net undercount comes, roughly sp ea king, by taking gros s o missions a nd sub- tracting the large num ber of erro neous en umerations, including fabrications and duplications. People assigned to the wro ng lo cation of res idence may b e reckoned bo th as a gros s o mis sion omitted from their prop er lo ca tion and as an oﬀsetting erroneo us enumeration included at the incor rect lo ca tion. While g ross ﬁgures a re int eresting for judgments of face v a lidity , the quantities r elev ant for geogra phica l breakdowns are net undercoun ts. The Dua l Sys tem E stimator nets out the gross ﬁg ures within ea ch po st-stratum, befo re the pro cess ca lled synt hetic estimatio n whic h pro duces geog raphical esti- mates. Dual System estimates o f co verage err or for substantial areas are a lmost all positive. In 200 0, all states a nd all but tw o congr essional districts had pos itive Cover age 239 estimates [ 7 ], p. 1 1. Oﬀsetting overcoun ts and undercounts o ccur among p ost-stra ta and neg atives alo ng with po sitives a re not rare for small jurisdic tio ns and for blo cks. But for units of the size for which c overage surveys give direct information, o nly po sitive num bers a re at issue. The DSE is distributing a sto ck of people (5 . 3 million in 1990 ) among geogra ph- ical lo catio ns to b e added to Census counts. Millions of these p eople ar e wrong ly included in the sto ck, due to pro cess ing err ors. Their loca tions are mea sured, but only with low er geog raphical resolutio n. Millio ns mo re, the do ubly missing p eople, hav e to be a dded ba ck in order to a pproximate true counts. Their lo cations are no t measured at all. 4. Diﬀerent ials and distributio ns Along with informatio n ab out num b er s of doubly missing p eople, the accumulated evidence supp or ts so me generaliza tions ab out diﬀerentials a nd dis tr ibutions. Although Demogr aphic Analysis only yields ﬁgures a t the natio na l level, it do es break the national po pulation down b y age, sex, a nd race. The ev aluation studies (henceforth abbreviated ES) for 1 990 and 200 0 hav e no breakdowns by a ge, but they do hav e br eakdowns by se x and minor it y-non-minority sta tus which can b e conv erted to rough breakdowns by s e x and ra ce. With 19 90 data, br eakdowns from the PES, the E S, and D A have b een combined to g ive a pproximations for n um- ber s of doubly missing p eople b y se x a nd tw o rac ia l categ ories, African- Amer icans and O ther Races. Awkward features of the 200 0 p ost-stratiﬁca tion, which did not sharply s eparate males and females, hav e as yet disco ur aged para llel estimates for 2000. Analysis in W ach ter and F reedman [ 13 ], p. 199–2 00, indicates that the 1990 doubly missing p eople included ab out 5.6% of the African-Amer ican men in the po pulation, ab out 1.0% o f the African-American women, 1.2% of the men of other races, and ab out 0 .6% of the women of o ther races. Me n, ab out 2 . 2 million of them, outnum b er women, ab o ut 0 . 8 million of them, by ab out 1 . 4 million. Demographic Analysis is thought to b e esp ecially a ccurate for African Americans be cause undo cu- men ted immigr ation contributes little to their n um ber s. The higher estimated rates of b eing doubly missing for Afric a n-American men are in line with expecta tions and help co rrob or ate the ca lculations. The Census Bureau, in conjunction with the ev aluation studies in 199 0 and 2000, presented estimates which were labeled “Correla tio n Bia s” and were entered into the Bureau’s T otal Err or Mo del. E x amination o f the ca lculations, howev er, shows that these Burea u ﬁg ures w ere not estimates of total corr elation bia s, but rather of the ex cess of ma le ov er female corr elation bias. A full discussion is given in W ach ter and F reedman [ 13 ], p. 200–2 22. When this distinction is taken into acc ount, the Bureau’s estimates ar e in go o d agreement with the estimates cited here. Unlik e diﬀeren tials for broad demographic subg r oups, g eogra phical distributions for doubly missing p eo ple are not amenable to calculation. That is the chief po int ab out doubly mis s ing people. They are missing from b oth census and survey , and information ab out their lo c ations is not at hand. There is, ho w ever, str ong cir cumstantial evidence that the doubly missing p eo - ple in 1990 and 2000 were very unevenly distributed acr oss re g ions of the coun- try . The evidence is set out in W ach ter a nd F reedman [ 13 ], p. 2 02–20 7, F reed- man a nd W ach ter [ 7 ], p. 6–7, and W ach ter [ 12 ], p. 110– 113. In b oth 1 990 and 2000 tabulations, states in the northeast and midwest with lar ge central cities 240 K. Wachter and heavy concentrations of African Americans hav e unexp ectedly lo w DSE ﬁg- ures for net undercounts. The underco un t diﬀerences betw een these states and those in the west are not fully explained by upstate areas with low er mino r ity concentrations, a nd they sho w up shar ply when metro p o litan ar e as are compar ed. Since Africa n-American men are over-represented a mong the doubly miss ing, it seems plausible that the doubly missing ar e over-represented in are as where African Americans are kno wn to b e numerous and where undercounts are unexpectedly low. This circumstantial ev idence applies at a very high level of geogr aphical aggre- gation and it is at b est s ug gestive. W e cannot c laim to know in any detail how the doubly missing ar e distr ibuted, e ven b y region, and certainly not by sta te or sub-state area. As shown in W ach ter a nd F reedman [ 13 ], p. 2 05, the n um ber s o f doubly missing a r e lar g e enough to alter the qua lita tive pattern of DSE-based a d- justmen ts to the pr o p ortional shar es of states in the national p o pulation. In brief, due to the do ubly missing peo ple, the cov erage sur veys are not pr oviding meaningful information ab out g eogra phical gradients in coverage. Doubly miss ing p eo ple are o nly o ne o f the pro blems that undermine geo g raphical breakdowns. Doubly missing p eople aﬀect geogr aphy at the level of broad regions and at the level o f s tates. E ﬀorts to car ry down estimates to ﬁner levels of geog - raphy , to congr essional districts, cities, counties, and ultimately to blo cks come up against the limitations of a “synthetic assumption” discussed in Br own et al. [ 3 ] and W ach ter and F reedman [ 14 ]. A t all geog r aphical levels, coverage surveys ar e not suited for es timating geogr a phical v ar iations in cov erage. 5. Samples for the future Recognition tha t census cov erage surveys ca nnot r eliably mea s ure v ar iability from place to place has far-r eaching implications for the design of future coverage surveys. The large sample s iz es in 1990 a nd 20 00 have b een motiv ated by the ultimately unsuccessful attempts to g enerate geog raphically diﬀerentiated estimates. Restoring emphasis on the o riginal purp ose, the repo rt-card function, foreg oing g eogra phical sp eciﬁcity , allows muc h smaller sa mple sizes to b e suﬃcient. The r esources that would b e set free b y r eductions in sample sizes inv olv e savings of many tens of millions of dollar s. A small part of the savings could be direc ted int o cov erage improvemen t, quality c o ntrol, and ﬁeld follo wup activities and bring ma jor b eneﬁts to overall accuracy . In the pa st, r andom sampling error s, which ar e reduced by increases in s ample size, have b een dominated by systematic err ors like the pro cessing erro rs discus sed in Section 3 , which go under the heading of “non- sampling erro rs”. Non-sampling err ors are made har de r to control and measure by increases in s ample size. Smaller sa mple sizes y ield dividends in quality as well as cost. Spec iﬁc recommendatio ns ab out sample sizes need to b e pr edicated on detailed calculations abo ut o ptimal s ample allo cation a nd its in teraction with choices ab out sampling strata as well as po s t-strata. A rough idea of a ppropria te sample sizes can, how ev er, be gleaned from compariso ns with the 2000 p ost-enumeration survey A CE. A motiv atio n for the large sample size of a b out 3 00 , 0 00 housing units in A CE was an early aim to be in a p os itio n to provide direct survey-based estimates for state p opula tio ns for co ng ressiona l a pp o rtionment. Census B ureau ca lculations then determined that sampling erro rs could b e kept within ac c eptable limits when the ACE sample was sub divided in to as many as 44 8 p ost-s trata. Cover age 241 The 200 0 ACE pos t-stratiﬁcation inco rp orates divisions by demographic gr oups and by geo graphica l categorie s . The demogr aphic bre akdown uses 7 age- sex cla sses and 6 main catego r ies of rac e a nd ethnicit y . (The ra cial category of Amer ican In- dians and Ala sk an natives is further subdivided, by an extra criterio n of lo cation, betw een thos e on reser v ations and those oﬀ rese r v atio ns .) Each of these demo- graphic gro ups is subdivided to diﬀering degre e s a mong four reg io ns, eight types of places, and a mong rent ers and owners, a distinction included for the sake of synthetic estimation for small ar eas. Unfortunately , one of the 7 age-sex class es merges ma les a nd females, making co nsistency chec ks a nd co mparisons awkward. A balanced design would have had 8 ag e-sex classes. Details are shown in a ta ble in Brown and Zha o [ 4 ] in this v olume. The demographic brea k downs for the 2 000 ACE p ost-s tr atiﬁcation, then, com- prise essentially 8 × 6 = 48 main demog raphic g r oups. A sample size of 4 8 / 448 or ab out a tenth of the ACE s ample size would b e mo re or less suﬃcient to meet the same targets for s a mpling error for demog raphic gro ups as were met in 2 000. This line of reaso ning suggests a sample size for future coverage s ur veys on the o rder o f ab out 30 , 000 housing units, that is to say , in the se veral tens of thousa nds, not in the hundreds of thous a nds. A complementary ca lculation is given in F reedman a nd W ach ter [ 7 ], p. 1 4 –15, 22– 23, 31. T he ca lculation there takes into a ccount within-po ststratum heteroge neit y across areas a t le vels inferred from studies of proxy v ar iables. The calculatio n lea ds to suggested sa mple sizes a little less than 30 , 000 housing units. Suc h a sample size would s till allow separate estimates of c ov erage b y t ype of place, by r enter-o wner status, by ma il-back rate, and by other v ariables of in terest. Ma rginal tabulations would remain usable. Only cross c la ssiﬁcations, crossing such v a riables w ith de- mographic breakdowns, w ould be swamped by sampling error. Cros s classiﬁcations serve little purp ose in any case, once it is recog niz e d that carr y ing down underco un t estimates to s mall geogra phical ar eas is undermined by the scale o f non-s ampling error s and spatial heter ogeneity , alo ng with doubly miss ing p eo ple. The most imp orta n t tabulations fro m any coverage s urvey ar e thos e that c an be combin ed with corresp onding tabulations from demog r aphic a na lysis. Ag e , sex, and race are paramount. A limitation of Demogr a phic Analysis is the absence, as yet, of separ ate e stimates for the Hispanic p opulation and for As ia n Americans. Some r esearch has b een underwa y at the Census Bureau regar ding the feasibility of extending or supplementing D A with breakdowns for Hispanics bas e d on other administrative r ecords. The r esources inv ested in D A by the Census Bure au hav e bee n suprisingly small, given the central r ole D A has played in 1990 and 2000. A couple of parts p er h undred of the budgets inv ested in cov erage surveys lik e A CE c o uld readily and wisely b e redir e c ted into developing Demogra phic Analy- sis. Reliable estimates of coverage fr om future cov erage surveys dep end not only on D A but also on estimates of pro cessing err or from success ful ev aluation studies, counterparts of the P-Studies and E-studies of 1990 and 2 000. Smaller s amples for the cov e r age survey itself should lea d to tighter initia l co nt rol of pro cessing er rors and should also allow lar ger budgets for ﬁeld follow-up for ev aluation. Ev aluation studies of suﬃcient s cop e to supp ort breakdowns of pro cessing error by age as well as sex would fa cilitate tighter consistency chec ks on the c haracteris tics of unreached peo ple. Cov erage sur veys do no t stand o n their own. Their v alue comes fr o m their int egration with the bro ader data collection pro gram, with the Census enumeration, the ev a lua tion studies, and Demogr a phic Ana lysis. 242 K. Wachter 6. The Ameri can Communit y Surv e y The sug gestions made in Sectio n 5 apply to any administrative setup for cov erage surveys that the C e ns us Bur eau ma y ado pt. Mor e sp eciﬁc r e ﬂections ar ise with regar d to the B ureau’s ma jor innov ation for the b eginning of the new century , the American Communit y Survey . The inaugura tion of the Amer ic a n Communit y Survey (A CS) and the pla nned replacement of the census long form with ACS data o ﬀer a new alternative for census coverage mea surement. Pro g ress and plans for the A CS a re des c rib ed in U. S. Census Bureau [ 11 ]. At full implementation, the A CS will b e surveying ab out 250 , 000 housing units p er month. F o r the sake of contin uit y with the lo ng form, ACS metho ds are a ligned with traditional pro ce dur es for the decennia l census. Mail-out and mail-back a re supplemen ted with computer-as s isted telephone interviews and per sonal interviews on a sample basis. Implementation has b een p ostp oned a few times, but it should arrive b e fo re 2010 . Residence rules in the ACS for assigning resp o ndents to lo cations are based on a concept of “current residence” in contrast to the census deﬁnition of “usual res - idence”. Pr inciples and details a re discuss ed in Cork a nd V oss [ 5 ]. At the time of the decennial census, it w ill b e essential to hav e calibration s tudies which a llow data users to tra nslate b etw een the r esidence concepts of the tw o sources. Conse- quently , there will b e a pres sing need for the temp or ary inclusion in the A CS of questions eliciting census-day r esidence. That need arises independent of a ny re - lation to census coverage measur e men t. How ev er, these considerations o p en up an opp ortunity for replacing a separa te census cov erage survey in 201 0 a nd beyond with a subsa mple o f ACS returns from a p erio d follo wing the Census . The 1980 PEP oﬀers a pr ecedent. It made use of tw o wav es from the Current Population Survey (CPS) in pla c e o f a separ a te P-s ample. The American Commu- nit y Survey is muc h b etter s uited than the CPS for such a role, because of the pre-existing a lig nment of its pr o cedures with decennial census pr o cedures. Beneﬁts would also accrue directly to the A CS from a progr am for matching ACS resp onses to decennial Ce ns us r ecords, given the imp ortance of ca librating s urvey results against Census tabulatio ns. A limitation of the American Comm unit y Survey for the purpos e of cov erage measurement is the lower res po nse r ates that it achiev es. The Census long form has alwa ys had low er resp onse rates than the short for m, and the ACS inevitably contin ues this pattern. W ere non-resp onse distributed at ra ndom, it would not aﬀect results. The 1990 and 2 000 p os t- e n umeration surveys had resp onse rates no b etter than the c e ns uses. But, of course, non-r esp onse is not dis tributed at random. W e exp ect it to b e concent rated amo ng the har d-to-count. Use o f the A CS in pla ce of a separ ate p ost- e numeration s urvey would not a lleviate the proble m of correlation bias, and no n- resp onse might well be larger than in a separate p ost-enumeration survey . But with num bers of doubly missing p eople alr e ady in the millions , sma ll incre- men ts or dec r ements in co verage har dly matter very muc h. The pr oblem of cor rela- tion bias cannot be addressed in any case with incremental improv emen ts in s ur vey design. Credible estimates requir e combination of s ur vey information with inde- pendent sources, primarily with Demogr aphic Analysis, and call for a willingness to foreg o geo graphica l breakdowns whic h the data canno t susta in. Counterbalancing the limitations o f the American Co mm unit y Survey fo r cov- erage measurement, there a re dis tinct adv antages. As a n ongoing ent erprise, the American Communit y Survey is no t sub ject to the mas sive log istical challenges o f Cover age 243 a one-oﬀ enterprise like the 1990 PES or the 2000 ACE. The survey staﬀ is a lready trained and alr eady exp er ienced. Reg ional o ﬃce s and systems for lo ca l o per ations are alr eady in place. Bug s and kinks in pro cedur es a re alr e ady being resolved as they turn up. The prosp ects for reducing pro cess ing error s fro m so urces of mea- sured bias are therefor e more favorable in the context o f the American Communit y Survey than with a separate sur vey . Each of the separa te cov erage surveys of the las t several decade s has brought surprises, forcing the Census B ur eau into last- min ute or after-the-fact r egrouping. A computer-co ding erro r in 1 9 90, a co urt decision that derailed plans for sample- based Census followup fo r 2 000, and an unanticipated num ber of undetected du- plications in the 2000 enumeration ar e e x amples. The unex p ected will never b e wholly banis hed, but an established day-by-da y functioning enterprise like the ACS is vulnerable to few er log istic uncer tainties than a large-sca le s eparate o pe r ation mounted under press ur es of time. With rich ness of information co mparable to the Census long form, the American Communit y Survey supplies a more secure basis for matching individua ls to Census records than tr a ditional cov erage sur veys. Matching error and erro rs o f imputation of ca s es with unknown match status hav e b een tw o se rious so urces of pr o cessing error identiﬁed by ev aluatio n studies. With b etter initial matching, ﬁeld follow-up can b e concentrated where it is most ne e de d. F urthermor e, with the ACS, heavy inv es tmen t o f resources in ﬁeld follo w-up for the har d- to-ﬁnd would b e ea sier to justify , b ecause of the do uble pay oﬀ for A CS data quality along with coverage assessment. Accounting for mov ers has been a thorn y part of cov erage mea surement. In- mov ers who arrive at an address in time for the survey but lived elsewhere on Census Day have to be matched to a Census recor d at a previo us a ddress to b e conﬁrmed as co rrect enumerations. Out-mov ers who a re recorded in the Census but gone b y survey da te have to b e distinguished from er roneous en umerations. The richer informatio n a bo ut r esp ondents in the ACS c ould facilitate mov er matching. Estimation for mov ers co uld b eneﬁt from the built-in co ntin uit y of American Communit y Survey o pe rations ov er time. PES and A CE oper ations w ere them- selves spread out over man y mo nths. Later data mainly ca me from harder cases. Duration since Census da y was co nfounded with intrinsic diﬃcult y of enumeration. The American Communit y Survey would furnish estimates for mov ers from ordi- nary , randomly-selected cases a s a function of days s ince Census day . Such time series oﬀer checks on estimates for mo v ers. Pre s umably , the A CS will b e susp ended during the a ctual taking of the Census , so a s to av oid direct in terference with the enum eration. Some start-up eﬀects asso ciated with resumption would o ccur, but not on the sca le a sso ciated with a se pa rate c overage survey . Int egration of coverage measurement into the America n Commu nit y Survey might hav e broade r be neﬁts for coverage improvemen t. The A CS will b e contin- uously generating indicators o f cov erage diﬃculty for a sample of lo cal area s ov er the stretch of time leading up to the Census enumeration. Mea sures lik e the ra te o f cases with “insuﬃcient infor ma tion” (I I’s) discussed in this volume by Br own and Zhao [ 4 ] have been used a s proxies for net underco un t in past s tudies of heterogene- it y , W ach ter a nd F reedman [ 14 ]. An ACS-based time se r ies of such proxies co uld provide “leading indicator s ” of cov erage risk and a ba sis for a fair and systematic allo cation of sp ecial resources for intensive ﬁeldwork. Using data from the American Communit y Survey in place of a separa te P- sample would only require a small ACS subsample, m uc h less than a single month’s stream o f data . The ability to draw stratiﬁed subsamples from the ACS with ov er- 244 K. Wachter sampling of key har d-to-count s ubgroups on the basis of already recorded c harac- teristics would b e a b onus. In summary , whether o r not cov erage measurement is int egrated into the Amer- ican Communit y Survey in 2010 or in succeeding c e ns uses, less ons fro m past e x pe - rience should gov ern future pla nning . Census cov erage sur veys should b e designe d to do well what they can do well. Smaller samples and be tter integration ca rry many b eneﬁts. Sa vings in co s ts can b e directed into int ensive matching and into ev aluatio n follow-ups. Under a ny foreseeable arrang ement, doubly missing p eople will still b e preventing reliable geog raphical brea kdowns for underco unt. But bet- ter c o ntrol ov e r mea sured bias es can a llow more precise compariso ns with results from Demo g raphic Analy sis. The result can b e sup erior estimates of cov erage for po pulation s ubgroups, class iﬁed by ag e, sex, and race, a nd a more cogent rep ort card for the evolving dec e nnia l census. References [1] Anderson, M. and Fienberg , S. E. (1999). Who Counts? The Politics of Census-T aking in Contemp or ary Americ a . Russell Sag e F oundation, New Y or k. [2] Breiman, L. (199 4). The 199 1 census adjustment: Underc o unt o r bad data? (with discussion). S tatist. S ci. 9 458–53 7. [3] Bro wn, L. D., Ea ton, M. L., Freedman, D . A., Klein, S. P ., Olshen, R. A ., W a chter, K. W. , Wells , M. T. and Yl visaker, D. (1999). Sta- tistical controv ersies in Cens us 2 000. Jurimetrics 9 347– 375. [4] Bro wn, L. D. and Zha o, A. (2007). Alternative for mulas for s ynthet ic dual system estimation in the 20 00 census. In Pr ob ability and Statistics: Essays in Honor of David A. F re e dman (D. Nolan a nd T. Sp eed, eds.) 9 0–113 . Institute of Mathematical Statistics. [5] Cork, D. and Voss, P. , eds. (2 006). On c e, On ly Onc e, and in the Righ t Plac e . National Acade my P ress, W ashing to n, D. C. [6] F a y, R. E. and Thom pson, J. H . (199 3). The 1990 p os t enumeration survey: Statistical lesso ns, in hindsig h t. In Pr o c e e dings, Bur e au of the Census Annual R ese ar ch Confer enc e . W as hington, D. C. Burea u of the Census. [7] Freedman, D. A. and W a chter, K. W. (2 0 03). On the likelihoo d o f improving the accuracy o f the census through statistical adjustment. In A F est schrift for T erry Sp e e d (D. Goldstein and S. Dudoit, eds.) 197– 230. Institute of Mathematica l Statistics . MR20043 39 [8] Hogan, H. (199 3). The 199 0 p os t-enumeration survey: Op era tions and re s ults. J. A mer. Statist. Asso c. 88 1 047– 1 060. [9] Office of the Secret ar y of Commer ce ( 1991). De cision on Whether or Not a S tatistic al Ad justment of the 1990 De c ennial Census of Population Should Be Made for Cover age Deﬁciencies R esulting in an Over c ount or Un - der c ount of the Population, Explanation. Three volumes. Reprinted in part in F e der al R e gister 56 33582 –336 4 2 (July 22 ). U.S. Department of Commerce, W ashingto n, D. C . [10] Robinson, J. G., West, K. K. and A dlakha, A. (2002). Cov erage of the p opulation in c e ns us 2000: Results from demog raphic a nalysis. Population R ese ar ch and Polic y R eview 21 19–3 8. [11] U. S. Census Bureau (2003). Americ an Community Su rvey Op er ations Plan . U.S. Department of Commerc e, W ashington, D. C. [12] W a chter, K. W. (199 3). The census adjustmen t tr ia l: An exchange. Juri- metrics 34 1 07–11 5. Cover age 245 [13] W a chter, K. W. and Freedman, D. A . (2000a). The ﬁfth cell. Evaluation R evi ew 24 19 1–211 . [14] W a chter, K. W. and Freedman, D. A. (2000b). Mea s uring lo ca l hetero- geneity with 199 0 U. S. census da ta. D emo gr aphi c R ese ar ch 3 1 –22.

The future of census coverage surveys

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment