Bayesian models to adjust for response bias in survey data for estimating rape and domestic violence rates from the NCVS
It is difficult to accurately estimate the rates of rape and domestic violence due to the sensitive nature of these crimes. There is evidence that bias in estimating the crime rates from survey data may arise because some women respondents are "gagge…
Authors: Qingzhao Yu, Elizabeth A. Stasny, Bin Li
The Annals of Applie d Statistics 2008, V ol. 2, No. 2, 665–686 DOI: 10.1214 /08-A OAS160 c Institute of Mathematical Statistics , 2 008 BA YESIAN MODELS TO ADJUST F OR RESPONSE BIAS IN SUR VEY D A T A FOR ESTIMA TING RAPE A ND DOMESTIC VIOLENCE RA TE S FR OM THE NCVS 1 By Qingzhao Yu, Elizabeth A. St asny and Bin Li L ouisiana Sta te University and Ohio State U niversity It is difficult to accurately estimate the rates of rap e and domestic violence due to the sensitive nature of th ese crimes. There is evidence that bias in estimating the crime rates from surv ey d ata ma y arise b e- cause some w omen resp ondents are “gagged” in rep orting some types of crimes by the u se of a telephone rather than a personal interview , and by the p resence of a spou se during the interview. On t he other hand, as data on these crimes are collected every year, it would b e more efficient in data analysis if w e could identify and m ake use of information from previous data. In this pap er we propose a mo del to adjust the estimates of the rates of rap e and domestic v iolence to account for the resp onse bias du e to the “gag” factors. T o estimate parameters in the model, we iden tify the information that is not sensi- tive to time and incorp orate this into prior distributions. The strength of Bay esian estimators is their ability to combine information from long ob serva tional records in a sensible wa y . Within a Bay esian frame- w ork , we develop an Exp ectation-Maximization-Ba yesian (EMB) al- gorithm for computation in analyzing con t ingency t able and w e app ly the jackknife to estimate the accuracy of the estimates. Our approach is illustrated using t he yearly crime data from the N ational Crime Victimization Survey . The illustration show s th at compared with the classical metho d, our mod el leads to more efficient estimation but does not require more complicated compu tation. 1. In tro d uction. Rap e and d omestic violence rates are d ifficult to esti- mate b ecause of difficulties in colle cting data on these crimes. Ann u al rap e incidence r ates in the U.S. obtained from p olice s tatistics, rep orted thr ou gh Received March 2007; revised Jan uary 2008. 1 Supp orted in part by aw ard number 93-IJ-CX- 0050 from the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice, and by the Cen ter for Survey Research Sum mer F ellow ship, OSU. Key wor ds and phr ases. Categorical data analysis, contingency table, EMB algorithm, incompletely classified data, Jac kknife method, panel survey, survey mo de. This is a n e lectronic repr int of the o riginal article published by the Institute of Ma thematical Statistics in The Annals of Applie d Statistics , 2008, V ol. 2, No. 2, 6 65–68 6 . This repr int differs from the origina l in paginatio n and typogra phic deta il. 1 2 Q. Y U, E. A. ST A SNY AN D B. LI the Unif orm Crime Rep ort (UCR), were estimated to b e 0.3 p er 1,000 p er- sons in females age 12 an d older [Bureau of Justice Statistics ( 2002 )]. But the ma jorit y of b oth rap e and domestic violence in ciden ts are not rep orted to p olice. Data f r om the National W omen’s Stud y , a longitudinal telephone sur- v ey of a n ational household pr obabilit y sample of wo men at least 18 yea r s of age, sh ow that 683,000 w omen we re f orcibly rap ed eac h y ear and th at 84% of rap e victims did not r ep ort the offense to the p olice (CDC’s National Center of Inju r y Prev en tion and C on trol web site). Thus, we b elieve that the UCR underestimates the r ates of su c h crimes. The National Crime Victimization Survey (NCVS) is the b est national source for estimates of rates of rap e and domestic violence in the Un ited States. It includes crimes r ep orted and not rep orted to the p olic e. But NCVS rates migh t still b e b iased b ecause: (a) the definition of “criminal” rap e or domestic violence w as left to the resp onden t who ma y defi n e rap e d ifferen tly relativ e to the legal defi nition; (b) p ersonal NCVS in terviews may not b e conducted confiden tially (with the r esp ondent alone); and (c) telephone interviews ma y not b e suffi cien tly priv ate. Leggett et al. ( 2003 ) rep orted that p eople are more likel y to r ep ort in a face-to-face in terview, compared with a telephone interview. In this p ap er we deal with resp ond en t bias caused by (a) pr iv acy concerns in telephone inte rviews and (b) the in-p erson in terviews not conducted confidentia lly . It is imp ortan t to ha ve accurate and reliable estimates of rap e and d o- mestic violence r ates. The crimin al justice communit y , f or example, needs accurate estimates of these rates to ev aluate ho w w ell it meets th e needs of victims, and whether criminal justice interv ent ions help reduce the rates of rap e or domestic violence. Since the NCVS is a p opulation-based data source for estimating crime rates, in terven tions to r ed uce r ap e (or domestic violence) can use the NCVS data to help ev aluate their efficacy . The NCVS survey can b e conducted by ph one or in -p erson. Other in di- viduals are allo wed to b e pr esen t during the interview. Ou r researc h on the NCVS data (see Section 2.2 ) su ggests that rap e and domestic violence inci- den ts are un der-rep orted by women in telephone in terviews, or if a sp ouse is present du ring the inte rview. W e refer to these as the gag factors in re- p orting rap e and domestic violence. Our research also indicates that the effect of gag f actors in un derrep orting crimes is relativ ely constan t o ve r time. In the follo wing w e deve lop a Bay esian mo del that allo ws the use of b oth curren t data and previous inf ormation to estimate rates of rap e and domestic violence wh ile taking in to account p ote n tial un der-rep orting of these crimes du e to the gag effect. This pap er presents a Ba y esian mo del for estimating classification probabilities and p robabilities of resp onden ts’ bias. Our mo dels tak e in to accoun t the resp onden t bias caused by kn o wn influentia l factors. Moreo v er, in estimating the distrib u tions of int erest, we iden tify from the previous data the information that is not sensitiv e to time and us e it to bu ild prior d istr ibutions for some of the parameters, so that BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 3 the previous in f ormation is efficien tly used in estimation. W e also pro vide computation metho d s that allo w complicated p osterior d istributions to b e explored through simple iterations. W e u se the jac kknife in the Ba yesia n en viron m en t to calculate the acc uracy of the estimates, and then compare the results to estimates obtained from the classical metho d. W e illus tr ate our metho dology by estimat ing rates of rap e and domestic violence usin g the p ublic NCVS d ata. The bac kground literature for our resp onse-bias-adjusted Ba yesia n mo d- els is based on three lines of researc h. First, extensive researc h h as b een done on metho d s to capture information on resp onden t bias. Our mo d el h as its r o ots in th e work of Stasny and Cok er ( 1997 ), who id en tified the pr ob- lem of resp onse bias in self-rep orted crime and d ev elop ed a mo d el to adjust for the b ias. In this pap er w e mo d ify the mo del and in corp orate historical information to impro v e estimation. The second line of bac kground researc h concerns the extension of the EM algorithm to fin d empirical Ba yesian es- timates in analyzing con tingency tables. S ee Little and Ru bin ( 2002 ) f or an in tro duction to EM algorithm; Carlin and Louis ( 2000 ) and Bishop, Fien- b erg and Holland ( 1975 ) f or an introd uction of emp irical-Ba y esian estima- tors of cell probabilities. W e extend the EM algorithm to find empir ical Ba y esian estimates b y addin g a third step, the B step, in w hic h we cal- culate empirical-Ba yesian estimates based on cur ren t maximum lik eliho o d estimates. W e call this an Ex p ectatio n-Maximizatio n -Ba y esian (EMB) al- gorithm. T o our knowledge , this is the first pr op osal for s uc h an algo rithm, although there are Ba ye sian ve rsions of EM b y Gelman et al. ( 2004 ). EMB is u seful in estimating surv ey classification and resp ondent bias caused b y an y infl uent ial factors when pr ior inf ormation is a v ailable. Third , we u s e the jac kknife metho d [Efr on ( 1987 )] in the Ba y esian en vir on m en t to measure the accuracy of the estimates. In the next s ection w e pro vide a brief description of the NCVS and an exploratory analysis of the d ata. Section 3 p resen ts a mo del that adjusts data for gag factors. Section 4 discus ses efficiency gains usin g the Ba y esian mo del and describ es buildin g prior distributions, the EMB alg orithm and ho w to use the jac kknife m etho d in the Ba y esian environmen t. Section 5 present s the results of analysis on NCVS. Finally , Section 6 p oints out some directions for future r esearc h. 2. The National Crime Victimization Surve y and data. 2.1. Survey design. The National Crime Victimization S urvey is adm in- istered by the US Census Bur eau on b ehalf of the Bureau of J ustice Statis- tics. The sur v ey has b een collecting data on p ersonal and household vic- timizatio ns since July 19 72. It w as formerly kno wn as the National Crime Survey b efore its redesign in 1989 w hen the curr en t su rv ey metho dology 4 Q. Y U, E. A. ST A SNY AN D B. LI b egan systematic field testing. T he first annual resu lts fr om the redesigned surve y were pub lished in 1993 [Bureau of Justice S tatistics ( 1995 )]. There w ere some fu r ther changes to the sur v ey after that, but the data collection pro cedur e and instru men t that pr o vides imp ortant in formation to our anal- ysis w ere consistent ov er these y ears. W e u se d ata from 199 3 to the most currentl y av ailable data online. The NCVS is the primary source of n ational- lev el information on victimizations, includin g n ot only crimes rep orted to the p olice, bu t also those not rep orted to la w enf orcement authorities. W e briefly describ e the NCVS in the rest of this su bsection. Add itional in f or- mation on the design and history of the NCVS is pro vid ed, for example, b y the U.S. Department of J ustice and Bureau of Ju stice Statistics ( 2001 ) and Stasn y and Cok er ( 1997 ). The NCVS is compr ised of a str atified, multi-stag e, cluster sample of hous- ing u nits (HUs). This ongoing s u rv ey seeks to obtain a repr esen tativ e sample of individuals 12 yea rs of age and older living in hou s eholds or group quarters within the United States. Semi-annual data on th e frequency , c haracteristics and consequences of crimin al victimizatio n are collected fr om app ro ximately 49,000 hous eh olds comprising ab out 100,000 p ersons. Th e NCVS u ses a ro- tating panel design whereby a samp led HU is maintai ned in th e sample for sev en panels with inte rviews conducted at six-month interv als. The fir st in- terview conducted within a household is considered a b ounding inte rview, whic h is not p ublished b ut is used as a con tr ol to av oid duplicate rep orting of an inciden t. New h ouseholds r otate in to the sample on an ongoing basis. During the interview, in dividuals are asked ab out crimes committed against them or against the h ousehold (HH) in the past six months. The crimes are catego rized as p ers onal (whic h includ es rap e/sexual assault, robb ery , ag- gra v ated/simple assault and p ersonal larceny) or prop erty (whic h includes burglary , auto or motor ve hicle theft, th eft and v andalism) related. Crimes not cov ered include kidnapp ing, murder, shoplifting and crimes that o ccur at places of busin ess. Th e survey instru m en t is comp osed of a scr eenin g sec- tion and an incident rep ort. A single HH resp ond en t is asked a series of six screening questions to elicit inform ation on crimes committed against the HH (e.g ., bur glary , larcen y , m otor v ehicle theft). Next, an elev en-question screener is used to elici t information from eac h in dividual in the HH con- cerning p ersonal crimes committed against th at individual. If any screening question elicits a p ositiv e resp onse, an incident rep ort is filled out. The r ep ort is designed to obtain detailed data on the charac teristics and circums tances of the crime, s u c h as the mon th , time, lo cation of the in cidence, relationships b et w een victim and offender, offender charac teristics, self-pr otectiv e actio ns, t yp e of pr op ert y lost, wh ether crime was rep orted to the p olice, consequences of the victimizat ion and offender use of we ap ons, dr ugs or alcohol. The initial NCVS interview at a housing u nit must b e conducted in p er- son. The su b sequent survey conta cts at the same add ress could b e condu cted BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 5 either thr ough telephone or f ace-to -face. Primarily for cost reasons, phone con tacts are emp hasized in the later in terviews; a face-to-face in terview is conducted only when it is inefficien t or infeasible to mak e con tact b y ph one. Since r esp ond ents are ask ed to describ e the victimization, the lac k of priv acy can influen ce resp onses d uring a telephone interview. Id eally , a p ersonal in- terview is conducted and the in terview er and r esp ondent are alone dur ing the in terview. In our data, appro x im ately 45% of p ersonal int erviews are conducted alone with the resp ond en t. Ho wev er, this is not alw a ys p ossible. In neither the phone nor p ersonal su r v eys are in terview ers instructed to es- tablish a priv ate inte rview setting. Dur in g the face-to-face int erview, if th e resp ond en t is not alone, the interview er indicates on the questionnaire who else is pr esent. When a telephone in terview is conducted, the resp onden t is not ask ed so we h a ve n o information ab out whether other individuals are present . Although there are limitations in using the NCVS to estimate rates of b oth rap e and domestic violence , this d ata set is the only on-going large and nationally repr esen tativ e sur v ey to ask individuals directly whether they ha ve b een victims of sp ecific crimes. 2.2. The survey data. Our analysis d ata set includes all wome n 16 y ears of age or older in the NCVS d ata base, with the exception of p ro xy int erviews for th e yea rs from 1993 to 2004. W e use the NCVS data from 1993 to 1997 as pr ior information and condu ct the analysis on d ata fr om 1998 to 2004. It is n ecessary to com b ine inf orm ation from a n um b er of yea r s b ecause rap e and domestic violence are relativ ely rare ev en ts. W e combine the data f or the y ears 1993 to 199 7 and for the y ears 199 8 to 2004 since a descriptiv e analysis sho ws that although the crime r ates increases fr om th e first p erio d to the second p eriod , the crime rates in eac h p erio d are almost constan t. W e recognize that there exists a p oten tial correlation in resp onses f rom the same wo m an o v er time. But this corr elation should not p r esen t a significan t problem in our analysis sin ce: (a) the survey is con trolled so that n o crime is rep eatedly recorded; (b) th e data are collec ted so that p eople ha ve the same c hance of b eing in cluded in the samp le; and (c) an analysis is also p erformed u sing we igh ted data, wh ic h should b etter represen t the p opulatio n of interest. The ra w, unw eigh ted data from 1998 to 2004 b y type of crime, typ e of in terview and who was presen t during p ersonal inte rviews are present ed in T able 1 . W e divide the crimes into four group s: rap e, domestic violence, other assault and p ersonal larcen y . Rap e, attempted rap e and s exu al as- sault are categorized as rap e. The t yp es of crimes in cluded in the domestic violence and other assault categ ories are exactly th e same. If the offender is an in timate, then the assault is categorized as domestic violence. Other- wise, it is categorized as other assault. An inti mate is d efined as a sp ouse, 6 Q. Y U, E. A. ST A SNY AN D B. LI ex-sp ouse, b oyfriend or ex-b o y f riend. Pe rsonal larcen y in cludes p urse sn atc h- ing and p o ck et pic king. W e exclude verbal crimes in our analysis b ecause these kinds of crimes are hard to d efine. T able 1 lists frequ en cies and rates of crimes rep orted by in terview ers. F our categories are used to describ e who w as p r esen t durin g the p ersonal in terview: (1) a sp ouse and no one else (lab eled Sp ouse), (2) a s p ouse and at least one other p erson (Sp ouse and Other), (3) at least one p erson but no sp ouse (Other), and (4) no one else present (Alone). As m en tioned p r e- viously , we do not kn o w who is presen t with the r esp ondin g w oman d u ring telephone in terviews. Note that the ra w data rep ort cr im e rates p er 1,000 w omen inte rview ed as follo ws: 0 . 79 rap es, 1 . 66 incidence of domestic vio- lence, 4 . 14 other assaults and 0 . 60 incidence of p ers onal larcen y . If we consider the rates of v arious crimes in T able 1 b y t yp e of in terview, w e fin d that there are some large differences as shown in Figure 1 . Except for p ersonal larcen y , more crimes we re rep orted in p ersonal in terviews than in telephone interviews. F or example, rap e w as rep orted at a r ate 1.45 times higher in p ersonal in terviews compared to telephone in terviews; domestic violence wa s rep orted at a rate 1.33 times h igher and other assault w as rep orted at a rate 1.11 times higher. Th us, the telephone in terview app ears to hav e a gag effect in the rep orting of crimes. F rom T able 1 , we also note that rates of rep orted crimes in p ersonal in terviews dep end on who wa s present d uring the int erview. F or p ersonal T able 1 F r e quencies and r ates of crimes r ep orte d by settings of the interviews: NCVS 1998–2004 Numbers of in cidents rep orted by type of p ersonal Interviews crimes (rates p e r 1000 interviews) Ty p e of Who Domestic Other P ersonal No crime Interview present Number Rape violence assault larceny rep orted T elephone Unknown 412339 288 516 1539 270 4 09726 (0 . 70) (1 . 25) (3 . 73) (0 . 65) (993 . 66) Sp ouse 24063 4 12 45 10 23992 (0 . 17) (0 . 50) (1 . 87) (0 . 42) (997 . 05) Sp ouse 14322 5 2 29 2 14284 P ersonal and Other (0 . 35) (0 . 14) (2 . 02) (0 . 14) (997 . 35) Other 48916 6 162 350 18 48320 (1 . 35) (3 . 31) (7 . 16) (0 . 37) (987 . 82) Alone 71708 86 256 403 41 70922 (1 . 20) (3 . 57) (5 . 62) (0 . 57) (989 . 04) All p ersonal 159009 161 432 827 71 157518 (1 . 01) (2 . 72) (5 . 20) (0 . 45) (990 . 62) All interviews 571348 449 948 2366 341 5 67244 (0 . 79) (1 . 66) (4 . 14) (0 . 60) (992 . 82) BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 7 in terviews, we compare other situ ations to the case in wh ic h the w oman wa s in terviewed alone, since that is considered the ideal case. Comp ared with a w oman wh o was in terviewed alone, rap e wa s rep orted ab out one-fifth as frequent ly when a sp ouse w as pr esen t (either with or without others). Con- sidering d omestic violence, th e in ciden t was rep orted approxi mately one- ten th as frequen tly if a sp ouse was p resen t. Th e other assault category w as rep orted 1 / 2 . 93 as fr equen tly , and p ersonal larcen y was r ep orted 1 / 1 . 84 as frequent ly if a sp ouse w as presen t d uring the in terview. Pa r t of this rep orting differen tial ma y b e explained b y th e pr otectio n offered by ha ving a sp ouse in the household. All rates of rep orted crimes we re lo wer when a sp ouse w as present , ho wev er, the rep orted r ates for rap es and domestic violence are sig- nifican tly lo wer than those obtained when the w oman was int erview ed alone. W e therefore b eliev e that the sp ouse b eing p resen t du ring an in terview has a differen tial infl uence on the rep orting of rap e and domestic violence. Rap e is under-rep orted b ecause it is a sensitive crime, while domestic violence is under-rep orted not only b ecause it is a s ensitiv e crime b ut also b ecause the offender ma y b e present. Fig. 1. Comp arison of crime r ates r ep orte d by women by typ e of interview and who was pr esent during interview. 8 Q. Y U, E. A. ST A SNY AN D B. LI T able 2 Observe d data (fr e quencies) f or r ep orting r ap e, domestic violenc e and other crimes fr om 1998 to 2004 Pers onal interview T elep hone interview Sp ouse present Spouse n ot present Rap e 9 152 288 Domestic violence 14 418 516 Other assault 74 753 1539 P ersonal larceny 12 59 270 No crime 38276 119242 409726 This exploratory analysis of the raw d ata suggests th at there is resp onse bias in the NCVS related to t yp e of interview (p ersonal versus telephone) and wh o w as pr esen t durin g the in terview. In the later case, the bias is par- ticularly large in the rep orting of the sensitiv e crimes of rap e and domestic violence. Thus, w e consider a mo d el that allo ws u s to adju s t the estimates for those crime rates according to the p ossible r esp onse bias. 3. A mo d el for resp onse b ias adjus tmen t. In our mo del w e classify the data according to th e t y p e of crime, pr esence of the sp ous e (either with or without others) dur ing the in terview an d whether the in terview w as con- ducted by telephone or face-to-fac e. T he crimes are classified into five cat- egories: (1) r ap e and p ossibly some other crime, (2) domestic violence, n ot rap e but p ossibly some other crime, (3) other assault except for rap e and domestic violence, (4) p ersonal larceny except for all kind s of assault, and (5) n o p ers on al crime rep orted. T he unw eigh ted and weigh ted data are sum- marized in T able 2 and T able 3 resp ectiv ely . W eigh ted data are used in the analysis to accoun t for the sample design and so cio-economic indicators. T ables 2 and 3 do not present the “truth.” S ome women prefer not rep ort- ing crimes u nder certa in circumstances. W e wa n t to build a mo del taking T able 3 Weighte d-adjuste d data (fr e quencies) for r ep orting r ap e, domestic vi olenc e and other crimes fr om 1998 to 2004 Pers onal interview T elep hone interview Sp ouse present Spouse n ot present Rap e 8 . 93 177 . 65 338 . 66 Domestic violence 13 . 19 492 . 70 565 . 67 Other assault 77 . 08 867 . 77 1673 . 10 P ersonal larceny 10 . 28 57 . 99 263 . 93 No crime 37234 . 9 1 121675 . 28 407890 . 86 BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 9 in to acco unt the circu m stances of the interview to obtain more accurate estimates of th e rates of rap e and domestic violence. W e susp ect that our estimates still un derestimate th e true r ates since some w omen would nev er rep ort some incident s under an y circums tances. W e use the information a v ailable on factors b elieve d to provi de gag effects to impro ve estimation. In our mo d el we assume a w oman may b e gagged from rep orting a crime for t wo reasons: a sp ouse wa s present , or the interview was conducted ov er the telephone. W e fur ther assume that the sp ouse’s p r esence only influences the rep orting of rap e and domestic violence, wh er eas condu cting the in ter- view o v er the ph on e m a y influence all crimes except for p ers onal larce n y . T o ensu re the mo del is iden tifiable, w e imp ose a hierarch y on the r easons for not rep orting crimes. Namely , w e assume that the presence of a s p ouse dominates the use of a telephone int erview in determining wh ether or n ot a w oman rep orts suc h an in cident. S o if we could on ly observe it, the com- plete data und erlying T able 2 w ould tell us if a crime actually o ccurred and whether it wa s rep orted. If a crim e was n ot rep orted, the un ob s erv ed com- plete data could tell us if the crime wa s not rep orted b ecause of the presence of a sp ou s e, or was not rep orted b eca use the in terview w as conducted o ver the telephone. T he form of the complete (bu t unobs erv ed) data for r ep orting crimes is sho wn in T able 4 . Note that some ou tcomes are imp ossible, f or example, not rep orting b e- cause the sp ou s e is presen t wh en the w oman w as inte rview ed without the sp ouse’s presence. Suc h imp ossible outcomes are denoted by a dash in T a- ble 4 . W e no w present a mo del to analyze the probabilistic relationship b etw een the und erlying complete data and the observ ed data. The follo wing n otatio n is emp lo ye d: π = probabilit y of a telephone interview, 1 − τ = probabilit y of crimes not rep orted b ecause of telephone interview, 1 − ρ = probabilit y of rap e n ot rep orted b ecause sp ous e is presen t, 1 − δ = probabilit y of domestic violence not rep orted b ecause sp ouse is present, ω ij = p robabilit y of crime status i and interview status j , where j = 1 if sp ouse is present, 2 if sp ouse is not pr esen t; i = 1 if r ap e, 2 if domestic violence, 3 if other assault, 4 if p ersonal larceny , 5 if no crime. W e fit a m o del that assumes the indep endence b et w een crimes and sp ouse presence, s o that ω ij = c i · s j , wh ere c i denotes the probabilit y of eac h typ e of crime and s j denotes th e probabilit y of the sp ous e presen t d u ring the inter- view. Un d er the assum ptions describ ed ab o v e, the probabilities u nderlying the u nobserve d complete d ata are sho wn in T able 5 . 10 Q. Y U, E. A. ST A SNY AN D B. LI T able 4 F orm of unobserve d c omplete data Pers onal interview Sp ouse Present Not present Rap e Reported y 1111 y 1112 Not reported—sp ouse present y 1121 — Domestic Rep orted y 1211 y 1212 violence Not rep orted—sp ouse present y 1221 — Other assault Rep orted y 1311 y 1312 P ersonal larceny Rep orted y 1411 y 1412 No crime Rep orted y 1511 y 1512 T elep hone interview Sp ouse Present Not present Rap e Reported y 2111 y 2112 Not reported—sp ouse present y 2121 — Not reported—p hone interview y 2131 y 2132 Domestic Rep orted y 2211 y 2212 violence Not rep orted—sp ouse present y 2221 — Not reported—p hone interview y 2231 y 2232 Other Rep orted y 2311 y 2312 assault Not reported—p hone interview y 2331 y 2332 P ersonal larceny Rep orted y 2411 y 2412 No crime Rep orted y 2511 y 2512 In the observ ed data, s ome of the cells from the complete data are col- lapsed [see, e.g., Ch en and Fien b erg ( 1974 , 1976 )]. Hence, we observ e only sums of several cells rather than all 30 p ossible cells repr esen ted in the complete-data table. T able 6 presen ts the n otation for the observed data ta- ble and ind icates whic h cell counts from the complete data table are s ummed together to create the observ ed data. The p robabilities underlyin g the ob- serv ed data are similarly just the s ums of the probabilities underlying the unobserved complete data and are shown in T able 7 . 4. Ba yesian inference for the b ias-adjusting mo d el. 4.1. Why use a Bayesian mo del. O u r goal is to estimate the π , τ , ρ, δ and ω based on observ ations y in the m o del describ ed in Section 3 . W e adopt a Ba yesia n p ersp ectiv e that treats the parameters τ , ρ and δ as random v ariables to accoun t for the uncertain ties in τ , ρ and δ . W e incorp orate prior BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 11 T able 5 Pr ob abilities underlying unobserve d c omplete data Pers onal interview Sp ouse Present Not pr esent Rap e Rep orted (1 − π ) ρω 11 (1 − π ) ω 21 Not rep orted—sp ouse present (1 − π )(1 − ρ ) ω 11 — Domestic Rep orted (1 − π ) δ ω 12 (1 − π ) ω 22 violence Not reported—sp ouse present (1 − π )(1 − δ ) ω 12 — Other assault Rep orted (1 − π ) ω 13 (1 − π ) ω 23 P ersonal larceny Reported (1 − π ) ω 14 (1 − π ) ω 24 No crime Rep orted (1 − π ) ω 15 (1 − π ) ω 25 T elep hone interview Sp ouse Present Not pr esent Rap e Rep orted π τ ρω 11 π τ ω 21 Not rep orted—sp ouse present π (1 − ρ ) ω 11 — Not rep orted—ph one interview π ρ (1 − τ ) ω 11 π (1 − τ ) ω 21 Domestic Rep orted π τ δ ω 12 π τ ω 22 violence Not reported—sp ouse present π (1 − δ ) ω 12 — Not rep orted—ph one interview π δ (1 − τ ) ω 12 π (1 − τ ) ω 22 Other Rep orted π τ ω 13 π τ ω 23 assault Not rep orted—ph one interview π (1 − τ ) ω 13 π (1 − τ ) ω 23 P ersonal larceny Rep orted π ω 14 π ω 24 P ersonal larceny Rep orted π ω 15 π ω 25 information provided by the NCVS d ata from 1993 to 1997. Notice that we do not estimate π in this wa y since π , the prop ortion of telephone in terviews, is a fixed v alue and cannot b e influ enced by the prior surveys. W e would not use the prior in formation to estimate ω s in ce the crime rates change considerably b et w een the 93–97 an d the 98–04 time p eriods. T o demonstrate the difference, we bu ild t w o mo dels: Mo del A that assum es equal crime rates b et w een time p erio d s, and Mod el B that do es not mak e this assump tion. The lik eliho o d-ratio mo del comparison test sho ws an impr ov emen t of 740 in the G 2 with 4 degrees of f reedom of mo del B o ve r m o del A. The infl uence of the “gag” factors on crime rep orts migh t also change o ver the y ears, bu t th is change is not significan t as is sh o wn in the f ollo wing tests. This migh t result from the facts that features of the sur v ey instr u men t su c h as the question order, w ord ing of the questions and the gatew a y questions are consisten t o v er the y ears. T able 8 sho w s the num b er of rap es rep orted b y w omen when a sp ouse is or is not present d uring the int erview in th e 12 Q. Y U, E. A. ST A SNY AN D B. LI T able 6 F orm of observe d data Pers onal interview Sp ouse present N ot present Rap e rep orted x 111 = y 1111 x 112 = y 1112 Domestic violence rep orted x 121 = y 1211 x 122 = y 1212 Other assault rep orted x 131 = y 1311 x 132 = y 1312 P ersonal larceny rep orted x 141 = y 1411 x 142 = y 1412 No crime rep orted x 151 = y 1121 + y 1221 + y 1511 x 152 = y 1512 T elep hone interview Rap e rep orted x 21 = y 2111 + y 2112 Domestic violence rep orted x 22 = y 2211 + y 2212 Other assault rep orted x 23 = y 2311 + y 2312 P ersonal larceny rep orted x 24 = y 2411 + y 2412 No crime rep orted x 25 = y 2121 + y 2131 + y 2132 + y 2221 + y 2231 + y 2232 + y 2331 + y 2332 + y 2511 + y 2512 p erio d s 1993 to 1997 and 1998 to 2004. W e use the Breslo w–Da y metho d to test the h omogeneit y of the o d ds ratio in the continge ncy tables and T able 7 Pr ob abilities underlying the observe d data Pers onal interview Sp ouse Present Not present Rap e rep orted (1 − π ) ρω 11 (1 − π ) ω 21 Domestic violence rep orted (1 − π ) δ ω 12 (1 − π ) ω 22 Other assault rep orted (1 − π ) ω 13 (1 − π ) ω 23 P ersonal larceny rep orted (1 − π ) ω 14 (1 − π ) ω 24 No crime rep orted (1 − π )(1 − ρ ) ω 11 + (1 − π ) ω 25 (1 − π )(1 − δ ) ω 12 + (1 − π ) ω 15 T elep hone interview Rap e rep orted π τ ρω 11 + π τ ω 21 Domestic violence rep orted π τ δ ω 12 + π τ ω 22 Other assault rep orted π τ ω 13 + π τ ω 23 P ersonal larceny rep orted π τ ω 14 + π τ ω 24 No crime rep orted π (1 − ρ ) ω 11 + π ρ (1 − τ ) ω 11 + π (1 − τ ) ω 21 + π (1 − δ ) ω 12 + π δ (1 − τ ) ω 12 + π (1 − τ ) ω 22 + π (1 − τ ) ω 13 + π (1 − τ ) ω 23 + π ω 14 + π ω 24 BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 13 get a test statistic χ 2 = 0 . 2649 with 1 degree of fr eedom, and a p -v alue of 0.6068 , wh ic h means th at the in fluence of sp ous e-presence on rap e r ep orting is not significantly different for the tw o p erio ds. W e do th e same test on the rep orting of domestic violence and dra w the same conclusion (p -v alue = 0.3049). T o test the gag effect of telephone in terview on crime rep orts, w e cannot use the same metho d, b ecause the effect from “who was p resen t during the interview” is confounded in the d ata. W e use the data from yea r s 1993 to 1997 efficien tly through the u se of empir ical-Ba y esian estimates. The metho d is describ ed in the n ext section and in the r esu lts (Section 5 ), we clearly sho w that our metho d greatly impr ov es estimation without increasing computational load. 4.2. Inc orp or ating prior information. Since the parameters w e w an t to estimate are probabilities, we choose the Diric hlet distribution as the pr ior distribution since it is the n atur al conjugate family of prior distribu tions for the multinomia l distr ibution [Berger ( 1985 )]. W e now u se the data from 1993 to 1997 to form pr ior d istributions. Let X = ( x 1 , x 2 , . . . , x t ) b e m u ltinomial distributed with parameters n and P = ( p 1 , p 2 , . . . , p t ). Let the prior d istri- bution for P b e Diric hlet with parameters ( β 1 , β 2 , . . . , β t ). Then the p oste- rior distribu tion is also Diric h let with parameters B + X = ( β 1 + x 1 , β 2 + x 2 , . . . , β t + x t ). Under the square error loss fu nction, the Bay esian estima- tor of P is the p osterior mean. If w e set k = P t i =1 β i and λ i = β i /k , Λ = ( λ 1 , λ 2 , . . . , λ t ), which is a one-to-one tran s formation from the parameters ( β 1 , β 2 , . . . , β t ), then the estimate for P is E ( P | k , Λ , X ) = n n + k × X n + k n + k × Λ . W e wa nt to find prop er p arameters for the Diric hlet distr ibution to incor- p orate the in formation from the 1993 to 1997 data. W e know that the prior means of the p i are giv en b y E ( p i | k , λ i ) = λ i and w e use estimates for the probabilities from the 1993 to 1997 NCVS data as the λ i ’s. Using the mo del describ ed in S ection 3 and using the NC VS data f r om 1993 to 1997, we ob- tain estimates for τ , ρ, δ and ω easily [see, e.g., Stasny and Coke r ( 1997 )] by T able 8 Comp arison of the effe cts fr om sp ouse pr esenc e on r ap e r ep orts for the ye ars 1993–1997 and 1998–2004 1993–1 997 1998–2 004 Sp ouse present Not present Sp ouse present Not present Y es 9 211 Y es 9 152 No 22502 76513 N o 38376 1204 72 14 Q. Y U, E. A. ST A SNY AN D B. LI implemen tin g the EM algorithm. The parameter k can b e though t of as the prior sample size and it sp ecifies the exten t to wh ic h the estimator dep ends on th e prior information. Here w e prop ose an empirical-Ba yesian estimate of k . A detailed explanation and assessmen t of this m etho d can b e f ound in Bishop, Fien b erg and Holland ( 1975 ) and Carlin et al. ( 2000 ). Denote the Ba yes estimate for P to b e ˆ Q , w h ic h is the E ( P | k , Λ , X ). Then the risk function is R ( ˆ Q, P ) = n n + k 2 (1 − k P k 2 ) + k n + k 2 n k P − Λ k 2 , where k P k 2 = p 2 1 + p 2 2 + · · · + p 2 t . Differentia ting the risk f u nction with resp ect to k and setting th e resulting equation equal to 0 yields the estimate of k , ˆ k = (1 − k P k 2 ) / k P − Λ k 2 , that min imizes the r isk R ( ˆ Q, P ). The optimal v alue of k dep end s on the un kno w n v alue of P . If we use the MLE ˆ P = X/n to replace P , then the estimated optimal v alue of k is ˆ k = n 2 − t X i =1 x 2 i ! . t X i =1 x 2 i − 2 n t X i =1 x i λ i + n 2 t X i =1 λ 2 i ! . (4.1) The empirical-Ba yesia n estimat e of P is then P ∗ = n/ ( n + ˆ k )( X/n ) + ˆ k / ( n + ˆ k )Λ . (4.2) F rom equation ( 4.2 ), w e see that ˆ k determines how m uc h the estimator dep end s on the prior inf ormation. In equation ( 4.1 ) we can rewrite the de- nominator as P t i =1 ( x i − n λ i ) 2 . Ther efore, the m ore the pr ior information represent s the cur ren t d ata (i.e., the closer n λ i is to th e obs er ved mean of x i ), the more the estimators d ep end on the prior distribution. 4.3. An EMB algorithm. The EMB algorithm is a v arian t of the EM algorithm [see, e.g., Dempster, Laird and Rubin ( 1977 )]. In the EM algo- rithm, the M-step inv olv es maximizing the complete data lik eliho o d func- tion to obtain the MLE for th e parameters. In EMB algorithm, we add a B-step in whic h the ˆ k in equation ( 4.1 ) is calculated based on the last M-step, and th en empirical-Ba yesia n estimates are ob tained by minim izing the risk fun ction. Th us, at the E-step, we fill in the u nobserved data with estimates b ased on the v alues from the last B-step (those p arameters us- ing Ba yesian estimates) or from th e last M-step (for those using classical estimates). W e illustrate th e us e of the EMB algorithm in the NCVS exam- ple. In the example the empirical-Ba y esian estimates are obtained for τ , ρ and δ . Using th e cell probabilities sho w n in T able 5 and the complete d ata f r om T able 4 , sub ject to the constrain t th at P i P j ω ij = 1, the lik eliho od function of the complete data h as a sim p le multiplicati v e form and can b e sp lit in to BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 15 fiv e factors, eac h a function of only the π , τ , ρ, δ and ω parameters. Th e lik e- liho o d function, w r itten so th at the fun ctions of the fiv e types of parameters are ob vious, is p rop ortional to the follo wing fu nction (a “+ ” in a subscript indicates summation o ver the corresp onding ind ex): π y 2+++ (1 − π ) y 1+++ × ρ y 1111 + y 2111 + y 2131 (1 − ρ ) y 1121 + y 2121 × δ y 1211 + y 2211 + y 2231 (1 − δ ) y 1221 + y 2221 × τ y 2111 + y 2112 + y 2211 + y 2212 + y 2311 + y 2312 × (1 − τ ) y 2131 + y 2132 + y 2231 + y 2232 + y 2331 + y 2332 (4.3) × ω y 1111 + y 1121 + y 2111 + y 2121 + y 2131 11 ω y 1211 + y 1221 + y 2211 + y 2221 + y 2231 12 × ω y 1311 + y 2311 + y 2331 13 ω y 1411 + y 2411 14 ω y 1511 + y 2511 15 ω y 1112 + y 2112 + y 2132 21 × ω y 1212 + y 2212 + y 2232 22 ω y 1312 + y 2312 + y 2332 23 ω y 1412 + y 2412 24 ω y 1512 + y 2512 25 ≡ (1 − π ) y 1+++ π y 2+++ ρ a 1 (1 − ρ ) a 2 × δ b 1 (1 − δ ) b 2 τ c 1 (1 − τ ) c 2 ( 5 Y i =1 2 Y j =1 ω y + i + j ij ) . Because of th e multiplicati v e form of this like lih o o d fu nction, we can accomplish maximization separately for the τ , ρ, δ and ω parameters. The closed form MLE s for these parameters are as follo ws: ˆ π = y 2+++ / ( y 1+++ + y 2+++ ) = (# ph one in terviews) / (# phone + p ersonal interviews), ˆ ρ = a 1 / ( a 1 + a 2 ) , ˆ δ = b 1 / ( b 1 + b 2 ) , ˆ τ = c 1 / ( c 1 + c 2 ) , ˆ ω ij = y + i + j /y ++++ , where a 1 , a 2 , b 1 , b 2 , c 1 and c 2 are as defined in equation ( 4.3 ). Based on the MLEs, w e determine the prior distribution for parameters b y calculating the k ’s through equation ( 4.1 ) th at minimize th e r isk functions. W e then obtain the empirical-Ba ye sian estimates τ ∗ , ρ ∗ and δ ∗ from equation ( 4.2 ). F or example, to estimate ρ , ˆ k ρ = 2 a 1 a 2 / ( a 2 1 + a 2 2 − 2 × ( a 1 + a 2 )( a 1 λ ρ + a 2 (1 − λ ρ )) + ( a 1 + a 2 ) 2 ( λ 2 ρ + (1 − λ ρ ) 2 )) 16 Q. Y U, E. A. ST A SNY AN D B. LI and ρ ∗ = ( a 1 + a 2 ) / ( a 1 + a 2 + ˆ k ρ ) × a 1 / ( a 1 + a 2 ) + ˆ k ρ / ( a 1 + a 2 + ˆ k ρ ) λ ρ , where λ ρ is the estimated v alue of ρ using the 1993 to 1997 NCVS d ata. Note that the B-step do es n ot add significan t computational load to the EM algorithm, sin ce eac h estimate has a closed form. The E-step of th e EMB algorithm is similar to that for th e EM algo- rithm, except that we use different estimates of p arameters to calculate the exp ectations of m iss ing cells. In our example, the E-step consists of obtain- ing th e exp ected cell coun ts for the complete data matrix (T able 5 ), giv en the observ ed data and the current estimates of the π , τ , ρ, δ and ω parame- ters. Th ese exp ectations are particularly simple in the case of discrete d ata [see L ittle and Ru bin ( 2002 )] and amount to prop ortionally allo cating the x ij k ’s of the observ ed d ata as sho wn in T able 6 to the y ij kl cells of T able 4 according to the curr en t parameter estimates. F or example, ˆ y 1121 = x 141 × (1 − π ∗ )(1 − ρ ∗ ) ω ∗ 11 (1 − π ∗ )(1 − ρ ∗ ) ω ∗ 11 + (1 − π ∗ )(1 − δ ∗ ) ω ∗ 12 + (1 − π ∗ ) ω ∗ 14 . Other exp ecte d cell counts can b e f ou n d similarly and, h ence, are not sh o wn here. The E-, M- and B-steps are rep eated until parameter estimates hav e con- v erged to the desired degree of accuracy , whic h in our case is when the sum of the r elativ e differences of all estimated probabilities b etw een t w o itera- tions is less than 0.0001. The co de for imp lemen ting the EMB algorithm on the NCVS d ata to adjust for resp ondent’s bias and estimate the rap e and domestic violence rates in th e y ears from 1998 to 2004 can b e found in the supplemental fi le [Y u, Stasny and Li ( 2008 )]. T o estimate the v ariances of the estimat ors, w e use the jac kknife. More sp ecifically , we lo ok on eac h quarter b et we en the y ears 1998 and 2004 as a sampling unit (S U). This results in 28 sampling units. Using all 28 SUs, w e obtain the b est estimate, sa y , m , for the parameter. By thro wing out the first SU, we use the jac kkn ife data s et of 27 “resampled” SUs to get another estimate, sa y , m 1 . In the next step a new reasmplin g is p erformed with the second SU b eing deleted, and a new estimate m 2 is obtained. The pro cess is rep eated for eac h samp le u nit, resulting in a set of estimates, m i , i = 1 , . . . , 28 . The error for estimation is giv en b y the formula σ 2 j = 27 P 28 i =1 ( m i − m ) 2 28 . T o calculate the m i , we alwa ys use the EMB algorithm and the same prior information, since different observ ations in the futur e w ould not c hange the prior kn o wledge. BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 17 T able 9 Observe d data (fr e quencies) for crimes fr om NCVS 1993 to 1997 Pers onal interview T elep hone Interview Sp ouse present Spouse n ot present Rap e 9 211 314 Domestic violence 12 581 652 Other assault 81 861 1903 P ersonal larceny 16 117 323 No crime 22393 74954 303610 EM estimates for crimes from 1993 to 1997 ˆ ω i j Sp ouse present Spouse n ot present Rap e 0.00057 7 0.00196 4 Domestic violence 0.00137 4 0.00467 6 Other assault 0.00247 5 0.00842 1 P ersonal larceny 0.00025 5 0.00086 8 No crime 0.22244 8 0.75694 2 ˆ π = 0 . 76 ˆ ρ = 0 . 14 ˆ δ = 0 . 07 ˆ τ = 0 . 53 5. Results. 5.1. Complexity of the c omputation. T o chec k the computational com- plexit y and the efficiency of the estimates, we use b oth the frequentist metho d and the Ba yesia n metho d to obtain the estimates. Sin ce eac h Bay esian estimate has a closed form, to get a Ba y esian estimate do es not add muc h computational load at eac h iteration. F rom the jac kknife analysis, the EM algorithm in the frequenti st m etho d iterates 77 . 57 times on a v erage to ob- tain conv erged estimates. F or the Ba yesian metho d, an a v erage of 76 . 43 iterations in the EMB alg orithm is requ ir ed for conv ergence. W e conclude that the Ba y esian metho d do es n ot add significan t computational load as compared w ith the traditional metho d. 5.2. Anal ysis with unweighte d data. W e fir st fit the mo del describ ed in Section 3 to the unw eigh ted data fr om 1993 to 1997 and form the prior distributions. The original d ata from 1993 to 1997 and the estimate s are sho w n in T able 9 . W e then fit the empirical-Ba y esian mo del to the data fr om T able 2 . The estimates obtained from the pro cedur e are sho wn in T able 10 . Notice th at the estimated probabilit y of rap e adjusting for the damp ening effect of the presence of a sp ouse and us in g a telephone inte r view is 0 . 0003 26 + 0 . 001024 = 0 . 0013 50 . Thus, w e estimate ab out 1.35 r ap es p er 1,000 women. This com- pares with a rate of 0.79 p er 1,000 based on the r a w data (T able 1 ). Simi- 18 Q. Y U, E. A. ST A SNY AN D B. LI larly , the estimated probabilit y of domestic violence is 0 . 000702 + 0 . 002205 = 0 . 0029 07 . T his results in an estimate of 2.91 incidences of d omestic violence p er 1,000 w omen, whic h we compare to a rate of 1.66 p er 1,000 based on the r aw d ata (T able 10 ). W e estimate the probabilit y that a crime (except for p ersonal larcen y) is not r ep orted b ecause th e interview is conducted ov er the telephone is 1 − τ ∗ = 0 . 37. T h us, for interviews conducted ov er th e telephone w ith women who are victims of an y t yp e of p ersonal crime (except for p ersonal larceny), w e estimate that approximat ely 37% of the w omen did not rep ort the vic- timizatio n. F or this estimate, k / ( n + k ) = 21 / (21 + 3708 ) = 0 . 005 8 . Th at is, to get the estimate, w e dep end 0.58% on the prior information. Similarly , the probabilit y that a r ap e is not rep orted b ecause a s p ouse is present is ab out 1 − ρ ∗ = 0 . 86, with k / ( n + k ) = 1418 / (1 418 + 193) = 0 . 88. Th at is, for inte rviews with w omen who are victims of rap e and whose s p ouse was present d uring the in terview, w e estimate that 86 % of the women did not rep ort the victimization. T o get this estimate , we dep end 88% on the prior information. The p r obabilit y that d omestic violence is not r ep orted b ecause a sp ouse is p resen t is ab out 1 − δ ∗ = 0 . 92, and k / ( k + n ) = 227 / (227 + 404) is 0.36. That is, for the w omen who are victims of d omestic violence and whose sp ous e w as pr esen t durin g the in terview, w e estimate that 92% of the w omen did not r ep ort the victimization and the estimate dep ends 36% on the p rior information. T able 11 sho ws the estimated p arameters from the EM algorithm with- out using the p rior inf orm ation. Comparing the results in T able 10 and T a- ble 11 , we see the gains f r om the Ba y esian mod el: the v ariances (as sho wn in parentheses) of the estimat es from the Bay esian mod el are significantly lo we r than those from the classical metho d— V ar( ˆ ρ ) / V ar( ρ ∗ ) = 6 . 625 and V ar ( ˆ δ ) / V ar( δ ∗ ) = 1 . 86. Ba yesia n and frequ en tist metho ds estimate ab ou t the same effect of a telephone in terview on rep orting crime, and hav e simi- lar v ariances. This migh t b e due to the f act that the Bay esian estimate of τ dep end s little on pr ior information. On a verag e, the estimation of τ dep end s T able 10 EMB estimates f or crimes fr om 1998 to 2004 ˆ ω i j Sp ouse present Spouse n ot present Rap e 0.00032 6 0.00102 4 Domestic violence 0.000702 0.002 205 Other assault 0.00136 3 0.00428 1 P ersonal larceny 0.00014 4 0.00045 3 No crime 0.23895 4 0.75054 9 π ∗ = 0 . 72 ρ ∗ = 0 . 14(0 . 0 008) δ ∗ = 0 . 08(0 . 0 007) τ ∗ = 0 . 63(0 . 0008) Note . V ariances for the corresp onding estimates are shown in parentheses. BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 19 T able 11 EM estimates for crimes fr om 1998 to 2004 (not using the prior information) ˆ ω i j Sp ouse present Spouse n ot present Rap e 0.00036 7 0.00121 1 Domestic violence 0.000766 0.002 528 Other assault 0.00147 4 0.00486 3 P ersonal larceny 0.00013 5 0.00044 6 No crime 0.22986 0 0.75834 6 ˆ π = 0 . 72 ˆ ρ = 0 . 16(0 . 005 3) ˆ δ = 0 . 09(0 . 0013) ˆ τ = 0 . 61(0 . 00 08) 0 . 60% on p rior information, while the estimation of ρ dep ends 87 . 4 7% and the estimation of δ dep ends 38 . 87% on prior information. In all, w e conclude that to estimate r esp ondent bias, the Ba yesian mo del using pr ior information leads to more efficien t estimates. 5.3. Anal ysis with weight-adjuste d data. W e then ap p ly our mo del to th e w eight- adjusted NC VS data to determine if the u s e of sample-based w eigh ts leads to conclusions differen t from those based on the ra w, unw eigh ted data. W e tak e the usual app roac h of standardizing the w eigh ts so that they sum to the actual sample size. Thus, a w oman’s standard ized weigh t is her original sample w eight divid ed by th e total of the sample-based wei gh ts for all w omen in the analysis. (Th e we igh ts used h ere are the cross-sectional w eigh ts deve l- op ed to m ak e the sample representa tiv e of the p opulation of in terest at the time of the survey . Because th er e are no longitudin al weigh ts a v ailable for th e NCVS, w e use the cross-sectional w eigh ts to reflect the so cio-economic and demographic mak eup of the p opulation while recognizing their limitations.) W e u se the weigh ted data from 1993 to 1997 to f orm the pr ior d istributions. These weig ht-adjusted data and the estimates are summ arized in T able 12 . W e again estimate τ , ρ and δ using the EMB-algorithm d escrib ed in Sec- tion 4 . The parameter estimates for crimes are sh own in T able 13 . T able 14 compares the original crime rates and our estimated crime rates for b oth w eighte d and unw eigh ted data. W e note that the crime rates are generally estimated at a h igher lev el when us ing the w eight ed data. 6. Conclusions and future researc h . W e ha v e sh o wn that estimated rates of rap e and domestic violence among w omen are increased un der a mo del that considers gag factor effects in rep orting suc h crimes based on the type of in terview an d w ho is presen t during the in terview. Also, w e used pr ior information to obtain more efficien t estimates. W e noticed that the t y p e of in terview and wh o is presen t dur ing the in terview may ha ve differen t influ- ences on differen t women. As rep orted by S tasn y and Coke r ( 1997 ), com- pared with wo men not rep orting rap e and domestic violence, those rep orting 20 Q. Y U, E. A. ST A SNY AN D B. LI T able 12 Weight-adjuste d data (fr e quencies) f or crimes f r om NCVS 1993 to 1997 Pers onal interview T elep hone interview Sp ouse present Spouse n ot present Rap e 8 . 96 224 . 36 337 . 35 Domestic violence 11 . 26 608 . 61 695 . 37 Other assault 84 . 99 937 . 59 1991 . 93 P ersonal larceny 15 . 10 121 . 16 323 . 33 No crime 22174 . 7 9 77079 . 1 8 301423 . 0 2 EM estimates for crimes from 1993 to 1997 ˆ ω i j Sp ouse present Spouse n ot present Rap e 0 . 00058 7 0 . 00207 9 Domestic violence 0 . 00138 3 0 . 00489 2 Other assault 0 . 00251 5 0 . 00889 5 P ersonal larceny 0 . 00024 9 0 . 00088 2 No crime 0 . 21568 8 0 . 76282 8 ˆ π = 0 . 75 ˆ ρ = 0 . 14 ˆ δ = 0 . 06 ˆ τ = 0 . 53 w ere y oun ger, had annual in comes of less than $15, 000, w ere unemplo y ed, ren ted rather than o wn their h omes, w ere not cur ren tly married, and had mo ved more than five times in the last three y ears. An imp ortan t area f or future researc h is to acc oun t f or some of these factors in the mo del. The problem with suc h analysis, of course, is that as other v ariables are used in creating cross-classified tables the d ata b eco mes very sparse, particularly in the cells in volving rep orting r ap e or domestic violence. Because rap e and domestic violence are r elativ ely rare eve n ts, we ha ve to com bine information from a num b er of y ears. T hus, we do not obtain enough rep eated m easures for a p anel analysis. The method s describ ed in T able 13 EMB estimates f or crimes fr om 1998 to 2004 (weight-adjuste d data) Estimates for crimes ˆ ω i j Sp ouse present Spouse n ot present Rap e 0.00037 2 0.00122 6 Domestic violence 0.000774 0.002 555 Other assault 0.00148 1 0.00488 7 P ersonal larceny 0.00013 5 0.00044 6 No crime 0.22983 9 0.75828 5 ˆ π = 0 . 72 ρ ∗ = 0 . 14(0 . 0 002) δ ∗ = 0 . 07(0 . 0 005) τ ∗ = 0 . 61(0 . 0025) BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 21 T able 14 Crime r ates (# of crimes p er 1000 p e ople) for ye ars 1998–2004 Fitted rates Original rates Un weighted data W eighted data Rap e 0 . 79 1 . 35 1 . 60 Domestic violence 1 . 66 2 . 91 3 . 33 Other assault 4 . 14 5 . 64 6 . 37 P ersonal larceny 0 . 60 0 . 60 0 . 58 No crime 992 . 82 989 . 50 988 . 1 2 this pap er are mainly us ed on con tingency table analysis. If su fficien t data are collected, we would lik e to implement our mo del in a panel analysis to disco ver how the crime rates c h ange ov er time. In our current analysis, we assume that if there is panel attrition in s u rv ey , the pattern of attrition is rand om. Brame and Piquero ( 2003 ) found that the pattern of panel attrition migh t b e r elated to the interview ee’s c h arac- teristics. In our f uture researc h , we w ant to explore the panel attrition. If it is nonr an d om panel attrition, we w ant to ad j ust our analysis accordingly so that the estimation of crime rates w ould b e more accurate. Our mo del w as d ev elop ed to adjus t for resp onse b iases caused by the mo de of int erviewing and the sensitiv e nature of questions in rep orting rap e and domestic violence. Th e mo dels describ ed in our pap er ma y b e useful in other s urve y s ampling settings where some kn o wn factors may result in resp onse bias. F or example, one can imagine that rep orting v arious sources of income could b e b iased b ecause of who is p resen t durin g the in terview. Moreo v er, if d ata are collected rep eatedly , the Ba y esian metho d w ould b e efficien t b ecause it incorp orate s the previously collected information into an imp ortant estimate. The EMB algorithm helps to mak e the compu tation easy . T o imp lemen t our metho d, first identify which part of the information w ould not c hange o v er time and then build th at part of the inform ation in to the p rior distrib utions. F ollo win g the EMB algorithm, we could easily obtain the estimates of p arameters that are of in terest. T he metho ds prop osed here, therefore, can b e easily applied in such cases. Ac kn o wledgment s. Th e authors th an k Donald Mercan te, the editor, as- so ciate ed itor and r eferees for constructiv e comments and suggestions that help ed to improv e the presenta tion of the pap er. SUPPLEMENT AR Y MA TER I AL R-cod e of EMB algorithm to adju st for resp onse b ias in NCVS data for es- timating rap e and d omestic violence rates (doi: 10.121 4/08-A OAS160 S UPP ; .txt). 22 Q. Y U, E. A. ST A SNY AN D B. LI REFERENCES Berger, J. O. (1985). Statistic al De cisi on The ory and Bayesian Analysis . Springer, New Y ork. MR0804611 Bishop, Y. M. M., Fi enberg, S. E. and Holland, P. W . (1975). Discr ete Multivariate Ana lysis : The ory and Pr actic e . MIT Press, Cambridge, MA. MR0381130 Brame, R. and Piquero, A. R. (2003). Selective attrition and the age-crime relationship. J. Quantitative Crimi nolo gy 19 107–12 7. Bureau of Justice St a tistics (1995). National Crime Victimi zation Survey R e design— F act She et . U .S. Dep artment of Justice, W ashington, DC. Bureau of Justice S t a ti stics (2001). National Crim e Victimization Survey , 1992- 1995[c omputer file] . U.S. Department of Commerce, Bureau of the Census. 9th ICPSR ed. Inter-Universit y Consortium for Politica l and So cial R esearc h [p ro du cer and dis- tributor], A nn Arb or, MI. Bureau of Justice St a tistics (2002). Criminal Vi ctimization 2001 : C hanges 2000-01 with tr ends 1993- 2001 , Pr eventing Domestic Violenc e A gainst Women . U.S. Depart- ment of Justice, W ashington, DC. Carlin, B. P. and Louis, T . A. (2000). Bayes and Empiric al Bayes Metho ds for Data Ana lysis , 2nd ed. Chapman an d Hall, New Y ork. MR1427749 Chen, T. and Fie nberg, S. E. (1974). Two-dimensional contingency tables with b oth complete and partially cross-class ified data. Biometrics 30 629–642. MR0403086 Chen, T. and Fie nberg, S. E. (1976). The analysis of contingency tables with incom- pletely classified data. Biometrics 32 133–144. MR0403033 Dempster, A. P., Laird, N. M. and R ubin, D. B . (1977). Maximum likeli hoo d from incomplete data via the EM algo rithm. J. R oy. Statist. So c. Ser. B 39 1–38. MR0501537 Efr on, B. (1987). The Jackkn ife , the Bo otstr ap , and Other R esampling Plans . SI A M, Philadelphia. MR0659849 Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Ana lysis . Chapman and Hall /CRC, New Y ork. MR2027492 Leggett, C . G., Kleckner, N. S., Boyle, K. J., Duffi eld, J. W. and M itchell, R. C. (2003). So cial desirabilit y bias in contingen t v aluation surveys administered through in-p erson interviews. L and Ec onomics 79 561–575. Little, R. J. and R ubin, D. B . (2002). Statistic al Analysis with Mi ssing Data . Wiley , New Y ork. MR1925014 Yu, Q., St asny, E. A. and Li, B. (2008). Supplement to “Ba yesian mo dels to adjust for response bias in survey d ata for estimating rap e and domestic violence rates from t he NCVS.” DOI: 10.1214/08-A OAS160SUPP . St asny, E. A. and C oker, A. L. (1997). Adjusting t h e national crime victimization survey’s estimates of rap e and d omestic violence for ‘gag’ factors in rep orting. T echnical Rep ort #592, Department of S t atistics, Oh io State Un iv. Q. Yu School of Pub lic Heal th Lousiana St a te Univ ersity 1615 Poydras Street Suite 140 0 New Orleans, Louisiana 70112 USA E-mail: qyu@lsuhsc.edu URL: http ://public health.lsuhsc.edu/F aculty pages/qyu/index.h tml E. A. S t asny Dep ar tment of St a tistics Ohio St a te University 404 Cockins Hall 1958 Neil A venue Columbus, Ohio 43210 USA E-mail: eas@stat.osu.edu BA YESI AN MODELS TO ADJUS T FOR RESPON SE BIAS IN S UR VEY DA T A 23 B. Li Dep ar tment of Experim ent al St a tistics Louisiana S t a te University Ba ton Rouge, Louisiana 70803 USA E-mail: bli@lsu.edu URL: http ://www.stat.lsu.edu/facult y/li /
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment