SOBA: Secrecy-preserving Observable Ballot-level Audit

SOB A: Secrecy-pr eserving Observabl e Ballot-lev el A ud it J o sh Benaloh, Micr osoft Resear ch Douglas Jones, Department of Computer Science, University of Iowa Eric L. Lazarus, DecisionSmit h Mark Lindeman Philip B. Stark, Department of Statist ics, University of California, Berkele y Abstract SOB A is an approa ch to e lection veriﬁcation that p ro- vides o bservers with justiﬁably high conﬁdence that the reported results of an election are consistent with an audit trail ( “ballots”), which can be pap er or electronic. SOB A combines three ideas: (1) publishing cast vote records (CVRs) separ ately for each contest, so that anyone can verify that each r eported contest o utcome is co rrect, if the CVRs reﬂect v oter s’ intention s with sufﬁcient accu- racy; (2) shro uding a map ping between ballots and the CVRs for those b allots to pre vent the lo ss of pri vacy that could occur otherwise; (3 ) assessing the accuracy with which th e CVRs reﬂect v o ters’ inten tions for a collection of contests while simultaneo usly assessing the integrity of the shrouded ma pping between ballots and CVRs by comparin g randomly selected ballots to the CVRs that purpo rt to represent th em. Step (1) is related to work by the Humb oldt County Election Transparency P roject, but publishing CVRs separately f or individual contests rather than images of entire b allots preserves pr iv acy . Step (2) r equires a cryptograph ic commitmen t from elec- tions ofﬁ c ials. Ob servers participate in step (3), wh ich relies o n the “super-simple simultaneo us single-b allot risk-limiting aud it. ” Step (3) is designed to r ev ea l rel- ativ ely fe w ballots if the s h roude d mappin g is proper and the CVRs acc urately reﬂect voter inten t. But if the re- ported ou tcomes of the contests differ from th e o utcomes that a full hand count would show , step (3) is guaranteed to have a large chance o f requiring all the ballots to be counted by hand , the reby limitin g the risk that an incor- rect outcome will become ofﬁcial and ﬁn al. 1 Intr oduction and background The ma jority o f America ns now vote electronically , ei- ther on machine-cou nted paper ballots or on Direct Recording Ele ctronic (DRE) machines. Electronic vot- ing of fer s advantages over ha nd cou nts and le ver ma - chines, but it poses challeng es for determining whether votes were re corded and counted correctly . A wide range of secur ity vuln erabilities an d othe r ﬂaws have been doc - umented in contempo rary v o ting eq uipment. The 200 7 “T op-to- Bottom Re view” of th e systems used in Califor- nia found that all the systems had “serious design ﬂaws” and “speciﬁc vuln erabilities, which attackers could ex- ploit to affect election outcomes” [Bowen, 2007]. While some of th ese v ulnerabilities can be mitigated, the und er- lying veriﬁcation c hallenge is formidab le. As Rivest and W ack comm ent, “co mplexity is the enem y of security , ” and demon strating that any complex sy stem is free of faults may be imp ossible or infeasible [Riv est an d W ack, 2006]. Electronic voting systems have failed in real elec- tions. In the 2004 general election in Car teret County , North Carolina, over 4,00 0 votes were lost irre triev- ably due to a progr amming err or tha t af fected UniLect Patriot voting machines, casting doubt on a statewide election outcom e [Bon ner, 2 004]. Mo re co ntroversially , in the 200 6 general elec tion, ES&S iV otronic DREs in Sarasota County , Flo rida did not record a vote for U.S. House for abo ut 1 5% o f voters—far m ore than can plausibly b e attributed to intentional u ndervoting. Inad- vertent und ervotes were pr obably de cisi ve in that con- test [Ash and Lamperti, 2008; Meb ane and Dill, 2007]. Hypotheses explaining these undervotes include voter confusion caused b y poo r ballot lay out [Fr isina et al., 2008] and mac hine failur e [Garb er, 200 8; Meba ne, 2009]. Unfortu nately , the forensic e v idence g enerated by the voting systems was inad equate to determ ine the cause of the undervotes or the intentions of the v oters. V oter -m arked paper ba llots provid e a clearer r ecord of what v o ters did and more evidence abo ut voter in- tent, b u t by themselves do not solve the election veriﬁca- tion prob lem. In 20 05, Har ri Hursti rep eatedly demon- strated th e ability to “hack” optical scan cou nts when giv en access to a memor y card [Ze tter, 2 005]. In a June 2006 primary election in Pottawattamie County , I ow a, incorrectly conﬁgured o ptical scann ers m iscounted ab- 1 sentee ballots in every contest, altering tw o o utcomes. The co unty a uditor or dered a hand recount, which co r- rected the errors [Flahe rty, 2006]. Similar erro rs in other elections may have altered outcom es without ever being detected. Even wh en scanners work correctly , their re - sults may differ mater ially f rom v o ter intent. Consider the 2 006 U.S. Senate contest in Minnesota, where Al Franken be at No rm Coleman in a hand recoun t largely because of b allots wh ere the human inter pretation dif - fered from the machine interpre tation. 1 1.1 Softwar e independence Computerize d election equip ment canno t be infallible, so Riv e st and W ack [2 006] and Ri vest [2 008] sugge st that voting systems should be software-ind ependen t. A voting system is software-independen t “if an und etected change or error in its software canno t cause an unde- tectable chan ge or erro r in an [ apparen t] election out- come. ” This idea can be generalized to deﬁne indepen- dence f rom hard ware and fr om elections p ersonne l, lead- ing to so-called end-to-end veriﬁab le election tech nolo- gies. Howe ver, end-to -end tec hnolog y may require fun - damental changes in curren t voting proc esses. The outcome of a contest is the set of winners, not the exact v ote cou nts. Th e app ar en t outcom e o f a c ontest is th e winner or winners acc ording to the voting sy stem. The corr ec t outcome of a con test is the winner or win- ners that a full hand co unt of the “audit trail” would ﬁnd. The aud it trail is assumed to b e an indelible reco rd of how voters cast their votes. It might consist of a com- bination of voter-marked p aper ballots, v o ter r eceipts, a voter -veriﬁab le paper aud it trail (VVP A T), and suitable electronic records. This deﬁnition of “correct” is g enerally a matter of law . It do es no t nece ssarily imply that the audit trail is inviolate (nor that the outcome acco rding to the a udit trail is the same as th e o utcome accord ing to how voters originally cast the ir ballots); tha t there is no controversy about which reco rds in the audit trail re ﬂect valid votes; that human observers agr ee on the inte rpretation of the audit trail; th at the actual han d coun ting is accu rate; nor that repe ating the hand count would giv e the same an- swer . If there is no audit trail, deﬁning what it means for the appar ent outcome to be co rrect requir es hypothetical counterfactua ls—but f or the fault in the voting system, what would the outcome hav e been? Software independ ence means that erro rs th at cause apparen t ou tcomes to be w rong leave traces in the audit trail. But software indep endence does not gu arantee any of the following: 1 The 2000 presidential election may ha ve been decide d by di ffer- ences betwe en the machine interpreta tion of c ertain Flori da opt ical scan ballot s a nd the like ly human interpre tation [ Ke ating , 2002]. 1. that n o such traces will o ccur if the appa rent out- come is correct 2 2. that those traces will be no ticed or acted upon 3. that th e cost of look ing through the audit trail fo r those traces is affordable 4. that, in principle, there is a w ay to corr ect the a ppar- ent outcome without holding another election 5. that, in practice, the au dit trail was preserved and protected well enough to determine the outcome ac- cording to how the voters originally cast their bal- lots The penultim ate p roperty is gu aranteed by strong soft- ware in depend ence. Ri vest and W ack [ 2006] and Rivest [2008] deﬁne a v otin g system to b e str on gly software- indepen dent if an undetected ch ange or error in its soft- ware cann ot cause an un detectable ch ange or erro r in an [appare nt] election outcome, and mor eover , a detecte d change or error in an [app arent] election o utcome (due to change or er ror in the software) can be corrected with out re-run ning the election . Ha v ing an aud it trail does not guaran tee that anyon e will dig through it to see whether there is a prob lem or to c orrect the outcome if the o ut- come is wron g. Strong software independen ce does not correct anything, but it is an essential ingredien t for a system to be self-correc ting. Compliance audits can be used to assess whether the last property listed above h olds: Gi ven that the election used a strongly sof tware-independ ent voting system, did it adh ere to pr ocedur es tha t should keep the audit trail sufﬁciently acc urate to reconstruct the outcome a ccord- ing to how voters cast their b allots? Stron g e vid ence th at such procedure s were followed is stro ng e vid ence that the legally corr ect outcome—wh at a full hand count of the audit trail would sho w—is the same as the o utcome accordin g to ho w the voters o riginally cast their ballots. As we d iscuss belo w in section 4, w e belie ve that com- pliance audits should always be requ ired: I f the election fails the c ompliance audit, 3 there is no assurance that ev en a fu ll han d cou nt of the aud it trail would show the outcome acc ording to h ow the voters really voted. Be- low , we assume that the election has passed a comp liance audit. 2 False alarms ar e possibl e. An analogy is that if a tamper -eviden t seal shows that a package has been opened, it does not follo w t hat the packag e contents have been altere d. 3 “Fai lure” means f ailure to ﬁnd strong e vidence that such proce - dures were foll owed, rather than ﬁnding ev idence that such procedures were not follo wed. 2 1.2 V ote tab ulation audits V ote tabulation audits compar e repo rted v o te subtotals for sub sets of ballots (“a udit units”) with han d counts of the votes for each of those subsets. A udit un its hav e to be subsets for which the voting system reports vote subto- tals. Most present U.S. audits use audit units that consist of all th e ballots cast in indi vid ual precincts or all the bal- lots tabulated on ind ividual v otin g m achines. Ge nerally , audit laws do n ot have provisions that w o uld lead to cor- recting incorrect electoral outcom es [Hall et al., 2009]. 4 A risk-limiting post-election audit u ses the au- dit trail to guaran tee that there is a large, pre- speciﬁed probab ility that the audit will correct the apparen t outcom e if the appar ent o utcome is wron g. Risk-limiting au dits are widely consider ed best p rac- tice [L indeman et al., 2008]. Risk-limiting audits h av e been end orsed by th e American Statistical Associa- tion [American Statistical Association, 2 010], the Bren- nan Center for Justice, Common Cause, the League of W o men V oters, an d V eriﬁed V oting, amo ng oth ers. Cal- ifornia AB 2 023 ( 2010 ), r equires a pilot o f risk-limitin g audits in 2011 [Salda ˜ na, 2010]. Colorado Revised Statutes § 1-7 -515 calls for imp lementing risk-limitin g audits by 2014 . The ﬁrst method for conducting risk-limiting audits was pro posed by Stark [20 08a]; numer- ous imp rovements have been mad e [Stark, 2 008b, 2009b,d,c; Miratrix and Stark, 2 009; Stark, 2010b]. See also [Checkoway et al., 201 0]. Ris k-limiting aud its limit the risk of failing to cor rect an ou tcome that is wro ng. The risk limit is 10 0% minus the min imum chance th at the au dit corr ects the o utcome. If the ou tcome is correc t in the ﬁrst place, a risk-limiting audit cann ot m ake it wrong; but if the outco me is wro ng, a risk-limiting audit has a large chance of co rrecting it. Henc e, the probab ility that the outcom e according to a risk- limiting audit is the c orrect outcome is a t least 100 % minus the risk limit. For system s th at are stro ngly software-independ ent, adding a risk-limiting audit ad dresses the second condi- tion ab ove: It ensures a large, pre-sp eciﬁed proba bility that the traces will be noticed and will be used to corr ect the apparent outcom e if the apparen t outcome is wron g. 1.3 Our goal Our go al in th is work is to sketch a p ersonally veriﬁable priv acy-p reserving P -resilient canv ass fram ew or k. W e must ﬁrst say what this means. 4 For instance , under Ne w Y ork law , each county determines inde- pendent ly whether its audit in a particula r contest must be expa nded. This provision means that a correct outcome might be changed to an incorre ct outcome e ven if the conduc t of the audit is formally ﬂawless. A canvass framework con sists of the vote-tabulation system to gether with other human, hardware, software, and proce dural c ompon ents of the can vass, inclu ding complianc e and v o te-tabulation audits. A canv ass frame- work is r esilient with pr oba bility P o r P-resilient if the probab ility th at the outcom e it gi ves 5 is th e correct out- come is at least P , e ven if its software has an erro r , short- coming, or undetected change. 6 Resilience means that the framework tend s to r ecover from faults. If a can- vass framework is P -resilient, either the outcome it gives when all is said and done is correct, or someth ing oc- curred that h ad pr obability less than 1 − P . The canvas s framework that results from perform ing a risk-limiting audit on a strongly software-indepen dent voting system that passes a com pliance audit is P -resilien t, with P equal to 1 00% m inus the risk limit. If the system fails the com- pliance audit, the framework shou ld not declare any out- come. Instead, the election should be re-run. Even if a canvas s framework is P -resilient, in p rac- tice the pu blic m ight not trust the system unless they can observe crucial steps, especially the audit. Th e mere right or o pportu nity to observe th e audit will not eng en- der much trust if—as a practical matter —n o sin gle per- son or small grou p c ould o bserve all the steps that are essential to ensur ing the accur acy of the ﬁnal result. For instance, if a vote-tabulation audit takes ten teams of au- ditors working in separate o fﬁces four days to comp lete, it would take a large team o f indep endent observers— with lots of free time an d long atten tion spans—to verify that the audit was carried out correctly . The lon ger an audit takes and the more people required to carry out the audit, the more oppor tunities ther e are to dam age the au- dit tr ail, and the harder it is for an observer to be satisﬁed that the audit has been conducted correctly . W e deﬁne a canv ass framework to be personally veriﬁ- able P-resilient if it is P - resilient and a single individual could, as a practical matter, o bserve en ough of the pro- cess to ha ve con v incing evidence that t h e can vass fr ame- work is in fact P -resilient. The transp arency required for a canv ass framework to be per sonally veriﬁable can impact p riv acy . For instance, publishing image s of all the b allots cast in an election 7 might giv e the indi v iduals compelling evidence that the vote tabulation system found the c orrect o utcome, since the images allo w people to count the votes themselves— 5 As discussed in section 4, to be P -resilient, a can va ss frame work should re frain from gi ving an y outcome a t all if some preconditi ons are not met. 6 The probabil ity comes from the overal l voting system, in our case from the fact that the audit relies on a random sample. The probabil- ity does not come from treating votes, voters, or election outcomes as random, for instanc e. 7 There also needs to be proof that the images are sufﬁc iently com- plete and accura te to determin e the correct outcome. 3 at least to the extent th at voter in tent is unamb iguou s. 8 But pu blishing ballot im ages can facilitate vote-selling and coe rcion and can co mpromise privac y , becau se vot- ers can deliberately or accidentally rev eal their identities throug h marks on the ballo ts includ ing idiosyncrasies of how indi v iduals ﬁll in bubbles [Calandrino et al., 2011] or even the ﬁber structure of the p aper on which the bal- lot is printed [Calandrin o et al. , 2009]. 9 A lesser but substantial degree o f tr ansparen cy is conferr ed by publishing cast vote records (CVRs) 10 enabling anyone to verify that th e contest ou tcomes are co rrect—if the CVRs are accur ate. Howe ver, as Popoveniuc a nd Stanton [2007] and Rescorla [20 09] point out, p ublishing CVRs also can aid vote-selling or coercion because of the potential for pattern voting. One typical sample b allot (from T ulsa, Ok lahoma) contains 18 contests with over 589,000 possible combin ations if a voter v otes in every contest, or over 688 m illion com - binations allo win g for und ervotes. Thus, a voter could be instructed to vote for the preferre d candidate in o ne contest, and to cast a series of other votes th at would al- most certainly (especially within a precinct), conﬁrm the voter’ s identity if all of the voter’ s selections were pub- lished. Hence, pub lishing whole-ballot CVRs for la rge number s of ballo ts im proves transparen cy but can sacri- ﬁce priv acy . When there is not strong evidence th at th e ap parent outcome is co rrect, risk-limiting audits can req uire ex- amining the entir e au dit trail, potentially exposing all the ballots to pu blic scrutiny . 11 If the app arent outcome is wrong, such exposure is ne cessary in order to correct the outcom e. Therefore , if a risk-limiting audit is to be personally veriﬁab le, there may be oc casions where compro mising privac y is unav oid able. Bu t minimizing the number of ballots or whole-ba llot CVRs that are routinely exposed helps pr otect p riv acy , im peding vote- 8 V eriﬁcat ion methods lik e Humboldt Coun ty Election Transpare ncy Project (see belo w) in volve publishing digital images of all the ball ots. 9 There are arguments that images of ballots should be published anyw ay—that transpare ncy is m ore important tha n pri vac y . In juris- dicti ons that permit voting by mail, there is an opportunity to conﬁrm ho w someone votes for the purp ose of vote-sel ling or coe rcion; indeed, someone could ﬁll out another’ s ballot. W hether publishing images of ballot s would change the rate of vote-sell ing or coercion substantiall y is the subject of some debate . 10 In the 2002 FEC V oting System Stan- dards [Federal Election Commission, 2002], these were calle d “ballot images”; ho wev er , the term CVR has been used in more recent EA C V oluntary V oting System Guide- lines [Election Assistance Commission, 2005]; we prefer the lat ter term because it does not suggest an actual image but rather a record of the inte rpretation of the system’ s interpret ation of the ba llot. And what matters is the system’ s interpre tation of the ballot as a set of votes. 11 One could hav e a risk-limitin g audit that, if it had not terminate d after some fraction of the ballots had been examin ed, triggered a hand count of the remaining ballots, but did not allow the public to observe that hand count . But then why should the public trust tha t the hand count was acc urate? selling and coercion . W e d eﬁne a canv ass f ramework to be personally ver- iﬁable privacy-preser ving P-res ilien t if it is perso nally veriﬁable P -resilient and it does not sacriﬁce priv acy un - necessarily . Neither personally veriﬁable nor privacy- pr eserving is a m athematically precise characteristic, while P -resilience is. The contribution of the pr esent work is to sketch a personally veriﬁable p riv acy-pre serving P -resilient vot- ing system. W e assume, as a f oundatio n for building this system, that we are starting with a strongly sof tware- indepen dent v o ting system with an audit trail that corre- sponds to individual ba llots. Mo reover , we assume th at a complianc e audit has determined that the audit trail gen- erated by the system is sufﬁciently trustworthy to r eﬂect the corr ect outcomes of the contests. W e augmen t the system with p rocedu res and data structures th at make it possible for an individual observer to gain comp elling evidence that either the outcom es are corr ect, or some- thing very unlikely occu rred—that is, that the overall can vass framework is P -resilient. Un less some of the apparen t outco mes are wrong or a ma rgin is extremely small, gather ing that evidence will generally inv olve ex- posing only a tiny percentage of ballots an d who le-ballot CVRs. In essence, our method adds a spe cial risk-limiting audit to a strongly software-indepen dent voting system (one that h as had a compliance audit to ensur e that its audit trail is intact). Since one person cannot be in two places at the same time, the procedur e canno t be person- ally veriﬁable if it in volves aud iting a multi-jurisdiction al contest in different jurisdictions simultaneously; it would then b e necessary to trust con federates to observe what is happen ing else wher e. The next few sections outline ele- ments of this risk-limiting audit. 2 Ballot-level ris k-limiting audits One key to keeping the pro cess personally veriﬁable (by keeping amount o f observation req uired low) and to pro- tecting priv a cy (b y exposing as few ballots as possible to observers) is to audit the record at the le vel of indi- vidual ballots, rather than large batches of ballots suc h as pr ecincts. T he fewer ballots there ar e in each a udit unit, th e smaller the expected counting burden for risk- limiting audits tends to b e—when th e electo ral o utcome is cor rect (see, e.g., [Stark, 2009a, 201 0a,b]). A vote- tabulation audit based on ch ecking the CVRs of individ- ual ballo ts against a human interpretation of those ballots is often called a “ba llot-level audit, ” a “single-ballot a u- dit, ” or a “ballot-based audit. ” Becau se they reduce the time it takes to audit and the nu mber of ballots in volved, ballot-level risk-lim iting audits are especially am enable to personal veriﬁcation. 4 Ballot-lev el audits are extremely efﬁcient statis tically , but they are not simple to imp lement using current vot- ing systems. T o perform a b allot-level audit, there must be a way to identify each b allot uniquely , for instance, a serial n umber on a pap er ballot, or iden tifying the bal- lot b y its location: “th e 1 7th ballot in deck 1 52 scann ed by scanner C, ” for in stance. 12 There mu st also be a way to m atch each ballot to its CVR. Som e com mercial vot- ing systems do not generate or do n ot store CVRs for individual ballots. Oth er voting systems recor d individ- ual CVRs, but are d esigned m ake it difﬁcult o r im pos- sible to m atch ind ividual CVRs to the b allots they pur- port to rep resent. In some cases, aud it trails have identi- ﬁers that can be used to ﬁnd the correspon ding CVRs; this method was used for p art o f a 2008 aud it in Ea- gle County , Colorado [Branscomb, 2008] 13 and a ballot- lev el risk-lim iting audit in Orange County , Califor nia, in 2011 [P .B. Stark, per sonal communication , 20 11]. Ho w- ev er, to protect priv acy , mo st paper b allots do not have identiﬁcation n umber s. In a 20 09 pilot ballot-level audit in Y olo Cou nty , Califor nia, Stark [ 2009c] exploited the fact that the CVRs and th e p hysical ballots were in the same ord er . The scann ed imag es associated with each CVR in the a udit sample were comp ared with the ph ysi- cal ballots to check the accuracy of the CVR s. Calandrino et al. [2007] describ e an appro ach to ele c- tion veriﬁcation that inv olves im printing ballots with identiﬁcation number s and scannin g th e ballots with a “parallel” system in ad dition to the system of record. The pa rallel system derives its own CVRs, from which the apparent contest outcome can be determined inde- penden tly . The ac curacy of the unofﬁcial CVRs an d of the im printing process is then assessed by a ballot- lev el audit. Since 2 008, th e Hu mboldt Coun ty Election T r ans- parency Pro ject (Humb oldt Coun ty ETP) has experi- mented with p ublishing ballot im ages and independently tabulating CVRs extracted fr om those ima ges. Using commercia lly av ailab le eq uipment, Humboldt Cou nty ETP rescans paper ballots after emb ossing them with serial num bers. Th en, open-sour ce software is used to form CVRs from the digital images. Humbold t Coun ty ETP has pro cessed ballots for six elec tions and pub lished scanned ballot images as well as its version of the C VRs for som e of them. T he results based on their re-scan s 12 If an identiﬁer is print ed on paper ballot s, the print ing should occur after the voter casts his or her vote and the ballots are co-mingle d. If the identiﬁer is printed before the voter casts his or her vote , pri vac y could be compromised. 13 Optical -scan ballot s as well as DRE paper audit trails ca n hav e identi ﬁers. For instanc e, in Boulder County , Colorado, the Hart Ballot No w system is conﬁgured to print unique identiﬁe rs and bar codes on each ballot. In O range County , California , bal lots for the Hart Ballot No w system hav e non-unique identiﬁe rs an d bar codes (numbered 1– 2500, then repeat ing). generally ha ve agreed well with the or iginal results, with one important exception : The Humboldt Coun ty ETP analysis of th e November 2008 election uncovered a de- fect in the election manag ement software that led the re- sults of an entire ballot batch to be silently discarded! The Clear Ballot Group, inspired in part by Humbold t County ETP , is d ev elo ping a system that, in its words, could permit election outcomes to be “thorough ly and transparen tly veriﬁed within 36–48 ho urs a fter the polls close. ” Neither the Humboldt Coun ty ET P n or Clear Ballot Group curren tly inco rporate risk-limiting audits, 14 but the parallel scans th eir systems per form facilitate ballot-level risk-limiting audits, along the gen eral lines propo sed by Calandrino et al. [20 07]. I f the system of record an d the par allel system agree o n th e set of win- ners, a risk-lim iting audit o f the parallel sy stem transi- ti vely co nﬁrms the o utcome accord ing to the system o f record. 15 3 A privacy-p reserving audit The method we propo se h ere presupp oses th at CVRs are av ailable, either from the system of record or from a par- allel system. I t pu blishes all th e d ata contain ed in the CVRs in a form th at (1 ) still permits all ob servers to check the con test ou tcomes on the assumption that the CVRs are accurate, (2) does n ot compromise p riv acy , and (3) enable s the CVRs to be checked against the audit tr ail while minimizing the loss of priv acy . In SOB A, election ofﬁcials make a cryptogra phic commitmen t 16 to the full set o f CVRs by p ublishing the CVRs separately for ea ch contest, disaggregating the bal- lots (we ca ll these con test-CVRs or CCVRs in co ntrast to whole-ballo t CVRs), and a shrouded link between each CCVR an d the ballo t it purports to repr esent. Splitting the CVRs into CCVRs an d obfuscating the identity of the ballot fr om wh ich each C CVR comes eliminates some of the infor mation required to identify a voter’ s ballot style or to use pattern voting to signal the voter’ s identity . 17 This m akes the procedu re privac y -preservin g. But it re- tains enough infor mation for any o bserver to check that 14 Clear Ballo t Group is adding support for risk-limit ing audits to their softwa re [L. Moore, personal communication, 2011]. 15 This is true as long as the systems agree on the set of winners, e ven i f t hey disagree about vote total s or marg ins. For instanc e, suppose candida te A defeats candi date B by one perc entage point in the or iginal returns, a nd by t en points ac cording to t he paralle l system. Such a large discrepa ncy might justify cl ose scrutiny , but a risk-limiting audit of the results of the para llel system w ould still prov ide strong e vidence that A defeat ed B, or w ould lead to a full hand c ount to set the record straight. 16 See http://en. wikipedi a.org/wiki/Commitment_scheme . Cryptogra phic commitments hav e two important properties, the bind- ing propert y and the hiding property , discussed in section 3.2. 17 Of course, if there is a contest in which few voters are eligib le to vote , eligibil ity itself is a signal. 5 the apparen t outco me agr ees with the outcome accord- ing to the CCVRs, for each contest. That is, there is a known algorithm (the winner alg orithm 18 ) th at o bservers can apply to the published CC V Rs to calculate the cor- rect outco me of ev er y contest—provided the CCVRs re- ﬂect the ballots (more g enerally , audit trail) a ccurately enoug h. This is p art of m aking the p rocedur e p ersonally veriﬁable. Loosely speak ing, the re quired lev e l of accu - racy depen ds on the nu mber of CVRs that must have er - rors for the ap parent o utcome to be wron g: 19 The fewer ballots that need to be ch anged to affect the outcome, the larger the sample generally will need to be to attain a given level of conﬁdence that th e ap parent outco me is correct. The CCVRs mig ht fail to be suf ﬁcien tly ac curate be- cause • At least one CCVR and the ballot it purp orts to rep- resent do not match because human and machine in- terpretation s of voter intent differ (for in stance, b e- cause the voter marked th e ballot improper ly). This is a failure of the generation of CCVRs. • At least one CCVR d oes no t in fact c orrespon d to any ballot. It is an “or phan. ” This is a failur e of the mapping between ballots and CCVRs. • More than one CCVR for the same con test is mapped to the same ballot. It is a “multiple. ” This is also a failure of the mappin g b etween ballots and CCVRs. • There is no CCVR co rrespon ding to some voting oppor tunity o n a ballot. A failure of the mapping might be the more distressing source of error, since it is a failure o n the part of the election of ﬁcial, but we must ensure (statis tica lly) that— together—all sources o f error did not comb ine to cause the outcome to be wrong . SOB A uses a risk-limiting au- dit to assess statistically whether th e winners according to the full audit trail differs f rom th e winners acco rding to the CCVRs, fo r all contests under audit, taking into a c- count all sources of er ror . If the o utcome accordin g to the CCVRs is incorrect, the audit is very likely to proceed to a full hand count of the aud it trail, th ereby r ev e aling the correct outcome. This provides P -resilience. T o m ake the risk-lim iting audit po ssible, election s of - ﬁcials are required to p ublish a nother ﬁle, the b allot style 18 For ﬁrst-past -the-post contests, the winner algorithm just ﬁnds who has the most votes. Other voting schemes, such as instant-runof f voti ng (IR V) or ranked choice votin g (RC V), have more complicat ed winner algorit hms. 19 In p lurali ty voting, thi s is the mar gin or t he set of margin s between each (winner , loser) pair . Deﬁning the margi ns for IR V and calcula ting them for a gi ven set of reported results is not simple. See Cary [2011]; Magrino et al. [2011]. ﬁle , which co ntains ballo t identiﬁers a nd lists the con- tests each of those ba llots contain s. It d oes not contain the voters’ selections. The risk-limiting techn ique we pro pose is the super-simple simultaneous single-ba llot risk-limiting au- dit [Stark, 20 10b]. It is not the most efﬁcient ballot- lev el audit, but th e calcu lations it requ ires can be done by hand, increasing tran sparency . I t in volves dr awing ballots at ran dom with equal pro bability; some more ef- ﬁcient audits r equire using dif fer ent probab ilities fo r d if- ferent b allots, which is harder to imp lement and to ex- plain to the public. Mo reover , th is techn ique allows a collection of contests to b e audited s imultaneou sly u sing the same samp le of ba llots. Th at can redu ce the numb er of randomly selected ballots tha t mu st b e loc ated, inter - preted, and compared with CVRs, decr easing the cost and time required for the audit and th ereby increasing transparen cy . The following su bsections give more tech nical detail. 3.1 Data framework and assumptions W e assume that the a udit tr ail con sists of o ne recor d per ballot cast. There are C con tests we wish to assess . The contests might be simple measu res, measures r equiring a super-majority , multi-can didate c ontests, or contests of the form “vote for u p to W cand idates. ” 20 W e refer to records in the audit trail as “ballo ts. ” A ballot may be an actual v oter-marked paper b allot, a voter -veriﬁab le paper audit trail (VVP A T) , or a suitable electronic record. There are N ballots in the audit trail that each co n- tain one or more of the C co ntests. Each ballot can be though t of as a list o f p airs, one pair fo r each co ntest o n that ballot. Each p air identiﬁes a contest an d the voter’ s selection(s) in that contest, which might be an undervote or a v o te f or one or mo re candidates or position s. Ex- amining a ballot by hand reveals all the v oter ’ s selections on tha t ba llot; we assume th at there is no ambiguity in interpretin g e ach voter’ s intention s from the audit trail. Before the audit starts, the voting system must report results for e ach o f th e C contests. The rep ort fo r co ntest c gives N c , the total num ber of ballo ts cast in c ontest c (includin g un dervotes and spoiled ballots), as we ll as the number of v alid votes for each position o r cand idate in contest c . Let M ≡ N 1 + N 2 + · · · + N C denote the total number o f voting op portu nities on the N b allots. W e as- sume th at the complian ce audit assures us ( e.g., thro ugh ballot accounting ) that the repo rted values of N c are accu- rate, and that the audit trail is tru stworthy . In the present work, we do not consider attacks on the audit trail. 20 W e do not s peciﬁc ally consider inst ant-runof f voting or ranked- choice vo ting here. Risk-limiting methods can be ext ended to such voti ng methods, but the deta ils are complex. 6 There is a pub lished “ballot style ﬁle. ” Each line in the ballot style ﬁle lists a ballot identiﬁer and a list of con- tests th at ballo t is supposed to contain. The ba llot iden- tiﬁer uniquely id entiﬁes a ba llot in th e au dit trail. The identiﬁer could b e a numb er that is pr inted on a p aper ballot or una mbiguo us instruction s for locating the bal- lot (e.g., the 275th b allot in th e 39th deck). The re should be N lines in the ﬁle, and the N ballot identiﬁers sho uld be un ique. Because the ballot style ﬁle is published, indi- vidual can check this for themselves. Mo reover , individ- uals can check whether the num ber of lines in th e ballo t style ﬁle that list contest c eq uals N c , the total number of ballots the system reports were cast in contest c . Before the au dit starts, the voting system or a paral- lel system ha s pr oduced a CVR for each ballot. These are not published as whole-ballot CVRs. Rather , the CVRs ar e split by co ntest to make contest-speciﬁc CVRs (CCVRs) that co ntain v oters’ selections in only one con - test. Each whole-ballo t CVR is (suppo sed to b e) split into as many CCVR s as there are con tests on the ballot. The CCVRs for the contests are published in C ﬁles, one for each contest. The CCVR ﬁle for co ntest c shou ld contain N c lines; bec ause this ﬁle is p ublished, ind ividu- als can check this for themselves. Each line in the CCVR ﬁle for con test c lists a voter’ s selection and a shrouded version of the identiﬁer o f the ballot that the selection is supposed to re present. T he order of th e lines in each of the C CCVR ﬁles should by shufﬂed ( preferab ly us- ing random permutatio ns) so that whole CVRs ca nnot b e reassembled without knowing s e cret informatio n. 21 The public ca n co nﬁrm whe ther the c ontest o utcomes accordin g to the CCVR ﬁles m atch the voting system’ s reported outcomes. If they do not match, there should be a full hand cou nt of any contests with discrepan t out- comes. W e assume hen ceforth that the outco mes do match, but we do not a ssume th e exact v o te totals ac- cording to the CC VR ﬁles m atch the repor ted vote totals. The da ta in clude one more ﬁle that is no t publishe d, the lookup ﬁle . The lo okup ﬁle contain s M lines, one for each voting opportun ity on each b allot. Each lin e has three entries: a shro uded b allot identiﬁer , the correspon d- ing un shroud ed ballot identiﬁer, an d a number (“salt”) that is u sed in co mputing th e shr ouded identiﬁer from the unshr ouded identiﬁer using a cryptogr aphic commit- ment function , as described below . (For a re view of uses for crypto graph y in voting, s ee Adida [2006].) The salt on the j th line of the ﬁle is den oted u j . Ea ch line corre sponds to a (b allot, contest) p air: W e can think of u j as bein g u ic , the salt u sed to shro ud the identity of ballot b i in th e CCVR ﬁle for co ntest c . T he election ofﬁcial will use this ﬁle to co n v ince observers that e very selection on ev er y ballot corresponds to exactly o ne entry 21 For e xample, each CCVR ﬁle could be sorted in order of the shrouded ballot identi ﬁer . in a CCVR ﬁle, and vice versa. 3.2 Shr ouding The method of shroud ing ballot identiﬁers is cruc ial to the approach. SOBA req uires election ofﬁcials to cr yp- tograph ically commit to the value of the ballot ide ntiﬁer that goes with e ach CCVR. A cry ptograp hic commitment ensures that the ballot identiﬁer is secret but indelible: The election ofﬁcial can, in effect, prove to o bservers that a shrou ded identiﬁer correspon ds to a unique unshr ouded identiﬁer, but nob ody can ﬁgu re ou t wh ich un shroude d identiﬁer correspon ds to a gi ven shrou ded iden tiﬁer with- out secret informatio n. The next few paragraph s d escribe a sug gested instanti- ation of the cry ptograp hic commitment. W e assume that ballot identiﬁers all ha ve the same length. If necessary , this can be ach iev e d b y padd ing identiﬁers with leading zeros. The co mmitment f unction H () mu st be disclosed publicly and ﬁxed for the duration of the election. Each commitment represents a claim about a voter’ s selection(s) on a giv en ballot in a gi ven co ntest. For each set of selection s that any v o ter mad e in e ach con- test, including undervotes and votes for more than one candidate, th e election ofﬁcial will create a set of com- mitments. Each commitment de signates the b allot ide n- tiﬁer of a ballot that th e electio n o fﬁcial claims contains that set of selectio ns in th at co ntest. T o comm it to the ballot iden tiﬁer b , the election ofﬁ c ial selects a secret “salt” value u 22 and co mputes the commitmen t v alue y = H ( b , u ) . At a later stage, the o fﬁcial can op en the commitmen t b y re vealing u and b : Then anyo ne can ver- ify that the value y re vealed earlier is ind eed equal to H ( b , u ) . Loosely speaking, a co mmitment function must hav e two properties, th e b inding pr operty and the hiding pr operty . Th e binding proper ty makes it infeasib le fo r the ofﬁ c ial to ﬁnd any pair ( b ′ , u ′ ) 6 = ( b , u ) for which H ( b ′ , u ′ ) = H ( b , u ) . This pr ovides integrity by h elping to ensure that election ofﬁcials cannot c ontrive to have more than one CCVR for a g iv en contest claim to com e fro m the same ballot. 23 The bind ing prop erty is cru cial for P - resilience; indeed, the proof of P -resilience requir es only that th e com mitment have the binding pro perty an d that { N c } C c = 1 are known. The hiding proper ty makes it infeasible for anyone with access o nly to the sh rouded values H ( b , u ) to learn anything about which ballot is inv o lved in each commit- ment. This p rovides priv acy by helping to ensure that 22 T o protect voter pri vac y , it must be infeasible to guess the salts: Each salt should contain m any ran dom or pseudo-random bits. For the commitment to be ef fecti ve, the length o f a ll salt v alues should be ﬁxed and equal. See section 4. 23 See step 7 of the proof in secti on 3. 4. 7 observers cannot reassemble whole-ballot CVRs fro m the CCVR ﬁles with out extra information . If obser vers could reassemble who le-ballot CVRs, that would open a channel of co mmunicatio n ( pattern voting) for coercion or v ote selling . Ballot identiﬁer b may app ear in multiple commitmen ts since a separate commitment is g enerated for each candida te selection on each ballot. The hiding proper ty ensures that those collec tions of co mmitments do not tog ether reveal the value of any b . This is cruc ial for the method to be priv acy-pr eserving. An HMA C (as de scribed in Federal Informatio n Pro - cessing Standard Publication 198 ) with a secure hash function such as SHA-256 (descr ibed in Fede ral Infor- mation Processing Standard Publication 180-2 ) can be used to instan tiate the co mmitment functio n. Howe ver, since e ach of the parameters o f the commitment fu nction is of ﬁx ed length it is mor e efﬁcient to simply use a cryp - tograph ic hash function such a s SHA-256 directly . The length of the ballot id entiﬁers does not matter , as l ong as all ballot identiﬁers in the election have the same length. W e recommen d that all salt values h av e eq ual length , of at least 1 28 bits. Ou r results do not depend on the partic- ular commitmen t functio n chosen , as lo ng as it has both the binding and hiding properties. 24 W e now describe how to perfor m a risk-limiting aud it that simultaneously ch ecks th e accuracy of the CCVRs, whether ea ch CCVR entry comes from exactly on e b al- lot, and wh ether every voting oppo rtunity on ev er y ballo t is reﬂected in the correct CCVR ﬁle. 3.3 The audit The ﬁrst thr ee steps check the consistency of the C CVRs with the reported results an d the uniquene ss of the shroude d identiﬁers. 1. V erify that, for each contest c , ther e are N c entries in the CCVR ﬁle for contest c . 2. V erify that, for each contest c , the CCVR ﬁle sho ws the same outcom e a s the reported outcom e. 3. V erify that the M = N 1 + · · · + N C shroude d ballot identiﬁers in all C CCVR ﬁles are uniqu e. If step 2 shows a different outcome for one or more con- tests, those contests (at least) should be completely hand counted . Steps 4 and 5 check the logical consistency of the bal- lot style ﬁle with the reported results. 4. V erify that, for each contest c , ther e are N c entries in the ballot style ﬁle that list the contest. 24 Meneze s et al. [1996] of fers a thorough treat m ent of hash func- tions and their use for commitmen ts in application s such as digital sig- natures. 5. V erify th at the ballot identiﬁers in the ba llot style ﬁle are unique. If steps 1, 3, 4 , or 5 fail, there has b een an err or o r mis- representatio n. Th e election ofﬁcial need s to correct all such problem s before th e audit can start. The remaining steps comprise the statistical po rtion o f the risk-limiting audit, which check s whether the CCVRs and the mapping f rom b allots to CCVRs is accur ate enoug h to determ ine the correct winner . 6. Set the audit para meters: (a) Choo se the risk limit α . (b) Choo se the maximum nu mber of samples D to d raw; if there is not stro ng e viden ce that the outcomes are corr ect after D draws, the entire audit trail will be counted by hand. (c) Choo se the “error bound inﬂator” γ > 1 and the er ror tolerance λ ∈ ( 0 , 1 ) for the super- simple simultane ous method [Stark, 2010b] ( γ = 1 . 01 and λ = 0 . 2 are reasonable values). (d) Calculate ρ = − log α 1 2 γ + λ log ( 1 − 1 2 γ ) . (1) (e) For each of the C contests, calculate the mar- gin of victory m c in v otes from the CCVR s for contest c . 25 (f) Calculate the dilu ted mar gin µ : the smallest value o f m c / N among the C c ontests. 26 (g) Calculate the initial samp le size n 0 = ⌈ ρ / µ ⌉ . (h) Select a seed s for a pseu do-ran dom number generato r (PRNG). 27 Observers a nd election ofﬁcials cou ld contribute inpu t v alues to s or s could be generated b y an observ ab le, mech an- ical sou rce of randomn ess such as r olls o f a 10-sided die. Th e seed shou ld be selected only once. 7. Draw the initial samp le by ﬁnding n 0 pseudo- random num bers b etween 1 and N and aud it the cor- respond ing ballots: 25 This would be replaced by a differe nt calculati on for IR V or RCV contest s . See, e.g., Magrino et al. [2011]; Cary [2011]. 26 The dilute d margi n controls the sample size. If contest c has the smallest val ue of m c / N and N c is rather smaller than N , it can be m ore ef ﬁcient to audit contest c separate ly rather than auditing all C contests simultane ously . 27 The code for the PRNG algorithm should be published so that it can be check ed and so that, giv en the seed s , observers can reproduce the sequence of pseudo-random numbers. The PRNG should produce numbers that are statistical ly indistingui s hable from independent ran- dom numbers uniformly distrib uted between 0 and 1 (i.e., have large p -v alues) for sample size s up to million s for a reasonable batt ery of tests of randomness, such as the Diehard tests. 8 (a) Use the PRNG an d the seed s to g enerate n 0 pseudo- random n umber s, r 1 , r 2 , . . . , r n 0 . (b) Let ℓ j ≡ ⌈ N r j ⌉ , j = 1 , . . . , n 0 . T his list might contain repeated values. I f so, the tests be- low only need to be p erform ed o nce for e ach value, but the results coun t as many times as the value oc curs in the list. 28 (c) Find rows ℓ 1 , . . . , ℓ n 0 in the ballot style ﬁle. (d) Retrieve the ballots b ℓ j in the audit trail iden- tiﬁed by tho se ro ws in the b allot style ﬁle. If there is no ballot with identiﬁer b ℓ j , pretend in step 7(g) below that the ballot showed a vote for the runn er-up in every c ontest listed in that row of the ballot style ﬁle. (e) Determ ine wheth er each ballot shows the same c ontests a s its correspond ing en try in the ballot style ﬁle. If there are any contests on th e ballot that are not in the ba llot style ﬁle e ntry , pretend in step 7(g) below that the CCVR for that (ballo t, con test) pair showed a vote fo r the apparen t winne r of the contest. I f there are any contests in th e ballot style ﬁle entry that are not on the ballot, pretend in step 7(g) below that th e ballo t showed a vote for th e app arent runner-up for that contest. (f) For each ballot b ℓ j in the samp le, the election ofﬁcial reveals the value of u ℓ j c for each co n- test c on the ballot. (g) For each ballot in the sample, for each contest on that ballot, observers calculate H ( b ℓ j , u ℓ j c ) and ﬁnd the entry in the CCVR ﬁle for con- test c that has that shrouded identiﬁer . I f the shroude d identiﬁer is not in the CCVR ﬁle, pretend that the CCVR ﬁle showed that the voter had selected the app arent win ner of con - test c . Compa re the v oter ’ s selection(s) ac- cording to the CCVR ﬁle to the voter’ s selec- tion(s) accord ing to a human reading of ballot b ℓ j . Fin d e ℓ j , the largest numb er of votes by which any CCVR for ballo t b ℓ j overstated the margin b etween any (winner, loser) pair in any contest on ballot b ℓ j . This nu mber will b e b e- tween − 2 and + 2. 8. If no ballot in the sample has e ℓ j = 2 and no mor e than λ µ n 0 have e ℓ j = 1 , the audit stops. ( In this calculation, the v alu e of e ℓ j should be counted as many times as ℓ j occurs in the sample.) 9. Otherwise, calcu late the Kap lan-Markov P -value, P K M accordin g to equation (9 ) in Stark [ 2009 d,c , 28 The auditing m ethod relies on sampling with replaceme nt to limit the risk. 2010b]. 29 If P K M is less than α , the audit stops. If P K M is greater than α , the sample is expanded : An- other random number r j is generated and steps 7 (c)– (g) ar e r epeated. The value of P K M is up dated to include the overstatement errors found in the n ew draw . 30 This con tinues until either P K M ≤ α or there have b een D draws. In the latter case, all rem aining ballots are co unted b y h and, r ev e aling the true out- come. The next section establishes that this procedure in fact giv es a risk-limiting audit. 3.4 Proof of the risk-limiting pr operty If th e ba llot style ﬁle is c orrect an d entries in the CCVR ﬁles ar e mapp ed p roperly to v otin g opp ortunities o n ac - tual ballots, the only p otential source o f erro r is that CCVR entries do not accurately reﬂect the voters’ selec- tions accord ing to a hum an read ing of the ballot. If th at is th e case, this is an “ordinary ” risk- limiting au dit, an d the proof in Stark [2010 b ] that the super-simple s imulta- neous method is risk-limiting applies directly . Suppose theref ore that the ballot style ﬁle or the map- ping b etween ballots a nd CCVRs is faulty . Recall that the super-simple simultaneo us meth od assume s that no ballot can overstate any margin by m ore than 2 γ votes, where γ > 1. There are se ven cases to consider . 1. The ballot style ﬁle has more than one e ntry th at correspo nds to the same actual ballo t, or more than one actual ballot corresp onds to the sam e en try in the ballo t style ﬁle. These faults are precluded by the uniquen ess of the ballo t ide ntiﬁers an d of the recipes for locating the actual ballot with each iden - tiﬁer . 2. More th an one ballot iden tiﬁer correspon ds to th e same shroude d en try (for d ifferent values of u ). This is precluded by the binding prop erty of H . 29 W e conside r only plurality voting here: IR V is more complicated . For eac h contest c , let W c be the indices of the apparent winners of the contest and let L c be the indices of the apparent losers of the contest. If w ∈ W c and x ∈ L c , le t V wx be the margin in vote s between candi date w and candidat e x accordin g to the CCVR ﬁle for contest c . For each candida te k on ball ot ℓ , let v ℓ k denote t he numbe r of v otes for candidate k on ballot ℓ according to the CCVR ﬁle and let a ℓ k denote the number of votes on ballot ℓ for candidat e k according to a human reading of ballot ℓ . Let ε ℓ ≡ max c max w ∈ W c , x ∈ L c ( v ℓ w − a ℓ w − v ℓ x + a ℓ x ) / V wx . (2) Then P K M ≡ n ∏ j = 1 1 − 1 / U 1 − ε ℓ j 2 γ / V . (3) 30 Oversta tements are ca lculated as step 7 a bov e, including, in partic- ular , steps 7(e) and 7(g), which say how to treat failures to ﬁnd ballots or contest s. 9 3. The ballot style ﬁle c ontains identiﬁers th at do not correspo nd to a ctual b allots, or claims that a ballot contains a con test that it does not actually con tain. The big gest effect this could have o n an apparent contest outcome is if the ballo t that entry is sup- posed to ma tch sho wed a vote for the runne r-up in ev er y missing con test, which is no greater than a two-vote cha nge to any ma rgin. Because the au- dit samples entries of the ba llot style ﬁle wit h equal probab ility , this kind of er ror in an entr y is just a s likely to be r ev ea led as any other . If such a b allot style ﬁle entr y is selected for au dit, steps 7(d) and 7(e) treat it this worst-case way . 4. The b allot style ﬁle claims that a ballot do es not contain a c ontest that it d oes con tain. The big gest effect this cou ld have o n an apparent contest out- come is if the CCVR for that conte st sho wed a vote for the ap parent winn er , which cann ot change the margin by more than two votes, so the err or-bound assumptions are satisﬁed. Because the audit sam- ples entries of the ballot style ﬁle with equal proba- bility , this k ind of erro r in an entry is ju st a s likely to be revealed as a ny other . If such a b allot style ﬁle entry is selected fo r audit, step 7 (e) treats it this worst-case way . 5. Ther e are ballots who se identiﬁers do not appear in the ballot style ﬁle. Sin ce there are the same num ber of ballots as entries in the ballot style ﬁle and the ballot iden tiﬁers in th e ballot style ﬁle ar e uniq ue, there mu st b e ballot identiﬁers in the ballot style ﬁle that do not match any ballot. Hence, case (3) holds. 6. Ther e are CCVRs fo r which the shr ouded ballot identiﬁer is not the iden tiﬁer of any ballot. If th e shroude d identiﬁer matches an identiﬁer in the bal- lot style ﬁle, we are in case (3). Suppose therefo re that the shroud ed identiﬁer does not match any in the ballot style ﬁle. Sup pose this hap pens fo r co n- test c . The p reliminary checks show that the bal- lot style ﬁle has exactly N c entries for con test c and that there ar e exactly N c entries in the CCVR ﬁle for contest c . Therefor e, if there is such a CCVR, one o f the ballot style ﬁle entries that lists contest c has an identiﬁer th at does no t occur in shrouded form in the CCVR ﬁle for th at contest. Th e largest effect this could have on contest c is if the “substi- tuted” CCVR en try reported a vote for the appar ent winner; this cannot overstate th e m argin by more than two votes, so the au dit’ s error-bound assump - tion still holds. Becau se the audit samples e ntries of the b allot style ﬁle with equ al prob ability , this kind of error in a ballot style ﬁle e ntry is ju st as likely to be revealed as a ny other . If such a b allot style ﬁle entry is selected fo r audit, step 7 (e) treats it this worst-case way . 7. The same ballot iden tiﬁer appears in shroud ed fo rm more than on ce in a single CCVR ﬁle. As in the previous case, we know there are N c entries in the CCVR ﬁle f or co ntest c and N c entries in the bal- lot style ﬁle that inclu de con test c ; mor eover , the identiﬁers in th e ballot style ﬁle are uniq ue. Hence, there must b e at least one entry in the b allot style ﬁle that lists contest c f or which the ballot id entiﬁer does not appear in shro uded form in the CC VR ﬁle. W e are there fore in case (6). 4 Discussion Others have pro posed elec tion veriﬁcation method s that inv olve a cry ptogr aphic com mitment b y elec- tions ofﬁcials to a mapping between b allots and CVRs [E.K. Rescorla , personal commun ication, 20 11; R.L. Ri vest, personal communication, 20 09; D. W allach, personal communicatio n, 2010 ; see also Adida [2006]]. Howe ver, we believe SOB A is the ﬁrst m ethod that re- quires o nly on e com mitment and that uses a risk-limiting audit to check wh ether the mapping is accu rate enoug h to determine the correct winner . W e have said little abou t the r equiremen t for a co m- pliance audit. In par t, this is a deﬁnitio nal issue: Even if the au dit trail is known to ha ve b een com promised , it is our u nderstand ing that in many states, a fu ll han d cou nt of the audit tr ail would still be the “co rrect” outcom e, as a matter of law . Hen ce, an audit to assess whether the audit trail was protected an d preser ved ade quately for it to reﬂect the ou tcome accor ding to h ow the voters cast their ballo ts is legally super ﬂuous. W e consider this a shortcomin g of curre nt audit and recount laws. More- over , we dou bt that any system can be P -resilient u nless the election and the data it generates satisﬁes particular condition s. For instance, risk-limitin g audits genera lly assume that the n umber of ballots cast in all in each co n- test is known. Such cond itions should be checked. W e would advocate carryin g out a c ompliance au- dit to assess whether the procedu res as fo llowed in the election give reason able assurance that the audit trail is trustworthy—sufﬁciently accu rate to reﬂect the outcome accordin g to how v o ters cast their ballots—and to assess whether any other pr econdition s of the risk-limiting au- dit h old. The co mpliance aud it should ev a luate whethe r there is strong e v idence th at the chain of custody of the ballots is intact, or whether it is plausible th at ballo ts were lost, “found, ” altered, or substituted. The com pli- ance audit sho uld con ﬁrm th e values o f { N c } by bal- lot accounting: co nﬁrming that the number of ballots printed equals the nu mber returned v o ted, unvoted, an d 10 spoiled, for each ballot type. If the electio n passes the comp liance audit, a risk- limiting audit can th en assess the acc uracy of the reported result and would have a large c hance of correc ting th e ap- parent ou tcome if it is wrong (by examinin g th e full a udit trail). But if the elec tion f ails the compliance audit—that is, if we lack strong e v idence that the audit trail is reli- able and that the pre condition s for the risk-limiting audit are met—a P -resilient election fr amew ork shou ld not de - clare any outcome at all. For the method to be P -resilient, H must be binding and we mu st know { N c } . Because the election ofﬁcial discloses H and the (ﬁxed) length of the ballot identi- ﬁers, we can d etermine whether H is b inding. For the method to be pr iv acy-preserv ing, H must have the hid ing proper ty , which will d epend on how the salts a re chosen and how the CC VR ﬁles are organized . If the salts can be discovered, in ferred, or guessed, or if observers have another way to reassemble wh ole-ballot CVRs from the CCVRs (for instance, if th e CCVRs are in th e same bal- lot order acro ss contests), voter privac y can be compr o- mised. 5 Conclusions SOB A makes possible a persona lly veriﬁable priv acy- preserving P -resilient canv a ss fram ew or k. It allo ws indi- viduals to obtain stron g ﬁrsthand 31 evidence that a ppar- ent election outcomes either ar e correct in the ﬁrst place, or are cor rected by a risk-limiting audit bef ore becoming ﬁnal, without u nnecessary compromises to priv acy . Af- ter the procedure is complete, either all th e outcomes are correct or an ev en t with pro bability less than 1 − P has occurre d. Th e pub lished data stru ctures allow the pub- lic to check the consistency of the apparent ou tcomes but do no t allow who le-ballot cast v o te record s to be r econ- structed, thereby pr eserving pri vacy . When all the appar- ent co ntest o utcomes are cor rect, gatherin g the evidence that th e ou tcomes are right ty pically will require e x pos- ing on ly a small fra ction of ballo ts to observers, pro tect- ing pri vacy . But the data structures and aud iting p rotocol ensure that if the apparent outco me of one or more of the contests is w rong, there is a la rge chan ce o f a full h and count of the audit trail to set the record straight. 6 Acknowledgmen ts This work was sup ported in part by NSF Grant CNS- 05243 (A CCURA TE). W e are grateful to Poorv i V ora 31 For multi-jurisdic tional contests, it might not be possible to con- duct an audit in a single place and time. If the audit step take s pl ace in pieces in separate jurisdictions simultaneousl y , ﬁrsthand knowl edge might be impossible; one might need to trust observers in other loca- tions. for shepherdin g the paper and to anonymous referee s for helpful com ments. W e are grateful to Joseph Lorenzo Hall, David Jef fe rson, Neal McBurnett, Dan Reardon, Ronald L. Ri vest, and Emily Shen f or help ful conversa- tions and commen ts on earlier drafts. Refer ences Adida, B. (200 6). Adv ances in Cryptogr a phic V o ting Systems . PhD thesis, Massachusetts In stitute of T ech- nology . American Statistical Association (2010). American Statistical Association state- ment on risk-limiting post-election audits. www.amstat .org/outreach/pdfs / Risk- Limiting_Endorsement.pdf . Ash, A. and Lamper ti, J. (20 08). Florida 2 006: Can Statistics tell us who w on Congre ssional District 13 ? Chance , 21(2):1 8–27 . Bonner, L. (2 004). New state wide election possible: Board m ay reconsider a g com missioner race. The News & Observer (Raleigh, NC), r ep ublished at V ot- ersUnite.or g . Retriev ed Februa ry 25, 2011. Bowen, D. (2007). W ithdrawal of approval of Diebold Elections Systems, Inc., GEMS 1.18.2 4/AccuV ote-TSX/AccuV ote-OS DRE & Op ti- cal Scan voting System (October 25, 2007 revision). http:/ /www. sos.ca. gov/voting- systems/oversight/ttbr/die b o l d - 1 0 2 5 0 7 . p d f . Retriev ed Februa ry 22, 2011. Branscomb, H. (2 008) . Aud it report to sat- isfy Colorado Revised Statutes, E agle County , Colorado Nov 4, 2008 General Election. http:/ /www. electio nmathematics.org/em- audits/CO/2008/AU D I T R e p o r t E a g l e C o u n t y C O 2 0 0 8 . p d f , Retriev ed March 6, 20 11. Calandrino, J., Clarkson, W ., and Felten, E. (2009 ). Some consequ ences of p aper ﬁngerprin ting for elec- tions. In Pr oceedin gs of the 2 009 E lectr onic V ot- ing T echnology W o rkshop / W orkshop on T rustworthy Elections (EVT/WO TE ’09) . USENIX. Calandrino, J., Clarkson, W ., and Fel- ten, E. (2011). Bubble trouble: Off- line de-anonymization of b u bble forms. http:/ /www. cs.prin ceton.edu/ ~ wclark so/bu bble- trouble.pd f . Calandrino, J., Halderma n, J., and Felten , E. (200 7). Machine-assisted elec tion a uditing. In Pr oceeding s of the 2007 USENIX/ACCURA TE Electr o nic V oting T echnology W orkshop (EVT 07) . USENIX. Cary , D. (2011 ). Estimating the margin of victo ry for instant-run off voting. In Pr oceeding s o f the 20 11 11 Electr onic V oting T echnology W orkshop / W orkshop on T rustworthy Elections (EVT/WO TE ’11) . USENIX. Checkow ay , S., Sarwate, A., and Shacha m, H. (201 0). Single-ballo t risk-limitin g au dits using conv ex optimization . In P r oceeding s of the 2 010 Ele c- tr onic V oting T echnology W orkshop / W orkshop o n T rustworthy Elections ( EVT/WO TE ’ 10) . USENIX. http:/ /www. usenix. org/events/evtwote10/tech/full_papers/Checkoway.pdf . Retriev ed April 20, 20 11. Election Assistance Comm ission (200 5). V olun- tary V oting System G uidelines: V olume I – V oting System P erforman ce Guidelin es . E lec- tion Assistance Commission. Internet Archive: http:/ /www. eac.gov /VVSG%20Volume_I.pdf . Federal Election Commission (2002 ). V or- ing S ystem P erformance and T est Standards: V olume I – P erformance S tanda r ds . Fed- eral Election Commission . Intern et Archiv e : http:/ /www. fec.gov /pages/vssfinal/vss.html . Flaherty , S. (2006 ). In an age o f computerized voting, is it possible to main tain voting integrity? Iowa City Press-Citiz e n . Republished by V oteT r ustUSA, http:/ /www. votetru stusa.org/index.php?option=com_content&task=view&id=1460&Itemid=113 . Retriev ed Februa ry 25, 2011. Frisina, L., Herron, M., Honaker, J., and Le wis, J. (2008 ). Ballot form ats, touchscreens, and under votes: A study of the 2006 mid term electio ns in Florida. Election Law J ournal , 7(1):2 5–47. Garber, K. (20 08). Lost votes in Florid as 2006 general election: A look at extraordinary un- dervote rates on the ES&S iV otro nic, part 2 . http:/ /www. florida fairelections.org/reports/LostVotes_Part_2.pdf . Retriev ed Februa ry 22, 2011. Hall, J. L., Mir atrix, L. W ., Stark, P . B., Brione s, M., Gin nold, E ., Oakley , F ., Peaden, M., Pellerin, G., Stanion is, T ., an d W ebber, T . (2009) . Im- plementing risk-limiting p ost-election audits in Cal- ifornia. In Pr oc. 2009 Electr o nic V oting T echnol- ogy W orkshop /W orksho p on T rustworthy Elec tions (EVT/WO TE ’09) , Montreal, Canada. USENIX. Keating, D. (200 2). Democracy counts: Th e Me dia Consortium Florida Ballot Project. In 2002 annua l confer ence o f the American P o litical Science As- sociation . American Political Science Associatio n. http:/ /www. aei.org /docLib/20040526_KeatingPaper.pdf Retriev ed March 1, 20 11. Lindeman , M. , Halvorson, M., Smith , P ., Garlan d, L., Addo na, V . , and McCrea, D. (2 008) . Prin- ciples and best practices f or post-electio n audits. www.el ectio naudits .org/files/best%20practices%20final_0 . p d f . Retriev ed April 20, 20 11. Magrino, T ., Rivest, R., Shen, E., and W agner, D. (201 1). Computing the margin of vic tory in IR V electio ns. In Pr oceeding s of th e 20 11 Electr o nic V oting T echnol- ogy W orkshop / W orkshop on T rustworthy Election s (EVT/WO TE ’11) . USENIX. Mebane, W . (2 009) . Machine errors an d undervotes in Flor ida 2 006 revisited. http:/ /www- per sonal.umich.edu/ ~ wmeban e/how paper 2.pdf . Retriev ed Februa ry 22, 2011. Mebane, W . and Dill, D. (20 07). Factors associ- ated with the excessi ve CD-13 u ndervote in the 2006 General Election in Sar asota County , Flor ida. http:/ /www. umich.e du/wmebane/smachines1.pdf . Retriev ed Februa ry 22, 2011. Menezes, A. J., V anston e, S. A., an d Oorschot, P . C. V . (1996 ). Hand book of App lied Cryptography . CRC Press, Inc., Boca Raton, FL, USA, 1st edition. Miratrix, L. and Stark, P . (2009 ). Th e trinomial bou nd for po st-election au dits. IEEE T ransactions on Infor- mation F or en sics and Security , 4:974– 981. Popoveniuc, S. and Stanton, J. (2007). Under vote and pattern voting: V uln erability and a mitigation tech- nique. In Pr e-Pr oceed ings of the 2 007 I A V oS S W o rk- shop on T rustworthy Election s (W O TE ’09 ) . Rescorla, E. (20 09). Understan ding the security proper ties of ballot- based veriﬁcation tech- niques. In Pr oceedin gs of the 201 0 Ele c- tr onic V oting T echno logy W orkshop / W o rkshop on T rustworthy Elections (EVT/WO TE ’ 10) . http:/ /www. usenix. org/event/evtwote09/tech/full_papers/ r e s c o r l a - b a l l o t . p d f Retriev ed March 6, 20 11. Ri vest, R. (200 8). On the notion of ‘ software inde- penden ce’ in voting systems. Phil. T rans. R. So c. A , 366(1 881):3 759– 3767. Ri vest, R. an d W ack, J. (20 06). On th e n o- tion of “software in depend ence” in voting sys- tems ( draft version of july 28, 2006 ). T ech- nical rep ort, Information T echno logy Laborato ry , National Institute o f Standards and T echnolo gy . http:/ /vote .nist.g ov/SI- in- voting.pdf Re- triev e d April 20, 2011. Salda ˜ n a, L. (201 0). California Assemb ly Bill 2023 . www.le ginfo .ca.gov /pub/09- 10/bill/asm/ab_2001- 2050/ab_2 0 2 3 _ b i l l _ 2 0 1 0 0 3 2 5 _ a m e n d e d _ a s m _ v 9 8 . h t m l Retriev ed April 20, 20 11. 12 Stark, P . (2008a) . Conservativ e statistical post-election audits. Ann. Appl. Stat. , 2:550 –581. Stark, P . (200 8b). A sharper discrepa ncy measure for post-election audits. Ann. Appl. Stat. , 2:98 2–985 . Stark, P . (2 009a). Au diting a collection of races simulta- neously . T echn ical report, arXi v .org. Stark, P . (20 09b). CAST: Canv ass audits b y samplin g and testing. IEEE T ransactions on Information F o r en- sics and Security , Special Issue on Electr on ic V o ting , 4:708– 717. Stark, P . ( 2009 c). Ef ﬁcien t post-electio n audits of multiple contests: 2009 California tests. http://ssrn.com /abstract=144 3314. 2009 Con fer- ence on Empirical Legal Studies. Stark, P . (2009d) . Risk-limiting post-election au dits: P - values fr om commo n probability inequa lities. IEEE T ransactio ns on Information F orensics and Security , 4:1005 –101 4. Stark, P . (2 010a) . Risk-limiting vote-tabulation au dits: The importan ce o f cluster size. Chance , 23(3 ):9–12 . Stark, P . (2010 b). Supe r-simple simultan eous single- ballot risk-limiting audits. In Pr o ceedings of the 2010 Electr onic V oting T echnology W orkshop / W orkshop on T rustworthy Elections ( EVT/WO TE ’1 0) . USENI X. http:/ /www. usenix. org/events/evtwote10/tech/full_papers/Stark.pdf . Retriev ed April 20, 20 11. Zetter , K. (2005 ). Diebold hack h ints at wid er ﬂaws. W ired , De cember . http:/ /www. wired.c om/politics/security/news/2005/12/69893?currentPage=all Retriev ed Februa ry 25, 2011. 13

SOBA: Secrecy-preserving Observable Ballot-level Audit

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment