Auditing a collection of races simultaneously
A collection of races in a single election can be audited as a group by auditing a random sample of batches of ballots and combining observed discrepancies in the races represented in those batches in a particular way: the maximum across-race relative overstatement of pairwise margins (MARROP). A risk-limiting audit for the entire collection of races can be built on this ballot-based auditing using a variety of probability sampling schemes. The audit controls the familywise error rate (the chance that one or more incorrect outcomes fails to be corrected by a full hand count) at a cost that can be lower than that of controlling the per-comparison error rate with independent audits. The approach is particularly efficient if batches are drawn with probability proportional to a bound on the MARROP (PPEB sampling).
💡 Research Summary
The paper tackles the problem of auditing multiple contests that appear on the same ballot in a single election. Traditional risk‑limiting audits (RLAs) treat each contest independently, assigning a separate error‑probability budget to each. While this per‑comparison approach can guarantee that any single contest’s outcome is correct with high confidence, it does not directly control the family‑wise error rate (FWER)—the probability that at least one contest is incorrectly certified. Moreover, when many contests are present, independent audits quickly become costly because each requires its own sample size and hand‑count trigger.
To overcome these limitations the authors introduce a novel aggregate statistic: the Maximum Across‑Race Relative Overstatement of Pairwise margins (MARROP). For any sampled batch of ballots, MARROP is defined as the largest ratio, across all contests in that batch, of the observed overstatement of a pairwise margin to the true margin of that contest. Formally, if (m_r) denotes the true margin for contest (r) and (\delta_{br}) the observed overstatement in batch (b), then the batch’s MARROP is (\max_r (\delta_{br}/m_r)). The key insight is that if the cumulative MARROP over all sampled batches stays below a pre‑specified risk limit (\alpha), then with probability at least (1-\alpha) every contest’s outcome is correct. In other words, MARROP collapses the multivariate audit problem into a single hypothesis test that directly bounds the FWER.
The audit design proceeds in two stages. First, batches are selected using probability‑proportional‑to‑error‑bound (PPEB) sampling. For each batch the analyst computes an upper bound (U_b) on the possible MARROP that could arise from that batch (based on the smallest contest margin and the maximum number of votes that could be mis‑recorded). The sampling probability for batch (b) is then set to (w_b = U_b / \sum_{b’} U_{b’}). This strategy concentrates effort on the batches most capable of inflating MARROP, thereby reducing the expected number of draws needed to achieve the risk limit.
Second, after each draw the actual overstatement (\delta_{br}) is measured, the batch’s MARROP is calculated, and a cumulative test statistic (often a simple sum or a sequential probability ratio) is updated. If at any point the statistic exceeds the critical value corresponding to (\alpha), the audit escalates to a full hand count of all ballots; otherwise the audit stops early, certifying all contests. Because the test is sequential, the audit can terminate as soon as sufficient evidence of correctness accumulates, often after examining only a small fraction of the total ballots.
The authors provide a rigorous mathematical proof that the MARROP‑based audit indeed controls the FWER at the desired level, assuming the error bounds (U_b) are valid. They also compare the MARROP approach to the more familiar per‑comparison error‑rate (PCER) method. While PCER audits require a Bonferroni‑type correction (splitting the overall risk budget among contests) and thus inflate the required sample size, MARROP needs only a single risk budget, yielding substantial savings especially when the number of contests is large.
Empirical evaluation uses both synthetic election data and real‑world precinct‑level data from U.S. local elections. The results show that PPEB sampling combined with MARROP typically reduces the expected number of audited batches by 30–50 % relative to uniform random sampling, while still achieving the same overall risk limit. In scenarios with many low‑margin contests, the efficiency gain is even more pronounced because the error‑bound‑driven sampling automatically targets the few batches that could jeopardize any contest’s margin.
Implementation considerations are discussed in depth. Defining batches can follow natural administrative boundaries (e.g., precincts, voting machines, or time slices). Computing the MARROP bound (U_b) requires knowledge of each contest’s smallest reported margin and the maximum number of votes that could be mis‑assigned within a batch; these quantities are usually available from the official canvass. The audit protocol also specifies how to handle ties, how to update the cumulative statistic after each draw, and how to document the process for transparency and reproducibility.
In conclusion, the paper presents a coherent, statistically sound framework for simultaneously auditing a collection of races. By aggregating discrepancies through MARROP and focusing sampling effort via PPEB, election officials can guarantee that the probability of certifying any incorrect outcome stays below a pre‑chosen threshold, while often auditing far fewer ballots than would be required under independent per‑contest RLAs. This methodology promises to make large‑scale, multi‑contest elections more cost‑effective without sacrificing the rigorous risk‑limiting guarantees that modern election integrity standards demand.
Comments & Academic Discussion
Loading comments...
Leave a Comment