Randomization Inference For the Always-Reporter Treatment Effect

This article studies randomization inference for treatment effects in randomized controlled trials with attrition, where outcomes are observed for only a subset of units. We assume monotonicity in reporting behavior as in \cite{lee2009training} and f…

Authors: Haoge Chang, Zeyang Yu

Randomization Inference F or the Alw a ys-Rep orter T reatmen t Effect Haoge Chang ∗ Zey ang Y u † Marc h 27, 2026 Abstract This article studies randomization inference for treatmen t effects in randomized con trolled trials with attrition, where outcomes are observed for only a subset of units. W e assume monotonicity in rep orting b eha vior as in [18] and fo cus on the av erage treatment effect for alw ays-reporters (AR-A TE), defined as units whose outcomes are observ ed under both treatmen t and control. Because alw ays-reporter status is only partially rev ealed by observed assignment and resp onse patterns, w e propose a w orst-case randomization test that maximizes the randomization p-v alue o ver all alwa ys-rep orter configurations consistent with the data, with an optional pretest to prune implausible configurations. Using studen tized Ha jek- and chi-square–t ype statistics, we show the resulting pro cedure is finite-sample v alid for the sharp n ull and asymptotically v alid for the weak n ull. W e also discuss computational implemen tations for discrete out- comes and in teger-programming-based b ounds for contin uous outcomes. 1 In tro duction Sample attrition, in which some units’ outcomes are unobserv ed after random- ization, is common in field exp erimen ts and can introduce selection bias if it is systematically related to p oten tial outcomes. A p opular framew ork for ad- dressing the attrition problem, since [18], imp oses a monotonicit y assumption under which treatmen t affects rep orting b eha vior in only one direction. 1 Un- der this assumption, the observed outcome distributions imply b ounds on the a verage treatmen t effect of alwa ys-rep orters (AR-A TE hereafter) —units who w ould rep ort outcomes regardless of treatment assignment. 2 ∗ Department of Economics, Columbia University , hc3516@columbia.edu † Department of Politics, Princeton Universit y , arthurzeyangyu@princeton.edu 1 See also [31]. 2 Other common approaches to addressing the sample attrition problem include worst-case bounds [10] and inv erse-probability w eighting under a conditional ignorabilit y assumption [23]. 1 Existing metho ds for conducting inference for AR-A TE under monotonicit y rely on asymptotic approximations [18, 11, 28]. This pap er proposes an alterna- tiv e randomization-based testing procedure for inference on the AR-A TE. The primary adv antage of our approach is that it delivers strong finite-sample guar- an tees under the sharp null h yp othesis (e.g., no treatment effects for all alw ays- rep orters), while main taining asymptotic v alidit y when treatment effects may b e heterogeneous. Specifically , our pro cedure is finite-sample v alid for testing the sharp-null hypothesis and remains asymptotically v alid under the weak null h yp othesis that the a verage treatment effect for alwa ys-rep orters is zero. The principle that randomization inference based on prop erly chosen (e.g. studen tizied) statistics delivers the dual guarantees as describ ed ab o ve has long b een known (see the related literature section b elo w). This paper applies this principle to randomization inference in the presence of sample attrition. How- ev er, the structure of our problem differs substantially from previously studied settings. In particular, the target subp opulation (alwa ys-rep orters) is unob- serv able, and the outcome distributions of alw ays-reporters cannot b e recov ered exactly ev en under the sharp-n ull hypothesis. T o address this challenge, we adopt a worst-case p-v alue approach and con- sider a w orst-case randomization test that maximizes the randomization p-v alue o ver all alwa ys-reporter configurations consisten t with the observed data. Im- p ortan tly , our mo del implies t wo balance conditions that can b e tested under the sharp-n ull hypothesis: (i) balance of outcomes, as in standard randomization- inference settings, and (ii) balance in the num b er of alw ays-reporters b et ween the treatmen t and control groups. The second balance condition can b e used to construct a t wo-step testing pro cedure or combined with the first condition to yield a chi-square–t yp e test statistic. This stands in contrast to the exist- ing literature, where the sharp n ull hypothesis typically implies only a balance condition on the outcomes. The second balance follows from the observ ation that, while alwa ys-reporters cannot b e individually identified, random assign- men t implies that their exp ected counts are equal across the treatmen t and con trol groups. The worst-case p-v alue approach delivers finite-sample v alid inference un- der the sharp n ull h yp othesis via a simple argumen t. Establishing asymptotic v alidit y under mild conditions, how ev er, requires more refined analysis than in the existing literature. In particular, we sho w that our pro cedure is uniformly asymptotically v alid o ver a parameter region in whic h the differential attrition rate b etw een treatment and control groups ma y be negligible. This empirically relev an t region is generally ruled out b y the analysis of existing procedures [18]. 3 Incorp orating this region requires a refined conditional analysis, as it may in- duce nonstandard asymptotic distributions for the component of the statistic that tests balance in the n umber of alwa ys-rep orters, depending on the partic- ular sequence of parameters under consideration. In addition to the statistical results, we provide computationally feasible algorithms for implemen ting the prop osed statistical procedure. F or outcomes 3 A notable exception is [25], which prop oses a pretesting-based approach. 2 with finite support (e.g., binary or categorical), we exploit the symmetry of randomization distribution to compute p-v alues via exhaustive enumeration. F or contin uous outcomes, w e reformulate the p-v alue computation as a collection of smaller optimization problems, each solved using in teger linear programming. The remainder of the pap er is organized as follows. Section 1.1 discusses related literature, Section 2 describ es the setup and standard assumptions, Sec- tion 3 introduces our statistical pro cedures, Section 4 states additional assump- tions and establishes the statistical guaran tees, Section 5 discusses computa- tional implemen tation and Section6 presen ts simulation results. 1.1 Related Literature Asymptotic inference metho ds for (trimming) b ounds hav e b een studied in [11, 18, 28, 24, 25]. Randomization inference and design-based inference with sample attrition or missing outcomes has b een examined in [20, 13, 9, 8, 19]. T o our knowledge, none of the existing pap ers establish the dual guarantee in the monotone attrition framew ork as provided by our procedure. [17] also consider randomization-based pro cedures for weak null hypotheses and their approaches requires a user-sp ecified apriori b ounds on the outcome supp ort. In general, heterogeneity-robust permutation/randomization inference is stud- ied in [21], [14], [15], [16], [4], [3], [30], [32], [5], [29] and [1]. The primary dif- ference of our pap er with resp ect to this literature has b een discussed in the in tro duction. 2 Setup, Dataset and Statistical F ramew ork 2.1 Notation and Iden tification Assumptions W e consider an exp erimen t with n units, where eac h unit is randomized to the treatmen t group or the con trol group. Let y i ( d ) denote the p oten tial outcome of unit i under treatmen t status d ∈ { 0 , 1 } . Let r i ( d ) denote the rep orting status of the i th unit when under treatmen t status d . W e write r i (1) = 1 if the i th unit is assigned to treatment and is present in the follo wing-up survey , and we write r i (1) = 0 if it is not present. The reporting status r i (0) is defined analogously . As in [18], w e mak e the follo wing assumption on the reporting status. Assumption 1 (Monotonicit y) . F or all i ∈ [ n ], w e ha ve r i (1) ≥ r i (0). The monotonicity assumption rules out the existence of units who would rep ort if untreated but not report if treated. W e note that the direction of the monotonicit y is not critical here. With sligh t modification, the same iden tifica- tion argumen t and statistical pro cedure work if one assumes that r i (1) ≤ r i (0) for all i ∈ [ n ]. The key requirement is that treatment affects rep orting status in the same direction for all units. Giv en Assumption 1, an unit i belongs to one of the three principal strata based on its p oten tial rep orting status. Sp ecifically , it can be: 3 • an alw ays-reporter if r i (1) = 1 and r i (0) = 1, • an if-reporter if r i (1) = 1 and r i (0) = 0, or • a nev er-rep orter if r i (1) = 0 and r i (0) = 0. W e refer to the sets of alwa ys-reporters, if-reporters, and never-reporters as the principal rep orting strata [6]. They are the groups defined with resp ect to the p oten tial reporting status. Let A denote the set of alw ays-reporters, defined as: A = { i ∈ [ n ] : r i (0) = 1 , r i (1) = 1 } . (1) The parameter of interest is the av erage treatmen t effect (A TE) of the alw ays- rep orters, defined as: τ A = 1 |A| X i ∈A  y i (1) − y i (0)  , (2) where we implicitly assume that |A| ≥ 1 so that the quantit y is well-defined. W e refer to this parameter as the always-r ep orter aver age tr e atment effe ct (AR- A TE). 4 Under Assumption 1 and using the argument in [18], one can establish that AR-A TE is partially identified and a sharp b ound on AR-A T E can b e derived. 5 In particular, the logic of [18] implies (i) the fraction of alwa ys-rep orters, π AR , is identified b y the rep orting rate in the con trol group; (ii) the fraction of if-rep orters, π IR , is identified b y the differential rep orting rate betw een the treated and con trol groups; (iii) the a verage un treated outcome for alwa ys-reporters is iden tified b y the a verage outcomes among reporters in the control group; (iv) the fraction of if-rep orters among the rep orters in the treated group is π IR | r (1)=1 = π IR / ( π AR + π IR ) (v) the av erage treated outcome for alw ays-reporters, which is partially iden- tified, is b ounded b etw een the upp er-( π IR | r (1)=1 ) and low er-( π IR | r (1)=1 ) trimmed means of the treated outcomes among rep orters in the treated group with estimable sample analogues. 6 4 F or if-rep orters, we never observe their control outcomes; for never-reporters, we observ e neither their treated nor control outcomes. F or the A TE parameters of these groups, it should b e clear that we can do no b etter than worst-case b ounds under a b ounded outcome assumption. 5 F or example, see Section 7.4 of [7]. 6 Let { x ( i ) } n i =1 with x (1) ≤ · · · ≤ x ( n ) denote the order statistics of n real num b ers and let s ∈ [0 , 1]. The upp er- s trimmed mean of { x ( i ) } n i =1 is 1 n −⌊ sn ⌋ P ,n −⌊ sn ⌋ i =1 x ( i ) and the low er- s trimmed mean of { x ( i ) } n i =1 is 1 n −⌊ sn ⌋ P n i = ⌊ sn ⌋ +1 x ( i ) . 4 F or statistical inference on the identified set, [18] uses asymptotic metho ds. In this pap er, w e retain the identification framew ork of [18] but replace asymptotics metho ds with a randomization-based inferential approach. 2.2 Exp erimen tal Design and Data Generating Pro cess W e assume that the exp erimen ts under consideration are completely random- ized: among n units, researchers assign n 1 units to treatment, chosen uni- formly at random. F or eac h unit i , w e let D i b e a random v ariable such that D i = 1 if unit i is assigned to treatment, and D i = 0 otherwise. W e note that P( D i = 1) = n 1 /n . W e denote a random assignmen t v ector follo wing a complete randomization of n 1 out of n units as D = ( D i ) n i =1 ∼ CR( n, n 1 ). A sample of n units is asso ciated with an unknown p oten tial-outcome- p oten tial-rep orting-status table (( y i (1) , y i (0) , r i (1) , r i (0))) n i =1 . The units are randomly assigned to the treatmen t according to a realization of the assign- men t vector D obs = ( D obs i ) n i =1 ∼ CR( n, n 1 ). W e observe rep orting status for all units given by R i = r i ( D i ). F or rep orted units with R i = 1, we observe their outcomes Y i = y i ( D i ) ∈ R . F or units with R i = 0, we do not observ e their outcomes and we denote their outcomes as NA . The observed dataset hence consists of a set of outcome–assignment–reporting-status triples D = { ( Y i , D obs i , R i ) } n i =1 ∈ ( R ∪ NA ) n × { 0 , 1 } n × { 0 , 1 } n . (3) W e will also write D =  Y , D obs , R  where Y = ( Y i ) n i =1 , D obs = ( D obs i ) n i =1 and R = ( R i ) n i =1 if needed. W e adopt the finite-p opulation (design-based) framework, treating the p o- ten tial outcomes and p oten tial rep orting status as fixed parameters [12]. The sole source of randomness in our mo del is the v ector of random treatmen t as- signmen ts, and statistical uncertainties are ev aluated exclusively with resp ect to this randomness. 2.3 Problem Statemen t With the setup ab o v e, we no w state our pr oblem. Given a sample of n units and the set of alwa ys-reporters A as defined in (1), we are in terested in designing statistical inferen tial pro cedures that are v alid as a test (but with different guaran tees) for both the sharp-n ull h yp othesis: H s 0 : y i (1) = y i (0) , ∀ i ∈ A , (4) and the weak-n ull h yp othesis: H w 0 : 1 |A| X i ∈A y i (1) = 1 |A| X i ∈A y i (0) . (5) W e shall prop ose below procedures that are finite-sample v alid for testing the sharp-n ull hypothesis and asymptotically v alid for the w eak-null h yp othesis. 5 3 Statistical Pro cedure W e first giv e a heuristic explanation for our inferential pro cedure. W e denote A i = 1 if the unit i is an alw ays-reporter and A i = 0 if otherwise. Denote the binary v ector of alwa ys-report indicators as A = ( A i ) n i =1 ∈ { 0 , 1 } n . W e note that A is unknown to the researc hers and is only partially revealed by the realized assignmen ts (e.g. see discussion in Section 3.1). If we kno w the set of alw ays-reporters, randomization inference under the sharp-n ull hypothesis is straightforw ard. F or example, w e can use the (p ossibly studen tized) absolute v alue of the difference-in-means statistic b τ ( ˜ D , Y , A ) =       P n i =1 ˜ D i A i Y i P n i =1 ˜ D i A i − P n i =1  1 − ˜ D i  A i Y i P n i =1  1 − ˜ D i  A i       , (6) as our test statistic, where ˜ D = ( ˜ D i ) n i =1 ∈ { 0 , 1 } n is an arbitrary treatmen t assignmen t, Y = ( Y i ) n i =1 is the vector of (p ossibly missing) observed outcomes, and A is the binary vector of alwa ys-rep ort indicators defined ab o ve. W e can obtain the randomization distribution of the statistics under the sharp-n ull hy- p othesis and reject the sharp-null hypothesis (4) if the p-v alue p ( A ) is less than a pre-sp ecified lev el α . A standard argument [12] sho ws that this test is a finite- sample v alid level- α test for testing the sharp-null hypothesis. How ever, this pro cedure is infeasible b ecause it relies on the kno wledge of the set of alwa ys- rep orters, which is unkno wn in practice. T o address this difficult y , we tak e a w orst-case approac h: we calculate w orst- case p-v alue as the maximum across all randomization p-v alues based on sets of alw ays-reporters consistent with the observ ed data. W e shall explain b elo w how to construct suc h sets of alwa ys-rep orters in Section 3.1. F or no w, let A ( D obs , R ) b e the set of sets of alwa ys-reporters that are consistent with the observed data. F ormally , the worse-case p-v alue is defined as p worst = sup A ∈ A ( D obs ,R ) p ( A ) . (7) F or statistical testing, w e will reject the sharp-n ull hypothesis if p worst ≤ α . The statistical inferential algorithm is detailed in Algorithm 1. Giv en a sample size n , w e denote an arbitrary test statistic as: T ( · , · , · ) : ( R ∪ NA ) n × { 0 , 1 } n × { 0 , 1 } n → [ −∞ , ∞ ] , (8) whic h is a function mapping (possibly missing) observed outcomes, treatment assignmen ts and alw a ys-rep orters indicators to the extended real line. W e dis- cuss the choice of test statistics in Section 3.2. Remark 3.1. In practice, one does not need to iterate ov er all p ossible reporting tables. One can terminate the algorithm and output 0 as so on as a single p-v alue is abov e α − β . 6 Algorithm 1 Randomization Inference for AR-A TE 1: Input : dataset D = ( Y , D obs , R ), where Y = ( Y i ) n i =1 ∈ ( R ∪ NA ) n , D obs =  D obs i  n i =1 ∈ { 0 , 1 } n and R = ( R i ) n i =1 ∈ { 0 , 1 } n , test statistics T ( · , · , · ) as defined in (8), pre-test significance level β ∈ [0 , 1]. 2: Step 1 : Compute the set of p ossible rep orting tables A ( D , R ) as (10). 3: Step 2 : Prune A ( D , R ) with Algorithm 2 at the significance level β . 4: Step 3 : F or each p ossible reporting table A = ( A i ) n i =1 ∈ A ( D, R ), calculate the p-v alue: p ( A ) = E D ∼ CR( n,n 1 )  1 {T ( Y , D , A ) ≥ T ( Y , D obs , A ) }  , (9) where Y , D obs and A are fixed and the expectation is taken with respect to the random binary vector D = ( D i ) n i =1 , which is generated b y complete randomization (selecting n 1 of n units without replacement). 5: Step 3 : Collect the p -v alues for all p ossible rep orting tables { p ( A ) } A ∈ A ( D,R ) and output 1 (reject) if max A ∈ A ( D,R ) p ( A ) ≤ α − β , otherwise output 0 (fail to reject). In what follo ws, Section 3.1 describ es the construction of all rep orting tables compatible with the observ ed assignmen ts and rep orting statuses. Section 3.2 in tro duces sev eral test statistics. 3.1 Assign units to principal rep orting strata T o construct the worst-case p-v alue, we need to first construct the set of sets of alw ays-reporters that are consisten t with the observ ed data. Recall that R i is the reporting status of the i th unit and D i is its treatment assignment. Conditioning on treatment assignments, it is p ossible to assign some sub- jects to principal rep orting strata based on realized assignments and rep orting statuses: • If D i = 0 and R i = 1, the unit m ust b e an alw ays-reporter; • If D i = 0 and R i = 0, the unit could b e an if-rep orter or a never-reporter, and it can not b e an alwa ys-rep orter; • If D i = 1 and R i = 1, the unit could be an if-rep orter or an alw ays- rep orter; • If D i = 1 and R i = 0, the unit m ust b e a nev er rep orter. The toy example in T able 1 illustrates the attribution pro cedure. Note that the alw ays-reporters in the con trol group can b e identified exactly . The ambiguit y comes from the treated units with D i = 1 and R i = 1, whic h are a mixture of alw ays-reporters and if-rep orters. Hence the set of sets of alwa ys rep orters that are consisten t with the data can b e describ ed as: 7 A ( D, R ) =        A ∈ { 0 , 1 } n : A i = 1 , if D i = 0 , R i = 1 A i = 0 , if D i = 0 , R i = 0 A i = 0 , if D i = 1 , R i = 0 A i ∈ { 0 , 1 } , if D i = 1 , R i = 1        (10) index r i (1) r i (0) D i R i AR? T rue Principal Stratum 1 1 1 1 1 ? alw ays-reporter 2 1 0 1 1 ? if-rep orter 3 0 0 1 0 NO never-reporter 4 1 1 0 1 YES alw ays-reporter 5 1 0 0 0 NO if-rep orter 6 0 0 0 0 NO never-reporter T able 1: The column index lists the indices for units. The column r i (1) indi- cates the rep orting status if treated. The column r i (0) indicates the reporting status if untreated. The column D i sho ws the observ ed treatment assignments. The column R i sho ws the observed reporting status. The column AR ? indicates whether the unit is an alwa ys-rep orter. The column T rue Princip al Str atum de- notes the true (but unknown) principal reporting stratum that the unit b elongs. W e shall call eac h A ∈ A ( D , R ) a r ep orting table , which is a v ector of alw a ys- rep orter indicators. 3.1.1 Refine sets of alwa ys-rep orters with pretesting Based on the observed missing pattern, we can pretest the n um b er of alwa ys- rep orters in the data. Bec ause the treatmen t assignmen t is randomized indep en- den t of the rep orting status, the fractions of alwa ys-reporters should on av erage b e the same. This argumen t allows one to prune probabilistically improbable tables in A ( D , R ) that contain too many or to o few alw ays-reporters. This is an application of thr Bergers-Bo o pro cedure in our setting [2]. F or example, given a table A ∈ A ( D , R ), define the test-statistic: b τ a n ( D , A ) = 1 n 1 n X i =1 D i A i − 1 n 0 n X i =1 (1 − D i ) A i , (11) and w e reject a table if the p-v alue based on its randomization distribution is smaller than β . A detailed pro cedure is included in Algorithm 2. 7 Other pre- testing procedures, such as the one based on tail bounds or test in versions [22] can also b e employ ed. 7 Note that the randomization distribution of [ DIM( D, A ) dep ends only on the n umber of alwa ys-reporters in A b ecause of the symmetry of complete randomization. Hence, all tables with the same num b er of alwa ys-reporters yield identical rejection decisions. This observ ation can substantially reduce the computational burden. 8 Algorithm 2 Prune A  D obs , R  based on the n umber of alwa ys rep orters at lev el β 1: Input : { ( R i , D obs i ) } n i =1 , compatible rep orting tables A  D obs , R  as defined in (10), significance lev el β ∈ (0 , 1) 2: Step 1 : Calculate n 1 A = P n i =1 R i D i . 3: for i ← 0 : n 1 A do 4: Select arbitrary A ∈ A ( D , R ) such that P n i =1 D obs i A i = i . 5: Giv en the selected A , calculate p ( i ) = E D ∼ CR( n,n 1 )  1 { b τ a n ( D obs , A ) ≥ b τ a n ( D , A ) }  , (12) where A is fixed and the expectation is tak en with resp ect to the random binary vector D = ( D i ) n i =1 , which is generated by complete randomization (selecting n 1 of n units without replacement). 6: end for 7: Step 2 : Calculate the set N A = { i ∈ [0 , n 1 A ] : p ( i ) ≥ β } . 8: Return the pruned set A prune  D obs , R  = n A ∈ A  D obs , R  : n X i =1 D obs i A i ∈ N A o . (13) 3.2 T est Statistics 3.2.1 The Studen tized-Ha jek Statistic Giv en a reporting table A that is consistent with the data, define the Ha jek estimator b τ hj n ( Y , D , A ): b τ hj n ( Y , D , A ) = P n i =1 D i A i Y i P n i =1 D i A i − P n i =1 (1 − D i ) A i Y i P n i =1 (1 − D i ) A i , (14) and its v ariance estimator: b V hj n ( Y , D , A ) = P n i =1 D i A i  Y i − b τ 1 A  2 ( P n i =1 D i A i ) 2 + P n i =1 (1 − D i ) A i  Y i − b τ 0 A  2 ( P n i =1 (1 − D i ) A i ) 2 (15) where, b τ 1 A = P n i =1 D i A i Y i P n i =1 D i A i , b τ 0 A = P n i =1 (1 − D i ) A i Y i P n i =1 (1 − D i ) A i . (16) W e define the absolute v alue of the studentized-Ha jek statistic as: T 0 n ( Y , D , A ) =       b τ hj n ( Y , D , A ) q b V hj n ( Y , D , A )       . (17) 9 3.2.2 The Chi-square Statistics Apart from the balance of outcomes among alwa ys-rep orters, we can include the balance of the num ber of alwa ys-reporters b et ween the treated and con trol groups, leading to v ariations of W ald statistics. The v ariance of the s tatistic b τ a n , V n ( A ), is defined as V n ( A ) = n 2 n 1 n 0 ( n − 1)   1 n n X i =1 A i − 1 n n X i =1 A i ! 2   . (18) W e no w define tw o test statistics: T 1 n ( Y , D , A ) =  T 0 n ( Y , D , A )  2 + b τ a n ( D , A ) p V n ( A ) ! 2 , (19) and T 2 n ( Y , D , A ) =  T 0 n ( Y , D , A )  2 + $ b τ a n ( D , A ) p V n ( A ) % − ! 2 , (20) where w e define ⌊ x ⌋ − = max { 0 , − x } and 0 / 0 = 0. 4 Statistical Guaran tees 4.1 Randomization-based Procedures Let Π n b e a distribution of random assignment v ariables { D i } n i =1 implemen ting a completely randomized design. W e first define a general parameter space enco ding Assumption 1. Θ n = n ( { ( y i (1) , y i (0) , r i (1) , r i (0)) } n i =1 , Π n ) : r i (1) ≥ r i (0) , ∀ i ∈ [ n ] o . (21) Note that each element θ n ∈ Θ n is a combination of p otential outcomes, p oten- tial rep orting status, and an exp erimen tal design. Each elemen t θ n completely determines distributions of our test statistics. F or the asymptotic guaran tee of testing the weak-n ull h yp othesis, we need the following assumptions on our parameter space: Assumption 2 (P arameter Space) . Let n ∈ N , and constants δ ∈ [ − 1 , 1), B > 0 and s ∈ (0 , 1] be given. Giv en a set of p oten tial outcomes and rep orting status { ( y i (1) , y i (0) , r i (1) , r i (0)) } n i =1 , define the alw ays-reporter indicator for eac h unit i as A i = 1 if r i (1) = r i (0) = 1 and A i = 0 otherwise. Define the set of alw ays-reporters as A = { i : A i = 1 } . Let n A = P n i =1 A i denote the n umber of alw ays-reporters. The following con- ditions hold: (i) F or the given s , n A ≥ sn . 10 (ii) F or the given δ , w e ha ve σ y (1) ,y (0) ,A ≥ − δ · σ y (1) ,A · σ y (0) ,A , (22) where σ y (1) ,y (0) ,A is the alw ays-reporter-group correlation of adjusted p o- ten tial outcomes, σ y (1) ,y (0) ,A = 1 n A − 1 X i ∈A  y i (1) − µ y (1) ,A   y i (0) − µ y (0) ,A  , (23) where µ y ( a ) ,A = n − 1 A P i ∈A y i ( a ) for a ∈ { 0 , 1 } , and σ 2 y (1) ,A and σ 2 y (0) ,A are the alw ays-reporter-group potential outcome v ariances, σ 2 y ( a ) ,A = 1 n A − 1 X i ∈A  y i ( a ) − µ y ( a ) ,A  2 , for a ∈ { 0 , 1 } . σ 2 y (1) ,A and σ 2 y (0) ,A are p ositiv e. If n A = 1, all v ariances and co v ariances are defined to b e zero. (iii) F or eac h a ∈ { 0 , 1 } and the given B , 1 n A X i ∈A | y i ( a ) − µ y ( a ) ,A | 4 ! 1 / 4 ≤ B σ y ( a ) ,A . (24) Remark 4.1. Condition (i) assumes that the alw ays-reporters tak e an non- negligible share of the sample. Condition (ii) rules out extreme correlations b et w een the p otential treatment and con trol outcomes of the compliers, a com- mon assumption in the finite-population setting. Condition (iii) is a standard tec hnical condition that arises when applying the central limit theorem for tri- angular arrays, and may be thought of as excluding heavy-tailed or sparse data. Assumption 3 (Completely Randomized Exp erimen t) . Given n ∈ N and a constan t 0 < r ≤ 1 / 2, w e hav e n 1 ∈ [ rn, (1 − r ) n ]. W e denote the parameter space of p oten tial outcomes, p oten tial rep orting statuses, and exp erimen tal designs satisfying Assumptions 1, 2 and 3 with n units and constants δ , r , and B as Θ n ( δ, s, r, B ) ⊂ Θ n . F or the weak-n ull h yp othesis, w e further define the parameter space: Θ w n ( δ, s, r, B ) = ( θ ∈ Θ n ( δ, s, r, B ) : 1 n A X i ∈A y i (1) = 1 n A X i ∈A y i (0) ) . (25) F or the sharp-null hypothesis, w e define the parameter space: Θ s n = n θ ∈ Θ n : y i (1) = y i (0) , ∀ i ∈ A o . (26) Note that the parameter space for the sharp-n ull hypothesis needs not satisfy Assumption 2 and Assumption 3. 11 The following theorem states that using test statistics T 0 , T 1 and T 2 in- tro duced in (17), (19), and (20), the statistical pro cedure as in Algorithm 1 pro vides finite-sample v alid level- α test against the sharp-n ull hypothesis and asymptotically-v alid level- α test against the weak-n ull h yp othesis. Theorem 4.2. Consider the statistical pro cedure in Algorithm 1 with test statistics T 0 n , T 1 n and T 2 n defined in (17), (19), and (20). Fix a significance level α ∈ (0 , 0 . 25] and a pre-testing lev el β ∈ [0 , α ). Let Θ s n b e the parameter space of the sharp-null hypothesis defined in (26). W e hav e sup θ s n ∈ Θ s n P θ s n  p worst ≤ α  ≤ α. (27) Fix δ < 1, s ∈ (0 , 1], r ∈ (0 , 1 / 2], and B > 0. Let Θ w n ( δ, s, r, B ) b e the parameter space of the w eak-null hypothesis defined in (25). W e hav e lim sup n →∞ sup θ w n ∈ Θ w n ( δ,s,r ,B ) P θ w n  p worst ≤ α  ≤ α (28) Remark 4.3. Our asymptotic uniform guaran tee includes cases where the sam- ple consists almost en tirely of alw ays-reporters and few (p oten tially zero) if- rep orters. Other asymptotic metho ds generally rule out this p ortion of the parameter space [18] or rely on problem-sp ecific tuning parameters [26]. Includ- ing this part of the parameter space necessitates a more in volv ed mathematical analysis. 4.2 Asymptotic-Distribution-Based Procedures W e note that, as an in termediate step in the pro of of Theorem 4.2, we deriv ed the asymptotic distributions (up to the slackness in tro duced b y the v ariance b ounds) of the statistics T 0 n , T 1 n , and T 2 n under the true but unkno wn alwa ys-reporter v ector A . This result p ermits inference based on asymptotic critical v alues. Pro cedures based on critical v alues obtained from asymptotic approximations do not provide the finite-sample guarantee as in (27); they retain only the asymptotic guarantee as in (31). Ho wev er, as we discuss in Section 5, procedures based on asymptotic critical v alues are typically m uch easier to compute in practice. The inferential algorithm based on asymptotic critical v alues is presented in Algorithm 3. W e first define three functions of n 1 A = P n i =1 D i A i , giv en n 1 , n 0 , and n A : g 0 ( n 1 A ) = 0 , g 1 ( n 1 A ) = V − 1 n ( A )  n 1 A n 1 − n A − n 1 A n 0  2 , (29) g 2 ( n 1 A ) = V − 1 n ( A )  n 1 A n 1 − n A − n 1 A n 0  − ! 2 , (30) where V n ( A ) is defined in (51) and we adopt the conv ention that 0 / 0 = 0. These three functions correspond to the three statistics for balance in the n umber of alw ays-reporters in test statistics (17), (19), and (20). 12 Algorithm 3 Asymptotic Inference for the AR-A TE 1: Input: Dataset D = ( Y , D obs , R ) as in (3); test statistics T i ( · , · , · ), i ∈ { 0 , 1 , 2 } , as in (17), (19), and (20); significance lev el α ∈ (0 , 1); pre-test significance lev el β ∈ [0 , α ). 2: Step 1: Create the set of compatible reporting tables A ( D, R ) as in (10). 3: Step 2: Prune A ( D , R ) using Algorithm 2 at significance level β . 4: Step 3: Initialize Rej ← 1. (1 = reject, 0 = fail to reject) 5: Step 4: Enumerate the p ossible sizes of alwa ys-rep orter vectors: Cardinalit y - A = ( n X i =1 A i : A = ( A i ) n i =1 ∈ A ( D, R ) ) . 6: for each k ∈ Cardinality- A do 7: Compute the (1 − α + β )-quan tile q k i, 1 − α + β of Z + g i ( n 1 A ), where Z ∼ N(0 , 1), n 1 A = P n i =1 D i A i , D = ( D i ) n i =1 ∼ CR( n, n 1 ), and Z is indep enden t of D . 8: Compute T max ,k i = max A ∈ A ( D,R ) : P n i =1 A i = k T i n  Y , D obs , A  . 9: if T max ,k i ≤ q k i, 1 − α + β then 10: Set Rej ← 0. 11: end if 12: end for 13: Step 5: return Rej . Theorem 4.4. Consider the statistical pro cedure in Algorithm 3. Fix a signif- icance level α ∈ (0 , 0 . 25] and a pre-testing level β ∈ [0 , α ). Fix δ < 1, s ∈ (0 , 1], r ∈ (0 , 1 / 2], and B > 0. Let Θ w n ( δ, s, r, B ) b e the parameter space of the w eak- n ull hypothesis defined in (25). Define the ev ent E n = { ( Y , D obs , R ) : Algorithm 3 returns Rej = 1 } . Then w e ha ve, lim sup n →∞ sup θ w n ∈ Θ w n ( δ,s,r ,B ) P θ w n ( E n ) ≤ α. (31) Remark 4.5. W e remark that the limiting distribution of g 1 ( n 1 A ) can dep end on the sequence of alwa ys-rep orter vectors. W e give tw o simple examples here and other sequences are also possible and ma y yield different limits. If a sequence ( A n ) ∞ n =1 satisfies, for a constan t c < 1, 1 n n X i =1 A n i ≤ c, ∀ n, then g 1 ( n 1 A ) has a limiting chi-squared distribution with one degree of freedom. In contrast, if P n i =1 A n i = 1 for all n , then g 1 ( n 1 A ) has a p oint m ass at 0 as its limiting distribution. Both sequences are permitted b y our parameter space. A 13 similar comment applies to g 2 ( n 1 A ). Both the randomization-based pro cedure in Algorithm 1 and the asymptotic pro cedure in Algorithm 3 are agnostic to the particular sequence of alwa ys-rep orter vectors, and b oth deliver uniformly asymptotically v alid inference. 5 Implemen tations 5.1 Ov erview In randomization inference, computing exact p-v alues by enumerating all p os- sible treatment assignments is typically infeasible. Instead, p-v alues are usually appro ximated via Mon te Carlo simulation, using a sufficien tly large n um b er of indep enden t random assignments to approximate the randomization distribu- tion. Let n mc denote the n um b er of Monte-Carlo sim ulations. F or eac h sim ulation s ∈ { 1 , ..., n mc } , denote the s th sim ulated assignmen t v ariables as D s = ( D s i ) n i =1 . The computation of the worst-case p-v alue problem based on Mon te-Carlo draws can be represen ted as: p worst,mc = max A ∈ A ( D obs ,R ) n mc X s =1 1  T i n ( Y , D s , A ) ≥ T i n  Y , D obs , A  , (32) with T i n , i ∈ { 0 , 1 , 2 } defined in (17), (19) and (20) resp ectiv ely , and A ( D obs , R ) defined in (10). The optimization problem (32) can b e c hallenging to solve practically . If the n umber of rep orting treated units is r 1 = P n i =1 D obs i R i , then the set A ( D obs , R ) will generically con tain 2 r 1 candidate alwa ys-rep orter tables. Pre-testing the n umber of alwa ys-rep orters in the treated group can shrink the searc h space, but t ypically not enough to mak e brute-force enumeration practical for mo derately sized datasets—for example, ev en when r 1 = 50. W e discuss t wo computation approaches in this section, designed for different outcome data v ariable types. Section 5.2 considers the case where the observ ed outcome v ariables are discrete and ha ve a small support. F or example, this is the case when the outcome v ariables are binary , categorical or count-v alued. In suc h a setting, exhaustive en umeration is feasible even with a large sample size, after exploiting the symmetry of complete randomization. Section 5.3 considers a second approach which can be used with con tinuous outcomes. It decomp oses (32) into smaller problems, each solv able using (lin- ear) integer programming tec hniques. The resulting p-v alue, p worst,IP ,mc , has the guaran tee that p worst,IP ,mc ≥ p worst,mc , hence enabling v alid y et p ossibly conserv ativ e inference. In our simulations, w e did not observe a ma jor loss of p o w er due to such relaxations, pro vided each subproblem is tigh t enough. See Lemma 5.1. W e use the follo wing notations through out the section. Giv en a dataset with a sample size n and a vector of alwa ys-rep orter indicators A = ( A i ) n i =1 , 14 let n A = P n i =1 A i denote the n umber of alw ays-reporters. Denote the random assignmen t v ariables ( D i ) n i =1 ∼ CR( n, n 1 ). W e define the random v ariables n A, 1 = P n i =1 D i A i and n A, 0 = P n i =1 (1 − D i ) A i . 5.2 Outcomes with a small num b er of supp ort p oin ts Datasets with discrete outcomes (e.g., binary , categorical or coun t-v alued) are common. With suc h dataset, exhaustive enumeration is feasible even with a large sample size, after exploiting the symmetry of complete randomization. Supp ose that the supp ort of the observ ed outcomes, S , has cardinalit y K . W e enumerate the supp ort as S = { v 1 , ..., v K } . F or the randomization distribu- tion under the sharp-n ull h yp othesis, we define the num b er of alw ays-reporters with outcome v k as n k A = P n i =1 A i 1 { Y i = v k } , k ∈ [ K ], and the num b er of alw ays-reporters assigned to treatmen t and control groups with outcome k as n 1 ,k A = P n i =1 D i A i 1 { Y i = v k } and n 0 ,k A = P n i =1 (1 − D i ) A i 1 { Y i = v k } , where D = ( D i ) n i =1 is a generic assignmen t v ector. Giv en n A and { n k A } k ∈ [ K ] , the distribution of the test statistics (17), (19) and (20) are completely determined b y the distribution of { n 1 ,k A } k ∈ [ K ] . F or example, the squared studen tized-Ha jek statistics (17) can b e written as a function of n A , { n k A } k ∈ [ K ] , n A, 1 and { n 1 ,k A } k ∈ [ K ] :  T 0 n ( Y , D , A )  2 =  n − 1 A, 1 P K k =1 v k n 1 ,k A − n − 1 A, 0 P K k =1 v k n 0 ,k A  2 n − 1 A, 1 b σ 2 1 + n − 1 A, 0 b σ 2 0 , where, for a ∈ { 0 , 1 } , b σ 2 a = 1 n A,a K X k =1 n a,k A v k − 1 n A,a K X k =1 v k n a,k A ! 2 , with n A, 1 = P K k =1 n 1 ,k A , n A, 0 = n A − n A, 1 , and n 0 ,k A = n k A − n 1 ,k A . The probabilit y mass function of { n 1 ,k A } k ∈ [ K ] is pmf  { n 1 ,k A } k ∈ [ K ]  =  n n 1  − 1  n − n A n 1 − n 1 A  Y k  n k A n 1 ,k A  , whic h dep ends on { n, n 1 , { n k A } K k =1 } . Hence, the randomization distributions of test statistics (17), (19) and (20) under the sharp-n ull h yp othesis are completely determined b y the parameters { n, n 1 , { n k A } K k =1 } . Giv en an alw ays-reporters vector A = ( A i ) n i =1 ∈ A ( D obs , R ), we denote the v ector of outcome counts as c ( A ) =  n k A  K k =1 . By the argumen t ab o v e, the randomization-based p -v alue associated with a given alwa ys-rep orter table dep ends on A only through the outcome coun ts vector c ( A ), together with the sample size n and the num b er of treated units n 1 : p ( A ) = e p  c ( A ) , n, n 1  . (33) 15 Let C ( D obs , R ) denote all p ossible coun ts v ector that is compatible with the data: C ( D obs , R ) = { c ( A ) , A ∈ A ( D obs , R ) } . (34) W e ha ve the reduction, p worst = sup A ∈ A ( D obs ,R ) p ( A ) = sup c ∈ C ( D obs ,R ) e p ( c, n, n 1 ) . (35) If the supp ort of observed outcomes has cardinality K , the w orst-case cardinalit y of C ( D obs , R ) is upper-b ounded by n K . Pro vided that we consider a regime where K do es not increase with n , an exhaustive search algorithm has a time complexit y that is p olynomial in n . When K is small, for example, K = 2 for binary outcomes, a simple implementation would lead to a practical algorithm. A pseudo algorithm is included as Algorithm 5 in Appendix Section C. 5.3 In teger Programming (IP) Approac h for outcomes with a large n umber of supp ort p oin ts When the observed outcome data ha ve large supp ort, the reduction in Sec- tion 5.2 do es not yield practically efficient algorithms. T o optimize ov er a large searc h space, w e emplo y in teger programming techniques. Because T 0 n , T 1 n , and T 2 n are nonlinear functions of the alwa ys-rep orter indi- cators A , a direct in teger-programming formulation is challenging. In particular, after factorization, T 0 n and T 1 n can be expressed as sums of ratios of polynomials in A of degree up to 10, and T 2 n is further complicated by the floor op erator ⌊·⌋ − . Standard linearization tec hniques in IP are not easily applicable ev en with small instances. T o address the computational c hallenge, we decompose (32) into a collection of smaller subproblems that are substantially easier to solve in our simulations. Moreo ver, the subproblems can b e solved independently , making this step read- ily distributable across computing resources. The decomp osition pro ceeds in t wo steps. First, we rewrite (9) as a set of subproblems with a simpler algebraic structure (for example, in volving lo w er-degree p olynomials). Second, w e further partition these subproblems b y the num b er of alw ays-reporters n A . The tw o- step reductions are useful because they decomp ose the original problem into in teger programming subproblems with only quadratic (second-order) polyno- mials, eliminating non-polynomial features such as flo or functions. F or details, see the discussion after Theorem 5.3. 5.3.1 Step 1: Decomp osing (9) in to subproblems with a simpler al- gebraic structure Let T n denote, generically , any one of the test statistics T 0 n , T 1 n , and T 2 n . Let L, U ∈ [0 , ∞ ] b e tw o (possibly infinit y-v alued) scalars such that L ≤ min A ∈ A ( D obs ,R ) T n ( Y , D obs , A ) ≤ max A ∈ A ( D obs ,R ) T n ( Y , D obs , A ) ≤ U. (36) 16 Relativ ely tigh t b ounds L and U are easy to obtain, either analytically or via n umerical pro cedures. See Algorithm 8, Algorithm 10 and Section C.4. Consider a partition of the interv al [ L, U ] with increasingly ordered endp oin ts { t i } I i =0 , where t 0 = L and t I = U . F or eac h subin terv al [ t i − 1 , t i ], consider the follo wing v alue asso ciated with each alwa ys-rep orter vector A : v ( A, t i − 1 , t i ) =    E D [ 1 {T n ( Y , D , A ) ≥ t i − 1 } ] , if T n ( Y , D obs , A ) ≤ t i , −∞ , otherwise, (37) where D ∼ CR( n, n 1 ). Also consider the optimization problem: v i = max A ∈ A ( D obs ,R ) v ( A, t i − 1 , t i ) . (38) Lemma 5.1 suggests that the collection of optimal v alues of (38) across differen t pairs of consecutive endp oin ts is informative ab out p worst . Lemma 5.1. Given endp oin ts { t i } I i =0 with t 0 = L and t I = U satisfying (36), let A i ∗ denote an optimal alwa ys-rep orter vector for problem (38) on the interv al [ t i − 1 , t i ]. W e hav e the following inequalities: p worst ≤ max i ∈ [ I ] v i , v i ≤ p worst + E D ∼ CR( n,n 1 )  1 {T n ( Y , D , A i ∗ ) ∈ [ t i − 1 , t i ) }  , ∀ i ∈ [ I ] . In particular, let i ∗ b e the optimal index of the problem max i ∈ [ I ] v i . W e hav e p worst ≤ max i ∈ [ I ] v i ≤ p worst + E D ∼ CR( n,n 1 ) h 1 {T n ( Y , D , A i ∗ ) ∈ [ t i ∗ − 1 , t ∗ i ) } i . (39) F rom a practical persp ective, this lemma has three implications. First, if max i ∈ [ I ] v i ≤ α , then it certifies that p worst ≤ α . Second, provided that the randomization distributions induced by the optimizing alw ays-reporter vectors are not ov erly concen trated on the interv al [ t i ∗ − 1 , t i ∗ ), the quantit y max i ∈ [ I ] v i should b e close to the true p worst . 8 Third, it transforms the task of comparing t wo ratios of p olynomials in to comparing a ratio of p olynomials with a scalar. This reduction simplifies the computation problem. After Step 2 below, we can clear the denominators (v ariance estimators) in the test statistics asso ciated with outcome balance b y multiplying them on b oth sides, yielding inequalities in whic h both sides are p olynomials of degree at most 2. W e note that the lemma and the accompan ying discussion remain v alid, after replacing p worst with p worst , mc , when the exp ectation (av erage) is taken ov er sim ulated assignments, as opp osed to the exact exp ectation under the complete randomization distribution. 8 Ideally , we would lik e to sho w that, as the maximum in terv al length tends to zero, the additional term on the righ t-hand side of (39) also tends to zero. If the underlying distribution were contin uous, this w ould follo w from an application of the dominated conv ergence theorem. In our setting, ho wev er, the randomization distributions are discrete with point masses, so such argumen ts are not directly applicable. W e are not aw are of a simple refinement that yields a useful b ound for this term. 17 5.3.2 Step 2: Split b y the n umber of alw ays-reporters T o make the discussion simpler, we in tro duce some additional notations. F or a given alwa ys-rep orter v ector A , we index the alwa ys-reporters b y a ∈ [ n A ] via a bijection π A : [ n A ] → [ n ]. Without loss of generalit y , w e label indices for b oth i and a so that the first r 0 = P n i =1 (1 − D obs i ) R i elemen ts correspond to alwa ys-rep orters assigned to the control group. W e write π A ( a ) = i when unit i is lab eled as the a -th alw a ys-rep orter. Our indexing con ven tion implies π A ( i ) = i for i ≤ r 0 and all A ∈ A ( D obs , R ). The original assignmen t v ariables D = ( D i ) i ∈ [ n ] and outcome v ariables Y = ( Y i ) i ∈ [ n ] indexed b y i then induce assignment v ariables and outcome v ariables indexed b y a , e D A a := D π A ( a ) , e Y a := Y π A ( a ) , a ∈ [ n A ] . W e also introduce matc hing v ariables { x A ai } a ∈ [ n A ] ,i ∈ [ n ] , where x A ai = 1 if π A ( i ) = a and x A ai = 0 otherwise. By our indexing conv ention, w e ha ve x A ai = 1 if i = a and i ≤ r 0 for ev ery alwa ys-rep orter table A ∈ A ( D obs , R ). With the matc hing v ariables, the outcome for the a th alwa ys-rep orter can b e expressed as Y a = P n i =1 x A ai Y i . Eac h alwa ys-rep orter table A induces the assignment v ariables ˜ D A =  ˜ D A a  a ∈ [ n A ] and matching v ariables { x A ai } a ∈ [ n A ] ,i ∈ [ n ] . It is imp ortan t to note that due to the symmetry of complete randomization, ˜ D A has the same distribution for all alwa ys-rep orter tables with the same num b er of alwa ys-rep orters. Moti- v ated by this fact, we write ˜ D A as ˜ D n A . W e denote the distribution of ˜ D n A as L ( n, n 1 , n A ). W e now note that T 0 n , T 1 n , and T 2 n dep end on ( Y , D , A ) only through Y , the induced assignmen t v ector ˜ D A , and the matc hing v ariables x A = { x A ai } a,i . Define n A, 1 = n A X a =1 D a and n A, 0 = n A X a =1 (1 − D a ) . F or example, T 0 n in (17) can b e written as T 0 n ( Y , D , A ) = e T 0 n ( Y , D A , x A ) = b µ 2 n ( Y , D A , x A ) b σ 2 , hj n ( Y , D A , x A ) , (40) where b µ n ( Y , D A , x A ) = b µ 1 n ( Y , D A , x A ) − b µ 0 n ( Y , D A , x A ) , with b µ 1 n ( Y , D A , x A ) = 1 n A, 1 X a,i D a x ai Y i , b µ 0 n ( Y , D A , x A ) = 1 n A, 0 X a,i (1 − D a ) x ai Y i , and b σ 2 , hj n  Y , D A , x A  = b v 1 n ( Y , D A , x A ) + b v 0 n ( Y , D A , x A ) , 18 where b v 1 n ( Y , D A , x A ) = 1 n 2 A, 1 X a,i D a x ai Y 2 i − 1 n A, 1  b µ 1 n ( Y , D A , x A )  2 , b v 0 n ( Y , D A , x A ) = 1 n 2 A, 0 X a,i (1 − D a ) x ai Y 2 i − 1 n A, 0  b µ 0 n ( Y , D A , x A )  2 . Define the space of matching v ariables asso ciated with all alwa ys-rep orter tables with k alw a ys-rep orters as x ( k , D obs , R ) =                { x ai } a ∈ [ k ] ,i ∈ [ n ] : x ai ∈ { 0 , 1 } ∀ a ∈ [ k ] , i ∈ [ n ] , x ai = 1 , ∀ a = i, a ≤ r 0 P i x ai = 1 , ∀ a ∈ [ k ] , P a x ai ≤ 1 , ∀ i ∈ [ n ] , x ai ∈ { 0 , 1 } , ∀ a ∈ [ k ] , D obs i = 1 , R i = 1 , x ai = 0 , ∀ a ∈ [ k ] , D obs i = 1 , R i = 0                . (41) W e note that the test statistics T 1 n and T 2 n can b e re-expressed, in a similar fashion, as e T 1 n = e T 0 n + g 1 ( n A, 1 ) , and, e T 2 n = e T 0 n + g 2 ( n A, 1 ) . (42) The functions g 1 and g 2 are defined in (96) and (97) in App endix A.2. W e omit their explicit forms for brevity . The only p oin t we use is that, for a fixed alw ays-reporter size n A , b oth g 1 and g 2 dep end on the data only through n A, 1 . Lemma 5.2 summarizes the discussion ab o ve. Lemma 5.2. Let e T n denote one of the test statistics e T 0 n , e T 1 n , or e T 2 n defined in (40) and (42). Giv en an alwa ys-rep orter table A = ( A i ) n i =1 , observ ed outcomes Y , and observed assignmen ts D obs , recall the p-v alue p ( A ) defined in (9) and the quan tity v ( A, t i − 1 , t i ) defined in (37) with endp oin ts t i − 1 and t i . Define the p-v alue associated with the matching v ariables x = { x ai } a,i b y e p ( x ) = E D n A h 1 n e T n  Y , e D n A , x  ≥ e T n  Y , e D n A obs , x oi (43) and, with endp oin ts t i − 1 and t i , define e v ( x, t i − 1 , t i ) =    E e D n A h 1 n e T n  Y , e D n A , x  ≥ t i − 1 oi , if e T n  Y , e D n A obs , x  ≤ t i , −∞ , otherwise, (44) where e D n A ∼ L ( n, n 1 , n A ) and e D n A obs is the assignment vector indexed by a that is induced by the observ ed assignmen ts. Let x A = { x A ai } a,i denote the matching v ariables induced by A . Then e p  x A  = p ( A ) and e v  x A , t i − 1 , t i  = v ( A, t i − 1 , t i ) . 19 Lemma 5.1 and Lemma 5.2 imply the following theorem. Theorem 5.3. Let L, U ∈ [0 , ∞ ] b e t wo scalars satisfying (36), and consider a partition of the interv al [ L, U ] with increas ingly ordered endp oints { t i } I i =0 , where t 0 = L and t I = U . Recall the definition of v i from (38). W e ha ve, max i ∈ [ I ] v i = max k ∈ [ n ] max i ∈ [ I ] max x ∈ x ( k,D obs ,R ) v ( x, t i − 1 , t i ) ≥ p worst . The theorem implies that the optimization problem on the left-hand side can b e decomp osed in to smaller subproblems indexed b y k , the n umber of alw ays- rep orters, and b y i , the interv al index. In addition, it conv erts the optimization ov er alwa ys-reporter indicators A = { A i } n i =1 in to an optimization ov er the matching v ariables { x ai } a,i . Im- p ortan tly , the comp onen t of the test statistics T 1 n and T 2 n that assesses bal- ance in the alwa ys-rep orter indicators do es not dep end on { x ai } a,i . T o il- lustrate this p oint, consider the following IP formulation of the subproblem max x ∈ x ( k,D obs ,R ) v ( x, t i − 1 , t i ) for some k and i . Let { ˜ D k s } n mc s =1 b e n mc Mon te Carlo draws of the assignmen t v ector (indexed b y a ) from L ( n, n 1 , k ). The subproblem can b e written as max { x ai } a,i , { I s } n mc s =1 n mc X s =1 I s , (45) sub ject to b µ 2 n  Y , D k s , x  +  g i ( n 1 A,s ) − t i − 1  b σ 2 , hj n  Y , D k s , x  ≥ L s (1 − I s ) , ∀ s ∈ [ n mc ] , b µ 2 n  Y , D k obs , x  +  g i ( n 1 A, obs ) − t i  b σ 2 , hj n  Y , D k obs , x  ≤ 0 , x ∈ x ( k , D obs , R ) , I s ∈ { 0 , 1 } , ∀ s ∈ [ n mc ] , where (i) the functions g i ( · ), i ∈ { 0 , 1 , 2 } , correspond to differen t test statistics T 0 n , T 1 n and T 2 n , and are defined in (96) and (97) and; (ii) each L s is an y constan t satisfying L s ≤ min x ∈ x ( k,D obs ,R ) n b µ 2 n  Y , D A s , x  +  g i ( n 1 A,s ) − t i − 1  b σ 2 , hj n  Y , D A s , x  o , and the collection { L s } n mc s =1 can b e obtained either analytically or computation- ally; (iii) the indicator I s m ust b e set to zero whenever b µ 2 n  Y , D A s , x  b σ 2 , hj n ( Y , D A s , x ) + g i ( n 1 A,s ) < t i − 1 , 9 (46) and is otherwise unconstrained. 9 W e implicitly assume b σ 2 , hj n  Y , D A s , x  > 0 for all x ∈ x ( k , D obs , R ) and all s ∈ [ n mc ]. This condition is often satisfied empirically and can b e verified quic kly . 20 W e note that g i ( n 1 A,s ) is independent of the decision v ariables and can there- fore be treated as a known scalar for each s . Moreov er, both b µ 2 n ( Y , D A s , x ) and b σ 2 , hj n ( Y , D A s , x ) are quadratic functions of the matching v ariables x . By con- trast, obtaining a formulation with comparable structure in the original decision- v ariable space A = ( A i ) n i =1 app ears challenging. W e conclude by noting that the reduction in Section 5.3.2, together with a technique analogous to that in Section 5.3.1, can b e combined with a bi- section search to solve Algorithm 3, which relies on critical v alues from the asymptotic distribution for inference. In practice, w e find the resulting compu- tations to b e highly efficient. This approac h is useful because it provides a fast heuristic that delivers a reasonable low er b ound for the w orst-case Monte Carlo p -v alue p worst,mc . The resulting solution can b e used to warm-start the integer- programming metho d to acce lerate computation, or to certify non-rejection at the c hosen significance level. Readers can find a complete pseudo algorithm incorp orating the discussions ab o v e in Algorithm 11. 6 Sim ulation Results This section ev aluates the finite-sample p erformance and computational cost of the proposed worst-case randomization tests. W e rep ort (i) empirical rejection rates under the sharp-null scenario (test size), (ii) empirical rejection rates un- der an alternative with a p ositiv e AR-A TE (pow er), and (iii) runtime of the implemen tation. Eac h sim ulated dataset contains n = 100 units, with complete randomization assigning n 1 = 50 to treatmen t and n 0 = 50 to con trol. Units belong to one of three principal rep orting strata under Assumption 1. W e set the p opulation shares to π AR = 0 . 9 , π IR = 0 . 05 , π NR = 0 . 05 . W e generate p oten tial outcomes for all units according to the following mo del: y i (0) ∼ N(0 , 1) , y i (1) = y i (0) + τ , where N(0 , 1) denote a standard normal random v ariable and τ is the treatem- ten t effect. W e consider t wo scenarios. τ = 0 and τ = 1. W e implemen t the worst-case randomization test in Algorithm 1 using the follo wing the one-sided chi-square statistics T 2 n in (20). W e c ho ose the pruning step at β = 0 . 05 and the nominal size at α = 0 . 05. The num b er of simulated randomization dra ws is 1000. All computations w ere p erformed on a Lin ux compute no de equipp ed with an Intel Xeon Platin um 8268 CPU (2.90 GHz), pro viding 8 logical cores. In the n ull scenario (A TE =0), the rejection rate is 0.002. In most simula- tions, the algorithm terminates at the heuristic stage once a rep orting configura- tion yields a randomization p-v alue ab ov e the rejection threshold, resulting in a median run time of 35.26 seconds and a 90th percentile runtime of 72.11 seconds. 21 T able 2: Rejection Rates and Runtime for the W orst-Case T est Using T 2 n A TE Rej. Rate Median Time (s) 90th Pct. Time (s) Max Time (s) 0 0.002 35.26 72.11 37956 1 0.9218 308 7080 75068 Notes: Sim ulations use n = 100 with n 1 = 50 under complete randomization and principal strata shares ( π AR , π IR , π NR ) = (0 . 9 , 0 . 05 , 0 . 05). Each design is ev aluated using 1 , 000 Mon te Carlo repetitions and 1 , 000 random treatment assignments p er repetition. The nominal test level is α = 0 . 05, and the pretest level is set to β = 0 . 005. Rejection rates are computed using the worst-case randomization test based on the statistic T 2 n . The rep orted runtimes correspond to the median, 90th p ercen tile, and maximum wall-clock time p er dataset. There are rare cases where the n ull is rejected and the algorithm must exhaus- tiv ely verify all admissible configurations; in tw o such instances this required roughly 10 hours. In the alternative scenario (A TE = 1), the rejection rate is 0.9218. This scenario sees a median runtime of 308 seconds and a 90th p ercen tile runtime of 7,080 seconds. Relativ e to the null case, a larger share of simulations require additional v erification beyond the heuristic stage, resulting in longer run times and a substan tially higher maximum computation time of appro ximately 75,000 seconds. In general, we find that when the low er b ound of the test statistic pro duced b y the heuristic stage lies in the range of approximately 6–9, the algorithm ma y require substan tial computation time to v erify the rejection decision. By con trast, when the null hypothesis is not rejected or when the h yp othesis is rejected with a sufficien tly large test statistic (e.g., greater than or equal to 9), the algorithm typically terminates quic kly . 22 A Pro ofs This section contains the proof of Theorem 4.2, which is included in Section A.6. The pro of requires m ultiple preliminary steps: 1. A study of asymptotic distributions of T 0 n , T 1 n and T 2 n in Section A.2 and Section A.3. Th k ey theorem is Theorem A.12. 2. A study of randomization-based critical v alue in Section A.4. The k ey theorem is Theorem A.25. Auxiliary lemmas are included in Section A.5. The k ey technical challenge is handling the random v ariable n 1 A = P n i =1 D i A i , the n um b er of treated alwa ys-rep orters. A CL T for n 1 A in the uniform sense do es not hold under Assumption 2; for instance, the sample ma y consist en tirely of alw ays-reporters. This rules out applying a com binatorial CL T to justify asymp- totic v alidity as in [30, 1]. Instead, we pursue a careful conditional analysis and in vok e recent Berry–Esseen b ounds for the com binatorial CL T developed by [27]. A.1 Notations Throughout the app endix, w e use Z ∼ N(0 , 1) to denote a standard normal v ariables with mean zero and unit v ariance. W e write CR( n, n 1 ) , n ≥ n 1 for the distribution of a random vector D = ( D i ) n i =1 ∈ { 0 , 1 } n that selects exactly n 1 of the n units to ha ve D i = 1, with the remaining n 0 = n − n 1 units ha ving D i = 0, uniformly o v er all suc h assignmen ts. Unless stated otherwise, all probabilit y measures P n are tak en under the complete randomization distribution of the assignmen t vector D = ( D i ) n i =1 ∼ CR( n, n 1 ), with n and n 1 understo od from context. Quan tities such as the num b er of alwa ys-rep orters n A and the n umber of treated units n 1 are inherently dep enden t on n . Unless noted otherwise, w e suppress their notational dep endence on n . A.2 Asymptotic Distributions of T 0 n , T 1 n and T 2 n Section A.2.1 states a Berry-Esseen bound from [27] adapted to our setting. Section A.2.2 states a probabilistic b ound for the num b er of alwa ys-rep orters in the treated and control group. Section A.2.3 studies the conditional properties of the Ha jek estimator, conditioning on the num b er of alwa ys-rep orters in the treated group. A.2.1 Berry-Esseen Bound for the Combinatorial Central Limit The- orem Theorem A.1 giv es a Berry–Esseen b ound for a t wo-arm completely randomized design, adapted from [27]. 23 Theorem A.1. Given potential outcomes { ( w i (1) , w i (0)) } n i =1 and a completely randomized design CR( n, n 1 ), consider the difference-in-means estimator: b τ n = 1 n 1 n X i =1 D i w i (1) − 1 n 0 n X i =1 (1 − D i ) w i (0) , for τ n = n − 1 P n i =1 ( w i (1) − w i (0)). Denote the v ariance of b τ n as V n = V n ( b τ n ) = 1 n 1 v 1 + 1 n 0 v 0 − 1 n v 01 , where, v 1 = 1 n − 1 n X i =1 ( w i (1) − w (1)) 2 , v 0 = 1 n − 1 n X i =1 ( w i (0) − w (0)) 2 , v 01 = 1 n − 1 n X i =1 ( w i (1) − w i (0) − ( w (1) − w (0))) 2 , with w (1) = n − 1 P n i =1 w i (1) and w (0) = n − 1 P n i =1 w i (0). Supp ose w e ha ve V n ( b τ n ) ≥ c − 2  1 n 1 v 1 + 1 n 0 v 0  for some c ≥ 1. Then there exists a universal constant C , which may depend on c , such that sup t ∈ R    P  V − 1 2 n ( b τ n − τ n ) ≤ t  − P ( Z ≤ t )    ≤ C max a ∈{ 0 , 1 } max i ∈ [ n ] | w i ( a ) − w ( a ) | √ n a v a Pr o of. The result follows from Theorem 1-(ii) in [27], with F = (1 , − 1), b = 1. It should b e noted if we consider − b τ n instead of b τ n , the same inequality holds, i.e., sup t ∈ R    P  V − 1 2 n ( b τ n − τ n ) ≥ t  − P ( Z ≥ t )    ≤ C max a ∈{ 0 , 1 } max i ∈ [ n ] | w i ( a ) − w ( a ) | √ n a v a A.2.2 Probabilistic Bounds on Alw ays-Reporter Coun ts in T reat- men t and Con trol Groups Giv en a v ector of alwa ys-rep orter indicators A = ( A i ) n i =1 , recall the definition of b τ a n from (11): b τ a n ( D , A ) = 1 n 1 n X i =1 D i A i − 1 n 0 n X i =1 (1 − D i ) A i . 24 Supp ose D ∼ CR( n, n 1 ). The v ariance of b τ a n can be sho wn to be V n ( A ) = n 2 n 1 n 0 ( n − 1)   1 n n X i =1 A i − 1 n n X i =1 A i ! 2   . (47) Lemma A.2 provides an upper-b ound on the v ariance term. Lemma A.2. max { A i } n i =1 ⊂{ 0 , 1 } n 1 n n X i =1 A i − 1 n n X i =1 A i ! 2 ≤ 1 4 Pr o of. The maximization problem reduces to maximizing a − a 2 o ver a ∈ [0 , 1], where a = 1 n P n i =1 A i . The maxim um equals 1 / 4 and is attained at a = 1 / 2. Lemma A.3 states that, under mild regularit y conditions, the num b ers of alw ays-reporters in the treated and control groups are non-negligible with prob- abilit y tending to 1. Lemma A.3. Consider a v ector of alw a ys–rep orter indicators A = ( A i ) n i =1 , and a completely randomized designs D = ( D i ) n i =1 ∼ CR( n, n 1 ). Let n 0 = n − n 1 and n A = P n i =1 A i . Supp ose the following conditions hold: (i) n 1 /n ∈ [ r, 1 − r ] with some r ∈ (0 , 1 2 ]; (ii) n A ≥ sn for some s ∈ (0 , 1]. W e ha ve P n n X i =1 D i A i ≤ n 1 n A 2 n ! ≤ 1 − r r s 2 ( n − 1) , and, P n n X i =1 (1 − D i ) A i ≤ n 0 n A 2 n ! ≤ 1 − r r s 2 ( n − 1) . Pr o of. W e hav e the following calculations P n  P n i =1 D i A i n 1 − P n i =1 (1 − D i ) A i n 0 ≤ − n A 2 n 0  = P n  n P n i =1 D i A i n 1 n 0 − n A n 0 ≤ − n A 2 n 0  = P n  n P n i =1 D i A i n 1 n 0 ≤ n A 2 n 0  = P n n X i =1 D i A i ≤ n A n 1 2 n ! . 25 W e then hav e P n  P n i =1 D i A i n 1 − P n i =1 (1 − D i ) A i n 0 ≤ − n A 2 n 0  ≤ P n      P n i =1 D i A i n 1 − P n i =1 (1 − D i ) A i n 0     ≥ n A 2 n 0  ≤ V n ( A n ) n A 2 (2 n 0 ) − 2 (48) ≤ n 2 4 n 1 n 0 ( n − 1) 1 n A 2 (2 n 0 ) − 2 = n 2 n 0 ( n − 1) n 1 n 2 A ≤ (1 − r ) n 3 2 r s 2 ( n − 1) n 3 (49) ≤ 1 − r r s 2 ( n − 1) , where V n ( A n ) is defined in (47), (48) is by the Chebyshev inequality , and (49) follows from Lemma A.2 and premises (i) and (ii). The statemen t for P n i =1 (1 − D i ) A i can be pro ved analogously A.2.3 Conditional Properties of the Ha jek Estimator The following lemma sa ys that under a completely randomized assignment of all units, the assignments for the alwa ys-reporters, conditional on exactly k of them being treated, are themselves completely randomized as well. Lemma A.4. Given a vector of alw ays–reporter binary indicators A = ( A i ) n i =1 , denote A = { i ∈ [ n ] : A i = 1 } and n A = P n i =1 A i . Supp ose ( D i ) n i =1 ∼ CR( n, n 1 ), then nonnegative integer k ≤ n A , ( D i ) i ∈A     n X i =1 D i A i = k ∼ CR( n A , k ) . Pr o of. First, consider P n ( P n i =1 D i A i = k ). W e ha ve P n n X i =1 D i A i = k ! = P n n X i =1 D i A i = k , n X i =1 D i (1 − A i ) = n 1 − k ! =  n A k  n − n A n 1 − k   n n 1  , where the first equalit y follows from the fact that ( D i ) n i =1 ∼ CR( n, n 1 ), and the second equalit y follo ws b y inspection. Next, consider P n (( D i ) i ∈A , P n i =1 D i A i = k ). W e ha ve P n ( D i ) i ∈A , n X i =1 D i A i = k ! = P n ( D i ) i ∈A , n X i =1 D i A i = k , n X i =1 D i (1 − A i ) = n 1 − k ! =  n − n A n 1 − k   n n 1  , 26 where the first equalit y follows from the fact that ( D i ) n i =1 ∼ CR( n, n 1 ), and the second equalit y follo ws b y inspection. The desired result follo ws immediately . W e note that the Ha jek estimator conditioning on the even t that P n i =1 D i A i = k , with 1 ≤ k ≤ n A − 1, can b e expressed as: b τ hj n,k ( Y , D , A ) = 1 k n X i =1 D i A i Y i − 1 n A − k n X i =1 (1 − D i ) A i Y i . (50) Lemma A.5 collects the conditional mean and v ariance c haracterizations of the Ha jek estimator. They immediately follow from Lemma A.4 and a standard calculation [12]. Lemma A.5 (Conditional Mean and V ariance) . Supp ose we are given a v ector of alw ays-reporter indicators A = ( A i ) n i =1 , outcomes { ( y i (1) , y i (0)) } n i =1 , and a completely randomized design D = ( D i ) n i =1 ∼ CR( n, n 1 ). Consider the estima- tors b τ hj n defined in (14) and b τ hj n,k defined in (50). Define A = { i ∈ [ n ] : A i = 1 } and n A = P n i =1 A i . Supp ose n A ≥ 1 and 1 ≤ k ≤ n A − 1. W e hav e: (i) E h b τ hj n ( Y , D , A )    P n i =1 D i A i = k i = E h b τ hj n,k ( Y , D , A )    P n i =1 D i A i = k i (ii) V n  b τ hj n    P n i =1 D i A i = k  = k − 1 v 1 A + ( n A − k ) − 1 v 0 A − n − 1 A v 01 A , where v 1 A = 1 n A − 1 X i ∈A ( y i (1) − y A (1)) 2 , v 0 A = 1 n A − 1 X i ∈A ( y i (0) − y A (0)) 2 , (51) v 01 A = 1 n A − 1 X i ∈A ( y i (1) − y i (0) − ( y A (1) − y A (0))) 2 , (52) with y A (1) = n − 1 A P i ∈A y i (1) and y A (0) = n − 1 A P i ∈A y i (0). If n A = 1, all v ariances and co v ariances are defined to be zero. W e shall write the conditional v ariance of the Ha jek estimator as V n,k  b τ hj n  = V n b τ hj n    n X i =1 D i A i = k ! , (53) viewing it as a function of k . In the remaining section, w e let δ , s, r, B b e the constan ts fixed in the statement of Theorem 4.2, and use the shorthand notation Θ w n = Θ w n ( δ, s, r, B ). W e t ypically denote an elemen t in Θ w n b y θ w n . Lemma A.6 collects implications of the assumptions on Θ w n . Lemma A.6. F or all n ∈ N and each θ w n ∈ Θ w n , let { ( y i (1) , y i (0)) } n i =1 b e the asso ciated potential outcomes and { A i } n i =1 b e the asso ciated alwa ys-rep orter indicators. Denote A = { i ∈ [ n ] : A i = 1 } and n A = P n i =1 A i . Define v 1 A , v 0 A and v 01 A as in (51) and (52), and y A ( a ) = n − 1 A P i ∈A y i ( a ) for a ∈ { 0 , 1 } . Let k b e an in teger suc h that k ∈ [1 , n A − 1] and n A ≥ 2. 27 (i) The follo wing inequalit y for the conditional v ariance holds: V n,k  b τ hj n  ≥ (1 − δ ) n A  n A − k k v 1 A + k n A − k v 0 A  . (54) (ii) W e ha ve the follo wing inequalit y: for a ∈ { 0 , 1 } , max i ∈A | y i ( a ) − y A ( a ) | p v a A ≤ B n 1 / 4 A , (55) where B is the constan t defined in Assumption2-(iii). Pr o of. First note, v 01 A = 1 n A − 1 X i ∈A ( y i (1) − y i (0) − ( y A (1) − y A (0))) 2 = v 1 A + v 0 A − 2 n A − 1 X i ∈A ( y i (1) − y A (1)) ( y i (0) − y A (0)) ≤ v 1 A + v 0 A + 2 δ q v 1 A v 0 A ≤ v 1 A + v 0 A + δ ( n A − k ) k v 1 A + δ k n A − k v 0 A , where the first inequality is b y Assumption 2-(ii) and the second inequalit y is b y the inequalit y 2 ab ≤ a 2 + b 2 . (i) follows by the calculation, 1 k v 1 A + 1 n A − k v 0 A − 1 n A v 01 A = 1 k v 1 A + 1 n A − k v 0 A − 1 n A  v 1 A + v 0 A + δ ( n A − k ) k v 1 A + δ k n A − k v 0 A  = 1 − δ n A  n A − k k v 1 A + k n A − k v 0 A  . (ii) follo ws b y the calculation, for eac h a ∈ { 0 , 1 } , max i ∈A | y i ( a ) − y A ( a ) | = max i ∈A  | y i ( a ) − y A ( a ) | 4  1 / 4 ≤ ( n A ) 1 / 4 1 n A X i ∈A ( y i ( a ) − y A ( a )) 4 ! 1 / 4 ≤ B n 1 / 4 A p v a A , where the last inequality follows by Assumption 2-(iii). v a A are positive b y Assumption 2-(ii) for a ∈ { 0 , 1 } and hence we can divide on b oth side. Lemma A.7. F or all n ∈ N and each θ w n ∈ Θ w n , define the vector of alw ays- rep orter indicators A = ( A i ) n i =1 , n A = P n i =1 A i and n 1 A = P n i =1 D i A i , where D = { D i } n i =1 ∼ CR( n, n 1 ). Define τ n = n − 1 A P n i =1 A i ( y i (1) − y i (0)). 28 Let g : R → R b e an arbitrary function. Let Z ∼ N(0 , 1) b e a standard normal random v ariable indep enden t of D . Define b τ hj n as in (14) and V n,k  b τ hj n  as in (53). 10 F or n A ≥ 2, we ha ve the inequality , for every nonnegative real num b ers t and x ,       P θ w n   t  b τ hj n − τ n  2 V n,n 1 A  b τ hj n  + g  n 1 A  ≥ x   − P θ w n  tZ 2 + g  n 1 A  ≥ x        ≤ 2 (1 − r ) r s 2 ( n − 1) + r 2 r C 61 ( δ, r ) max a ∈{ 0 , 1 } max i ∈A | y i ( a ) − y A ( a ) | p n A v a A ≤ 2 (1 − r ) r s 2 ( n − 1) + r 2 r C 61 ( δ, r ) × B n − 1 4 A , where B is the constant defined in Assumption 2-(iii) , C 61 ( δ, r ) is a constant that dep ends on δ and r defined in Assumption 2-(ii) and Assumption 3, re- sp ectiv ely , and s is the constant defined in Assumption2-(i). Pr o of. W e suppress the dep endence on θ w n for simplicit y . If t = 0, the inequalit y trivially holds. W e shall assume t > 0. W e hav e the identit y E   1    t  b τ hj n − τ n  2 V n,n 1 A  b τ hj n  + g  n 1 A  ≥ x      − E  1  tZ 2 + g  n 1 A  ≥ x  =E   E   1    t  b τ hj n − τ n  2 V n,n 1 A  b τ hj n  + g  n 1 A  ≥ x         n 1 A   − E " 1  tZ 2 + g  n 1 A  ≥ x       n 1 A #   =E   P   t  b τ hj n − τ n  2 V n,n 1 A  b τ hj n  + g  n 1 A  ≥ x      n 1 A   − P tZ 2 + g  n 1 A  ≥ x      n 1 A !   =E   P    b τ hj n − τ n  2 V n,n 1 A  b τ hj n  ≥ x − g  n 1 A  t      n 1 A   − P Z 2 ≥ x − g  n 1 A  t      n 1 A !   Recall that n 1 A = P n i =1 D i A i and n 0 A = n A − n 1 A = P n i =1 (1 − D i ) A i . Define ev ents E = n n 1 A ≤ n A n 1 2 n o ∪ n n 0 A ≤ n A n 0 2 n o , (56) E c = n n 1 A > n A n 1 2 n o ∩ n n 0 A > n A n 0 2 n o . (57) 10 When k = 1 or k = n A , we shall abuse the notation and define V n,k  b τ hj n  = ϵ for some arbitrary ϵ > 0. These ev ents happen with probability approaching 0, as we shown in the proof. 29 On the even t E c and b y Assumption 3, w e hav e, min  n 1 A , n 0 A  ≥ r n A 2 , and, 1 n A min  n 1 A , n 0 A  ≥ r 2 . (58) Hence on the ev ent E c w e hav e V n,n 1 A  b τ hj n  ≥ min  (1 − δ ) n 1 A n A , (1 − δ ) n 0 A n A   1 n 1 A v 1 A + 1 n 0 A v 0 A  ≥ (1 − δ ) r 2  1 n 1 A v 1 A + 1 n 0 A v 0 A  . F or a fixed n 1 A , the premise of Theorem A.1 is satisfied with c − 2 = (1 − δ ) r / 2. Conditioning on the ev ent E c and for a fixed n 1 A , w e ha ve the calculation       P    b τ hj n − τ n  2 V n,n 1 A  b τ hj n  ≥ x − g  n 1 A  t      n 1 A   − P Z 2 ≥ x − g  n 1 A  t      n 1 A !       ≤ sup q ∈ R       P    b τ hj n − τ n  2 V n,n 1 A  b τ hj n  ≥ q      n 1 A   − P  Z 2 ≥ q        = sup q ∈ R +      P V − 1 2 n,n 1 A  b τ hj n    b τ hj n − τ n   ≥ √ q      n 1 A ! − P ( | Z | ≥ √ q )      ≤ sup t ∈ R      P V − 1 2 n,n 1 A  b τ hj n   b τ hj n − τ n  ≤ t      n 1 A ! − P ( Z ≤ t )      + sup t ∈ R      P V − 1 2 n,n 1 A  b τ hj n   b τ hj n − τ n  ≥ t      n 1 A ! − P ( Z ≥ t )      ≤ C 61 ( δ, r ) max a ∈{ 0 , 1 } max i ∈A | y i ( a ) − y A ( a ) | p n a A v a A , (59) b y Theorem A.1 and Lemma A.6-(i), where the constan t C ma y dep end on δ and r . 30 W e ha ve the follo wing calculation:       E   1    t  b τ hj n − τ n  2 V n,n 1 A  b τ hj n  + g  n 1 A  ≥ x      − E  1  tZ 2 + g  n 1 A  ≥ x        ≤ P ( E ) + P ( E c ) × E   sup t ∈ R       P    b τ hj n − τ n  2 V n,n 1 A  b τ hj n  ≥ t      n 1 A   − P Z 2 ≥ t      n 1 A !             E c   ≤ P ( E ) + C 61 ( δ, r )E " max a ∈{ 0 , 1 } max i ∈A | y i ( a ) − y A ( a ) | p n a A v a A      E c # × P ( E c ) (60) ≤ 2 (1 − r ) r s 2 ( n − 1) + r 2 r C 61 ( δ, r ) max a ∈{ 0 , 1 } max i ∈A | y i ( a ) − y A ( a ) | p n A v a A , (61) ≤ 2 (1 − r ) r s 2 ( n − 1) + r 2 r C 61 ( δ, r ) × B n − 1 4 A , (62) where (60) follows from (59), (61) follows from Lemma A.3 and (58), and (62) follo ws from Lemma A.6-(ii). A.3 Consistency of the V ariance Estimator and Asymp- totic Distributions of T 0 n , T 1 n and T 2 n Recall the v ariance estimator defined in (15): b V hj n ( Y , D , A ) = P n i =1 D i A i  Y i − b τ 1 A  2 ( P n i =1 D i A i ) 2 + P n i =1 (1 − D i ) A i  Y i − b τ 0 A  2 ( P n i =1 (1 − D i ) A i ) 2 , where b τ 1 A = P n i =1 D i A i Y i / P n i =1 D i A i and b τ 0 A = P n i =1 (1 − D i ) A i Y i / P n i =1 (1 − D i ) A i . Conditioning on the the even t that P n i =1 D i A i = k , the v ariance estimator can b e expressed as: b V hj n,k ( Y , D , A ) = P n i =1 D i A i  Y i − b τ 1 A,k  2 k 2 + P n i =1 (1 − D i ) A i  Y i − b τ 0 A,k  2 ( n A − k ) 2 where, b τ 1 A,k = P n i =1 D i A i Y i /k , b τ 0 A,k = P n i =1 (1 − D i ) A i Y i / ( n A − k ). F or sim- plicit y , we denote, b v 1 A,k = P n i =1 D i A i  Y i − b τ 1 A,k  2 k , b v 0 A,k = P n i =1 (1 − D i ) A i  Y i − b τ 0 A,k  2 n A − k , (63) and the conditional v ariance estimator can b e written as: b V hj n,k ( Y , D , A ) = 1 k b v 1 A,k + 1 n A − k b v 0 A,k . (64) 31 Define the target v ariance, e V n,k  b τ hj n  = 1 k v 1 A + 1 n A − k v 0 A , (65) with v 1 A and v 0 A defined in (51). Lemma A.8 pro vides a tail inequalit y for the v ariance estimator. Lemma A.8. F or all n ∈ N and any θ w n ∈ Θ w n , let { ( y i (1) , y i (0)) } n i =1 b e the asso ciated potential outcomes and { A i } n i =1 b e the asso ciated alwa ys-rep orter indicators. Denote A = { i ∈ [ n ] : A i = 1 } and n A = P n i =1 A i . Define n 1 A = P n i =1 D i A i . Consider the conditional v ariance estimator b V hj n,k in (64), the v ariance target e V n,k (65). Supp ose n A ≥ 2. Let k b e a positive in teger with k ∈ [1 , n A − 1]. F or ev ery ϵ ∈ (0 , 2], P θ w n         e V n,n 1 A b V hj n,n 1 A − 1       ≥ ϵ       n 1 A = k   ≤ C 66 ( ϵ )  n A − k k + k n A − k  1 n A max a ∈{ 0 , 1 } max i ∈A ( y i ( a ) − y A ( a )) 2 v a A + 8(2 + ϵ ) ϵ n − 1 A + 2(2 + ϵ ) ϵ n − 1 A  n A − k k + k n A − k  , ≤  C 66 ( ϵ ) B 2 + 2(2 + ϵ ) ϵ  n − 1 2 A  n A − k k + k n A − k  + 8(2 + ϵ ) ϵ n − 1 A , where, C 66 ( ϵ ) =  8(2 + ϵ ) ϵ  2 +  2 (2 − ϵ ) ϵ  2 . (66) with the constant B defined in Assumption 2-(iii). Pr o of. Recall that b y Lemma A.4, conditioning on the even t n 1 A = k , ( D i ) i ∈A ∼ CR( n A , k ). W e suppress the dep endence on θ w n and the conditioning ev ent for simplicit y . By Assumption (ii), e V n,k is positive. On the even t where b V hj n,k = 0, w e interpret e V n,k / b V hj n,k = ∞ . 32 W e ha ve the inequalit y: P   e V n,k b V hj n,k ≥ 1 + ϵ   = P   e V n,k − b V hj n,k b V hj n,k ≥ ϵ   = P k − 1 v 1 A + ( n A − k ) − 1 v 0 A − k − 1 b v 1 A,k − ( n A − k ) − 1 b v 0 A,k k − 1 b v 1 A,k + ( n A − k ) − 1 b v 0 A,k ≥ ϵ ! ≤ P   k − 1  v 1 A − b v 1 A,k  k − 1 b v 1 A,k + ( n A − k ) − 1 b v 0 A,k ≥ ϵ 2   + P   ( n A − k ) − 1  v 0 A − b v 0 A,k  k − 1 b v 1 A,k + ( n A − k ) − 1 b v 0 A,k ≥ ϵ 2   ≤ P    v 1 A − b v 1 A,k  b v 1 A,k ≥ ϵ 2   + P    v 0 A − b v 0 A,k  b v 0 A,k ≥ ϵ 2   , (67) where we use the fact that b v 1 A,k and b v 0 A,k are nonnegative in (67). Similarly , w e ha ve, P   e V n,k b V hj n,k ≤ 1 − ϵ   = P   e V n,k − b V hj n,k b V hj n,k ≤ − ϵ   = P k − 1 v 1 A + ( n A − k ) − 1 v 0 A − k − 1 b v 1 A,k − ( n A − k ) − 1 b v 0 A,k k − 1 b v 1 A,k + ( n A − k ) − 1 b v 0 A,k ≤ − ϵ ! ≤ P   k − 1  v 1 A − b v 1 A,k  k − 1 b v 1 A,k + ( n A − k ) − 1 b v 0 A,k ≤ − ϵ 2   + P   ( n A − k ) − 1  v 0 A − b v 0 A,k  k − 1 b v 1 A,k + ( n A − k ) − 1 b v 0 A,k ≤ − ϵ 2   ≤ P    v 1 A − b v 1 A,k  b v 1 A,k ≤ − ϵ 2   + P    v 0 A − b v 0 A,k  b v 0 A,k ≤ − ϵ 2   , (68) where we note that e V n,k / b V hj n,k ≤ 1 − ϵ implicitly implies that b V hj n,k > 0. Com- bining (67) and (68) we hav e P         e V n,k b V hj n,k − 1       ≥ ϵ   = P   e V n,k b V hj n,k ≥ 1 + ϵ   + P   e V n,k b V hj n,k ≤ 1 − ϵ   ≤ P      v 1 A − b v 1 A,k b v 1 A,k      ≥ ϵ 2 ! + P      v 0 A − b v 0 A,k b v 0 A,k      ≥ ϵ 2 ! (69) Our result follo ws from b ounding the t wo terms in (69). W e b ound the first term 33 in (69). The bound for the second term follo ws from an analogous argumen t. P      v 1 A − b v 1 A,k    b v 1 A,k ≥ ϵ 2   = P v 1 A − b v 1 A,k b v 1 A,k ≥ ϵ 2 ! + P v 1 A − b v 1 A,k b v 1 A,k ≤ − ϵ 2 ! = P v 1 A b v 1 A,k ≥ 1 + ϵ 2 ! + P v 1 A b v 1 A,k ≤ 1 − ϵ 2 ! . (70) W e ha ve, P v 1 A b v 1 A,k ≥ 1 + ϵ 2 ! = P b v 1 A,k v 1 A ≤ 2 2 + ϵ ! = P b v 1 A,k − v 1 A v 1 A ≤ − ϵ 2 + ϵ ! = P    k − 1 P n i =1 D i A i ( Y i − y A (1)) 2 −  b τ 1 A,k − y A (1)  2 − v 1 A v 1 A ≤ − ϵ 2 + ϵ    ≤ P k − 1 P n i =1 D i A i ( Y i − y A (1)) 2 − v 1 A v 1 A ≤ − ϵ 2 (2 + ϵ ) ! (71) + P    −  b τ 1 A,k − y A (1)  2 v 1 A ≤ − ϵ 2 (2 + ϵ )    , (72) where w e use the fact that 1 k n X i =1 D i A i  Y i − b τ 1 A,k  2 = 1 k n X i =1 D i A i ( Y i − y A (1)) 2 −  b τ 1 A,k − y A (1)  2 . F or (72), we hav e, P    −  b τ 1 A,k − y A (1)  2 v 1 A ≤ − ϵ 2 (2 + ϵ )    = P     b τ 1 A,k − y A (1)  2 v 1 A ≥ ϵ 2 (2 + ϵ )    (73) ≤ 2(2 + ϵ ) ϵ E   b τ 1 A,k − y A (1)  2  v 1 A = 2(2 + ϵ ) ϵ V  b τ 1 A,k  v 1 A = 2 (2 + ϵ ) ϵ n A − k n A k , (74) where the last equalit y follo ws from V  b τ 1 A,k  = n A − k n A k 1 n A − 1 X i ∈A ( y i (1) − y A (1)) 2 = n A − k n A k v 1 A . (75) 34 Notice w e can write: 1 k n X i =1 D i A i ( Y i − y A (1)) 2 − v 1 A = n A ( n A − 1) k n X i =1 D i A i ( Y i − y A (1)) 2 − v 1 A − 1 ( n A − 1) k n X i =1 D i A i ( Y i − y A (1)) 2 No w, P k − 1 P n i =1 D i A i ( Y i − y A (1)) 2 − v 1 A v 1 A ≤ − ϵ 2 (2 + ϵ ) ! ≤ P n A k 1 n A − 1 P n i =1 D i A i ( Y i − y A (1)) 2 − v 1 A v 1 A ≤ − ϵ 4 (2 + ϵ ) ! (76) + P − 1 ( n A − 1) k P n i =1 D i A i ( Y i − y A (1)) 2 v 1 A ≤ − ϵ 4 (2 + ϵ ) ! (77) A bound for (76) follows from the calculation P n A k 1 n A − 1 P n i =1 D i A i ( Y i − y A (1)) 2 − v 1 A v 1 A ≤ − ϵ 4 (2 + ϵ ) ! ≤  4(2 + ϵ ) ϵ  2  n A n A − 1  2  v 1 A  − 2 V 1 k n X i =1 D i A i ( Y i − y A (1)) 2 ! (78) =  4(2 + ϵ ) ϵ  2  n A n A − 1  2  v 1 A  − 2 n A − k k n A 1 n A − 1 X i ∈A ( y i (1) − y A (1)) 4 ≤  4(2 + ϵ ) ϵ  2  n A n A − 1  2 n A − k k n A max i ∈A ( y i (1) − y A (1)) 2 v 1 A (79) ≤  4(2 + ϵ ) ϵ  2  n A n A − 1  2 n A − k k n A B 2 n 1 2 A , (80) where (78) uses the Chebyshev inequality and the fact that E " n A k 1 n A − 1 n X i =1 D i A i ( Y i − y A (1)) 2 # = v 1 A , (81) (79) uses the fact that P i ∈A ( y i (1) − y A (1)) 4 n A − 1 ≤ P i ∈A ( y i (1) − y A (1)) 2 n A − 1 × max i ∈A ( y i (1) − y A (1)) 2 , 35 and (80) uses Lemma A.6-(ii). A bound for (77) follows from the calculation P − 1 k ( n A − 1) P n i =1 D i A i ( Y i − y A (1)) 2 v 1 A ≤ − ϵ 4 (2 + ϵ ) ! = P 1 k ( n A − 1) P n i =1 D i A i ( Y i − y A (1)) 2 v 1 A ≥ ϵ 4 (2 + ϵ ) ! ≤ 4 (2 + ϵ ) ϵ 1 n A ( n A − 1) P i ∈A ( y i (1) − y A (1)) 2 v 1 A = 4 (2 + ϵ ) n A ϵ , b y the Mark ov inequality and 81. Collecting terms we hav e: P v 1 A b v 1 A,k ≥ 1 + ϵ 2 ! ≤  4(2 + ϵ ) ϵ  2  n A n A − 1  2 n A − k k n A B 2 n 1 2 A + 4 (2 + ϵ ) n A ϵ + 2 (2 + ϵ ) ϵ n A − k n A k . F or the second term in (70). W e use the premise that ϵ ≤ 2 and P v 1 A b v 1 A,k ≤ 1 − ϵ 2 ! = P v 1 A b v 1 A,k ≤ 2 − ϵ 2 ! = P b v 1 A,k v 1 A ≥ 2 2 − ϵ ! = P    k − 1 P n i =1 D i A i ( Y i − y A (1)) 2 −  b τ 1 A,k − y A (1)  2 − v 1 A v 1 A ≥ ϵ 2 − ϵ    ≤ P k − 1 P n i =1 D i A i ( Y i − y A (1)) 2 − v 1 A v 1 A ≥ ϵ 2 − ϵ ! ≤ P n A ( n A − 1) k P n i =1 D i A i ( Y i − y A (1)) 2 − v 1 A v 1 A ≥ ϵ 2 − ϵ ! (82) = P   n A ( n A − 1) k P n i =1 D i A i ( Y i − y A (1)) 2 − v 1 A v 1 A ! 2 ≥  ϵ 2 − ϵ  2   ≤  2 − ϵ ϵ  2  n A n A − 1  2  v 1 A  − 2 V 1 k n X i =1 D i A i ( Y i − y A (1)) 2 ! (83) ≤  2 − ϵ ϵ  2  n A n A − 1  2  v 1 A  − 2 n A − k k n A 1 n A − 1 X i ∈A ( y i (1) − y A (1)) 4 ≤  2 − ϵ ϵ  2  n A n A − 1  2 n A − k k n A max i ∈A ( y i (1) − y A (1)) 2 v 1 A ≤  2 − ϵ ϵ  2  n A n A − 1  2 n A − k k n A B 2 n 1 2 A , 36 where in (82) follo ws b ecause P n i =1 D i A i ( Y i − y A (1)) 2 is nonnegativ e and n A ( n A − 1) − 1 ≥ 1 for n A ≥ 2 and rest calculations are similar to the b ound for (76). Hence for ϵ ≤ 2, P      v 1 A − b v 1 A,k    b v 1 A,k ≥ ϵ 2   ≤  4(2 + ϵ ) ϵ  2  n A n A − 1  2 n A − k k n A B 2 n 1 2 A + 4 (2 + ϵ ) n A ϵ + 2 (2 + ϵ ) ϵ n A − k n A k +  2 − ϵ ϵ  2  n A n A − 1  2 n A − k k n A B 2 n 1 2 A . F or n A ≥ 2, we hav e 0 ≤ n A / ( n A − 1) ≤ 2. Hence w e can simplify , P      v 1 A − b v 1 A,k    b v 1 A,k ≥ ϵ 2   ≤ C 66 ( ϵ ) B 2 n A − k k n − 1 2 A + 4(2 + ϵ ) ϵ n − 1 A + 2(2 + ϵ ) ϵ n A − k k n − 1 A , where C 66 ( ϵ ) =  8(2 + ϵ ) ϵ  2 +  2 (2 − ϵ ) ϵ  2 . By symmetry , we hav e P      v 0 A − b v 0 A,k    b v 0 A,k ≥ ϵ 2   ≤ C 66 ( ϵ ) B 2 k n A − k n − 1 2 A + 4(2 + ϵ ) ϵ n − 1 A + 2(2 + ϵ ) ϵ k n A − k n − 1 A . (84) Com bining these t wo b ounds gives us the result in the lemma. Corollary A.9. Under the setup of Lemma A.8, we hav e, for n ≥ 2, P θ w n      e V n,n 1 A b V hj n − 1      ≥ ϵ ! = E   P θ w n         e V n,n 1 A b V hj n,n 1 A − 1       ≥ ϵ     n 1 A     ≤ C 88 ( r , s, ϵ ) n − 1 + C 87 ( ϵ, r ) 1 n A max a ∈{ 0 , 1 } max i ∈A ( y i ( a ) − y A ( a )) 2 v a A (85) ≤ C 88 ( r , s, ϵ ) n − 1 + C 87 ( ϵ, r ) B 2 n − 1 2 A (86) where C 87 ( ϵ, r ) is defined as C 87 ( ϵ, r ) = C 66 ( ϵ ) 2 (2 − r ) r , (87) 37 with C 66 ( ϵ ) defined in (66), and C 88 ( r , s, ϵ ) is defined as C 88 ( r , s, ϵ ) = 4 (1 − r ) r s 2 + 8 (2 + ϵ ) ϵs + 2 (2 + ϵ ) ϵ 2 (2 − r ) r s (88) with constants r , s and B defined in Assumption-3, Assumption2-(i) and As- sumption2-(iii), respectively . Pr o of. W e suppress the dep endence on θ w n and the conditioning ev ent for sim- plicit y . Define even ts E and E c as in (56) and (57). Note on the ev ent E c , w e ha ve the inequalit y , n 1 A > n A n 1 2 n ≥ r n A 2 , n 0 A > n A n 0 2 n ≥ r n A 2 . n 1 A = n A − n 0 A < n A − r n A 2 =  1 − r 2  n A , n 0 A = n A − n 1 A < n A − r n A 2 =  1 − r 2  n A . Hence, conditioning on the even t E c , w e ha ve n 1 A n 0 A ∈  r / 2 1 − r / 2 , 1 − r / 2 r / 2  , n 0 A n 1 A ∈  r / 2 1 − r / 2 , 1 − r / 2 r / 2  . (89) The result follows by E   P         e V n,n 1 A b V hj n,n 1 A − 1       ≥ ϵ     n 1 A     ≤ P ( E ) + E   P         e V n,n 1 A b V hj n,n 1 A − 1       ≥ ϵ     n 1 A       E c   × P ( E c ) ≤ P ( E ) + E " C 66 ( ϵ )  n 0 A n 1 A + n 1 A n 0 A  1 n A max a ∈{ 0 , 1 } max i ∈A ( y i ( a ) − y A ( a )) 2 v a A     E c # + E  8(2 + ϵ ) ϵ n − 1 A + 2(2 + ϵ ) ϵ n − 1 A  n 0 A n 1 A + n 1 A n 0 A      E c  , ≤ 2(1 − r ) r s 2 ( n − 1) + C 66 ( ϵ ) 2 (2 − r ) r 1 n A max a ∈{ 0 , 1 } max i ∈A ( y i ( a ) − y A ( a )) 2 v a A + 8 (2 + ϵ ) ϵ n − 1 A + 2 (2 + ϵ ) ϵ 2 (2 − r ) r n − 1 A ≤ 2(1 − r ) r s 2 ( n − 1) + C 66 ( ϵ ) B 2 2 (2 − r ) r n − 1 2 A + 8 (2 + ϵ ) ϵ n − 1 A + 2 (2 + ϵ ) ϵ 2 (2 − r ) r n − 1 A where the second inequality follows from Lemma A.8 with the constan t C ( ϵ ) defined in (66) and the third inequality follo ws from Lemma A.3 and (89). Con- stan ts r , s and B are defined in Assumption-3, Assumption 2-(i) and Assumption 2-(ii), respectively . 38 F or eac h θ w n ∈ Θ w n , w e ha ve n − 1 A ≤ s − 1 n − 1 . W e write, for n ≥ 2 2(1 − r ) r s 2 ( n − 1) n − 1 n + 8 (2 + ϵ ) ϵ s − 1 n − 1 + 2 (2 + ϵ ) ϵ 2 (2 − r ) r s − 1 n − 1 ≤  4 (1 − r ) r s 2 + 8 (2 + ϵ ) ϵ s − 1 + 2 (2 + ϵ ) ϵ 2 (2 − r ) r s − 1  n − 1 = C 88 ( r , s, ϵ ) n − 1 . An ticipating the pro of in Lemma A.23 and recalling the definition of b v 1 ,k A and b v 0 ,k A in (64), w e note that a similar calculation can be carried out for ϵ ≤ 1 P      v 1 A b v 1 A,n 1 A − 1      ≥ ϵ ! = E " P      v 1 A b v 1 A,n 1 A − 1      ≥ ϵ     n 1 A !# ≤ P ( E ) + E " P      v 1 A b v 1 A,n 1 A − 1      ≥ 2 ϵ 2     n 1 A !     E c # × P ( E c ) ≤ P ( E ) + E  C 66 (2 ϵ ) B 2 n 0 A n 1 A n − 1 2 A + 4(1 + ϵ ) ϵ n − 1 A + 2(1 + ϵ ) ϵ n 0 A n 1 A n − 1 A     E c  , ≤ 2 (1 − r ) r s 2 ( n − 1) + C 66 (2 ϵ ) B 2 2 − r r n − 1 2 A + 8 (2 + ϵ ) ϵ n − 1 A + 2 (2 + ϵ ) ϵ 2 − r r n − 1 A = O ( n − 1 2 ) , (90) where the second inequality is b y (84) and the third inequality is by Lemma A.3 and (89). A similar calculation holds for b v 0 A,n 0 A . Lemma A.10. F or all n ∈ N and an y θ w n ∈ Θ w n , let { ( y i (1) , y i (0)) } n i =1 b e the asso ciated potential outcomes and { A i } n i =1 b e the asso ciated alwa ys-rep orter indicators. Denote D = { D i } n i =1 ∼ CR( n, n 1 ), A = { i ∈ [ n ] : A i = 1 } and n A = P n i =1 A i . Define τ n = n − 1 A P i ∈A ( y i (1) − y i (0)). Consider the v ariance estimator b V hj n in (15) and the Ha jek estimator b τ hj n in (14). Let Z be a normal v ariable with mean 0 and unit v ariance, Z ∼ N(0 , 1), that is indep enden t from D . There exists a p ositiv e in teger N such that for all n ≥ N , ev ery ϵ ∈ (0 , 2] and t ∈ R , P θ w n  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ t ! ≤ P θ w n  Z 2 (1 + ϵ ) + g  n 1 A  ≥ t  + C 88 ( r , s, ϵ ) n − 1 + C 87 ( ϵ, r ) B 2 n − 1 2 A + 2 (1 − r ) r s 2 ( n − 1) + r 2 r C 61 ( δ, r ) × B n − 1 4 A where the constan t C 87 ( r , ϵ ) is defined in (87), C 88 ( r , s, ϵ ) is defined in (88), C 61 ( δ, r ) is defined in Lemma A.7, and constan ts r , s , δ and B are defined 39 in Assumption 3, Assumption 2-(i), Assumption 2-(ii) and Assumption 2-(iii), resp ectiv ely . Pr o of. W e suppress the dep endence on θ w n and the conditioning ev ent for sim- plicit y . Pic k N = ⌈ 2 s − 1 ⌉ . W e hav e n A ≥ sn ≥ 2 for all n ≥ N . Recall the definitions of e V n,n 1 A and b V hj n,n 1 A defined in (65) and (64), resp ectiv ely . W e remind readers that V n,n 1 A is defined in 53 with k as n 1 A . P  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ t ! = P  b τ hj n − τ n  2 V n,n 1 A V n,n 1 A e V n,n 1 A e V n,n 1 A b V hj n + g  n 1 A  ≥ t ! ≤ P  b τ hj n − τ n  2 V n,n 1 A e V n,n 1 A b V hj n + g  n 1 A  ≥ t ! (91) =E   P    b τ hj n − τ n  2 V n,n 1 A e V n,n 1 A b V hj n,n 1 A + g  n 1 A  ≥ t     n 1 A     ≤ E   P    b τ hj n − τ n  2 V n,n 1 A e V n,n 1 A b V hj n,n 1 A + g  n 1 A  ≥ t, e V n,n 1 A b V hj n,n 1 A ≤ 1 + ϵ     n 1 A     (92) + E   P   e V n,n 1 A b V hj n,n 1 A ≥ 1 + ϵ     n 1 A     ≤ E " P  b τ hj n − τ n  2 V n,n 1 A (1 + ϵ ) + g  n 1 A  ≥ t     n 1 A !# + E   P   e V n,n 1 A b V hj n,n 1 A ≥ 1 + ϵ     n 1 A     ≤ E  P  Z 2 (1 + ϵ ) + g  n 1 A  ≥ t     n 1 A  + E   P   e V n,n 1 A b V hj n,n 1 A ≥ 1 + ϵ     n 1 A     + E " P  b τ hj n − τ n  2 V n,n 1 A (1 + ϵ ) + g  n 1 A  ≥ t     n 1 A !# − E  P  Z 2 (1 + ϵ ) + g  n 1 A  ≥ t     n 1 A  ! where for (91) we use the fact that V n,n 1 A / e V n,n 1 A ≤ 1. The results follo w by applying Lemma A.7 and Corollary A.9. Note b ecause for all θ w n ∈ Θ w n , we hav e n − 1 A ≤ s − 1 n − 1 F or n ≥ 2, w e can 40 simplify the upp er b ounds b y C 88 ( r , s, ϵ ) n − 1 + C 87 ( ϵ, r ) B 2 n − 1 2 A + 2 (1 − r ) r s 2 ( n − 1) + r 2 r C 61 ( δ, r ) × B n − 1 4 A ≤ C 88 ( r , s, ϵ ) n − 1 + C 87 ( ϵ, r ) B 2 n − 1 2 A + 4 (1 − r ) r s 2 n − 1 + r 2 r C 61 ( δ, r ) × B n − 1 4 A ≤  6 (1 − r ) r s 2 + 8 (2 + ϵ ) ϵs + 2 (2 + ϵ ) ϵ 2 (2 − r ) r s  n − 1 + C 87 ( ϵ, r ) B 2 n − 1 2 A + r 2 r C 61 ( δ, r ) × B n − 1 4 A ≤  6 (1 − r ) r s 2 + 8 (2 + ϵ ) ϵs + 2 (2 + ϵ ) ϵ 2 (2 − r ) r s  n − 1 + C 87 ( ϵ, r ) B 2 s − 1 2 n − 1 2 + r 2 r C 61 ( δ, r ) × B s − 1 4 n − 1 4 ≤ C 93 ( r , s, ϵ, B , δ ) n − 1 4 with C 93 ( r , s, ϵ, B , δ ) = 6 (1 − r ) r s 2 + 8 (2 + ϵ ) ϵs + 2 (2 + ϵ ) ϵ 2 (2 − r ) r s + C 87 ( ϵ, r ) B 2 s − 1 2 + r 2 r C 61 ( δ, r ) × B s − 1 4 . (93) W e shall use this constan t for the remaining calculations. Corollary A.11. Under the same setup as in Lemma A.10, supp ose sup n ∈ N sup θ w n ∈ Θ w n E θ w n    g ( n 1 A )    ≤ C < ∞ (94) for some p ositiv e constan t C . F or all t ∈ R , lim sup n →∞ sup θ w n ∈ Θ w n P θ w n  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ t ! − P θ w n  Z 2 + g ( n 1 A ) ≥ t  ! ≤ 0 . Pr o of. F or ϵ ∈ (0 , 2] and t ∈ R , the b ound in Lemma A.10 applies for all parameters θ w n ∈ Θ w n and all n ∈ N , w e shall write, for simplicit y , P θ w n  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ t ! ≤ P θ w n  Z 2 (1 + ϵ ) + g  n 1 A  ≥ t  + C 93 ( r , s, ϵ, B , δ ) n − 1 4 . 41 Hence w e ha ve, for ϵ ∈ (0 , 2] and t ∈ R , sup θ w n ∈ Θ w n P θ w n  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ t ! − P θ w n  Z 2 + g ( n 1 A ) ≥ t  ! ≤ sup θ w n ∈ Θ w n  P θ w n  Z 2 (1 + ϵ ) + g ( n 1 A ) ≥ t  − P θ w n  Z 2 + g ( n 1 A ) ≥ t  + C 93 ( r , s, ϵ, B , δ ) n − 1 4 ≤ s ϵ ( | t | + C ) 2 π (1 + ϵ ) + C 93 ( r , s, ϵ, B , δ ) n − 1 4 , (95) where for (95) w e used the fact that sup θ w n ∈ Θ w n  P θ w n  Z 2 (1 + ϵ ) + g ( n 1 A ) ≥ t  − P θ w n  Z 2 + g ( n 1 A ) ≥ t  ≤ s ϵ ( | t | + C ) 2 π (1 + ϵ ) , whic h is used in Lemma A.26. T aking lim sup n →∞ follo wed b y lim sup ϵ → 0 yields the stated result. W e note that T 0 n , T 1 n and T 2 n defined in (17), (19) and (20) has the form  b τ hj n − τ n  2 b V hj n + g i  n 1 A  with associated the g i , i ∈ { 0 , 1 , 2 } , functions: g 0 ( n 1 A ) = 0 , g 1 ( n 1 A ) = V − 1 n ( A )  n 1 A n 1 − n A − n 1 A n 0  2 (96) g 2 ( n 1 A ) = V − 1 n ( A )  n 1 A n 1 − n A − n 1 A n 0  − ! 2 . (97) Theorem A.12 c haracterizes the asymptotic distributions of T 0 n , T 1 n , and T 2 n , ev aluated at the true alwa ys-rep orter table A implied by each θ w n . Theorem A.12. Under the same setup as in Lemma A.10, for ev ery t ∈ R , lim sup n →∞ sup θ w n ∈ Θ w n  P θ w n  T i n ≥ t  − P θ w n  Z 2 + g i ( n 1 A ) ≥ t  ≤ 0 . for i ∈ { 0 , 1 , 2 } , where T 0 n , T 1 n and T 2 n are defined in (17), (19) and (20) resp ec- tiv ely . Pr o of. It is clear that sup n ∈ N sup θ n ∈ Θ w n E θ n    g 0 ( n 1 A )    = 0. Since 0 ≤ g 2 ( n 1 A ) ≤ g 1 ( n 1 A ), w e only need to v erify the condition for g 1 ( n 1 A ). Now, E θ w n  g 1 ( n 1 A )  = V − 1 n ( A ) E θ w n "  n 1 A n 1 − n A − n 1 A n 0  2 # = V − 1 n ( A ) V n ( A ) = 1 , for all θ w n ∈ Θ w n and all n ∈ N . Then the conclusion follows from Corollary A.11. 42 A.4 Randomization Critical V alue This section analyzes the prop erties of randomization critical v alues. Our main result, Theorem A.25, establishes their con v ergence and th us justifies the asymp- totic v alidity of using the randomization-based critical v alue. Section A.4.2 states some prop erties of the quan tiles of the random v ari- ables Z 2 + g ( n 1 A ), where Z ∼ N (0 , 1). Section A.4.3 uses these results to pro ve Theorem A.25. A.4.1 Preliminary Notations W e use the shorthand notation Θ w n = Θ w n ( δ, s, r, B ), as defined in (25). W e t ypically denote an element in Θ w n b y θ w n . W e use χ 2 1 − α to denote the (1 − α )-quantile of a c hi-square random v ariable with one degree of freedom. Note that if Z ∼ N(0 , 1), then Z 2 follo ws a chi- square distribution with one degree of freedom. Let Y = ( Y i ) n i =1 denote the observed outcomes and A = ( A i ) n i =1 a rep orting table. Let D ∗ = ( D ∗ i ) n i =1 ∼ CR( n, n 1 ) b e an independently generated treatmen t assignmen t vector. W e define the test-statistic based on the randomization distribution as: Z ∗ = b τ hj ∗ ( D ∗ ) q b V hj ∗ n ( D ∗ ) , (98) where, b τ hj ∗ ( D ∗ ) = P n i =1 D ∗ i A i Y i P n i =1 D ∗ i A i − P n i =1 (1 − D ∗ i ) A i Y i P n i =1 (1 − D ∗ i ) A i , and b V hj ∗ n ( D ∗ ) = P n i =1 D ∗ i A i  Y i − b τ 1 ∗ A  2 ( P n i =1 D ∗ i A i ) 2 + P n i =1 (1 − D ∗ i ) A i  Y i − b τ 0 ∗ A  2 ( P n i =1 (1 − D ∗ i ) A i ) 2 , with, b τ 1 ∗ A = P n i =1 D ∗ i A i Y i P n i =1 D ∗ i A i , and, b τ 0 ∗ A = P n i =1 (1 − D ∗ i ) A i Y i P n i =1 (1 − D ∗ i ) A i . W e also define n 1 ∗ A = n X i =1 D ∗ i A i , n 0 ∗ A = n X i =1 (1 − D ∗ i ) A i . (99) Giv en a measurable function g , define the randomization distribution function of ( Z ∗ ) 2 + g ( n 1 ∗ A ) based on the observed data by t 7− → P θ w n  ( Z ∗ ) 2 + g  n 1 ∗ A  ≤ t    D obs  , t ∈ R . (100) W e write “ | D obs )” to emphasize the fact that the distribution is conditioned on the realized assignments. Unless noted otherwise, w e define n 1 A = P n i =1 D i A i . 43 A.4.2 Prop erties of the Quantiles Lemma A.13 states that the quantiles of Z 2 + g ( n 1 A ) are uniformly b ounded in n , under suitable regularit y conditions on g . Lemma A.13. Let g : R → R be a function such that sup n ∈ N sup θ w n ∈ Θ w n E θ w n    g ( n 1 A )    ≤ C < ∞ , for a positive constant C . Giv en an α ∈ (0 , 1), let q 1 − α n b e the (1 − α )-th quan tile of the random v ariable Z 2 + g ( n 1 A ), where Z ∼ N (0 , 1) is indep endent of n 1 A . W e hav e sup n ∈ N sup θ w n ∈ Θ w n q 1 − α n ≤ χ 2 1 − α/ 2 + 2 C α (101) Pr o of. W e suppress the dep endence on θ w n . W e hav e P  Z 2 + g ( n 1 A ) ≥ χ 2 1 − α/ 2 + C α/ 2  ≤ P  Z 2 ≥ χ 2 1 − α/ 2  + P  g  n 1 A  ≥ C α/ 2  ≤ P  Z 2 ≥ χ 2 1 − α/ 2  + P    g  n 1 A    ≥ 2 C α  ≤ α 2 + α 2 E    g  n 1 A     C ≤ α. Hence w e m ust ha ve q 1 − α n ≤ χ 2 1 − α/ 2 + 2 C /α for all n ∈ N . W e note that Z 2 with Z ∼ N (0 , 1) follows a Chi-squared χ 2 1 distribution with one degree of freedom. Z 2 has a density function: f χ 2 1 ( t ) = 1 √ 2 π t − 1 2 e − t 2 , t ∈ (0 , ∞ ) , whic h is strictly decreasing on its supp ort. Lemma A.14. Consider a vector of alw ays–reporter indicators A = ( A i ) n i =1 , and a completely randomized design D = ( D i ) n i =1 ∼ CR( n, n 1 ). Let n 0 = n − n 1 and n 1 A = P n i =1 D i A i . Let Z ∼ N(0 , 1) b e a normal random v ariable with mean 0 and v ariance 1 that is indep enden t of D and g : R → R be an arbitrary function. F or every α ∈ (0 , 1), Z 2 + g ( n 1 A ) has a unique quantile q 1 − α n . In particular, P  Z 2 + g ( n 1 A ) ≤ q 1 − α n  = 1 − α . Pr o of. Define n A = P n i =1 A i . The random v ariable n 1 A has finite supp ort { 0 , . . . , n A } . Therefore, Z 2 + g ( n 1 A ) has the follo wing densit y f Z 2 + g ( n 1 A ) ( t ) = n A X k =0 f χ 2 1 ( t − g ( k )) p n 1 A ( k ) 1 { t ≥ g ( k ) } , (102) where p n 1 A is the probability mass function of n 1 A . It can b e easily seen that the density function is strictly p ositv e on its supp ort. Hence the quan tile is unique. 44 Lemma A.15. Let q 1 − α n b e the (1 − α )-th quantile of Z 2 + g ( n 1 A ), where Z ∼ N(0 , 1) is independent of n 1 A . Assume that there exist p ositive constants C 1 , C 2 and γ , such that inf n ∈ N inf θ w n ∈ Θ w n P θ w n  q 1 − α n − C 1 < g  n 1 A  < q 1 − α n − C 2  ≥ γ . (103) F or an y η ∈ (0 , C 2 ], the following inequalities hold for all θ w n ∈ Θ w n and all n ∈ N : P θ w n  Z 2 + g ( n 1 A ) ≥ q 1 − α n − η  − P θ w n  Z 2 + g ( n 1 A ) ≥ q 1 − α n  ≥ η γ f χ 2 1 ( C 1 ) P θ w n  Z 2 + g ( n 1 A ) ≥ q 1 − α n  − P θ w n  Z 2 + g ( n 1 A ) ≥ q 1 − α n + η  ≥ η γ f χ 2 1 ( C 1 + C 2 ) . Pr o of. W e suppress the dependence on θ w n . Giv en a η ∈ (0 , C 2 ], the ev ent { q 1 − α n − C 1 < g  n 1 A  < q 1 − α n − C 2 } is equiv alen t to the even t { η ≤ C 2 < q 1 − α n − g  n 1 A  < C 1 } . Then, P  Z 2 + g ( n 1 A ) ≥ q 1 − α n − η  − P  Z 2 + g ( n 1 A ) ≥ q 1 − α n  ≥ E  P  q 1 − α n − η − g  n 1 A  ≤ Z 2 < q 1 − α n − g ( n 1 A )   n 1 A  1 { C 2 < q 1 − α n − g  n 1 A  < C 1 }  =E " Z q 1 − α n − g ( n 1 A ) q 1 − α n − η − g ( n 1 A ) f χ 2 1 ( t ) dt 1 { C 2 < q 1 − α n − g ( n 1 A ) < C 1 } # ≥ η f χ 2 1 ( C 1 ) P  C 2 < q 1 − α n − g ( n 1 A ) < C 1  ≥ η γ f χ 2 1 ( C 1 ) , where the last inequalit y follows from (103), the fact that on the even t C 2 < q 1 − α n − g ( n 1 A ) < C 1 w e ha ve q 1 − α n − g ( n 1 A ) − η ≥ 0, and the fact the f χ 2 1 is a decreasing function. Similarly , P  Z 2 + g ( n 1 A ) ≥ q 1 − α n  − P  Z 2 + g ( n 1 A ) ≥ q 1 − α n + η  ≥ E  P  q 1 − α n − g  n 1 A  ≤ Z 2 < q 1 − α n − g ( n 1 A ) + η   n 1 A  1 { C 2 < q 1 − α n − g ( n 1 A ) < C 1 }  =E " Z q 1 − α n − g ( n 1 A )+ η q 1 − α n − g ( n 1 A ) f χ 2 1 ( t ) dt 1 { C 2 < q 1 − α n − g ( n 1 A ) < C 1 } # ≥ η f χ 2 1 ( C 1 + η ) P  C 2 < q 1 − α n − g ( n 1 A ) < C 1  ≥ η f χ 2 1 ( C 1 + C 2 ) P  C 2 < q 1 − α n − g ( n 1 A ) < C 1  ≥ η γ f χ 2 1 ( C 1 + C 2 ) . W e v erify equation (103) for the g 0 , g 1 and g 2 functions defined in (96) and (97). Lemma A.16. Let g : R → [0 , ∞ ) b e a non-negativ e function. F or an α ∈ (0 , 1), let q 1 − α n b e the (1 − α )-th quan tile of Z 2 + g ( n 1 A ), where Z ∼ N(0 , 1) is indep enden t of n 1 A . W e hav e q 1 − α n ≥ χ 2 1 − α , for all θ w n ∈ Θ w n and n ∈ N . 45 Pr o of. Note that for all θ w n ∈ Θ w n and n ∈ N P θ w n  Z 2 + g ( n 1 A ) ≤ χ 2 1 − α  ≤ P θ w n  Z 2 ≤ χ 2 1 − α  = 1 − α . The result then follo ws immediately . The 75-p ercen tile of a Chi-squared random v ariable with one degree of free- dom is 1.32. Hence χ 2 1 − α ≥ 1 . 32 for α ∈ (0 , 0 . 25]. Lemma A.17. F or α ∈ (0 , 0 . 25], the g 0 , g 1 and g 2 functions defined in (96) and (97) satisfy (103) with constants C 1 = χ 2 1 − α/ 2 + 2 α + 1 , C 2 = 0 . 1 , γ = 0 . 22 1 . 22 . Pr o of. W e use q 1 − α i,n to denote the quantile asso ciated with Z 2 + g i ( n 1 A ) for i ∈ { 0 , 1 , 2 } , where Z ∼ N(0 , 1) is indep enden t of n 1 A . F ollowing the calculati ons in the pro of of Theorem A.12, we hav e sup n ∈ N sup θ w n ∈ Θ w n E θ w n    g i  n 1 A     ≤ 1 , i ∈ { 0 , 1 , 2 } . Hence by Lemma A.13, sup n ∈ N sup θ w n ∈ Θ w n q 1 − α i,n ≤ χ 2 1 − α/ 2 + 2 /α . Let C 1 = χ 2 1 − α/ 2 + 2 /α + 1 and C 2 = 0 . 1. Notice q 1 − α i,n − C 1 < 0 for all i ∈ { 0 , 1 , 2 } and n ∈ N . F or g 0 , w e ha ve q 1 − α 0 ,n = χ 2 1 − α , P θ w n  q 1 − α 0 ,n − C 1 < g 0 < q 1 − α 0 ,n − C 2  ≥ P θ w n  0 ≤ g 0 < χ 2 1 − α − 0 . 1  = 1 , for all θ w n ∈ Θ w n and all n ∈ N . F or g 1 , w e ha ve P  q 1 − α 1 ,n − C 1 < g 1 < q 1 − α 1 ,n − C 2  ≥ P  0 ≤ g 1 < χ 2 1 − α − C 2  (104) =1 − P  g 1 ≥ χ 2 1 − α − C 2  =1 − P  n 1 A n 1 − n 0 A n 0  2 ≥  χ 2 1 − α − C 2  V n ( A ) ! ≥ 1 − V n ( A ) V n ( A )  χ 2 1 − α − C 2  = 1 − 1 χ 2 1 − α − 0 . 1 ≥ 1 − 1 1 . 32 − 0 . 1 = 1 − 1 1 . 22 = 0 . 22 1 . 22 , (105) where for (104) we use Lemma A.16 and for (105) we use the fact that χ 2 1 − α ≥ 1 . 32 for α ∈ (0 , 0 . 25]. A similar argumen t holds for g 2 . 46 A.4.3 Asymptotically V alid Inference with Randomization-Based Crit- ical V alues W e state a high-level theorem establishing asymptotically v alid inference using a estimator of the quantile. Theorem A.18. Let g : R → R be a function such that sup n ∈ N sup θ w n ∈ Θ w n E θ w n    g ( n 1 A )    ≤ C < ∞ , for a p ositive constant C . F or an α ∈ (0 , 1), let q 1 − α n b e the (1 − α )-th quan tile of the random v ariable Z 2 + g ( n 1 A ), where Z ∼ N(0 , 1) is indep enden t of n 1 A . Let b q n b e a quan tile estimator such that for eac h η > 0 we hav e lim sup n →∞ sup θ w n ∈ Θ w n P θ w n    b q n − q 1 − α n   ≥ η  = 0 , Giv en a θ w n ∈ Θ w n , let { ( y i (1) , y i (0)) } n i =1 b e the asso ciated p oten tial outcomes and { A i } n i =1 b e the associated alwa ys-rep orter indicators. Define A = { i ∈ [ n ] : A i = 1 } and n A = P n i =1 A i , τ n = n − 1 A P i ∈A ( y i (1) − y i (0)). Define the v ariance estimator b V hj n as in (15) and the Ha jek estimator b τ hj n as in (14). Then, lim sup n →∞ sup θ w n ∈ Θ w n P θ w n  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ b q n ! ≤ α. (106) Pr o of. W e suppress the dep endence on θ w n . By Corollary A.11 and for any 47 ϵ ∈ (0 , 2] and η > 0, we ha ve the calculations, P  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ b q n ! = P  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ b q n ,   b q n − q 1 − α n   ≤ η ! + P  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ b q n ,   b q n − q 1 − α n   > η ! ≤ P  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ q 1 − α n − η ! + P    b q n − q 1 − α n   ≥ η  ≤ P  Z 2 + g ( n 1 A ) ≥ q 1 − α n − η  + C 93 ( r , s, ϵ, B , δ ) n − 1 4 + (107) s ϵ    q 1 − α n − η   + C  2 π (1 + ϵ ) + P    b q n − q 1 − α n   ≥ η  (108) ≤ P  Z 2 + g ( n 1 A ) ≥ q 1 − α n − η  + C 93 ( r , s, ϵ, B , δ ) n − 1 4 + r ϵ 2 π (1 + ϵ ) r χ 2 1 − α/ 2 + 2 C α + C + η + P    b q n − q 1 − α n   ≥ η  (109) = P  Z 2 + g ( n 1 A ) ≥ q 1 − α n  + C 93 ( r , s, ϵ, B , δ ) n − 1 4 + r ϵ 2 π (1 + ϵ ) r χ 2 1 − α/ 2 + 2 C α + C + η + P    b q n − q 1 − α n   ≥ η  + P  Z 2 + g ( n 1 A ) ≥ q 1 − α n − η  − P  Z 2 + g ( n 1 A ) ≥ q 1 − α n  ≤ α + C 93 ( r , s, ϵ, B , δ ) n − 1 4 + r ϵ 2 π (1 + ϵ ) r χ 2 1 − α/ 2 + 2 C α + C + η + P    b q n − q 1 − α n   ≥ η  + r 2 π √ η , (110) where we use the estimate in (95) for (107) and (108), Lemma A.13 for (109), and (132) for (110), which follows from the calculation P  Z 2 + g ( n 1 A ) ≥ q 1 − α n − η  − P  Z 2 + g ( n 1 A ) ≥ q 1 − α n  =E  P  q 1 − α n − g ( n 1 A ) − η ≤ Z 2 ≤ q 1 − α n − g ( n 1 A )   n 1 A  =E  P  q 1 − α n − g ( n 1 A ) − η ≤ Z 2 ≤ q 1 − α n − g ( n 1 A )   n 1 A  1 { q 1 − α n − g ( n 1 A ) − η ≤ 0 }  + E  P  q 1 − α n − g ( n 1 A ) − η ≤ Z 2 ≤ q 1 − α n − g ( n 1 A )   n 1 A  1 { q 1 − α n − g ( n 1 A ) − η > 0 }  ≤ E  P  0 ≤ Z 2 ≤ η   n 1 A  1 { q 1 − α n − g ( n 1 A ) − η ≤ 0 }  + E  P  q 1 − α n − g ( n 1 A ) − η ≤ Z 2 ≤ q 1 − α n − g ( n 1 A )   n 1 A  1 { q 1 − α n − g ( n 1 A ) − η > 0 }  ≤ r 2 π √ η , 48 T aking lim sup n →∞ on both sides yields, lim sup n →∞ sup θ w n ∈ Θ w n P θ w n  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ b q n ! ≤ α + r ϵ 2 π (1 + ϵ ) r χ 2 1 − α/ 2 + 2 C α + C + η + r 2 π √ η . W e note that the LHS of the inequality abov e does not depend on ϵ or η . T aking lim ϵ → 0 and lim η → 0 giv es the desired results. Hence all we need to show is that, for all η > 0, lim sup n →∞ sup θ w n ∈ Θ w n P θ w n    b q n − q 1 − α n   ≥ η  = 0 . Lemma A.19. Let g b e a non-negativ e function such that sup n ∈ N sup θ w n ∈ Θ w n E θ w n    g ( n 1 A )    ≤ C < ∞ , for a positive constant C and satisfies the condition (103) in Lemma A.15 with p ositiv e constan ts γ , C 1 and C 2 . Let q 1 − α n b e the (1 − α )-th quantile of Z 2 + g ( n 1 A ), where Z ∼ N(0 , 1) is indep enden t of n 1 A . Let { b q 1 − α n } ∞ n b e the 1 − α quantile of the randomization distribution b q 1 − α n = inf n t ∈ R : P θ w n  ( Z ∗ ) 2 + g  n 1 ∗ A  ≤ t    D obs  ≥ 1 − α o , where Z ∗ and n 1 ∗ A are defined in (98) and (99), and the randomization distribu- tion is defined in (100). F or a given η > 0, define the interv al, I η =  χ 2 1 − α − η , χ 2 1 − α/ 2 + 2 C α + η  . (111) On the even t,   b q 1 − α n − q 1 − α n   > η , (112) w e must ha ve the ev ent E n : sup t ∈ I η    P θ w n  Z 2 + g ( n 1 A ) ≤ t  − P θ w n  ( Z ∗ ) 2 + g  n 1 ∗ A  ≤ t    D obs     ≥ min { η γ f χ 2 1 ( C 1 ) , η γ f χ 2 1 ( C 1 + C 2 ) } In particular, if lim sup n →∞ sup θ w n ∈ Θ w n P θ w n    b q 1 − α n − q 1 − α n   > η  > ϵ, then lim sup n →∞ sup θ w n ∈ Θ w n P θ w n ( E n ) > ϵ. 49 Pr o of. W e suppress the dep endence on θ w n for simplicit y . On the ev ent   b q 1 − α n − q 1 − α n   > η , we hav e either b q 1 − α n − q 1 − α n > η or b q 1 − α n − q 1 − α n < − η . F or the case b q 1 − α n − q 1 − α n > η , we hav e, b q 1 − α n > q 1 − α n + η > q 1 − α n . By the definition of b q 1 − α n , w e ha ve P  ( Z ∗ ) 2 + g  n 1 ∗ A  ≤ q 1 − α n + η    D obs  ≤ 1 − α Lemma A.15 gives P  Z 2 + g ( n 1 A ) ≤ q 1 − α n + η  ≥ P  Z 2 + g ( n 1 A ) ≤ q 1 − α n  + η γ f χ 2 1 ( C 1 + C 2 ) =1 − α + η γ f χ 2 1 ( C 1 + C 2 ) , where the second inequalit y is b y Lemma A.14. Hence, P  Z 2 + g ( n 1 A ) ≤ q 1 − α n + η  − P  ( Z ∗ ) 2 + g  n 1 ∗ A  ≤ q 1 − α n + η    D obs  ≥ η γ f χ 2 1 ( C 1 + C 2 ) (113) Alternativ ely , for the case b q 1 − α n − q 1 − α n < − η , we hav e b q 1 − α n < q 1 − α n − η < q 1 − α n . By the definition of b q 1 − α n , w e ha ve P  Z ∗ + g  n 1 ∗ A  ≤ q 1 − α n − η   D obs  ≥ 1 − α Lemma A.15 gives P  Z 2 + g ( n 1 A ) ≤ q 1 − α n − η  ≤ P  Z 2 + g ( n 1 A ) ≤ q 1 − α n  − η γ f χ 2 1 ( C 1 ) = (1 − α ) − η γ f χ 2 1 ( C 1 ) Hence, P  Z 2 + g ( n 1 A ) ≤ q 1 − α n − η  − P  Z ∗ + g  n 1 ∗ A  ≤ q 1 − α n − η   D obs  ≤ − η γ f χ 2 1 ( C 1 ) . (114) Notice by Lemma A.13 and Lemma A.16, q 1 − α n − η ∈ I η and q 1 − α n + η ∈ I η . Com bining equations (113) and (114) yields the stated results. Giv en a v ector of alwa ys-rep orter indicators A = ( A i ) n i =1 . Define A = { i ∈ [ n ] : A i = 1 } and n A = P n i =1 A i . Given realized outcomes, w e define Y A = 1 n A X i ∈A Y i , v ∗ A = 1 n A − 1 X i ∈A  Y i − Y A  2 . (115) As (65), we define the target v ariance conditioning on the realized outcomes, with v 1 ∗ A = v 0 ∗ A = v ∗ A , V ∗ n,k = 1 k v 1 ∗ A + 1 n A − k v 0 ∗ A , for k ∈ [1 , n A − 1]. W e note that if V ∗ n,k = 0 for some k , then all Y i , i ∈ A are iden tical. Hence V ∗ n,k = 0 for all k ∈ [1 , n A − 1], and b τ hj ∗ and b V hj ∗ n are 0 conditionally almost surely . W e shall define 0 / 0 ≡ 0 when suc h ev ent happ ens. 50 Theorem A.20. Let g b e a non-negativ e function such that sup n ∈ N sup θ w n ∈ Θ w n E θ w n    g ( n 1 A )    ≤ C < ∞ , for a p ositiv e constan t C . Given a θ w n ∈ Θ w n , let { A i } n i =1 b e the associated alw ays-reporter indicators and define A = { i ∈ [ n ] : A i = 1 } . Supp ose n A ≥ 2. F or eac h t ≥ 0 and ϵ ∈ (0 , 1), w e ha ve,     P θ w n  ( Z ∗ ) 2 + g  n 1 ∗ A  ≤ t    D obs  − P θ w n  Z 2 + g ( n 1 A ) ≤ t     ≤ 2(1 − r ) r s 2 ( n − 1) + r 2 r C 61 ( δ, r ) 1 √ n A max i ∈A   Y i − Y A   p v ∗ A + C 88 ( r , s, ϵ ) n − 1 + C 87 ( ϵ, r ) 1 n A max i ∈A   Y i − Y A   2 v ∗ A + s 2 ϵ ( t + C ) π (1 − ϵ ) + s 2 ϵ ( t + C ) π (1 + ϵ ) , where C 61 ( δ, r ) is defined in (61), C 87 ( ϵ, r ) is defined in (87), C 88 ( r , s, ϵ ) is defined in (88), { Y i } i ∈A are observed outcomes, and ¯ Y A and v ∗ A are defined in (115). Constan ts r , s , δ and B are defined in Assumption 3, Assumption 2-(i), Assumption 2-(ii) and Assumption 2-(iii), resp ectiv ely . Pr o of. Recall the definition of Z ∗ from (98). W e write b τ hj ∗ ( D ∗ ) as b τ hj ∗ , and b V hj ∗ n ( D ∗ ) as b V hj ∗ n . W e suppress the dep endence on θ w n . W e ha ve the following inequalit y: P  b τ hj ∗  2 b V hj ∗ n + g  n 1 ∗ A  ≤ t      D obs ! = P  b τ hj ∗  2 V ∗ n,n 1 ∗ A V ∗ n,n 1 ∗ A b V hj ∗ n + g  n 1 ∗ A  ≤ t      D obs ! = P  b τ hj ∗  2 V ∗ n,n 1 ∗ A V ∗ n,n 1 ∗ A b V hj ∗ n + g  n 1 ∗ A  ≤ t, V ∗ n,n 1 ∗ A b V hj ∗ n ≥ 1 − ϵ      D obs ! + P  b τ hj ∗  2 V ∗ n,n 1 ∗ A V ∗ n,n 1 ∗ A b V hj ∗ n + g  n 1 ∗ A  ≤ t, V ∗ n,n 1 ∗ A b V hj ∗ n < 1 − ϵ      D obs ! ≤ P  b τ hj ∗  2 V ∗ n,n 1 ∗ A (1 − ϵ ) + g  n 1 ∗ A  ≤ t      D obs ! + P V ∗ n,n 1 ∗ A b V hj ∗ n < 1 − ϵ      D obs ! 51 Similarly , P  b τ hj ∗  2 b V hj ∗ n + g  n 1 ∗ A  > t      D obs ! = P  b τ hj ∗  2 V ∗ n,n 1 ∗ A V ∗ n,n 1 ∗ A b V hj ∗ n + g  n 1 ∗ A  > t      D obs ! = P  b τ hj ∗  2 V ∗ n,n 1 ∗ A V ∗ n,n 1 ∗ A b V hj ∗ n + g  n 1 ∗ A  > t, V ∗ n,n 1 ∗ A b V hj ∗ n ≤ 1 + ϵ      D obs ! + P  b τ hj ∗  2 V ∗ n,n 1 ∗ A V ∗ n,n 1 ∗ A b V hj ∗ n + g  n 1 ∗ A  > t, V ∗ n,n 1 ∗ A b V hj ∗ n > 1 + ϵ      D obs ! ≤ P  b τ hj ∗  2 V ∗ n,n 1 ∗ A (1 + ϵ ) + g  n 1 ∗ A  > t      D obs ! + P V ∗ n,n 1 ∗ A b V hj ∗ n > 1 + ϵ      D obs ! Hence, P  b τ hj ∗  2 b V hj ∗ n + g  n 1 ∗ A  ≤ t      D obs ! ≥ P  b τ hj ∗  2 V ∗ n,n 1 ∗ A (1 + ϵ ) + g  n 1 ∗ A  ≤ t      D obs ! − P V ∗ n,n 1 ∗ A b V hj ∗ n > 1 + ϵ      D obs ! W e ha ve the inequalities: P  b τ hj ∗  2 b V hj ∗ n + g  n 1 ∗ A  ≤ t      D obs ! − P  Z 2 + g  n 1 A  ≤ t  ≥ P  b τ hj ∗  2 b V hj ∗ n (1 + ϵ ) + g  n 1 ∗ A  ≤ t      D obs ! − P  Z 2 (1 + ϵ ) + g  n 1 A  ≤ t  + P  Z 2 (1 + ϵ ) + g  n 1 A  ≤ t  − P  Z 2 + g  n 1 A  ≤ t  − P V ∗ n,n 1 ∗ A b V hj ∗ n > 1 + ϵ      D obs ! , 52 and, P  b τ hj ∗  2 b V hj ∗ n + g  n 1 ∗ A  ≤ t      D obs ! − P  Z 2 + g  n 1 A  ≤ t  ≤ P  b τ hj ∗  2 V ∗ n,n 1 ∗ A (1 − ϵ ) + g  n 1 ∗ A  ≤ t      D obs ! − P  Z 2 (1 − ϵ ) + g  n 1 A  ≤ t  + P  Z 2 (1 − ϵ ) + g  n 1 A  ≤ t  − P  Z 2 + g  n 1 A  ≤ t  + P V ∗ n,n 1 ∗ A b V hj ∗ n < 1 − ϵ      D obs ! Note w e ha ve x 1 ≤ x ≤ x 2 implies | x | ≤ max {| x 1 | , | x 2 |} . Hence      P  b τ hj ∗  2 b V hj ∗ n + g  n 1 ∗ A  ≤ t      D obs ! − P  Z 2 + g  n 1 A  ≤ t       ≤ sup s ≥ 0 ,t      P s  b τ hj ∗  2 V ∗ n,n 1 ∗ A + g  n 1 ∗ A  ≤ t      D obs ! − P  sZ 2 + g  n 1 A  ≤ t       +   P  Z 2 (1 − ϵ ) + g  n 1 A  ≤ t  − P  Z 2 + g  n 1 A  ≤ t    +   P  Z 2 (1 + ϵ ) + g  n 1 A  ≤ t  − P  Z 2 + g  n 1 A  ≤ t    + P      V ∗ n,n 1 ∗ A b V hj ∗ n − 1      > ϵ      D obs ! ≤ 2 (1 − r ) r s 2 ( n − 1) + r 2 r C 61 ( δ, r ) 1 √ n A max i ∈A   Y i − Y A   p v ∗ A (116) + C 88 ( r , s, ϵ ) n − 1 + C 87 ( ϵ, r ) 1 n A max i ∈A   Y i − Y A   2 v ∗ A (117) + s 2 ϵ ( t + C ) π (1 − ϵ ) + s 2 ϵ ( t + C ) π (1 + ϵ ) , (118) (116) follows from a same calculation as in (61), (117) follo ws from a same calculation as in (85), and (118) follows from Lemma A.26 and the fact that t ≥ 0. W e collect the sev eral algebraic iden tities in the lemma below. Lemma A.21. Let A b e the set of alwa ys-rep orters and D = ( D i ) n i =1 b e the observ ed assignmen ts. Define Y A as in (115) and b τ 1 A and b τ 0 A as in (16). Denote n 1 A = P n i =1 D i A i and n 0 A = P n i =1 (1 − D i ) A i . W e hav e the follo wing algebraic iden tities: (i) b τ 1 A − Y A = n − 1 A n 0 A  b τ 1 A − b τ 0 A  , b τ 0 A − Y A = − n − 1 A n 1 A  b τ 1 A − b τ 0 A  . 53 (ii)   Y i − Y A   ≤ | y i (1) − y A (1) | +   y A (1) − b τ 1 A   + n − 1 A n 0 A   b τ 1 A − b τ 0 A   for all i ∈ A if D i = 1. (iii)   Y i − Y A   ≤ | y i (0) − y A (0) | +   y A (0) − b τ 0 A   + n − 1 A n 1 A   b τ 1 A − b τ 0 A   for all i ∈ A if D i = 0. (iv) P i ∈A  Y i − Y A  2 = P a ∈{ 0 , 1 } P i ∈A ,D i = a ( y i ( a ) − b τ a A ) 2 + n − 1 A n 1 A n 0 A  b τ 1 A − b τ 0 A  2 . Pr o of. By definition w e ha ve Y A = n 1 A n A b τ 1 A + n 0 A n A b τ 0 A . Hence, b τ 1 A − Y A = n 0 A n A  b τ 1 A − b τ 0 A  , b τ 0 A − Y A = − n 1 A n A  b τ 1 A − b τ 0 A  . This pro ves (i). Supp ose D i = 1,   Y i − Y A   =     y i (1) −  n 1 A n A b τ 1 A + n 0 A n A b τ 0 A      =     y i (1) − b τ 1 A + n 0 A n A  b τ 1 A − b τ 0 A      ≤ | y i (1) − y A (1) | +   y A (1) − b τ 1 A   + n 0 A n A   b τ 1 A − b τ 0 A   . This pro ves (ii) and, by symmetry , (iii). F or (iv), w e ha ve, X i ∈A  Y i − Y A  2 = X a ∈{ 0 , 1 } X i ∈A ,D i = a  Y i − b τ a A + b τ a A − Y A  2 = X a ∈{ 0 , 1 } X i ∈A ,D i = a ( y i ( a ) − b τ a A ) 2 + X a ∈{ 0 , 1 } n a A  b τ a A − Y A  2 = X a ∈{ 0 , 1 } X i ∈A ,D i = a ( y i ( a ) − b τ a A ) 2 + n 1 A n 0 A n A  b τ 1 A − b τ 0 A  2 , where the third equalit y uses (i). Lemma A.22. Given a θ w n ∈ Θ w n , let { A i } n i =1 b e the associated alwa ys-reporter indicators and define A = { i ∈ [ n ] : A i = 1 } and n A = P n i =1 A i . Given the observ ed outcomes Y = ( Y i ) n i =1 and define Y A and v ∗ A as in (115). F or each η > 0 lim sup n →∞ sup θ w n ∈ Θ w n P θ w n 1 n A max i ∈A  Y i − Y A  2 v ∗ A ≥ η ! → 0 (119) Pr o of. Conditional on D i = 1 and using the inequality ( a + b + c ) 2 ≤ 3 a 2 + 54 3 b 2 + 3 c 2 , w e ha ve the inequalit y   Y i − Y A   2 v ∗ A ≤ 3 ( y i (1) − y A (1)) 2 + 3  y A (1) − b τ 1 A  2 + 3 n − 2 A  n 0 A  2  b τ 1 A − b τ 0 A  2 ( n A − 1) − 1 P i ∈A  Y i − Y A  2 ≤ 3 ( y i (1) − y A (1)) 2 + 3  y A (1) − b τ 1 A  2 + 3 n − 2 A  n 0 A  2  b τ 1 A − b τ 0 A  2 n − 1 A P i ∈A  Y i − Y A  2 = 3 ( y i (1) − y A (1)) 2 + 3  y A (1) − b τ 1 A  2 + 3 n − 2 A  n 0 A  2  b τ 1 A − b τ 0 A  2 n − 1 A  P a ∈{ 0 , 1 } P i ∈A ,D i = a ( y i ( a ) − b τ a A ) 2 + n − 1 A n 1 A n 0 A ( b τ 1 A − b τ 0 A ) 2  ≤ 3 ( y i (1) − y A (1)) 2 n − 1 A P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 + 3  y A (1) − b τ 1 A  2 n − 1 A P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 + 3 n 0 A n 1 A n A . 3 n 0 A n A n 1 A A similar identit y holds when D i = 0. Hence we hav e, max i ∈A  Y i − Y A  2 v ∗ A ≤ 3 max i ∈A ( y i (1) − y A (1)) 2 n − 1 A P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 + 3 max i ∈A ( y i (0) − y A (0)) 2 n − 1 A P i ∈A ,D i =0 ( y i (0) − b τ 0 A ) 2 + 3  y A (1) − b τ 1 A  2 n − 1 A P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 + 3  y A (0) − b τ 0 A  2 n − 1 A P i ∈A ,D i =0 ( y i (0) − b τ 0 A ) 2 + 3 n 1 A n A n 0 A + 3 n 0 A n A n 1 A . By a union b ound, we hav e, for each θ w n ∈ Θ w n , P θ w n 1 n A max i ∈A  Y i − Y A  2 v ∗ A ≥ η ! ≤ X a ∈{ 0 , 1 } P θ w n 1 n A 3 max i ∈A ( y i ( a ) − y A ( a )) 2 n − 1 A P i ∈A ,D i = a ( y i ( a ) − b τ a A ) 2 ≥ η 6 ! + X a ∈{ 0 , 1 } P θ w n 1 n A 3 ( y A ( a ) − b τ a A ) 2 n − 1 A P i ∈A ,D i = a ( y i ( a ) − b τ a A ) 2 ≥ η 6 ! + X a ∈{ 0 , 1 } P θ w n  1 n A 3 n 1 − a A n a A n A ≥ η 6  55 By Lemma A.23 b elo w, lim sup n →∞ sup θ w n ∈ Θ w n P θ w n 1 n A max i ∈A  Y i − Y A  2 v ∗ A ≥ η ! = 0 Lemma A.23. Giv en the setup in Lemma A.21 and Lemma A.22, for every η > 0 and a ∈ { 0 , 1 } , follo wing statemen ts hold lim sup n →∞ sup θ w n ∈ Θ w n P θ w n 1 n A 3 max i ∈A ( y i ( a ) − y A ( a )) 2 n − 1 A P i ∈A ,D i = a ( y i ( a ) − b τ a A ) 2 ≥ η ! → 0 , (120) lim sup n →∞ sup θ w n ∈ Θ w n P θ w n 1 n A 3 ( y A ( a ) − b τ a A ) 2 n − 1 A P i ∈A ,D i = a ( y i ( a ) − b τ a A ) 2 ≥ η ! → 0 , (121) and lim sup n →∞ sup θ w n ∈ Θ w n P θ w n  n 1 − a A n a A n 2 A ≥ η  → 0 . (122) Pr o of. W e prov e the case for a = 1. The case for a = 0 can b e pro v ed analo- gously . Rearranging the left-hand side in (120), we hav e, 1 n A 3 max i ∈A ( y i (1) − y A (1)) 2 n − 1 A P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 = 1 n A 3 max i ∈A ( y i (1) − y A (1)) 2 v 1 A × v 1 A n − 1 A P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 ≤ 3 B 2 √ n A × n A n 1 A × v 1 A ( n 1 A ) − 1 P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 , where the last inequalit y is b y Lemma A.6. W e prov e lim sup n →∞ sup θ w n ∈ Θ w n P θ w n B 2 √ n A × n A n 1 A × v 1 A ( n 1 A ) − 1 P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 ≥ η ! → 0 . F or ϵ sufficiently small and applying a union b ound, w e pro ve lim sup n →∞ sup θ w n ∈ Θ w n P θ w n  B 2 √ n A × n A n 1 A ≥ η 1 + ϵ  → 0 . (123) lim sup n →∞ sup θ w n ∈ Θ w n P θ w n v 1 A ( n 1 A ) − 1 P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 ≥ 1 + ϵ ! → 0 (124) 56 F or (123), b ecause b oth sides in the ev ent are positive, we note B 2 √ n A × n A n 1 A ≥ η 1 + ϵ ⇔ n 1 A n A √ n A B 2 ≤ 1 + ϵ η (125) ⇔ n 1 A n A ≤ 1 + ϵ η B 2 √ n A ⇒ n 1 A n A ≤ 1 + ϵ η √ s B 2 √ n , (126) where the last implication is by Assumption 2-(i). Notice E [ n a A ] = n − 1 n 1 n A and n − 1 A E [ n a A ] = n − 1 n 1 = π ∈ [ r, 1 − r ] b y Assumption 3. F or n large enough w e hav e, 1 + ϵ η √ s B 2 √ n − π ≤ 1 + ϵ η √ s B 2 √ n − r < 0 . Hence w e ha ve, P θ w n  B 2 √ n A × n A n 1 A ≥ η 1 + ϵ  ≤ P θ w n  n 1 A n A ≤ 1 + ϵ η √ s B 2 √ n  = P θ w n  n 1 A n A − π ≤ 1 + ϵ η √ s B 2 √ n − π  = P θ w n  n 1 A n A − π  2 ≥  1 + ϵ η √ s B 2 √ n − π  2 ! ≤ E "  n 1 A n A − π  2 # /  1 + ϵ η √ s B 2 √ n − π  2 ≤ (1 − r ) 3 4 r s 2 1 n − 1 /  1 + ϵ η √ s B 2 √ n − π  2 = O ( n − 1 ) , where the last inequalit y follo ws from Lemma A.2 and V  n 1 A n A  =  n 1 n A  2 V  n 1 A n 1  =  n 1 n A  2 n 0 n 1 n 1 n − 1 n X i =1  A i − A  2 ≤  1 − r s  2 1 − r r 1 n − 1 1 4 = (1 − r ) 3 4 r s 2 1 n − 1 . Define even ts E and E c as in (56) and (57). Recall the prop ert y in (89) on E c . (122) follows from a calculation: P θ w n  n 1 − a A n a A n 2 A ≥ η  ≤ P θ w n ( E ) + P θ w n  1 − r / 2 r / 2 1 n 2 A ≥ η  → 0 W e hence omit. F or ϵ small enough, (124) follows from the calculation in (90). W e no w sho w (121). The ev ent can b e written as 1 n A 3  y A (1) − b τ 1 A  2 v 1 A v 1 A ( n 1 A ) − 1 P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 n A n 1 A (127) ≥ η (1 + ϵ ) 2 (1 + ϵ ) 2 (128) 57 Again b y a union b ound, we can sho wn as in (123) lim sup n →∞ sup θ w n ∈ Θ w n P θ w n 1 √ n A n A n 1 A ≥ η (1 + ϵ ) 2 ! = 0 , (129) and as in (124) lim sup n →∞ sup θ w n ∈ Θ w n P θ w n v 1 A ( n 1 A ) − 1 P i ∈A ,D i =1 ( y i (1) − b τ 1 A ) 2 ≥ 1 + ϵ ! = 0 . W e only need to sho w lim sup n →∞ sup θ w n ∈ Θ w n P θ w n 1 √ n A 3  b τ 1 A − y A (1)  2 v 1 A ≥ 1 + ϵ ! = 0 (130) Recall the definition of the even t E from (56) and E c from (57) E " P 1 √ n A 3  y A (1) − b τ 1 A  2 v 1 A ≥ 1 + ϵ      n 1 A !# ≤ P ( E ) + E " P 1 √ n A 3  b τ 1 A − y A (1)  2 v 1 A ≥ 1 + ϵ      n 1 A !      E c # × P ( E c ) ≤ P ( E ) + E  3 1 + ϵ n 0 A n 1 A n A √ n A     E c  × P ( E c ) ≤ 2(1 − r ) r s 2 ( n − 1) + 3 1 + ϵ 2 − r r s − 3 2 n − 3 2 = O ( n − 1 ) . Theorem A.24. Let g : R → R be a function such that sup n ∈ N sup θ w n ∈ Θ w n E θ w n    g ( n 1 A )    ≤ C < ∞ , for a p ositiv e constan t C . Giv en a θ w n ∈ Θ w n , let { A i } n i =1 b e the associated alw ays-reporter indicators and denote a completely randomized design D = ( D i ) n i =1 ∼ CR( n, n 1 ). Define n 1 A = P n i =1 D i A i . Let Z ∼ N(0 , 1) be a normal random v ariable with mean 0 and v ariance 1 that is independent of D . Recall the definition of Z ∗ from (98) and n 1 A ∗ from (99). F or a given η > 0, define the in terv al, I η =  χ 2 1 − α − η , χ 2 1 − α/ 2 + 2 C α + η  , W e ha ve, for eac h µ > 0, define the even t E n : E n : sup t ∈ I η    P θ w n  Z 2 + g ( n 1 A ) ≤ t  − P θ w n  ( Z ∗ ) 2 + g  n 1 ∗ A  ≤ t    D obs     ≥ µ. 58 W e ha ve lim sup n →∞ sup θ w n ∈ Θ w n P θ w n ( E n ) = 0 Pr o of. W e note that b ecause the underlying probability space is discrete E n is measurable. Define t max =    χ 2 1 − α/ 2 + 2 C α + η    . By Theorem A.20, for suffi- cien tly large n and small ϵ > 0, w e ha ve 2(1 − r ) r s 2 ( n − 1) + C 88 ( r , s, ϵ ) n − 1 +  1 √ 2 π r ϵ 1 − ϵ + 1 √ 2 π r ϵ 1 + ϵ  p t max + C < µ 2 . Then the result follo ws from Lemma A.22. Theorem A.25. Let g b e a non-negativ e function such that sup n ∈ N sup θ w n ∈ Θ w n E θ w n    g ( n 1 A )    ≤ C < ∞ , and satisfies the condition (103) in Lemma A.15 with p ositiv e constants γ , C 1 and C 2 . Giv en α ∈ (0 , 1), let q 1 − α n b e the (1 − α )-th quan tile of Z 2 + g ( n 1 A ), where Z ∼ N(0 , 1) is indep endent of n 1 A . Let { b q 1 − α n } ∞ n b e the 1 − α quantile of the randomization distribution b q 1 − α n = inf n t ∈ R : P θ w n  ( Z ∗ ) 2 + g  n 1 ∗ A  ≤ t    D obs  ≥ 1 − α o , where Z ∗ and n 1 ∗ A are defined in (98) and (99). F or each η > 0, w e hav e lim sup n →∞ sup θ w n ∈ Θ w n P θ w n    b q 1 − α n − q 1 − α n   > η  = 0 . Moreo ver, lim sup n →∞ sup θ w n ∈ Θ w n P θ w n  b τ hj n − τ n  2 b V hj n + g  n 1 A  ≥ b q 1 − α n ! ≤ α (131) Pr o of. This is pro ved by Theorem A.24, Lemma A.19 (through a proof b y con- tradiction), and Theorem A.18. 59 A.5 Auxiliary Lemmas W e note that Z 2 with Z ∼ N (0 , 1) follows a Chi-squared χ 2 1 distribution with one degree of freedom. Z 2 has a density function: f χ 2 1 ( t ) = 1 √ 2 π t − 1 2 e − t 2 , for t ∈ (0 , ∞ ). Note that f χ 2 1 ( t ) is a decreasing function of t ∈ (0 , ∞ ). W e hav e the follo wing inequalit y: sup x ≥ 0 ,s ∈ [0 ,δ ] Z x + s x f χ 2 1 ( t ) dt = Z δ 0 f χ 2 1 ( t ) dt ≤ 1 √ 2 π Z δ 0 t − 1 2 dt = r 2 π √ δ . (132) Lemma A.26. Let Z 2 with Z ∼ N (0 , 1) b e a random v ariable indep enden t of D = ( D i ) n i =1 ∼ CR( n, n 1 ), and g : R → R be a function such that sup n ∈ N sup θ w n ∈ Θ w n E θ w n    g  n 1 A     ≤ C < ∞ (133) for some p ositive constant C . F or ev ery ϵ ∈ (0 , 1) and t ∈ R , we hav e the inequalities   P θ w n  Z 2 (1 + ϵ ) + g  n 1 A  ≥ t  − P θ w n  Z 2 + g  n 1 A  ≥ t    ≤ s 2 ϵ ( | t | + C ) π (1 + ϵ ) ,   P θ w n  Z 2 + g  n 1 A  ≥ t  − P θ w n  Z 2 (1 − ϵ ) + g  n 1 A  ≥ t    ≤ s 2 ϵ ( | t | + C ) π (1 − ϵ ) , for all θ w n ∈ Θ w n and all n ∈ N . In addition, w e ha ve, Pr o of. W e first show the result for   P θ w n  Z 2 + g  n 1 A  ≥ t  − P θ w n  Z 2 (1 − ϵ ) + g  n 1 A  ≥ t    . F or simplicity , we suppress the dep endence on θ w n . F or every ϵ ∈ (0 , 1), we hav e 0 ≤ P  Z 2 (1 + ϵ ) + g  n 1 A  ≥ t  − P  Z 2 + g  n 1 A  ≥ t  = P t − g  n 1 A  1 + ϵ ≤ Z 2 < t − g  n 1 A  ! =E " P t − g  n 1 A  1 + ϵ ≤ Z 2 < t − g  n 1 A       n 1 A !# =E " P t − g  n 1 A  1 + ϵ ≤ Z 2 < t − g  n 1 A       n 1 A ! 1  t − g  n 1 A  ≥ 0  # ≤ r 2 π E  r ϵ 1 + ϵ q t − g ( n 1 A ) 1  t − g  n 1 A  ≥ 0   (134) ≤ r 2 π r ϵ 1 + ϵ q E [ | t − g ( n 1 A ) | ] (135) ≤ r 2 π r ϵ 1 + ϵ q | t | + E [ | g ( n 1 A ) | ] ≤ r 2 π r ϵ 1 + ϵ p | t | + C , 60 where the p ositiv e constant C is defined in (133), (134) follows from (132), and (135) follo ws from E[ X ] ≤ p E[ X 2 ] for any square-in tegrable random v ariable X . The result for   P θ w n  Z 2 + g  n 1 A  ≥ t  − P θ w n  Z 2 (1 − ϵ ) + g  n 1 A  ≥ t    can b e shown similarly . 61 A.6 Pro of of Theorem 4.2 and Theorem 4.4 W e no w pro ve Theorem 4.2. Pr o of. W e write the underlying true table as A 0 . Under the sharp-null hypoth- esis, P θ s n  p worst ≤ α  ≤ P θ s n  p  A 0  ≤ α − β  + P θ s n  A 0 ∈ A ( D, R )  (136) ≤ α − β + β = α. (137) Similarly , under the w eak-null hypothesis, P θ w n  p worst ≤ α  ≤ P θ w n  p  A 0  ≤ α − β  + P θ w n  A 0 ∈ A ( D, R )  ≤ P θ w n  T 0 n + g i  n 1 A  ≥ b q i,α − β n  + β , for i ∈ { 0 , 1 , 2 } , where the function g i ( n 1 A ) is defined in (96) or (97) and b q i, 1 − α + β n is the randomization-based critical v alue of the corresponding function g i . T 0 n is defined in (17). W e hav e sho wn that g i ( n 1 A ) satisfies the premise of Theorem A.25 in Theorem A.12. Hence by Theorem A.25, lim sup n →∞ sup θ w n ∈ Θ w n P θ w n  p worst ≤ α  ≤ lim sup n →∞ sup θ w n ∈ Θ w n P θ w n  T 0 n + g i  n i A  ≥ b q i, 1 − α + β n  + β ≤ α − β + β = α. W e no w pro ve Theorem 4.4. Pr o of. Let i ∈ { 0 , 1 , 2 } . W e write the underlying true table as A 0 and define k 0 = P n i =1 A 0 i . Algorithm 3 will reject the w eak-null hypothesis if T max ,k i ≥ q k i, 1 − α + β for all k in Cardinality-A. Then w e ha ve, P θ w n  T max ,k i ≥ q k i, 1 − α + β , ∀ k ∈ Cardinality-A  ≤ P θ w n  T max ,k 0 i ≥ q k 0 i, 1 − α + β  + P θ w n  A 0 ∈ A ( D, R )  ≤ P θ w n  T i n + g i  n 1 A  ≥ q k 0 i, 1 − α + β  + β ≤ P θ w n  Z 2 + g i ( n 1 A ) ≥ q k 0 i, 1 − α + β  + β +         P θ w n  T i n + g i  n 1 A  ≥ q k 0 i, 1 − α + β  − P θ w n  Z 2 + g i ( n 1 A ) ≥ q k 0 i, 1 − α + β  | {z } ( ∗ )         . 62 By Lemma A.13, Theorem A.12 and Lemma A.16, the set of all p ossible quan- tiles for q k i, 1 − α + β lies in a b ounded set. Using an argumen t similar as the justi- fication from (108) to (109), we can sho w that, lim sup n →∞ sup θ w n ∈ Θ w n | ( ∗ ) | = 0 . Hence w e ha ve, lim sup n →∞ sup θ w n ∈ Θ w n P θ w n  T max ,k i ≥ q k i, 1 − α + β , ∀ k ∈ Cardinality-A  ≤ α. 63 B Pro of for Results in Section 5 B.1 Pro of of Lemma 5.1 Pr o of. Let A ∗ b e the optimizing alw a ys-rep orter v ector associated with problem (7). By definition, there must exist an interv al [ t i − 1 , t i ] suc h that T n ( Y , D obs , A ∗ ) ∈ [ t i − 1 , t i ]. Let A i ∗ b e the optimizing vector corresp onding to the subproblem of (38) associated with [ t i − 1 , t i ]. W e hav e the inequality: E D  1  T n ( Y , D , A i ∗ ) ≥ t i − 1  ≥ E D [ 1 ( T n ( Y , D , A ∗ ) ≥ t i − 1 )] ≥ E D  1  T n ( Y , D , A ∗ ) ≥ T n ( Y , D obs , A ∗ )  = p worst , where D ∼ CR( n, n 1 ). F or every in terv al [ t i − 1 , t i ] and its asso ciated optimized v alue, we hav e the inequality: v i =E D  1  T n ( Y , D , A i ∗ ) ≥ t i − 1  =E D  1  T n ( Y , D , A i ∗ ) ≥ t i  + E D  1  T n ( Y , D , A i ∗ ) ∈ [ t i − 1 , t i )  ≤ E D  1  T n ( Y , D , A i ∗ ) ≥ T n ( Y , D obs , A i ∗ )  + E D  1  T n ( Y , D , A i ∗ ) ∈ [ t i − 1 , t i )  ≤ p worst + E D  1  T n ( Y , D , A i ∗ ) ∈ [ t i − 1 , t i )  . Com bining the inequalities yields the desired results. 64 C Computational Subroutines Section C.1 includes t wo master algorithms that computer the worst-case p- v alues for differen t outcome t yp es: 1. A master continuous : for con tinuous outcomes. 2. A master finite support : for outcomes with a small support. Section C.2 includes subroutines that are used in A master continuous : 1. B Calculate Heuristic p Value : given a num b er of alw ays-reporters, the subroutine finds a heuristic solution for the alwa ys-reporter v ector that maximizes the p-v alue. 2. B Create Lower Bound: an MIQCP-based bisection metho d for construct- ing a low er bound for the test statistic e T 0 n ( Y , D A , x ) + g i ( n A, 1 ) , sub ject to x ∈ x ( k , D obs , R ), for a given k , an assignmen t vector D A , and a statistic type i ∈ { 0 , 1 , 2 } . Here, e T 0 n ( Y , D A , x ) is defined in (40), and the functions g i ( n A, 1 ) are defined in (96) and (97). 3. B Create Upper Bound: an MIQCP-based bisection metho d for construct- ing an upp er b ound for the same constrained problem considered in B Create Lower Bound . 4. B Create Lower Bound 2: a MIQCP metho d for constructing a low er b ound of the statistic b µ 2 n ( Y , D A , x A ) − c × b σ 2 , hj n  Y , D A , x A  sub ject to x ∈ x ( k, D obs , R ) for a given k , an assignment vector D A and a giv en scalar c , where the functions µ 2 n and b σ 2 , hj n  Y , D A , x A  are defined b elo w (40). 5. B Calculate p Value Upper Bound : an algorithm that solves the sub- problems described in Theorem 5.3. Section C.3 includes algorithms for asymptotic inferences: 1. A master asymptotics : a master algorithm that solves the asymptotic inference problem. 2. B Asymptotic Inferences Inner : a subroutine that is used in the master algorithm. Section C.4 includes 1. C dim2 bounds : calculate analytical upp er and lo wer b ounds for the squared difference-in-means estimator. 65 2. C v1 min and C v0 min : calculate analytical low er b ounds for the treated- group and control-group v ariance estimators. 3. C v1 max and C v0 max : calculate analytical upper b ounds for the treated- group and control-group v ariance estimators. W e adopt the indexing conv en tion from Section 5.3. Recall that the first r 0 units are alwa ys-rep orters assigned to the control group, regardless of whether w e index by i or by a . Also recall the set of matc hing v ariables associated with a fixed n umber of alwa ys-rep orters k ∈ [ r 0 , r 0 + P n i =1 D i R i ], as defined in (41): x ( k , D obs , R ) =                { x ai } a ∈ [ k ] ,i ∈ [ n ] : x ai ∈ { 0 , 1 } ∀ a ∈ [ k ] , i ∈ [ n ] , x ai = 1 , ∀ a = i, a ≤ r 0 P i x ai = 1 , ∀ a ∈ [ k ] , P a x ai ≤ 1 , ∀ i ∈ [ n ] , x ai ∈ { 0 , 1 } , ∀ a ∈ [ k ] , D obs i = 1 , R i = 1 , x ai = 0 , ∀ a ∈ [ k ] , D obs i = 1 , R i = 0 .                . Recall our notation for the observ ed dataset D =  Y , D obs , R  as defined in (3). W e define the subroutines zeros ( n 1 , n 2 ) and ones ( n 1 , n 2 ), whic h create a n 1 -b y- n 2 matrix of zeros and ones, resp ectiv ely . F or brevity , w e assume that our programs hav e access to basic information ab out the exp erimen tal design and statistical inferen tial parameters, including the sample size n , the num b er of treated units n 1 , the num b er of con trol units n 0 , the n umber of alw ays-reporters in the con trol group r 0 , the n umber of sim ulated assignmen ts n mc , the significance level α , and the statistic type i for testing the balance of the n umber of alw ays-reporters. Finally , recall the definition of e p ( x ) and e v ( x, t i − 1 , t i ) from (43) and (44). F or a given num b er of alwa ys-rep orters and due to the underlying symmetry of the assignmen t distribution, the optimal v alues of the problems max x ∈ x ( k,D obs ,R ) e p ( x ) and max x ∈ x ( k,D obs ,R ) e v ( x, t i − 1 , t i ) (138) are equiv alent to the optimal v alues of the problems max x ∈ x ( k,D obs ,R ) e p ( x ) sub ject to ˜ Y a ≥ ˜ Y a +1 , ∀ a ∈ [ r 0 + 1 , n A − 1] max x ∈ x ( k,D obs ,R ) e v ( x, t i − 1 , t i ) sub ject to ˜ Y a ≥ ˜ Y a +1 , ∀ a ∈ [ r 0 + 1 , n A − 1] , where ˜ Y a = P a,i x ai Y i , ∀ a ∈ [ n A ] and note that the outcomes for the first r 0 units are fixed under our conv en tion. W e include these constraints in the optimization problems below because it reduces the symmetry of the undelrying in teger programming problems, p ossibly making it faster to solve. 66 C.1 Master Algorithms Algorithm 4 A master continuous: test null hypotheses at a presp ecified significance lev el. Require: Dataset D as in (3), a prespecified significance level α ∈ (0 , 1), and statistic type i ∈ { 0 , 1 , 2 } , num b er of randomization dra ws n mc , pruning significance lev el β ∈ [0 , α ). Step I: Pretesting 1: Set Rej ← 1. 2: Construct and prune the compatible alwa ys-rep orter set A ( D obs , R ). 3: Set n max A = P n i =1 R i and n min A = r 0 . 4: if β > 0 then // pruned A ( D obs , R ) at significance level β 5: Set α ← α − β . 6: Set n max A ← max A ∈ A ( D obs ,R ) P n i =1 A i . 7: Set n min A ← min A ∈ A ( D obs ,R ) P n i =1 A i . 8: end if Step II: the Heuristic Pro cedure 9: for k = n min A to n max A do 10: ( pv al, L k , D mc k ) ← B Calculate Heuristic p Value ( D , k , n mc , i ). 11: Store L k and D mc k . // Store the lo wer b ound and the sim ulated assignmen ts 12: if p val ≥ α then 13: Set Rej ← 0; break . (W orst-case p -v alue ≥ α , fail to reject) 14: end if 15: end for Step II I: upp er bound the p-v alue 16: if R ej =1 then 17: Set p upp er bound= zeros ( n max A − n min A + 1,1). // Initialize an arra y to store results 18: for k = n min A to n max A do 19: p upp er b ound(i)= B Calculate p Value Upper Bound( D ,k, D mc k , L k ) . 20: end for 21: if max ( p upper bound ) < α then 22: Rej = 1. 23: else 24: Rej = 0. 25: end if 26: end if 27: return R ej Recall the definitions of T 0 n , T 1 n , and T 2 n in (17), (19), and (20). F or outcomes with cardinality K , also recall the definition of the coun t v ector c ( A ) (defined ab o v e (33)) for a giv en alwa ys-reporter vector A . 67 Algorithm 5 A master small support: test n ull hypotheses at a presp ecified significance lev el α ∈ (0 , 1). Require: Dataset D as in (3); significance lev el α ∈ (0 , 1); statistic type i ∈ { 0 , 1 , 2 } . Initialize Rej ← 1. Initialize worst p value ← 0. Construct C  D obs , R  as in (34). for each c ∈ C  D obs , R  do Select an y A ∈ A  D obs , R  suc h that c ( A ) = c . Compute p mc ( A ) using the statistic T i n . if p mc ( A ) ≥ α then worst p value ← p mc ( A ). Rej ← 0; break . // W orst-case p -v alue ≥ α : fail to reject. end if end forreturn Rej . T o implement the step “select A ∈ A  D obs , R  suc h that c ( A ) = c ” ef- ficien tly , one can group units by outcome category and then assign alwa ys- rep orter indicators to matc h the target count vector c (i.e., by selecting the appropriate n umber of units within eac h outcome category). C.2 Subroutines for A master continuous W e define the follo wing functions. Giv en a n umber of alwa ys-reporters n A , a v ector of assignmen ts D A = { D a } a ∈ [ n A ] and outcomes Y A = { Y a } a ∈ [ n A ] . DIM 2 ( D A , Y A ) =  P a D a Y a P a D a − P a (1 − D a ) Y a P a (1 − D a )  2 , (139) V AR( D A , Y A ) = P a D a Y 2 a ( P a D a ) 2 − 1 P a D a  P a D a Y a P a D a  2 (140) + P a (1 − D a ) Y 2 a ( P a (1 − D a )) 2 − 1 P a (1 − D a )  P a (1 − D a ) Y a P a (1 − D a )  2 (141) V( n A ) = n 2 n 1 n 0 ( n − 1) n A n  1 − n A n  , (142) N 0 A ( D A ) = 0 , (143) N 1 A ( D A ) =  n − 1 1 P a D a − n − 1 0 P a (1 − D a )  2 V( n A ) , (144) N 2 A ( D A ) =  ⌊ n − 1 1 P a D a − n − 1 0 P a (1 − D a ) ⌋ −  2 V( n A ) . (145) F or a given num b er of alwa ys-rep orters k , recall the definition of the distri- bution of { D A a } a ∈ [ k ] ∼ L ( n, n 1 , k ) as defined in Section 5.3. 68 The function below provides a heuristic solution to the w orst-case p -v alue problem for a given num ber of alw ays-reporters. W e note that NUM L and NUM U are analytical low er and upp er b ounds for the v ariance estimators, while DIM2 L and DIM2 U are analytical low er and upp er b ounds for the squared difference-in-means estimator. The corresp onding construction rou- tines are provided in Section C.4. The analytical b ounds are generally lo oser than those obtained via computational approaches. Algorithm 6 B Calculate Heuristic p Value: calculate low er b ounds for the w orst-case p-v alue. Require: Dataset D , num ber of alw ays-reporters n A , (optional) tol = 10 − 4 . 1: Create assignmen t v ector D A obs = [ ones ( n A , 1) , zeros ( r 0 , 1)]. 2: Set N A ← N i A ( D A obs ). 3: Set DIM2 L , DIM2 U ← C dim2 bounds ( D A obs , D ). 4: Set NUM L ← C v1 min ( D A obs , D ) + C v0 min ( D A obs , D ). 5: Set NUM U ← C v1 max ( D A obs , D ) + C v0 max ( D A obs , D ). 6: Set L ← DIM2 L / NUM U + N A . 7: Set U ← DIM2 U / NUM L + N A . 8: L, Y A ← B Create Lower Bound ( D A obs , n A , D , L, U, tol ) . 9: Set pv al = 1 n mc n mc X s =1 1  T obs  D A obs , Y A  ≥ T obs  D A s , Y A  , (146) where D A s ∼ L ( n, n 1 , n A ) , ∀ s ∈ [ n mc ], T obs ( D A obs , Y A ) = DIM 2 ( D A obs , Y A ) V AR( D A obs , Y A ) + N i A ( D A obs ) , and, T obs ( D A s , Y A ) = DIM 2 ( D A s , Y A ) V AR( D A s , Y A ) + N i A ( D A s ) , ∀ s ∈ [ n mc ] . 10: Store the n mc Mon te Carlo dra ws { D A s } n mc s =1 as D mc n A . 11: return pv al, L , D mc n A The following algorithm uses a bise ction metho d to compute a lo wer b ound for a given test statistic. The initial feasibilit y problem identifies a feasible set of outcomes whose test statistic exceeds this low er b ound; by construction, this feasibilit y problem is alwa ys feasible. Recall that i ∈ { 0 , 1 , 2 } denotes the type of statistics that is used to test the balance of the num ber of alw ays-reporters. 69 Algorithm 7 B Create Lower Bound: construct a low er b ound for the test statistic (17), (19) and (20). Require: assignment vector D A , num b er of alwa ys-rep orters n A , dataset D as in (3), low er bound L , upper b ound U , tolerance tol . 1: Calculate N A ← N i A ( D A ). 2: Solve the follo wing feasibility problem: find x = ( x ai ) a,i and Y A = ( Y a ) a that solv e the problem, // If the bisection b elow do es not up date L , return compatiable outcomes. max x,Y A 0 sub ject to, Y a = X i ∈ [ n ] x ai Y i , ∀ a ∈ [ n A ] , Y a ≥ Y a +1 , ∀ a ∈ [ r 0 + 1 , n A − 1] , DIM 2 ( D A , Y A ) +  N i A − L  V AR( D A , Y A ) ≥ 0 , x ∈ x ( n A , D obs , R ) . 3: Store the optimizer Y A ∗ ← ( Y ∗ a ) a . 4: while U − L ≥ tol do 5: Set M ← ( L + U ) / 2. 6: Solv e the follo wing feasibilit y problem: max x,Y A 0 7: sub ject to, Y a = X i ∈ [ n ] x ai Y i , ∀ a ∈ [ n A ] , Y a ≥ Y a +1 , ∀ a ∈ [ r 0 + 1 , n A − 1] , DIM 2 ( D A , Y A ) +  N i A − M  V AR( D A , Y A ) ≤ 0 , x ∈ x ( n A , D obs , R ) . 8: if the problem is feasible then 9: Set U ← M . // The low er bound m ust b e smaller than U . 10: Store the optimizer Y A ∗ ← ( Y ∗ a ) a . 11: else 12: Set L ← M . // The low er bound m ust b e larger than U . 13: end if 14: end while 15: return L , { Y ∗ a } a ∈ [ n A ] 70 Algorithm 8 B Create Lower Bound 2: construct a low er b ound for statistics of the form DIM 2 ( D A , Y A ) + c V AR 2 ( D A , Y A ) giv en Y and D A and c ∈ R Require: assignment v ector D A , dataset D as in (3), scalar c , (optional) low er b ound L . Solv e the follo wing feasibilit y problem: find x = ( x ai ) a,i , Y A = ( Y a ) a and t that solv es the problem: // find the smallest t that’s compatible with an alw ays-reporter table min x,Y A ,t t sub ject to Y a = X i ∈ [ n ] x ai Y i , ∀ a ∈ [ n A ] , Y a ≥ Y a +1 , ∀ a ∈ [ r 0 + 1 , n A − 1] , x ∈ x ( n A , D obs , R ) , DIM 2 ( D A , Y A ) + c V AR( D A , Y A ) ≤ t, t ≥ L. // only if L is provided. Return the optimal v alue t ∗ ; The following algorithm computes a tighter upp er b ound for the test statis- tic, pro vided that the input t max is a v alid upper b ound. 71 Algorithm 9 B Create Upper Bound: constructing an upp er b ound for the test statistic (17), (19) and (20). Require: assignment v ector D A , dataset D as in (3), upp er bound U . Set N A ← N i A ( D A ). Set t max = U . Set val ← −∞ . while v al = −∞ do Solv e the following feasibilit y problem: find x = ( x ai ) ai and Y A = ( Y a ) a that solv es the problem max x,Y A 0 sub ject to Y a = X i ∈ [ n ] x ai Y i , ∀ a ∈ [ n A ] , Y a ≥ Y a +1 , ∀ a ∈ [ r 0 + 1 , n A − 1] , DIM 2 ( D A , Y A ) +  N i A − t max  V AR( D a , Y a ) ≥ 0 . if the problem is feasible then Set val ← 0. Up date t max ← 2 t max // t max is feasible; enlarge the candidate b ound else Set val ← −∞ . Up date t max ← t max / 2 // t max is infeasible; shrink the candidate b ound end if end while return min { t max , U } The algorithm b elo w solv es the subproblems describ ed in Theorem 5.3. W e include a prepro cessing step (Blo c k 1) to compute tighter upp er b ounds for the sim ulated statistics. In practice, this step is useful b ecause it yields a quick upp er b ound on the p-v alue. 72 Algorithm 10 B Calculate p Value Upper Bound: constructing an upp er b ound for the p-v alue for a given num ber of alw ays-reporters n A . Require: dataset D , num b er of alwa ys-reporters n A , generated sim ulated as- signmen ts D mc n A , lo wer b ound for the observed test statistics L n A Blo c k 1: Create tigh ter upp er bounds for the sim ulated statistics Create U B = zeros ( nsim, 1) // T o hold upp er bound v alues for s=1 to n mc do Set D A = D mc n A ( s ) // Extract the i th assignment vector N A ← N i A ( D A ) Set DIM2 L , DIM2 U ← C dim2 bounds ( D A , D ). Set NUM L ← C v1 min ( D A , D ) + C v0 min ( D A , D ). Set NUM U ← C v1 max ( D A , D ) + C v0 max ( D A , D ). Set L ← DIM2 L / NUM U + N A . Set U ← DIM2 U / NUM L + N A . U B ( s ) = B Create Upper Bound ( D A , D , U ) end for Blo c k 2: Prepro cess based on the upp er bounds Calculate pval = mean ( U B > = L n A ). if pv al < α then // Many upp er bounds of the simulated statistics are smaller than the low er b ound of the observed statistic return pv al else Calculate the (1 − α ) × 100 p ercen tile of U B , t max . Create a grid t = L n A : 0 . 01 : t max . Index t = { t k } K k =0 for k=1 to K do Calculate N A,s = N i A  D mc n A ( s )  for all s ∈ [ n mc ]. Calculate N A, obs = N i A ( D A obs ). Calculate L s = B Create Lower Bound 2 ( D , D mc n A ( s ) , N A,s − t i − 1 ) for all s ∈ [ n mc ]. Solv es the optimization problem max n mc X s =1 I s , sub ject to Y a = X i ∈ [ n ] x ai Y i , ∀ a ∈ [ n A ] , Y a ≥ Y a +1 , ∀ a ∈ [ r 0 + 1 , n A − 1] , // W e write D mc n A ( s ) as D A s b elo w for simplicity . DIM 2 ( D A s , Y A ) +  N i A,s − t i − 1  V AR( D A s , Y A ) ≥ L s (1 − I s ) , ∀ s ∈ [ n mc ] , DIM 2 ( D A obs , Y A ) +  N i A, obs − t i  V AR( D A obs , Y A ) ≤ 0 , x ∈ x ( k , D obs , R ) , I s ∈ { 0 , 1 } , ∀ s ∈ [ n mc ] . Store v ( i ) = P n mc s =1 I ∗ s . end for return max i ∈ [ K ] v ( i ). end if 73 C.3 Algorithms for Asymptotic Inferences Recall that i ∈ { 0 , 1 , 2 } denotes the type of statistics that is used to test the balance of the n umber of alw ays-reporters. Algorithm 11 A master asymptotics: test n ull hypotheses at a presp ecified significance lev el α ∈ (0 , 1). Require: Dataset D as in (3), a prespecified significance level α ∈ (0 , 1), and statistic type i ∈ { 0 , 1 , 2 } , num b er of randomization dra ws n mc , pruning significance lev el β ∈ [0 , α ). Step I: Precomputation 1: Set Rej ← 1. 2: Construct and prune the compatible alwa ys-rep orter set A ( D obs , R ). 3: Set n max A = P n i =1 R i and n min A = r 0 . 4: if β > 0 then 5: Set α ← α − β . 6: Set n max A ← max A ∈ A ( D obs ,R ) P n i =1 A i . 7: Set n min A ← min A ∈ A ( D obs ,R ) P n i =1 A i . 8: end if Step II: Asymptotic Inferences 9: for k = n min A to n max A do 10: Rej ← B Asymptotic Inferences Inner ( D , k ). 11: if R ej = 0 then 12: break . 13: end if 14: end for 15: return R ej 74 Algorithm 12 B Asymptotic Inferences Inner: asymptotic inference given a n umber of alwa ys-rep orters Require: Dataset D , num b er of alwa ys-rep orters n A , statistic type i ∈ { 0 , 1 , 2 } , signifiancne lev el α . 1: Create assignmen t v ector D A obs = [ ones ( n A , 1) , zeros ( r 0 , 1)]. 2: Set N A ← N i A ( D A obs ). 3: Set DIM2 L , DIM2 U ← C dim2 bounds ( D A obs , D ). 4: Set NUM L ← C v1 min ( D A obs , D ) + C v0 min ( D A obs , D ). 5: Set NUM U ← C v1 max ( D A obs , D ) + C v0 max ( D A obs , D ). 6: Set L ← DIM2 L / NUM U + N A . 7: Set U ← DIM2 U / NUM L + N A . 8: L, Y A ← B Create Lower Bound ( D A obs , D , L, U, tol ) . 9: Calculate T ∗ = T obs  D A obs , Y A  , where, T obs ( D A obs , Y A ) = DIM 2 ( D A obs , Y A ) V AR( D A obs , Y A ) + N i A ( D A obs ) . 10: Calculate the 1 − α quan tile, q 1 − α , of random v ariables Z + g i ( n 1 A ), where n 1 A = P n i =1 D i A i , P n i =1 A i = n A and D ∼ CR( n, n 1 ). 11: if T ∗ > q 1 − α then 12: Set Rej = 1. 13: else 14: Set Rej = 0. 15: end if 16: return R ej . C.4 Analytical bounds for the studentized statistic Giv en the dataset D = ( Y , D obs , R ), denote the set of p oten tial alwa ys-rep orters in the observ ed treated group as A 1 ( D , R ) = { i ∈ [ n ] : D obs i = 1 , R i = 1 } . Denote the cardinalit y of this set as n A,p =   A 1 ( D obs , R )   . Recall our conv en tion that first r 0 units are alwa ys-rep orters assigned to the control group. C.4.1 Lo wer Bound and Upp er Bound for DIM 2 of (139) Giv en outcome v ector Y = ( Y i ) n i =1 ∈ R n and the set of p oten tial alwa ys- rep orters A 1 ( D obs , R ), we use e Y ( k ) , k ∈ [ n A,p ] to denote the k th smallest out- come in the set { Y i } i ∈ A 1 ( D,R ) , with e Y (1) b eing the smallest. F or a fixed n umber of alw a ys-rep orters, a standard argument gives the follo w- ing upp er bound and low er b ound for the squared difference-in-means estimator DIM 2 ( · , · ) as defined in (139). Lemma C.1. Given a positive in teger n A and the dataset D = ( Y , D obs , R ), recall the set of assignmen t v ariables x  n A , D obs , R  from (41). Denote the set 75 of all compatible outcome vectors as Y ( n A , D obs , R, Y ) (147) = ( ( Y a ) a ∈ [ n A ] : ∃{ x ai } a,i ∈ x  n A , D obs , R  , Y a = n X i =1 x ai Y i , ∀ a ∈ [ n A ] ) . Giv en an assignmen t v ector D A = ( D a ) n A a =1 , w e ha ve min Y A ∈ Y ( n A ,D obs ,R,Y ) DIM 2 ( D A , Y A ) ≥ L ( D A , D ) and, max Y A ∈ Y ( n A ,D obs ,R,Y ) DIM 2 ( D A , Y A ) ≤ U ( D A , D ) . where L ( D A , D ) and U ( D A , D ) are the lo wer b ound and upp er b ound returned from C dim2 bounds ( D A , D ) as defined in Algorithm 13. Algorithm 13 C dim2 bounds : Calculate a lo w er bound and an upp er bound for the squared difference-in-means estimator Require: D A = ( D a ) a ∈ [ n A ] , Observ ed dataset D =  Y , D obs , R  Create A 1 ( D obs , R ) and set n A,p =   A 1 ( D obs , R )   . Set n A, 1 = P n A a =1 D a , r 0 , 1 = P r 0 a =1 D a . Set n A, 0 = P n A a =1 (1 − D a ), r 0 , 0 = P r 0 a =1 (1 − D a ). Calculate L and U according to U = 1 n A, 1   r 0 X a =1 D a Y a + n A, 1 − r 0 , 1 X k =1 e Y ( n A,p − k )   − 1 n A, 0   r 0 X a =1 (1 − D a ) Y a + n A, 0 − r 0 , 0 X k =1 e Y ( k )   , and L = 1 n A, 1   r 0 X a =1 D a Y a + n A, 1 − r 0 , 1 X k =1 e Y ( k )   − 1 n A, 0   r 0 X a =1 (1 − D a ) Y a + n A, 0 − r 0 , 0 X k =1 e Y ( n A,p − k )   . if L ≤ U ≤ 0 or U ≥ L ≥ 0 then return min { L 2 , U 2 } , max { L 2 , U 2 } else if L ≤ 0 ≤ U then return 0 , max { L 2 , U 2 } end if In words, n A, 1 and n A, 0 are the num b ers of alwa ys-reporters assigned to the treated and con trol groups, resp ectiv ely . r 0 , 1 and r 0 , 0 are the n umbers of known alw ays-reporters that are assigned to the treated and con trol groups with the assignmen t vector ( D a ) n A a =1 . The upp er b ound and the low er b ound follow from a standard calculation. C.4.2 Lo wer Bound and Upp er Bound for V AR of (140) F or a fixed num b er of alwa ys-rep orters and by Lemma C.3 and Lemma C.4 b elo w, w e hav e the follo wing upp er b ound and low er b ound for the v ariance 76 estimator V AR( · , · ) as defined in (140). W e note that V AR is a standard t wo sample v ariance estimator and it can be written as the sum of the v ariance estimator for the treated group and the v ariance estimator for the con trol group. Lemma C.2. Given a positive in teger n A and the dataset D = ( Y , D obs , R ), recall the set of assignmen t v ariables x  n A , D obs , R  from (41). Recall the set Y ( n A , D obs , R, Y ) of compatible outcome v ectors x  n A , D obs , R  as defined in (147). Giv en an assignmen t v ector D A = ( D a ) n A a =1 , w e ha ve min Y A ∈ Y ( n A ,D obs ,R,Y ) V AR( D A , Y A ) ≥ v 1 min + v 0 min , and, max Y A ∈ Y ( n A ,D obs ,R,Y ) V AR( D A , Y A ) ≤ v 1 max + v 0 max . where v 1 max , v 0 max , v 1 min and v 0 min are the outputs of functions C v1 max ( D A , D ), C v0 max ( D A , D ), C v1 min ( D A , D ), and C v0 min ( D A , D ) resp ectiv ely . Pr o of. This is a direct corollary of Lemma C.3 and Lemma C.4 prov ed b elo w. C.4.3 Pro of of Lemma C.2 Let S = { s 1 , ..., s k 1 } ⊂ R b e a set of k 1 p oin ts and V = { v 1 , ..., v k 2 } ⊂ R b e a set of k 2 p oin ts. F or eac h X = { x 1 , ..., x | X | } suc h that S ⊂ X ⊂ S ∪ V , we define the v ariance of X as var ( X ) = 1 | X | | X | X i =1  x i − X  2 , with X = 1 | X | | X | X i =1 x i . (148) W e are interested in the maximum and minim um v alues of var ( X ) among the sets with the same cardinality whic h contain S . W e call the sets S and V incr e asingly-or der e d if s 1 ≤ s 2 ≤ ... ≤ s k 1 and v 1 ≤ v 2 ≤ ... ≤ v k 2 . Lemma C.3. Given increasingly-ordered sets S = { s i } k 1 i =1 ⊂ R , V = { v i } k 2 i =1 ⊂ R , and a p ositiv e integer k 1 + t , with t ≥ 0, define the set X t = { X : S ⊂ X ⊂ S ∪ V , | X | = k 1 + t } . Denote the set of consecutive subsets of V with size t as V t c = {{ v i +1 , .., v i + t } : i ∈ [0 , k 2 − t ] , i ∈ Z } . Then w e ha ve: min X ∈ X t var ( X ) = min X = S ∪ V c ,V c ∈ V t c var ( X ) . 77 Pr o of. Let V t b e all subsets of V with size t . Let V nc ∈ V t \ V t c b e a non- consecutiv e set. List its elemen t as { e v 1 , ..., e v t } in the order induced by V . There are tw o cases. If  V \ V nc  ∩ [ e v 1 , e v t ] ⊆ { e v 1 , e v t } , then V nc is non- consecutiv e only b ecause of ties and the lab eling order of observ ations. In this case, the v ariance of V nc coincides with that of a consecutive subset. F or the other case, w e ha ve a v ∈  V \ V nc  ∩ [ e v 1 , e v t ] suc h that v ∈ ( e v 1 , e v t ). W e show that such a set cannot b e a minimizing set, because we can alwa ys construct another set that has a strictly smaller v ariance than that of V nc . T o see this, we note by definition v = λ e v 1 + (1 − λ ) e v t with λ ∈ (0 , 1). Define m = t + k 1 . The v ariance of the set ( S ∪ V nc ∪ v ) \ e v 1 can be expressed as: var  ( S ∪ V nc ∪ v ) \ e v 1  = 1 2 m 2 X a,b ∈ S ∪ V nc \{ e v 1 , e v t ,v } ( a − b ) 2 + 1 m 2   X a ∈ S ∪ V nc \{ e v 1 , e v t } ( v − a ) 2 + ( v − e v t ) 2 + X a ∈ S ∪ V nc \ e v 1 ( e v t − a ) 2   , and the v ariance of the set ( S ∪ V nc ∪ v ) \ e v t can be expressed as: var  ( S ∪ V nc ∪ v ) \ e v t  = 1 2 m 2 X a,b ∈ S ∪ V nc \{ e v 1 , e v t ,v } ( a − b ) 2 + 1 m 2   X a ∈ S ∪ V nc \{ e v 1 , e v t } ( v − a ) 2 + ( v − e v 1 ) 2 + X a ∈ S ∪ V nc \ e v t ( e v 1 − a ) 2   . 78 Then, λ var  ( S ∪ V nc ∪ v ) \ e v 1  + (1 − λ ) var  ( S ∪ V nc ∪ v ) \ e v t  = 1 2 m 2 X a,b ∈ S ∪ V nc \{ e v 1 , e v t ,v } ( a − b ) 2 + 1 m 2 X a ∈ S ∪ V nc \{ e v 1 , e v t } ( v − a ) 2 + 1 m 2  λ ( v − e v t ) 2 + (1 − λ ) ( v − e v 1 ) 2  + λ m 2 X a ∈ S ∪ V nc \ e v 1 ( e v t − a ) 2 + 1 − λ m 2 X a ∈ S ∪ V nc \ e v t ( e v 1 − a ) 2 ≤ 1 2 m 2 X a,b ∈ S ∪ V nc \{ e v 1 , e v t ,v } ( a − b ) 2 + λ m 2 X a ∈ S ∪ V nc \{ e v 1 , e v t }  e v 1 − a  2 + 1 − λ m 2 X a ∈ S ∪ V nc \{ e v 1 , e v t }  e v t − a  2 + 1 m 2  λ 3 + (1 − λ ) 3   e v 1 − e v t  2 + λ m 2 X a ∈ S ∪ V nc \ e v 1 ( e v t − a ) 2 + 1 − λ m 2 X a ∈ S ∪ V nc \ e v t ( e v 1 − a ) 2 = 1 2 m 2 X a,b ∈ S ∪ V nc \{ e v 1 , e v t ,v } ( a − b ) 2 + 1 m 2  λ 3 + (1 − λ ) 3   e v 1 − e v t  2 + 1 m 2   X a ∈ S ∪ V nc \ e v 1 ( e v t − a ) 2 + X a ∈ S ∪ V nc \ e v t ( e v 1 − a ) 2   < 1 2 m 2 X a,b ∈ S ∪ V nc \{ e v 1 , e v t ,v } ( a − b ) 2 + 1 m 2  e v 1 − e v t  2 + 1 m 2   X a ∈ S ∪ V nc \ e v 1 ( e v t − a ) 2 + X a ∈ S ∪ V nc \ e v t ( e v 1 − a ) 2   = var ( S ∪ V nc ) , where the first inequality is b y conv exit y and the last inequality is by λ 3 + (1 − λ 3 ) < 1 for λ ∈ (0 , 1) because 1 = ( λ + 1 − λ ) 3 = λ 3 + (1 − λ ) 3 + 3 λ (1 − λ ) 2 + 3(1 − λ ) 2 . Hence either var ( S ∪ V nc ) > var  ( S ∪ V nc ∪ v ) \ e v 1  or var ( S ∪ V nc ) > var (( S ∪ V nc ∪ v ) \ e v t ), and S ∪ V nc cannot be the optimizing set. W e ha ve shown that for a non-consecutiv e set, it either realizes a v ariance equal to that of a consecutive set or cannot b e the optimizing set. This prov es our claim. Lemma C.4. Given increasingly-ordered sets S = { s i } k 1 i =1 ⊂ R , V = { v i } k 2 i =1 ⊂ R , and a p ositiv e integer k 1 + t , with 0 ≤ t ≤ k 2 , define the set X t = { X : S ⊂ X ⊂ S ∪ V , | X | = k 1 + t } . 79 Denote the set of shell sets of V with size t as V t s = {{ v 1 , ..., v s } ∪ { v k 2 − ( t − s − 1) , ..., v k 2 } : s ∈ [0 , t ] } , where { v 1 , ..., v s } = ∅ if s = 0 and { v k 2 − ( t − s − 1) , ..., v k 2 } = ∅ if s = t . Then we ha ve: max X ∈ X t var ( X ) = max X = S ∪ V s ,V s ∈ V t s var ( X ) . Pr o of. F or the case where t = k 2 , the statement is trivial. W e consider the case where t < k 2 . Let V t b e all subsets of V with size t . F or each e V ∈ V t , we define its left-shell size and right-shell size as: Left-Shell-Size  e V  = max n k : { v 1 , v 2 , ..., v k } ⊂ e V o , (149) Righ t-Shell-Size  e V  = max n k : { v k 2 − k +1 , ...v k 2 − 1 , v k 2 } ⊂ e V o . (150) W e note that e V ∈ V t is a shell set if and only if Left-Shell-Size  e V  + Righ t-Shell-Size  e V  = t. (151) W e show that w e can alwa ys transform a non-shell set to a shell set through consecutiv e steps and weakly increase the v ariance. Supp ose the set V ns that is not a shell set. Then there must exists a v s ∈ V ns , and v s l , v s r ∈ V ns suc h that s l < s < s r . Define l = min { i ∈ [ k 2 ] : v i ≤ v , v i ∈ V ns } , r = max { i ∈ [ k 2 ] : v i ≥ v , v i ∈ V ns } . Moreo ver, v s = λv l + (1 − λ ) v r for some λ ∈ [0 , 1]. Let m = k + t 1 . The v ariance of the set V ns ∪ v l \ v s can be expressed as var ( V ns ∪ v l \ v s ) = 1 2 m 2 X a,b ∈ V ns \ v s ( a − b ) 2 + 1 m 2 X a ∈ V ns \ v s ( v l − a ) 2 , and the v ariance of the set V ns ∪ v r \ v s can be expressed as var ( V ns ∪ v r \ v s ) = 1 2 m 2 X a,b ∈ V ns \ v s ( a − b ) 2 + 1 m 2 X a ∈ V ns \ v s ( v r − a ) 2 , By con vexit y , we hav e var ( V ns ) ≤ λ var ( V ns ∪ v l \ v s ) + (1 − λ ) var ( V ns ∪ v r \ v s ) . WLOG, if var ( V ns ) < var ( V ns \ v ∪ v r ), replacing v with v r will strictly in- crease the v ariance and increase the shell size of the new set. If var ( V ns ) = var ( V ns ∪ v r \ v s ) = var ( V ns ∪ v l \ v s ), replacing v s with either v r and v l will not decrease the v ariance and increase the shell size of the new set. Rep eat the same pro cedure m ultiple times. Since there are at most t p oints in the set, the pro cedure will stop after at most t steps and output a shell set. Since eac h step we do not decrease the v ariance, the new shell set has a v ariance at least as large as the that of the set V ns . This prov es the claim. 80 C.4.4 Algorithms according to Lemma C.3 and Lemma C.4 Giv en the observed dataset D =  Y , D obs , R  , we denote the set of p oten tial alw ays-reporters in the observed treated group as A 1 ( D , R ) = { i ∈ [ n ] : D obs i = 1 , R i = 1 } and denote n A,p =   A 1 ( D , R )   . Recall the notation that w e use e Y ( k ) , k ∈ [ n A,p ] to denote the k th smallest outcome in the set { Y i } i ∈ A 1 ( D,R ) , with e Y (1) b eing the smallest. W e note that D a = D i for a, i ≤ r 0 b y our b y indexing conv ention. Algorithm 14 C v1 min : Calculate a lo wer b ound for the treated group v ari- ance estimator Require: D A = ( D a ) a ∈ [ n A ] , Observ ed dataset D =  Y , D obs , R  Create A 1 ( D obs , R ) and set n A,p =   A 1 ( D obs , R )   Set n A, 1 = P n A a =1 D a , r 0 , 1 = P r 0 a =1 D a , n A,p =   A 1 ( D obs , R )   . v 1 min ← ∞ for k = 0 : ( n A,p − n A, 1 + r 0 , 1 ) do Create set X k = { Y i } i ≤ r 0 ,D i =1 S { e Y ( s ) } s = k +( n A, 1 − r 0 , 1 ) s = k +1 Calculate v k = var ( X k ). if v k ≤ v 1 min then v 1 min ← v k end if end forreturn v 1 min Algorithm 15 C v0 min : Calculate a low er b ound for the control group v ari- ance estimator Require: D A = ( D a ) a ∈ [ n A ] , Observ ed dataset D =  Y , D obs , R  Create A 1 ( D obs , R ) and set n A,p =   A 1 ( D obs , R )   Set n A, 0 = P n A a =1 (1 − D a ), r 0 , 1 = P r 0 a =1 (1 − D a ), n A,p =   A 1 ( D obs , R )   . v 0 min ← ∞ for k = 0 : ( n A,p − n A, 0 + r 0 , 0 ) do Create set X k = { Y i } i ≤ r 0 ,D i =0 S { e Y ( s ) } s = k +( n A, 0 − r 0 , 0 ) s = k +1 Calculate v k = var ( X k ). if v k ≤ v 0 min then v 0 min ← v k end if end forreturn v 0 min 81 Algorithm 16 C v1 max : Calculate an upp er b ound for the treated group v ariance estimator Require: D A = ( D a ) a ∈ [ n A ] , Observ ed dataset D =  Y , D obs , R  Create A 1 ( D obs , R ) and set n A,p =   A 1 ( D obs , R )   n A, 1 = P n A a =1 D a , r 0 , 1 = P r 0 a =1 D a , n A,p =   A 1 ( D obs , R )   v 1 max ← −∞ for k = 0 : ( n A, 1 − r 0 , 1 ) do Create set X k = { Y i } i ≤ r 0 ,D i =1 S { e Y ( s ) } s = k s =1 S { e Y ( s ) } s = n A,p s = n A,p − n A, 1 + r 0 , 1 + k +1 Calculate Calculate v k = var ( X k ). if v k ≥ v 1 max then v 1 max ← v k end if end forreturn v 1 max W e note that { e Y ( s ) } s = k s =1 = ∅ if k = 0 and { e Y ( s ) } s = n A,p s = n A,p − n A, 1 + r 0 , 1 + k +1 = ∅ if k = n A, 1 − r 0 , 1 . Algorithm 17 C v0 max : Calculate an upp er b ound for the control group v ari- ance estimator Require: D A = ( D a ) a ∈ [ n A ] , Observ ed dataset D =  Y , D obs , R  Create A 1 ( D obs , R ) and set n A,p =   A 1 ( D obs , R )   Set n A, 0 = P n A a =1 (1 − D a ), r 0 , 0 = P r 0 a =1 (1 − D a ). v 0 max ← −∞ for k = 0 : ( n A, 0 − r 0 , 0 ) do Create set X k = { Y i } i ≤ r 0 ,D i =0 S { e Y ( s ) } s = k s =1 S { e Y ( s ) } s = n A,p s = n A,p − n A, 0 + r 0 , 0 + k +1 Calculate Calculate v k = var ( X k ). if v k ≥ v 0 max then v 0 max ← v k end if end forreturn v 0 max References [1] P . Arono w, H. Chang, and P . Lopatto. Randomization-based confidence sets for the local av erage treatment effect. arXiv pr eprint arXiv:2404.18786 , 2024. [2] R. L. Berger and D. D. Bo os. P v alues maximized ov er a confidence set for the n uisance parameter. Journal of the Americ an Statistic al Asso ciation , 89(427):1012–1016, 1994. [3] I. A. Cana y , J. P . Romano, and A. M. Shaikh. Randomization tests under an approximate symmetry assumption. Ec onometric a , 85(3):1013–1030, 2017. 82 [4] E. Ch ung and J. P . Romano. Exact and asymptotically rob ust p erm utation tests. The A nnals of Statistics , 41(2):484–507, 2013. [5] P . L. Cohen and C. B. F ogarty . Gaussian prepiv oting for finite p opula- tion causal inference. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 84(2):295–320, 2022. [6] C. E. F rangakis and D. B. Rubin. Principal stratification in causal inference. Biometrics , 58(1):21–29, 2002. [7] A. S. Gerb er and D. P . Green. Field Exp eriments: Design, Analysis, and Interpr etation . W. W. Norton & Compan y , 2012. [8] S. Heng, J. Zhang, and Y. F eng. Design-based causal inference with miss- ing outcomes: Missingness mechanisms, imputation-assisted randomization tests, and co v ariate adjustment. Journal of the Americ an Statistic al Asso- ciation , (just-accepted):1–23, 2025. [9] N. Heussen, R.-D. Hilgers, W. F. Rosenberger, X. T an, and D. Uschner. Randomization-based inference for clinical trials with missing outcome data. Statistics in Biopharmac eutic al R ese ar ch , 16(4):456–467, 2024. [10] J. L. Horo witz and C. F. Manski. Nonparametric analysis of random- ized exp eriments with missing co v ariate and outcome data. Journal of the A meric an statistic al Asso ciation , 95(449):77–84, 2000. [11] G. W. Imbens and C. F. Manski. Confidence interv als for partially identified parameters. Ec onometric a , 72(6):1845–1857, 2004. [12] G. W. Imbens and D. B. Rubin. Causal infer enc e in statistics, so cial, and biome dic al scienc es . Cambridge universit y press, 2015. [13] A. Iv ano v a, S. Lederman, P . B. Stark, G. Sulliv an, and B. V aughn. Random- ization tests in clinical trials with multiple imputation for handling missing data. Journal of Biopharmac eutic al Statistics , 32(3):441–449, 2022. [14] A. Janssen. Studen tized p erm utation tests for non-i.i.d. hypotheses and the generalized Behrens-Fisher problem. Statistics & Pr ob ability L etters , 36(1):9–21, 1997. [15] A. Janssen. T esting nonparametric statistical functionals with applications to rank tests. Journal of Statistic al Planning and Infer enc e , 81(1):71–93, 1999. [16] A. Janssen and T. Pauls. Ho w do b ootstrap and p ermutation tests work? The Annals of statistics , 31(3):768–806, 2003. [17] B. Kline and M. A. Masten. Finite p opulation identification and design- based sensitivit y analysis. arXiv pr eprint arXiv:2504.14127 , 2025. 83 [18] D. S. Lee. T raining, wages, and sample selection: Estimating sharp b ounds on treatmen t effects. R eview of Ec onomic Studies , 76(3):1071–1102, 2009. [19] X. Li, P . Sheng, and Z. Y u. Randomization inference with sample attrition. arXiv pr eprint arXiv:2507.00795 , 2025. [20] W. Lin, S. D. Halp ern, M. Prasad Kerlin, and D. S. Small. A “placement of death” approac h for studies of treatmen t effects on icu length of stay . Statistic al metho ds in me dic al r ese ar ch , 26(1):292–311, 2017. [21] G. Neuhaus. Conditional rank tests for the t wo-sample problem under random censorship. The Annals of Statistics , pages 1760–1779, 1993. [22] J. Rigdon and M. G. Hudgens. Randomization inference for treatment effects on a binary outcome. Statistics in me dicine , 34(6):924–935, 2015. [23] J. M. Robins, A. Rotnitzky , and L. P . Zhao. Estimation of regression co efficien ts when some regressors are not alwa ys observed. Journal of the A meric an statistic al Asso ciation , 89(427):846–866, 1994. [24] C. Samii, Y. W ang, and J. A. Zhou. Generalizing trimming bounds for en- dogenously missing outcome data using random forests. Politic al Analysis , pages 1–15, 2023. [25] V. Semenov a. Generalized lee b ounds. Journal of Ec onometrics , 251:106055, 2025. [26] V. Semenov a. Generalized lee b ounds. Journal of Ec onometrics , 251:106055, 2025. [27] L. Shi and P . Ding. Berry-esseen b ounds for design-based causal inference with p ossibly diverging treatment levels and v arying group sizes. arXiv pr eprint arXiv:2209.12345 , 2022. [28] J. Stoy e. More on confidence interv als for partially iden tified parameters. Ec onometric a , 77(4):1299–1315, 2009. [29] P . T uv aandorj. Robust p erm utation tests in linear instrumental v ariables regression, 2024. F orthcoming in Journal of the Americ an Statistic al As- so ciation . [30] J. W u and P . Ding. Randomization tests for weak null h yp otheses in randomized experiments. Journal of the Americ an Statistic al Asso ciation , 116(536):1898–1913, 2021. [31] J. L. Zhang and D. B. Rubin. Estimation of causal effects via principal stratification when some outcomes are truncated by “death”. Journal of Educ ational and Behavior al Statistics , 28(4):353–368, 2003. [32] A. Zhao and P . Ding. Cov ariate-adjusted Fisher randomization tests for the a verage treatment effect. Journal of Ec onometrics , 225(2):278–294, 2021. 84

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment