Distributional Discontinuity Design

Distributional Discon tin uit y Design Kyle Sc hindl † , Larry W asserman ‡ † Departmen t of Statistics Io wa State Univ ersit y kschindl@iastate.edu ‡ Departmen t of Statistics & Data Science Mac hine Learning Department Carnegie Mellon Univ ersity larry@stat.cmu.edu Abstract R e gr ession disc ontinuity and kink designs ar e typic al ly analyze d thr ough me an eﬀe cts, even when tr e atment changes the shap e of the entir e outc ome distribution. T o addr ess this, we intr o duc e distributional disc ontinuity designs, a fr amework for estimating c ausal eﬀe cts for a sc alar outc ome at the b oundary of a disc ontinuity in tr e atment assignment. Our es- timand is the W asserstein distanc e b etwe en limiting c onditional outc ome distributions; a single sc ale-interpr etable me asur e of distribution shift. W e show that this we akly b ounds the aver age tr e atment eﬀe ct, wher e e quality holds if and only if the tr e atment eﬀe ct is pur ely additive; thus, dep artur e fr om e quality me asur es eﬀe ct heter o geneity. T o further enc o de eﬀe ct heter o geneity we show that the W asserstein distanc e admits an ortho gonal de c om- p osition into squar e d diﬀer enc es in L -moments, ther eby quantifying the c ontribution fr om lo c ation, sc ale, skewness, and higher-or der shap e c omp onents to the over al l distributional distanc e. Next, we extend this fr amework to distributional kink designs by evaluating the W asserstein derivative at a p olicy kink; this describ es the ﬂow of pr ob ability mass thr ough the kink. In the c ase of fuzzy kink designs, we derive new identiﬁc ation r esults. Final ly, we apply our metho ds on r e al data by r e-analyzing two natur al exp eriments to c omp ar e our distributional eﬀe cts to tr aditional c ausal estimands. Keywor ds: R e gr ession Disc ontinuity Design, R e gr ession Kink Design, Optimal T r ansp ort, W asserstein Distanc e, Quantile T r e atment Eﬀe cts 1 In tro duction First introduced by Thistleth waite and Campbell ( 1960 ) and formalized b y Hahn et al. ( 2001 ), regression discontin uit y design is a quasi-experimental design metho d that exploits disconti- n uities in treatment assignmen t to iden tify causal eﬀects. The k ey idea is that observ ational units arbitrarily close to either side of the treatment discon tinuit y can b e though t of as similar in all resp ects except for treatment status. Thus, in this neigh b orho o d of the discon tinuit y , treatmen t assignment can b e considered “as go o d as random,” and it is therefore reasonable to A ccompanying R co de is av ailable via github.com/kylesc hindl/discontin uity-designs assume that exc hangeability holds. Ov er the years, a very rich and deep literature for regres- sion discontin uit y design metho ds has b een developed, with con tributions to o broad to enu- merate. Mo dern regression discontin uit y design tends to fo cus on lo cal p olynomial estimation and bandwidth selection ( Imbens and Kalyanaraman , 2012 ), robust bias-corrected inference ( Calonico et al. , 2014 , 2019a ), and a suite of diagnostic to ols such as density-manipulation tests ( McCrary , 2008 ; Cattaneo et al. , 2020 ). Readers in terested in the history of regression discon tinuit y design should refer to Cook ( 2008 ), and to Lee and Lemieux ( 2010 ), Im b ens and Lemieux ( 2008 ), and Cattaneo and Titiunik ( 2022 ) for in-depth reviews. Notably , most traditional metho ds primarily fo cus on diﬀerences in mean eﬀects ab ov e and b elow the cutoﬀ; there is a muc h smaller literature considering distributional causal eﬀects. F o cusing exclusiv ely on mean eﬀects often limits b oth the usefulness and generalizability of an analysis, because a verages can mask treatment heterogeneity across the outcome distribu- tion. It is easy to imagine a treatmen t that lea ves the av erage unchanged, but has asymmetric eﬀects on the lo wer and upp er tails of the outcome distribution. Consequently , o ver the past sev eral years there has b een an increased fo cus on dev eloping causal eﬀects that consider how the entire outcome distribution changes with resp ect to a treatment. In the regression discon- tin uity design setting F randsen et al. ( 2012 ) in tro duced p oin twise quantile treatment eﬀects for b oth sharp and fuzzy designs. Since then, there ha v e b een sev eral extensions and generaliza- tions of their w ork, to include uniform conﬁdence bands ( Qu and Y o on , 2015 ), bias-corrected estimators ( Qu and Y o on , 2019 ; Chiang et al. , 2019 ), and quan tile eﬀects in regression kink designs ( Chiang and Sasaki , 2019 ; Chen et al. , 2020 ). More recently , Jin et al. ( 2025 ) con- sidered quan tile eﬀects under a lo cal rank similarit y condition and Dijc ke ( 2025 ) considered lo cal a verage quantile treatmen t eﬀects under distribution-v alued outcomes; their framework is conceptually similar, but non-ov erlapping with our own as we consider scalar v alued out- comes. While each of these metho ds is interesting and useful, in practice they can b e diﬃcult to implemen t and in terpret. Practitioners are often not in terested in sp eciﬁc quantile eﬀects and considering man y quantiles can b e hard to summarize and comm unicate. F urthermore, b ecause quan tile treatment eﬀects describ e the marginal outcome distributions, in order to in terpret eﬀects at the individual level a strong rank-in v ariance assumption must b e made. W e address these problems b y deﬁning causal eﬀects in terms of the distributional distance b et ween conditional coun terfactual distributions, which yields a single transparent measure of the ov erall distribution shift. In this pap er we introduce distributional disc ontinuity designs , a framework for studying distributional causal eﬀects ab ov e and b elow some treatment discon tinuit y . Sp eciﬁcally , we deﬁne our causal eﬀect to b e the W asserstein distance b et ween the limiting conditional dis- tribution of the counterfactual Y ( a ) | X = x ab ov e and b elo w the treatment discon tinuit y . This provides a clean, one num b er summary of the entire distance b etw een treatment groups, thereb y enco ding the total magnitude of the treatmen t eﬀect and establishing a relative scale for all treatmen t eﬀects. Using the W asserstein distance as our causal eﬀect yields a n umber of nice prop erties. First, we sho w that it is weakly greater than the a verage treatmen t eﬀect at the cutoﬀ, where equality holds if and only if the treatmen t eﬀect is purely additive; this im- mediately pro vides a useful reference p oint to establish the amoun t of treatment heterogeneit y . Second, we show that the W asserstein distance can b e decomp osed in to the eﬀect on individual L -momen ts ( Hosking , 1990 ). This allows us to deﬁne a “distributional R 2 ,” i.e. the amoun t of the distributional distance explained by eac h L -moment, thereby providing a nov el w ay of summarizing treatmen t heterogeneity by its eﬀect on lo cation, scale, skewness, etc. Third, since the W asserstein distance describes the total magnitude of the treatmen t eﬀect, we can use it to deﬁne the degree to which one quan tile function sto chastically dominates the other. 2 In our analysis, w e consider b oth sharp and fuzzy treatment assignmen ts. Additionally , w e extend the distributional discon tinuit y design framework to regression kink designs b y deﬁning our causal eﬀect to b e the W asserstein deriv ative at the p olicy kink; this describ es the ﬂow of probabilit y mass at the cutoﬀ, and neatly generalizes traditional kink designs. Notably , w e also extend the work of W ang and Zhang ( 2025 ) to establish identiﬁcation of fuzzy lo cal treatmen t eﬀects at a policy kink. Broadly speaking, our analysis ﬁts into a growing literature of papers that apply optimal transp ort metho ds to causal inference problems in order to compare entire outcome distri- butions, rather than just a verages. F or example, Gunsilius ( 2023 ) dev elops distributional syn thetic con trols that reconstruct a treated unit’s distribution from con trols. In diﬀerence- in-diﬀerences, optimal transp ort metho ds align pre and p ost treatmen t outcome distributions across groups instead of relying on mean-lev el parallel trends; see T orous et al. ( 2024 ) for a nonlinear diﬀerence-in-diﬀerences and Zhou et al. ( 2025 ) for a geo desic v ariant. More gen- erally , Kurisu et al. ( 2025 ) and Schindl and W asserman ( 2025 ) consider causal c hange as a mo vemen t along paths in the space of probability distributions. In terested readers can see Gunsilius ( 2025 ) for an extended discussion and review of the literature. The remainder of the pap er is organized as follows: In Section 2 w e deﬁne all relev ant notation and deﬁnitions. In Section 3 w e formally deﬁne the distributional discon tinuit y framew ork, ﬁrst in the sharp treatmen t assignment setting. Within this section, we consider iden tiﬁcation of the W asserstein eﬀect ( Section 3.1 ), in terpretation of these eﬀects and their relationship to traditional mean and quan tile based eﬀects ( Section 3.2 ), estimation and the limiting distribution of the W asserstein eﬀect ( Section 3.3 ), inference for the W asserstein eﬀect ( Section 3.4 ), and ﬁnally we extend our results to the fuzzy treatment discon tinuit y setting ( Section 3.5 ). In Section 4 w e extend our framew ork to kink ed distributional designs, by deﬁning a nov el causal eﬀect in terms of the W asserstein deriv ative at the p olicy kink and in Section 4.1 we extend the work of W ang and Zhang ( 2025 ) to establish causal identiﬁcation in fuzzy kink designs. In Section 5 we apply our metho d to real data sets by re-analyzing several natural exp eriments and directly comparing the W asserstein eﬀect to the av erage treatment eﬀect at the cutoﬀ. Finally , in Section 6 w e pro vide a discussion and conclusion of distributional discon tinuit y designs, including limitations and directions for future w ork. 2 Setup & Notation Supp ose we observe Z 1 , . . . , Z n iid ∼ P where Z i = ( X i , A i , Y i ) where X i ∈ R is the “running v ariable,” A i ∈ { 0 , 1 } is the treatmen t assignment, and Y i ∈ R is the observ ed outcome. Note that Y is a scalar, unlik e in Dijck e ( 2025 ), whic h considers a distribution-v alued outcome. T o b egin, we assume that treatmen t is assigned such that A i = ( 1 if X i ≥ x 0 0 if X i < x 0 at cutoﬀ X = x 0 . In Section 3.5 and b eyond w e relax this assignment rule to the fuzzy setting. W e assume that Y has a contin uous distribution. W e are interested in the conditional distri- bution of Y ( a ) | X = x where Y ( a ) are the p otential outcomes under treatment assignment A = a . F urthermore, note that our framework does allow for the inclusion of some v ector of co v ariates; ho wev er, since this is not required for iden tiﬁcation and is notationally cum b er- some to include, w e omit such terms from our analysis. A brief discussion of conditioning on additional cov ariates can b e found in Section 3.3 . 3 Throughout the pap er, for some function f ( · ) we use the notation lim x ↑ x 0 f ( x ) = f ( x − 0 ) to denote the left-hand limit (i.e. x < x 0 and x → x 0 ) and similarly lim x ↓ x 0 f ( x ) = f ( x + 0 ) to denote the righ t-hand limit. W e sa y P 2 ( · ) is the set of all probabilit y measures with ﬁnite second momen ts. W e use F Y | X ( y | x ) to denote the cum ulative distribution function of Y | X = x with the associated quan tile function Q x ( u ) = F − 1 x ( u ) = inf { y : F Y | X ( y | x ) ≥ u } . When we consider the limiting quan tiles, we drop the notation on x and simply say that Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } , where the zero and one notation is used to denote taking the limit from abov e or b elow the cutoﬀ. 3 Distributional Discon tin uit y Design In this section, w e in tro duce distributional causal eﬀects that compare the entire outcome distribution b elow and ab o ve some treatmen t discon tinuit y . Let P a | x denote the conditional coun terfactual distribution of Y ( a ) | X = x and supp ose that treatmen t is assigned in a discon tinuous w ay , where A = I ( X ≥ x 0 ) for some running v ariable X and cutoﬀ x 0 . Then, w e deﬁne our causal estimand to b e the 2-W asserstein distance b etw een the counterfactual treatmen t distributions at the treatmen t discon tinuit y X = x 0 , i.e. Ψ = W 2 ( P 1 | x 0 , P 0 | x 0 ) . The 2-W asserstein distance b etw een an y t wo probability distributions P and Q is deﬁned as W 2 2 ( P , Q ) = inf γ ∈ Γ( P ,Q ) Z || x − y || 2 2 dγ ( x, y ) where Γ( P, Q ) is the set of all couplings of P and Q , i.e. the set of all joint distributions γ that preserv e the marginals of P and Q ( Villani et al. , 2009 ). Roughly speaking, γ describ es a wa y of pairing p oints from P and Q suc h that the total quadratic transp ort cost b etw een distributions is minimized under the b est p ossible pairing, as visualized in Figure 1 . More in tuitively , this describ es the minimal transp ortation cost of transforming or “morphing” P in to Q . If P has a density then W 2 2 ( P , Q ) = inf E [ || T ( X ) − X || 2 2 ] where the inﬁmum is o ver all maps T such that T ( X ) ∼ Q . The map T is called the optimal transp ort map. y Densit y f Y (0) | X = x 0 ( y ) f Y (1) | X = x 0 ( y ) Figure 1: Optimal transport maps b et ween coun terfactual distributions. In the case of a treatmen t discontin uity , Ψ measures how far probability mass must b e mo ved in order to transform the un treated distribution at the cutoﬀ into the treated distribu- tion. Thus, it measures diﬀerences not only in the means, but also higher moment eﬀects suc h as the v ariance or sk ewness. F or this reason, the W asserstein eﬀect can detect and quantify 4 complex, higher order treatment eﬀects that the traditional regression discon tinuit y design estimand w ould miss. F or example, in Figure 2 , we can see that ab o ve the treatmen t discon- tin uity at x = 0 not only do es the mean change, but the v ariance and ov erall distributional shap e do es as well. F o cusing solely on the diﬀerence in means w ould not adequately describ e the full eﬀect of treatmen t here. As a simple motiv ating example, supp ose that there is a treatment discontin uit y at x 0 = 0 and that Y (0) | X = x ∼ N (0 , 1) for all x , and Y (1) | X = x ∼ N (0 , 2 2 ) for all x. Then, it is clear that the av erage treatmen t eﬀect at the cutoﬀ E [ Y (1) | X = x 0 ] − E [ Y (0) | X = x 0 ] is zero, as there is no change in lo cation ab ov e and below the cutoﬀ. How ever, the standard deviation doubles. Researc hers who only consider the av erage treatment eﬀect at the cutoﬀ would conclude there was no treatmen t eﬀect, but in reality , a doubling of the standard deviation could ha ve large practical implications. Similarly , as discussed in Kim et al. ( 2024 ), treatmen t eﬀects could easily tak e a multimodal structure where Y (0) = 0 almost surely , but Y (1) = 1 or Y (1) = − 1 with equal probability . In this setting, the av erage treatment eﬀect is again zero, but treatment harms half the p opulation and b eneﬁts the other half. F ortunately , b oth of these causal eﬀects can be detected by Ψ . F or example, in the ﬁrst setting with t wo Gaussians, it can b e sho wn that Ψ = | σ 1 − σ 2 | = 1 , indicating a sharp diﬀerence in the outcome distributions. In Section 3.2 , we provide more guidance on the in terpretation of the W asserstein eﬀect, and its comparison to the av erage treatment eﬀect at the cutoﬀ. Now that w e hav e deﬁned our eﬀect of in terest, w e establish the conditions under which it is causally iden tiﬁed. 5 0 -5 -10 − 1 − 0 . 5 0 0 . 5 1 0 0 . 1 0 . 2 0 . 3 y x Conditional Density Figure 2: Counterfactual distributions ab ov e and below a treatmen t discon tinuit y 5 3.1 Iden tiﬁcation In this section, w e discuss the assumptions required for causa l iden tiﬁcation of the distri- butional eﬀect Ψ . These conditions are nearly identical to the iden tiﬁcation requiremen ts established in F randsen et al. ( 2012 ) for quan tile treatmen t eﬀects in discontin uity designs, since in one dimension the W asserstein distance can b e expressed as the L 2 distance b etw een quan tile functions ( V allender , 1974 ) — the only additional assumption required is ﬁnite second momen ts of P a | x in order for the W asserstein distance to b e w ell-deﬁned. F or completeness, w e still outline each assumption required. Let F Y ( a ) | X ( y | x ) b e the cumulativ e distribution function of P a | x . Then, for a sharp treatmen t assignmen t A = I ( X ≥ x 0 ) , we require: ( i ) Consistency: Y = Y ( a ) if A = a for a ∈ { 0 , 1 } . ( ii ) Continuity: F or a ∈ { 0 , 1 } and all y ∈ R , lim x → x 0 F Y ( a ) | X ( y | x ) = F Y ( a ) | X ( y | x 0 ) . ( iii ) Density at thr eshold: f X ( x ) is diﬀeren tiable at x = x 0 and lim x → x 0 f X ( x ) > 0 . ( iv ) R e gularity: P a | x ∈ P 2 ( R ) for a ∈ { 0 , 1 } . Assumption ( i ) rules out any in terference or spillo ver eﬀects, where the treatmen t of one observ ation aﬀects the outcomes of another. Assumption ( ii ) ensures that as w e approach the cutoﬀ the cumulativ e distribution functions of the counterfactuals ha v e well-deﬁned lim- its. This rules out sudden jumps or discon tinuities in the outcome distribution that could be unrelated to the treatment assignmen t. Assumption ( iii ) guarantees that there are observ a- tions arbitrarily close to the cutoﬀ on b oth sides, whic h is necessary for well-deﬁned limiting distributions. Finally , assumption ( iv ) ensures that the W asserstein distance is well deﬁned b y requiring the counterfactual distributions to hav e a ﬁnite second moment. With these assumptions deﬁned, w e now establish causal iden tiﬁcation in the following lemma. Lemma 1 (Identiﬁcation) . Under assumptions ( i ) - ( iv ) by F r andsen et al. ( 2012 ) it fol lows that Ψ =  Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) 2 du  1 / 2 wher e Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } ar e the limiting c onditional quantiles of Y | X = x ab ove and b elow the cutoﬀ. By Lemma 1 , we can see that the W asserstein eﬀect Ψ ma y b e expressed as the squared diﬀerence in the u -th quantile b elo w and ab ov e the cutoﬀ, in tegrated across the en tire distri- bution. This highligh ts the fact that Ψ measures distributional c hanges of an y form, whether it b e changes in lo cation, scale, shap e, etc. In the next section, w e explore how Ψ can b e in terpreted, and compare it to the traditional regression discon tinuit y design estimand τ . W e note that the reduction to quantiles only holds because Y is scalar; when Y is m ultiv ariate the estimation of the W asserstein eﬀect is more complicated and will b e dealt with in future w ork. 3.2 In terpretation In this section, we build in tuition for how to interpret the W asserstein eﬀect, Ψ . In particular, w e establish an inequality that directly compares Ψ to the a verage treatmen t eﬀect at the cutoﬀ and the conditions under which the eﬀects are equal, w e demonstrate how the direction of the 6 eﬀect at eac h quantile can b e neatly visualized, we decomp ose the distributional eﬀect into individual momen t eﬀects, and w e deﬁne a no vel measure of eﬀect magnitude b y considering the degree to which Q 1 sto c hastically dominates Q 0 . 3.2.1 Relation to the A v erage T reatmen t Eﬀect In traditional regression discon tinuit y designs, practitioners are typically interested in estimat- ing the diﬀerence in means ab ov e and b elow the treatmen t cutoﬀ, deﬁned b y τ = E [ Y (1) − Y (0) | X = x 0 ] . Notably , this can b e in terpreted through a distributional lens; τ is simply measuring the distance b et ween the means of the counterfactual distributions at the cutoﬀ. In fact, if the treatmen t eﬀect is purely additive (suc h that it only impacts the distribution means) then it can b e shown that these t wo causal eﬀects are equal. In the following theorem, we establish an inequality b et ween the W asserstein and mean eﬀects at the cutoﬀ that shows Ψ must b e w eakly greater than | τ | . F urthermore, we establish the condition under which these eﬀects are iden tical. Theorem 1 (Eﬀect Inequalit y) . The W asserstein eﬀe ct upp er b ounds the aver age tr e atment eﬀe ct at the cutoﬀ, i.e. | τ | ≤ Ψ . F urthermor e, e quality holds if and only if the tr e atment eﬀe ct is pur ely additive; that is, if for some δ ∈ R and for al l u ∈ (0 , 1) that Q 1 ( u ) = Q 0 ( u ) + δ . Theorem 1 sho ws that the jump, or discon tinuit y , in the outcome distributions at the cutoﬀ is alw ays at least as large as the jump in the means. Intuitiv ely , w e can think of the relationship b etw een these eﬀects by framing both in terms of the quantile eﬀect function ∆ Q ( u ) = Q 1 ( u ) − Q 0 ( u ) . Supp ose that U ∼ Uniform (0 , 1) . Then, it is clear that τ = R 1 0 ∆ Q ( u ) du = E [∆ Q ( U )] is simply the av erage (or signed area) of the quantile eﬀect curv e. Mean while, w e can see that the W asserstein eﬀect can equiv alently b e written as Ψ 2 = R 1 0 ∆ Q ( u ) 2 du = E [∆ Q ( U ) 2 ] , i.e. the area under the squared quantile eﬀect curve. Immediately , this yields the v ariance decom- p osition Ψ 2 = τ 2 + V (∆ Q ( U )) . Consequen tly , we can see that Ψ captures the shift in lo cation (as measured by τ ) and the heterogeneit y around that shift (as measured b y the v ariance of ∆ Q ( U ) ). In fact, w e can use this decomp osition to deﬁne a heterogeneit y index; let γ := V (∆ Q ( U )) Ψ 2 = 1 −  | τ | Ψ  2 . Then, it is clear that γ ∈ [0 , 1] . When γ = 0 , the treatment eﬀect is purely additiv e. Mean- while, when γ = 1 the diﬀerence in means explains none of the distributional distance. 7 0 0 . 2 0 . 4 0 . 6 0 . 8 1 − 2 − 1 0 1 u ∆ Q ( u ) Quan tile Eﬀect Curv es 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 5 1 1 . 5 2 u 1 Ψ 2 ∆ Q ( u ) 2 Con tribution Curv es Figure 3: Quantile eﬀect curves (left panel) and contribution curv es (righ t panel) for a hy- p othetical n ull eﬀect curv e (solid) and a sk ew ed eﬀect curv e (dashed). Both eﬀect curv es are deﬁned such that the a verage treatment eﬀect τ = 0 . 3.2.2 Visualizing Quantile Eﬀect Curv es Considering τ b y itself can conceal imp ortant diﬀerences: positive and negative quantile eﬀects ma y cancel out in the a verage, thereb y leaving a small a verage treatment eﬀect. This problem is readily addressed by the W asserstein eﬀect. Here, no treatmen t eﬀect is lost or canceled out since Ψ aggregates these eﬀect diﬀerences across all quan tiles. How ever, considering Ψ in isolation can b e restrictiv e since it doesn’t describ e the direction of the eﬀect at eac h quan tile (e.g. is treatment harmful or helpful). This concern is easily addressed b y plotting the quan tile eﬀect curv e ∆ Q ( u ) across u ∈ (0 , 1) whic h lets us directly visualize quan tile-by- quan tile con tributions to the W asserstein eﬀect. In this sense, our analysis neatly complemen ts existing metho ds for studying quan tile treatmen t eﬀects, such as in F randsen et al. ( 2012 ), Qu and Y o on ( 2015 ), Qu and Y o on ( 2019 ), and Chiang et al. ( 2019 ). In the left panel of Figure 3 , w e can see t wo curves, b oth of whic h ha v e an av erage treatmen t eﬀect of zero. The heigh t at each quantile sho ws the individual con tribution to the ov erall eﬀect; notably , one eﬀect curv e is nearly constan t, suggesting a n ull treatment eﬀect. How ever, the other curve has a signiﬁcant negativ e treatmen t eﬀect in the left tail of the distribution that is mask ed b y a p ositive eﬀect near the median. This juxtap osition b et ween eﬀect curv es highlights the imp ortance of considering distributional eﬀects o ver traditional diﬀerence-in-means analyses. F urthermore, we can also neatly visualize the contribution of each quantile to the W asserstein eﬀect via the con tribution function u 7→ 1 Ψ 2 ∆ Q ( u ) 2 , as sho wn in the right panel of Figure 3 . Here, w e can see that most of the W asserstein eﬀect in the skew ed distribution is driven b y the left tail. Mean while, the n ull eﬀect curve has nearly a uniformly distributed contribution plot across u ∈ (0 , 1) . 3.2.3 Direction of the T reatmen t Eﬀect Visualizing the quantile eﬀect curv e is a useful exercise and can help practitioners b etter in terpret the W asserstein eﬀect, how ever, it can lea v e some am biguity in terms of the o verall 8 direction of the treatment eﬀect. In this section, w e deﬁne a no vel one-n umber summary of the degree to whic h the treated quantiles dominate the untreated ones. Recall that for any t wo quan tile functions Q a ( u ) and Q b ( u ) , Q a sto c hastically dominates Q b if and only if Q a ( u ) ≥ Q b ( u ) for all u ∈ (0 , 1) , as discussed in Qu and Y o on ( 2015 ). Imp ortantly , we can decomp ose the W asserstein eﬀect in to directional-dominance eﬀects by deﬁning the p ositiv e and negativ e splits, ∆ Q + ( u ) = max { ∆ Q ( u ) , 0 } and ∆ Q − ( u ) = max {− ∆ Q ( u ) , 0 } . In tuitively , ∆ Q + ( u ) captures all of the p ositiv e treatment eﬀects across quan tiles (where the diﬀerence b et ween Q 1 ( u ) and Q 0 ( u ) is greater than zero), and ∆ Q − ( u ) captures all of the negativ e treatmen t eﬀects. Then, it follo ws that we may write Ψ 2 = Z 1 0 { ∆ Q ( u ) } 2 du = Z 1 0 { ∆ Q + ( u ) } 2 du + Z 1 0 { ∆ Q − ( u ) } 2 du whic h for notational simplicit y we write as Ψ 2 + + Ψ 2 − . Now that w e hav e split the W asserstein eﬀect into p ositiv e and negative quantile eﬀects, w e ma y deﬁne the W asserstein Dominance, ρ = Ψ 2 + − Ψ 2 − Ψ 2 + + Ψ 2 − ∈ [ − 1 , 1] . If Q 1 ( u ) stochastically dominates Q 0 ( u ) then Ψ 2 − = 0 and then ρ = 1 . Similarly , if Q 0 ( u ) sto c hastically dominates Q 1 ( u ) then Ψ 2 + = 0 and ρ = − 1 . Thus, ρ neatly describ es the degree to whic h one treatmen t eﬀect dominates the other. When ρ is close to zero, it follows that the quan tile eﬀects cross each other, leading to cancellations. 3.2.4 Decomp osition in to L -Momen ts Although decomp osing the W asserstein eﬀect into ∆ Q ( u ) is useful and lets us neatly visualize the signed con tributions of eac h quan tile, it doesn’t say an ything ab out the momen ts of the coun terfactual distributions at the cutoﬀ. Practitioners may b e interested in understanding the eﬀect contribution from the diﬀerences in means, standard deviations, skewnesses, etc. F ortunately , follo wing a similar approach to Sillitto ( 1969 ), the W asserstein eﬀect can be written as a generalized F ourier series using the shifted Legendre p olynomials as an orthogonal basis. The shifted Legendre polynomials are deﬁned by P ∗ k ( x ) = P k (2 x − 1) where P k ( x ) are the usual Legendre p olynomials and form an orthogonal basis on L 2 ([0 , 1]) . A closed form expression for the k th shifted Legendre p olynomial is given by P ∗ k ( x ) = ( − 1) k k X j =0  k j  k + j j  ( − x ) j . Imp ortan tly , under this orthogonal basis it can b e shown that Ψ may b e decomp osed into the summation of squared diﬀerences in L -moments. First introduced b y Hosking ( 1990 ), for an y random v ariable X with a ﬁnite ﬁrst momen t, the k th L -moment is deﬁned as λ k = Z 1 0 Q x ( u ) P ∗ k − 1 ( u ) du 9 where Q x ( u ) is the quan tile function for X . Note that P ∗ 0 = 1 . As sho wn in Hosking ( 1990 ), L -momen ts are deﬁned b y taking exp ectations of linear com binations of order statistics, and represen t a “robust” analogue of conv entional momen ts of a probabilit y distribution that are t ypically less sensitiv e to hea vy tailed distributions and are better b eha ved in small samples. Notably , they are alw ays w ell-deﬁned (as long as the ﬁrst moment exists) even when not all con ven tional momen ts exist. T o build intuition, let X 1: n ≤ X 2: n ≤ · · · ≤ X n : n b e the order statistics of a random sample of size n from the distribution of X . Then, the ﬁrst three L -momen ts are giv en b y: λ 1 = E [ X ] , λ 2 = 1 2 E [ X 2:2 − X 1:2 ] , and λ 3 = 1 3 E [( X 3:3 − X 2:3 ) − ( X 2:3 − X 1:3 )] . F o cusing on the second L -momen t, w e can see that it is prop ortional to the exp ected diﬀerence b et ween t wo indep enden t dra ws from a distribution. Thu s, it pro vides an alternate measure of disp ersion to the traditional standard deviation. Similarly , λ 3 pro vides an alternate measure of asymmetry to the traditional sk ewness by taking the exp ected diﬀerence b etw een upp er and lo wer order statistics. In the follo wing theorem, we establish how Ψ can b e decomposed into a summation of squared diﬀerences in L -momen ts. Theorem 2 ( L -Moment Decomp osition) . Supp ose that P a | x ∈ P 2 ( R ) for a ∈ { 0 , 1 } . Then, Ψ 2 = ∞ X k =1 (2 k − 1)  λ (1) k − λ (0) k  2 wher e λ ( a ) k = R 1 0 Q a ( u ) P ∗ k − 1 ( u ) du ar e the k th L -moments ab ove and b elow the cutoﬀ. By Theorem 2 we obtain an imp ortant decomp osition of Ψ : we may now deﬁne what can be though t of as a “distributional R 2 ,” that is, the amount of the W asserstein eﬀect that can b e explained b y a given L -moment. F or example, for eac h k ≥ 1 the share of the total distributional distance explained by the k th L -moment is given by R 2 k = (2 k − 1)  λ (1) k − λ (0) k  2 Ψ 2 (1) suc h that P ∞ k =1 R 2 k = 1 . This decomposition is purely distributional: it decomp oses the W asserstein distance b et ween the marginal coun terfactual outcome distributions at the cutoﬀ and do es not require a rank inv ariance assumption. As an illustrative example, in T able 1 w e can see the explanatory p ow er of the ﬁrst three moments for the eﬀect curves shown in Figure 3 . Notably , the n ull eﬀect curv e is primarily explained by v ariation in its L -scale and higher-order momen ts, as its quan tile eﬀect curv e is symmetric. Mean while, the skew ed eﬀect curv e is (unsurprisingly) primarily driven b y the diﬀerences in its L -sk ewness. Note that b oth ha ve an L -lo cation v alue of zero, since they are b oth deﬁned to ha ve an a verage treatmen t eﬀect of zero. The momen t decomp osition outlined in T able 1 provides a new and p ow erful to ol for decoding treatmen t eﬀect heterogeneit y . No w that we ha ve established several methods of interpreting the W asserstein eﬀect and ho w it compares to traditional causal eﬀects, we turn to estimation and inference. In the next section, w e formalize an estimator for the W asserstein eﬀect and derive its asymptotic 10 Moment Nul l Eﬀe ct Curve Skewe d Eﬀe ct Curve k = 1 0.0000 0.0000 k = 2 0.6079 0.1548 k = 3 0.0000 0.8157 k ≥ 4 0.3921 0.0295 T able 1: Comparison of Explained D istributional Distance prop erties around some chosen bandwidth of the treatment threshold X = x 0 . W e sho w that standard bias correction tec hniques can b e applied to estimation of the W asserstein eﬀect such that empirical bandwidth selection methods can b e implemen ted. 3.3 Estimation and Asymptotics In this section, w e establish formal properties for estimation of the W asserstein eﬀect. Note that Ψ depends on one-sided limiting conditional distributions ev aluated at a single p oint; suc h functionals are not pathwise diﬀerentiable, so there is no √ n -regular estimator and no eﬃcien t inﬂuence function. W e therefore emplo y a simple plug-in estimator, deﬁned b y b Ψ n =  Z 1 0 ( b Q 1 ( u ) − b Q 0 ( u )) 2 du  1 / 2 where Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } are the limiting conditional quant iles of Y | X = x . Thus, estimation of the W asserstein eﬀect reduces to estimation of conditional quan tile processes (whic h is a w ell studied problem), follo wed b y n umerical integration. There are man y w ays that Q a ( u ) can b e estimated. F or example, one natural route is lo cal linear quantile regression, as prop osed b y Y u and Jones ( 1998 ), whic h minimizes the chec k loss of a kernel-w eighted p olynomial estimator in order to pro duce b oundary-adaptive estimates of the conditional quan tile curves. This approac h w as adapted by F randsen et al. ( 2012 ) when ﬁrst deﬁning quan tile treatmen t eﬀects in a discontin uity design framework. How ev er, the metho ds established in F randsen et al. ( 2012 ) only yield p oint wise conﬁdence in terv als for conditional quantiles. F urthermore, their bandwidth condition requires √ nhh 2 → γ < ∞ . When γ > 0 , the squared bias and v ariance are of the same order; consequently , undersmo oth- ing must b e emplo yed so the bias is negligible relative to the v ariance and γ → 0 . In practice, this means that the standard mean-squared-error optimal bandwidth selection of h ∝ n 1 / 5 can lead to improp er cov erage. More recently , Qu and Y o on ( 2015 ) show ed that local quantile re- gression admits a uniform Bahadur representation which they then lev erage to obtain uniform conﬁdence in terv als for quantile treatmen t eﬀects. Building on this framew ork Qu and Y o on ( 2019 ) sho w that by estimating the leading bias term it is p ossible to obtain bias-adjusted uniform inference in the spirit of Calonico et al. ( 2014 ). Ultimately , the metho ds established b y Qu and Y o on ( 2015 ) and Qu and Y o on ( 2019 ) rely on the fact that the asymptotic distri- bution is conditionally piv otal, so they are not suitable for the local W ald ratios required by fuzzy designs (whic h w e consider in Section 3.5 ), th us, we turn to the framework established in Chiang et al. ( 2019 ). Their approac h dev elops a general theory for lo cal W ald estimands that allo ws for uniform inference across quantiles and can accommo date empirical bandwidth selection. Moreo ver, it encompasses b oth sharp and fuzzy discontin uity designs, as w ell as kink ed designs (which w e also consider in Section 4 ). W e formalize these tec hnical details in what follows. 11 In order to estimate Q a ( u ) , Chiang et al. ( 2019 ) adapt the lo cal p olynomial estimation with bias correction approach established in Calonico et al. ( 2014 ). F or a ∈ { 0 , 1 } let F ( k ) a ( y | x ± 0 ) = ∂ k ∂ x k F Y | X ( y | x )     x → x ± 0 b e the k th partial deriv ative of the conditional cum ulativ e distribution function where a = 1 corresp onds to the right limit (as x ↓ x 0 ) and a = 0 corresp onds to the left limit (as x ↑ x 0 ). Then, under appropriate smo othness assumptions, it follo ws that we ma y deﬁne the follo wing p th order one-sided T aylor expansions ab out x = x 0 , F Y | X ( y | x ) ≈ F Y | X ( y | x + 0 ) + · · · + F ( p ) Y | X ( y | x + 0 ) p ! ( x − x 0 ) p = r p  x − x 0 h  T α 1 ,p ( y ) F Y | X ( y | x ) ≈ F Y | X ( y | x − 0 ) + · · · + F ( p ) Y | X ( y | x − 0 ) p ! ( x − x 0 ) p = r p  x − x 0 h  T α 0 ,p ( y ) for x > x 0 and x < x 0 resp ectiv ely , where w e sa y F Y | X ( y | x + 0 ) = lim x ↓ x 0 F Y | X ( y | x ) and F Y | X ( y | x − 0 ) = lim x ↑ x 0 F Y | X ( y | x ) are the one-sided limits of F Y | X ( y | x ) , w e deﬁne r p ( u ) = (1 , u, . . . , u p ) T , and α a,p ( y ) =  F Y | X ( y | x ± 0 ) , F (1) Y | X ( y | x ± 0 ) h 1! , . . . , F ( p ) Y | X ( y | x ± 0 ) h p p !  T . Then, we may estimate the co eﬃcients separately on each side of the treatmen t discon tin uity b y solving one-sided lo cal w eighted least squares problems, deﬁned b y b α 1 ,p ( y ) = arg min α ∈ R p +1 n X i =1 I ( X i ≥ x 0 ) I ( Y i ≤ y ) − r p  X i − x 0 h  T α ! 2 K  X i − x 0 h  where K ( · ) is some kernel function and the estimator for b α 0 ,p ( y ) follows analogously with I ( X i ≤ x 0 ) . Clearly , if e 0 = (1 , 0 , . . . , 0) T is a standard basis v ector it follows that b F Y | X ( y | x + 0 ) = e T 0 b α 1 ,p ( y ) and b F Y | X ( y | x − 0 ) = e T 0 b α 0 ,p ( y ) . Ho wev er, w e are not quite done deﬁning our estimator. F rom here, Chiang et al. ( 2019 ) add a bias correction term in the style of Calonico et al. ( 2014 ) in order to allo w for empirical bandwidth selection. T o dev elop a deep er understanding of this calculation, observ e that the bias of our lo cal p olynomial estimator is giv en by E h b F Y | X ( y | x ± 0 ) i − F Y | X ( y | x ± 0 ) = h p +1 e T 0 (Γ ± p ) − 1 Λ ± p,p +1 F ( p +1) Y | X ( y | x ± 0 ) ( p + 1)! | {z } B ± ( y ,h,p ) + o ( h p +1 ) where we deﬁne B ± ( y , h, p ) to b e the bias suc h that Γ ± p = Z R ± K ( u ) r p ( u ) r p ( u ) T du and Λ ± p,q = Z R ± u q K ( u ) r p ( u ) du. In tuitively , Γ ± p is a matrix that describ es ho w the p olynomial regressors interact under the k ernel w eights and Λ ± p,q captures how the next higher-order term in the T aylor expansion 12 in teracts with the regressors. Then, Lemma 1 of Chiang et al. ( 2019 ) shows that under some set of regularit y conditions ∆ ± B ( y ) := √ nh  b F Y | X ( y | x ± 0 ) − F Y | X ( y | x ± 0 ) − b B ± ( y , h, p )  admits the uniform Bahadur represen tation ∆ ± B ( y ) = n X i =1 e T 0 (Γ ± p ) − 1 r p  X i − x 0 h  K  X i − x 0 h   I ( Y i ≤ y ) − F Y | X ( y | X i )  δ ± i √ nhf X ( x 0 ) + o P | X (1) where δ + i = I ( X i ≥ x 0 ) and δ − i = I ( X i ≤ x 0 ) . W e defer the reader to Assumption 1 of Chiang et al. ( 2019 ) for a comprehensiv e list of these regularity conditions. Notably , Chiang et al. ( 2019 ) require that the Kernel function K ( · ) is b ounded and contin uous and is of VC t yp e, which allo ws for common kernels such as the uniform, triangular, biweigh t, triw eight, and Epanechnik ov kernels, but rules out the Gaussian kernel due to its un b ounded support. F urthermore, for some bandwidth h satisfying h → 0 , they require nh 2 → ∞ and nh 2 p +3 → 0 . The former condition is a stronger assumption than the typical nh → ∞ in order to allow for uniform con vergence of the quantile pro cess, and the latter condition controls the bias relativ e to the v ariance. No w that we ha ve deﬁned this mac hinery , we discuss the conditional w eak conv ergence of our estimator. First, note that Chiang et al. ( 2019 ) consider con vergence of the quan tile pro cess after trimming the left and right tails, such that u ∈ [ ς , 1 − ς ] for some ς ∈ (0 , 1 / 2) . They do so since near the tails the conditional quantile function can b e diﬃcult to estimate reliably , so instead they establish weak con vergence in l ∞ ([ ς , 1 − ς ]) . How ever, in order to prop erly estimate the W asserstein eﬀect we need to extend the domain of the quantiles to the full support on [0 , 1] . Therefore, for weak conv ergence w e require the additional assumptions that: ( i ) The p otential outcomes are compactly supp orted. ( ii ) f Y ( a ) | X ( y | x ) is uniformly b ounded a w ay from zero on that support. With these assumptions in place, let ν ± n ( y ) = n X i =1 e T 0 (Γ ± p ) − 1 r p  X i − x 0 h  K  X i − x 0 h   I ( Y i ≤ y ) − F Y | X ( y | X i )  δ ± i √ nhf X ( x 0 ) . Then, by Theorem 1 of Chiang et al. ( 2019 ) it follo ws that ν ± n ⇝ G H ± where G H ± are zero mean Gaussian pro cesses with some co v ariance function H ± . Now that w e ha ve established conditional w eak conv ergence for the bias corrected cumulativ e distribution functions abov e and b elow the cutoﬀ, we need to inv ert them in order to obtain weak con vergence for the quan tile processes. Simply put, w e deﬁne b Q 1 ( u ) = inf { y : b F Y | X ( y | x + 0 ) − b B + ( y , h, p ) ≥ u } and b Q 0 ( u ) = inf { y : b F Y | X ( y | x − 0 ) − b B − ( y , h, p ) ≥ u } 13 suc h that the quantile treatmen t eﬀect may b e deﬁned as ∆ b Q ( u ) = b Q 1 ( u ) − b Q 0 ( u ) . F rom here, since the quan tile map F 7→ F − 1 is Hadamard diﬀeren tiable, w e ma y apply the functional delta metho d to see that √ nh ( b Q 1 ( u ) − Q 1 ( u )) ⇝ − G H + ( Q 1 ( u )) f Y | X ( Q 1 ( u ) | x + 0 ) and √ nh ( b Q 0 ( u ) − Q 0 ( u )) ⇝ − G H − ( Q 0 ( u )) f Y | X ( Q 0 ( u ) | x − 0 ) . Consequen tly , it follows that √ nh (∆ b Q ( u ) − ∆ Q ( u )) ⇝ G H − ( Q 0 ( u )) f Y | X ( Q 0 ( u ) | x − 0 ) − G H + ( Q 1 ( u )) f Y | X ( Q 1 ( u ) | x + 0 ) . In practice, p is often chosen to b e tw o, yielding lo cal quadratic p olynomial estimators. Higher order p olynomials can potentially reduce bias even further, but they also come with the risk of a larger v ariance due to sensitivity of the estimator near the b oundary . In the next section, w e discuss inference for the W asserstein eﬀect. R emark 1 (Conditioning on co v ariates) . Although cov ariates are not required for iden tiﬁcation, they are often of interest to practitioners b oth to obtain cov ariate indexed causal eﬀects and to improv e precision ( F rölich and Hub er , 2019 ; Calonico et al. , 2019b ). Let W i ∈ R d denote a vector of cov ariates and let b µ a,W ( x ) denote a lo cal p olynomial estimate of E [ W | X = x ] ; for example, following the same estimation procedure describ ed in Section 3.3 . W e ma y then deﬁne the cen tered cov ariates f W a,i = W i − b µ a,W ( X i ) for a ∈ { 0 , 1 } . Then, follo wing Chiang et al. ( 2019 ), for eac h y w e solve for ( e α 1 ,p ( y ) , e ϑ 1 ( y )) = arg min α ∈ R p +1 ,ϑ ∈ R d n X i =1 δ + i I ( Y i ≤ y ) − r p  X i − x 0 h  T α − f W T 1 ,i ϑ ! 2 K  X i − x 0 h  and analogously , for ( e α 0 ,p ( y ) , e ϑ 0 ( y )) b y replacing δ + i with δ − i and f W 1 ,i with f W 0 ,i . F rom here, if our goal is target F Y | X ( y | x + 0 ) (using the co v ariates only as v ariance-reducing nuisances), then we simply take e F Y | X ( y | x + 0 ) = e T 0 e α 1 ,p ( y ) and e F Y | X ( y | x − 0 ) = e T 0 e α 0 ,p ( y ) . Note that since we ha ve centered our cov ariates we can now safely interpret eac h estimate as the cumulativ e distribution function at the a v erage co v ariate v alue. If our goal is the conditional cumulativ e distribution function itself, F Y | X,W ( y | x + 0 , w ) = lim x ↓ x 0  P ( Y ≤ y | X = x, W = w )  then for any w ∈ R d w e deﬁne our estimator to b e e F Y | X,W ( y | x ± 0 ) = e T 0 e α a,p ( y ) + w T e ϑ a ( y ) . With these estimators in place, we can no w estimate the conditional W asserstein eﬀect Ψ( w ) =  Z 1 0  e Q 1 ( u ; w ) − e Q 0 ( u ; w )  2 du  1 / 2 where e Q 1 ( u ; w ) and e Q 0 ( u ; w ) are the inv erses of e F Y | X,W ( y | x + 0 ) and e F Y | X,W ( y | x − 0 ) , resp ec- tiv ely . Bias correction and the m ultiplier b o otstrap can b e implemented following the same co v ariate augmentation pro cedure with higher-order lo cal ﬁts. 14 3.4 Statistical Inference No w that we hav e established methods for estimation of quan tile treatment eﬀects as w ell as their limiting distributions, w e turn to inference for the W asserstein eﬀect. Surprisingly , statistical inference in this setting is not as straightforw ard as one migh t exp ect since Ψ is a quadratic parameter; here, the limiting distribution and rate of con vergence change as Ψ → 0 . T o illustrate this p oint broadly for quadratic parameters, V erdinelli and W asserman ( 2024 ) consider a toy example where X 1 , . . . , X n ∼ N ( µ, σ 2 ) and we are interested in estimating ψ = µ 2 . Using the estimator b ψ = ¯ X 2 n , it follo ws that √ n ( b ψ − ψ ) ⇝ N (0 , η 2 ) for some η 2 when µ  = 0 , and n b ψ ⇝ σ 2 χ 2 1 when µ = 0 . Moreo ver, when µ is close to zero, its distribution will b e neither normal nor χ 2 1 , and its rate of conv ergence will b e b etw een 1 /n and 1 / √ n . This is a common (and p erhaps understudied problem) in statistics; other parameters such as kernel tw o-sample statistics ( Gretton et al. , 2012 ) and Reproducing Kernel Hilbert Space corrections ( Sejdino vic et al. , 2013 ) suﬀer from this misalignment of con v ergence around the n ull. In the con text of distributional discon tin uity design, the delta metho d fails for our functional Ψ 2 = R 1 0 [∆ Q ( u )] 2 du , so we m ust construct our h yp othesis tests and conﬁdence interv als around this fact. W e ﬁrst consider testing the n ull hypothesis of no distributional c hange ab ov e and b elow the cutoﬀ; that is, ∆ Q ( u ) = 0 for all u ∈ (0 , 1) (or equiv alently that Ψ = 0 ). Then, we deﬁne tw o metho ds of constructing v alid (but conserv ativ e) conﬁdence interv als for Ψ . 3.4.1 T esting the Null Hyp othesis In this section, we test the null h yp othesis of no causal eﬀect. Under the null, it follows that ∆ Q ( u ) = 0 for all u ∈ (0 , 1) . F urthermore, as discussed in Section 3.3 it follows that √ nh ∆ b Q ( u ) ⇝ G ( u ) where G ( u ) is a mean-zero Gaussian pro cess with cov ariance kernel κ . F rom here, w e may apply the Karhunen-Loève theorem ( Karh unen , 1946 ; Lo ève , 1977 ) to expand G ( u ) as G ( u ) = ∞ X k =1 p λ k Z k ϕ k ( u ) where { ϕ k } ∞ k =1 are an orthonormal basis on L 2 ([0 , 1]) deﬁned by the eigenfunctions of the co v ariance operator induced b y the k ernel κ ( u, v ) (with eigen v alues λ 1 , λ 2 , . . . ) and Z k ∼ N (0 , 1) for all k . Then, it follo ws that nh b Ψ 2 n = Z 1 0  √ nh ∆ b Q ( u )  2 du ⇝ ∞ X k =1 λ k Z 2 k , (2) whic h is a second-order Gaussian (or Wiener-Itô) Chaos ( Janson , 1997 ). F rom here, there are sev eral w ays w e can go ab out conducting our h yp othesis test. The ﬁrst option is to directly estimate the eigen v alues of κ ( u, v ) = Cov ( G ( u ) , G ( v )) and approximate Equation (2) via Mon te-Carlo simulation. Although in principle this app ears to b e a straigh tforw ard pro cedure, the v alidit y of suc h a test is not automatic as w e must estimate λ 1 , λ 2 , . . . , truncate P K k =1 λ k Z 2 k for some K , and appro ximate the null distribution via Mon te-Carlo simulation. In the follo wing theorem, w e formally establish the conditions required to obtain a v alid lev el- α test under this pro cedure. 15 Theorem 3 (Eigenv alue T est) . Supp ose that P ∞ k =1 λ k < ∞ with λ 1 > 0 and deﬁne the Monte-Carlo dr aws b T ∗ K n ,b = K n X k =1 b λ k,n Z 2 k,b wher e b λ k,n ar e the estimate d eigenvalues and Z k,b ∼ N (0 , 1) for k ≥ 1 and b = 1 , . . . , B n . L et b c ∗ n,α b e the empiric al (1 − α ) quantile c ompute d fr om { b T ∗ K n ,b } B n b =1 . Then, supp osing that B n → ∞ and K n → ∞ , for any α ∈ (0 , 1) it fol lows that lim n →∞ P H 0 ( nh b Ψ 2 n > b c ∗ n,α ) = α as long as || b κ n − κ || 2 = o P ( K − 1 / 2 n ) . By Theorem 3 , we can see that the conditions required to obtain a level- α test using Monte- Carlo sim ulation dep end crucially on the n umber of terms included in our truncation, K n . Imp ortan tly , there are tw o errors in tro duced by sim ulating the critical v alue: the truncation error, con trolled by P k>K n λ k , and the estimation error, controlled by √ K n || b κ n − κ || 2 . Thus, K n m ust div erge to eliminate the truncation error, but not so fast that the estimation error fails to v anish. One w ay to obtain a rate for K n is to assume some kind of p olynomial eigen v alue deca y of the form λ k ≲ k − β for some β > 1 ; in the follo wing corollary we formalize this notion. Corollary 1 (Eigenv alue Deca y) . Assume the c onditions of The or em 3 hold and supp ose that || b κ n − κ || 2 = O p ( r n ) for some r n → 0 . F urthermor e, supp ose that ther e exist c onstants C λ > 0 and β > 1 such that for al l k , λ k ≤ C λ k − β . Then, it fol lows that letting K n ≍ r − 2 / (2 β − 1) n b alanc es the trunc ation bias and estimation err or, such that X k>K n λ k = O  r 2( β − 1) 2 β − 1 n  and p K n || b κ n − κ || 2 = O p  r 2( β − 1) 2 β − 1 n  . Corollary 1 clariﬁes the relationship b etw een b oth the truncation bias and estimation error, as well as the K n and the rate of eigenv alue deca y . Clearly , faster eigen v alue decay (i.e. a larger β ) allo ws for a smaller K n ; in this setting there will be less sensitivity to estimating κ . Con versely , slow er deca y requires a larger K n and therefore requires more accurate estimation of the co v ariance op erator. A natural c hoice for the rate is r n ≍ ( nh ) − 1 / 2 , as this aligns with the eﬀective sample size in a discon tin uity design setting. While Theorem 3 and Corollary 1 establish a useful testing framework, choosing K n in practice can be tric ky . That, com bined with the computational burden of Monte-Carlo simu- lation, suggests the eigen v alue test may b e less than desirable for practitioners. Alternatively , one could leverage Theorem 5 of Luedtke et al. ( 2018 ) to obtain a conserv ative, but computa- tionally simple statistical test. Speciﬁcally , Luedtk e et al. ( 2018 ) deriv e non-parametric tests of equalit y in distribution betw een unkno wn functions; they sho w that suc h a test also man- ifests as a Gaussian c haos, whic h can b e easily b ounded by applying a one-sided Chebyshev inequalit y . In the follo wing prop osition, we lev erage their results to obtain a conserv ative test for no causal eﬀect. 16 Prop osition 1 (Conserv ative T est) . L et µ = R 1 0 κ ( u, u ) du and σ 2 = 2 R 1 0 R 1 0 κ ( u, v ) 2 du dv . Fix α ∈ (0 , 1) and deﬁne c ub 1 − α = µ + σ p (1 − α ) /α . Then, by Lue dtke et al. ( 2018 ) it fol lows that lim sup n →∞ P H 0 ( nh b Ψ 2 n > b c ub n, 1 − α ) ≤ α wher e b c ub n, 1 − α = b µ + b σ p (1 − α ) /α for any estimators such that b µ p → µ and b σ p → σ . Note that we may equiv alently deﬁne µ and σ 2 as P ∞ k =1 λ k and 2 P ∞ k =1 λ 2 k , resp ectiv ely . Prop osition 1 provides us with a more con venien t statistical test that requires few er assump- tions on the estimation error of κ . Practically sp eaking, the condition K 1 / 2 n || b κ n − κ || 2 = o P (1) required in Theorem 3 means the eigen v alue test is only trust worth y when b κ n is estimated accurately enough that one can include man y eigenv alues without the sim ulated critical v alue b ecoming sensitive to K n . With a small sample size, b κ n ma y only supp ort a small K n , making the test fragile to the truncation c hoice and p otentially anti-conserv ative if K n is pushed to o large. In such settings the conserv ative test is preferable; it a voids estimating the full eigen- sp ectrum and instead requires only consisten t estimation of µ and σ . Finally , w e note that one may reject the n ull hypothesis using a one-sided 1 − α upp er conﬁdence b ound for Ψ using the interv als deﬁned in the follo wing section. 3.4.2 Constructing Conﬁdence Interv als As discussed in V erdinelli and W asserman ( 2024 ), constructing conﬁdence interv als for quadratic parameters with uniformly correct cov erage (with length n − 1 / 2 a wa y from the null and length n − 1 at the null) is an unsolved problem in statistics. In practice, w e deal with this problem b y constructing in terv als that are conserv ative near the n ull. W e consider t wo approaches for constructing suc h interv als. Later, in Section 5 , w e compare the cov erage and width of b oth metho ds via sim ulation. First, w e consider constructing a conﬁdence interv al for Ψ using the uniform conﬁdence band deﬁned for the quan tile treatmen t eﬀect. As shown b y Chiang et al. ( 2019 ), we can construct a m ultiplier bo otstrap pro cess G ∗ n suc h that G ∗ n ⇝ G . Therefore, if we let b c n,α b e the 1 − α conditional quantile of sup u | G ∗ n ( u ) | , it follows that ∆ b Q ( u ) ± 1 √ nh b c n,α yields a 1 − α conﬁdence band. With that in mind, let a n ( u ) = ∆ b Q ( u ) − b c n,α √ nh and b n ( u ) = ∆ b Q ( u ) + b c n,α √ nh . Then, it is clear that ov er the interv al [ a, b ] that max x ∈ [ a,b ] x 2 =      b 2 , a ≥ 0 a 2 , b ≤ 0 , max { a 2 , b 2 } , a < 0 < b and min x ∈ [ a,b ] x 2 =      a 2 , a ≥ 0 b 2 , b ≤ 0 0 a < 0 < b. Therefore, if w e deﬁne the upp er and lo wer b ounds M n ( u ) = max { a 2 n ( u ) , b 2 n ( u ) } and M n ( u ) = ( max { a n ( u ) , 0 } ) 2 + ( min { b n ( u ) , 0 } ) 2 then it becomes straightforw ard to construct the interv al C n = [ R 1 0 M n ( u ) du, R 1 0 M n ( u ) du ] . Under the regularity conditions established in Chiang et al. ( 2019 ) it immediately follows that lim inf n →∞ P (Ψ 2 ∈ C n ) ≥ 1 − α. 17 Alternativ ely , we can artiﬁcially widen our conﬁdence interv al follo wing the approach of V erdinelli and W asserman ( 2024 ). Sp eciﬁcally , w e could deﬁne C ′ n = " b Ψ 2 n ± z 1 − α/ 2 r b s 2 n + c 2 nh # (3) where b s n is the estimated standard deviation of Ψ 2 , z 1 − α/ 2 is the 1 − α/ 2 quan tile of a standard Normal distribution, and c is some constan t, such as V ( Y ) . W e no w conﬁrm that this provides a v alid, but possibly conserv ativ e, conﬁdence interv al. Lemma 2 (Conserv ative Interv al) . L et C ′ n b e the interval deﬁne d in Equation (3) for some c onstant c . Supp ose that E [ b Ψ 2 n − Ψ 2 ] = o (( nh ) − 1 / 2 ) and V ( b Ψ 2 n ) = o (( nh ) − 1 ) . Then, it fol lows that P (Ψ 2 ∈ C ′ n ) = o (1) . In practice, either of the prop osed metho ds for constructing conﬁdence interv als for Ψ is reasonable; their empirical widths are further discussed in Section 5 . A dditionally , as noted in Section 3.4.1 , we can chec k the n ull h yp othesis of no causal eﬀect by chec king if zero is in C n or C ′ n . In the following section, we extend our analysis to the fuzzy treatment assignmen t setting. 3.5 F uzzy Distributional Discon tin uity Design In many applications treatment assignment ab o ve and b elow the cutoﬀ is not p erfectly sharp. That is, although the probabilit y of receiving treatmen t jumps discontin uously at the thresh- old, some units below the threshold ma y receiv e treatment, and some ab ov e may not; such settings are referred to as “fuzzy” regression discontin uity designs ( Hahn et al. , 2001 ). In- tuitiv ely , in this setting the cutoﬀ acts as an instrument for treatment status; crossing the threshold changes the lik eliho o d of treatment but does not ﬁx it. No w, it no longer makes sense to directly compare outcome distributions abov e and b elow the cutoﬀ b ecause these groups diﬀer in more than treatmen t status. Notably , F randsen et al. ( 2012 ) extend the framew ork proposed b y Angrist et al. ( 1996 ) to deﬁne lo cal alwa ys-tak ers, nev er-takers, com- pliers, deﬁers, and indeﬁnites. T o do so, let X i b e the running v ariable with cutoﬀ x 0 and no w let A i ( x ) denote unit i ’s p oten tial treatment status if the running v ariable w ere x . Observ ed treatmen t is then A i = A i ( X i ) . Then, we deﬁne the one-sided treatment limits (when they exist) as A − i = lim x ↑ x 0 { A i ( x ) } and A + i = lim x ↓ x 0 { A i ( x ) } suc h that A − i is the treatment status that w ould b e received if the running v ariable approaches the cutoﬀ from the left and A + i the treatment that would b e received from the righ t, then we ma y deﬁne the following mutually exclusive groups: • L o c al A lways-T akers : AT = { i : A − i = 1 , A + i = 1 } . • L o c al Never-T akers : N T = { i : A − i = 0 , A + i = 0 } . • L o c al Compliers : C = { i : A − i = 0 , A + i = 1 } • L o c al Deﬁers : D = { i : A − i = 1 , A + i = 0 } 18 • L o c al Indeﬁnites : I = { i : one or both of ( A − i , A + i ) do not exist } . No w, w e fo cus on the sub-p opulation of lo cal compliers when deﬁning a fuzzy distribu- tional eﬀect, as this is the group whose treatmen t status is actually c hanged b y the treatmen t discon tinuit y . In this setting, we need require additional assumptions for causal iden tiﬁcation. Bey ond the assumptions discussed in Section 3.1 , w e need to assume ( v ) T r e atment Disc ontinuity: lim x ↓ x 0 P ( A = 1 | X = x ) > lim x ↑ x 0 P ( A = 1 | X = x ) . ( v i ) L o c al Smo othness: E [ A ± | X = x ] and F Y ( a ) | G = g ,X ( y | g , x ) are contin uous at X = x 0 , the latter for all y , eac h a ∈ { 0 , 1 } , and each g ∈ {AT , N T , C , D } . ( v ii ) Monotonicity : lim x → x 0 P ( A + ≥ A − | X = x ) = 1 and P ( I ) = 0 . Assumption ( v ) simply requires that the probability of treatmen t changes discontin uously at the threshold X = x 0 . Assumption ( v i ) requires that the fraction of units that w ould tak e treatmen t evolv es smo othly as the treatmen t cutoﬀ is approac hed. This guaran tees that the only discontin uit y of the observed treatmen t assignmen t is through the discon tinuit y at X = x 0 and not through some hidden break in the treatment assignment mechanism. F urthermore, ( v i ) requires that (within eac h compliance group) the distribution of the p otential outcomes v aries smo othly with the running v ariable at the cutoﬀ. This again ensures that an y discon- tin uity in observed outcome distributions is attributable to the change in the probabilit y of treatmen t, rather than a discontin uity in the p otential outcome distributions themselves. As- sumption ( v ii ) rules out the existence of deﬁers (units that alwa ys go against their treatment assignmen t) and indeﬁnites (units with ill-deﬁned treatment limits) in a neigh b orho o d of the treatmen t discontin uity . Intuitiv ely , this assumption implies that all units w eakly comply with the treatment assignment mec hanism, such that mo ving from b elow to ab o ve the cutoﬀ can only increase the chance of treatment. F urthermore, it ensures that every unit has w ell-deﬁned p oten tial treatmen t statuses. With these assumptions in place, it now follows that the group iden tiﬁed b y the discon tin uity are gen uine compliers. With assumptions ( v ) - ( v ii ) in place F randsen et al. ( 2012 ) show that the cumulativ e dis- tribution functions for compliers abov e and b elow the cutoﬀ are iden tiﬁed as F 1 |C ( y ) = lim x ↓ x 0 E [ I ( Y ≤ y ) A | X = x ] − lim x ↑ x 0 E [ I ( Y ≤ y ) A | X = x ] lim x ↓ x 0 E [ A | X = x ] − lim x ↑ x 0 E [ A | X = x ] and F 0 |C ( y ) = lim x ↓ x 0 E [ I ( Y ≤ y )(1 − A ) | X = x ] − lim x ↑ x 0 E [ I ( Y ≤ y )(1 − A ) | X = x ] lim x ↓ x 0 E [(1 − A ) | X = x ] − lim x ↑ x 0 E [(1 − A ) | X = x ] . Therefore, it follo ws that the W asserstein eﬀect for compliers is deﬁned and iden tiﬁed as Ψ C =  Z 1 0  Q 1 |C ( u ) − Q 0 |C ( u )  2 du  1 / 2 where Q a |C ( u ) = inf { y : F a |C ( y ) ≥ u } for a ∈ { 0 , 1 } are the quan tiles of the complier cum ulative distribution functions ab ov e and below the treatmen t discon tin uity . In terpretation of the fuzzy W asserstein eﬀect follows analogously to the sharp case; Ψ 2 C still acts as a distributional analogue upper b ounding the lo cal a verage treatment eﬀect at the cutoﬀ. Moreov er, the same inequalities and decomp ositions can b e extended to Ψ 2 C . Now that we hav e deﬁned the W asserstein eﬀect in a fuzzy distributional discon tinuit y design framework, we discuss estimation. 19 3.5.1 Estimation of the F uzzy W asserstein Eﬀect T o estimate the fuzzy W asserstein eﬀect, w e can directly extend the procedure describ ed in Section 3.3 , again follo wing the work of Chiang et al. ( 2019 ). F or a ∈ { 0 , 1 } we deﬁne G a ( y | x ) = E [ I ( Y ≤ y ) I ( A = a ) | X = x ] and π a ( x ) = E [ I ( A = a ) | X = x ] . Then, taking the corresp onding one-sided limits we hav e F a |C ( y ) = G a ( y | x + 0 ) − G a ( y | x − 0 ) π a ( x + 0 ) − π a ( x − 0 ) . T o estimate G a ( y | x ± 0 ) and π a ( x ± 0 ) we use one-sided lo cal p olynomial regression, mirroring the sharp case. Sp eciﬁcally , for G a ( y | x ) w e solv e b α G a,p ( y ) = arg min α ∈ R p +1 n X i =1 δ ± i  I ( Y i ≤ y ) I ( A i = a ) − r p  X i − x 0 h  T α  2 K  X i − x 0 h  . Analogously , for π a ( x ± 0 ) w e solv e b α π a,p b y letting I ( A i = a ) b e the dep enden t v ariable. Then, it follows that b G a ( y | x ± 0 ) = e T 0 b α G a,p ( y ) and b π a ( x ± 0 ) = e T 0 b α π a,p , which yields lo cal W ald estimator b F a |C ( y ) = b G a ( y | x + 0 ) − b G a ( y | x − 0 ) b π a ( x + 0 ) − b π a ( x − 0 ) . Finally , the complier quantile function is giv en by b Q a |C ( u ) = inf { y : b F a |C ( y ) ≥ u } . Statistical inference for the fuzzy W asserstein eﬀect follo ws exactly as in the sharp case, as described in Section 3.4 . 4 Distributional Kink Designs In man y practical settings, we do not observ e a discon tinuit y in the treatment assignmen t, but rather a kink or c hange in slop e of the policy . The idea here is the same as under regression discontin uity designs; it is assumed that units arbitrarily close to either side of the p olicy kink are comparable, and therefore a causal interpretation can b e justiﬁed. A canonical application of regression kink designs is that of Nielsen et al. ( 2010 ), who estimate the causal eﬀect of studen t grants on college enrollmen t in Denmark. Here, their running v ariable X is some con tinuous measure of parental income and the b eneﬁt b ( X ) exhibits a kink at diﬀerent eligibilit y thresholds; that is, the full grant is oﬀered up to some level X = x 1 , a linear phaseout o ccurs b et ween x 1 and x 2 (with a decreasing b eneﬁt in x given), and zero b eneﬁt is oﬀered for paren tal incomes greater than x 2 . Other notable early applications of regression kink designs can be found in Guryan ( 2001 ) and Dahlberg et al. ( 2008 ); interested readers should refer to Card et al. ( 2016 ) for a review, and to Ando ( 2017 ); Ganong and Jäger ( 2018 ) for discussions of inference and robustness in ﬁnite samples. In a sharp regression kink design the b eneﬁt is set deterministically according to the kno wn assignmen t rule b ( · ) . One causal target is the lo cal av erage marginal eﬀect of the b eneﬁt on the outcomes; that is, the slop e of the dose-response curv e at the p olicy kink, τ ′ = ∂ ∂ t E [ Y ( t ) | X = x 0 ]    t = b ( x 0 ) ( i ) = µ ′ Y ( x + 0 ) − µ ′ Y ( x − 0 ) b ′ ( x + 0 ) − b ′ ( x − 0 ) 20 0 1 2 3 4 5 6 0 0 . 5 1 1 . 5 2 b ( x ) y Figure 4: Example of a hypothetical regression kink design. where equality ( i ) follows by the identifying assumptions outlined in Card et al. ( 2015 ) and µ ′ Y ( x + 0 ) = lim x ↓ x 0  ∂ ∂ x E [ Y | X = x ]  , b ′ ( x + 0 ) = lim x ↓ x 0  ∂ ∂ x b ( x )  , and analogous deﬁnitions are giv en for µ ′ Y ( x − 0 ) and b ′ ( x − 0 ) . F or example, in Figure 4 w e can see an example of a regression kink design, where there is a clear kink in the dose-resp onse curv e at X = x 0 . As w as the case for regression discon tin uity design, far more information can b e gained by considering distributional causal eﬀects. Notably , quan tile treatmen t eﬀects in kinked designs hav e b een explored by Chiang and Sasaki ( 2019 ), Chiang et al. ( 2019 ), Chen et al. ( 2020 ), and W ang and Zhang ( 2025 ). How ever, these approaches suﬀer from the same set of dra wbac ks as b efore — namely , diﬃcult y in implementation and interpretation. Th us, in what follows, w e sho w that the W asserstein deriv ative at the p olicy kink pro vides a clean generalization of traditional kink design eﬀects. Let g ( t, x, ε ) b e a function of the b eneﬁt, running v ariable, and unobserv ables. Then, we ma y deﬁne the coun terfactual Y ( t ) = g ( t, X , ε ) and the observ ed outcome Y = g ( b ( X ) , X , ε ) . Again, w e let P t | x denote the conditional distribution of Y ( t ) | X = x under some b eneﬁt or treatmen t lev el t = b ( x ) . In this setting, w e can think of P t | x as a distribution along some absolutely contin uous path of distributions in the running v ariable x . This allo ws us to deﬁne the W asserstein deriv ative at the p olicy kink X = x 0 as Ψ ′ = lim δ → 0  W 2 ( P t 0 + δ | x 0 , P t 0 | x 0 ) | δ |  =    Z 1 0 ∂ ∂ t Q Y ( t ) | X = x 0 ( u )     t = b ( x 0 ) ! 2 du    1 / 2 where w e deﬁne t 0 = b ( x 0 ) . In tuitiv ely , Ψ ′ represen ts the instan taneous rate at which proba- bilit y mass mov es or ﬂows at the p olicy kink ( Am brosio et al. , 2005 ). While the traditional 21 regression kink design estimand τ ′ measures how the cen ter of mass mo ves or drifts through the kink, Ψ ′ measures how the entire distribution mo ves. No w w e consider iden tiﬁcation of the W asserstein deriv ative at the kink. Assume that b ′ ( x + 0 )  = b ′ ( x − 0 ) and b ( · ) is a kno wn function. Naturally , we exp ect the b ehavior of F Y | X ( y | x ) near x 0 to provide information ab out the causal eﬀect; how ev er, making this in tuition rigorous is subtle. F or iden tiﬁcation and in terpretation, the classical approac hes of Card et al. ( 2015 ) and Chiang and Sasaki ( 2019 ) can b e surprisingly diﬃcult to work with. In the case of mean eﬀects, Card et al. ( 2015 ) show that τ ′ can b e written as a w eighted a verage of individual-level marginal eﬀects where the w eights dep end on unobserv ables. F or quantile eﬀects, Chiang and Sasaki ( 2019 ) obtain an analogous weigh ted-av erage of structural deriv atives ev aluated along some laten t b oundary set. Both settings can b e hard to translate into standard treatment-eﬀect language, e.g. “what is the eﬀect of a marginal increase in the b eneﬁt.” F urthermore, these iden tiﬁcation strategies can require additional regularity conditions to make the weigh ts well- deﬁned; for example, Chiang and Sasaki ( 2019 ) suggest a rank-inv ariance assumption. More recen tly , W ang and Zhang ( 2025 ), prop osed a more direct identiﬁcation strategy that leads to a cleaner interpretation. Sp eciﬁcally , W ang and Zhang ( 2025 ) deﬁne the lo cal treatmen t eﬀect at the kink to be ∆ ϕ = ∂ ∂ t  ϕ ( F Y ( t ) | X = x 0 )     t = b ( x 0 ) = lim δ → 0  ϕ ( F Y ( t 0 + δ ) | X = x 0 ) − ϕ ( F Y ( t 0 ) | X = x 0 ) δ  where ϕ is some Hadamard diﬀerentiable functional. Now, the estimand describ es a genuinely lo cal av erage marginal eﬀect of a small p olicy-induced change in the b eneﬁt level around b ( x 0 ) for units at X = x 0 . Under some regularity conditions, W ang and Zhang ( 2025 ) show that the causal eﬀect ∆ ϕ is iden tiﬁed as ϕ ′ F Y | X = x 0 (DRKD( · )) where DRKD( · ) is the distributional regression kink design estimand, DRKD( y ) = ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) b ′ ( x + 0 ) − b ′ ( x − 0 ) . In the case of distributional kink designs, w e let ϕ u ( F ) = F − 1 ( u ) denote the u -quantile functional. Then, since Y is univ ariate, it follows that the W asserstein deriv ative at the p olicy kink is iden tiﬁed as Ψ ′ =    Z 1 0 ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) b ′ ( x + 0 ) − b ′ ( x − 0 ) ! 2 du    1 / 2 . Notably , if Y is m ultiv ariate this identiﬁcation strategy do es not work as the W asserstein distance is no longer a function of the limiting conditional quan tiles; establishing and iden ti- fying suc h distributional causal eﬀects in high dimensional settings is an in teresting and op en question. In the follo wing section, we extend the w ork of W ang and Zhang ( 2025 ) to handle iden tiﬁcation of fuzzy treatment assignment in kink designs. 4.1 F uzzy Distributional Kink Designs Although W ang and Zhang ( 2025 ) establish a clean and in terpretable framework for identiﬁca- tion of causal eﬀects in sharp kink designs, they do not consider the fuzzy treatmen t assignment setting, where the running v ariable induces a kink in treatment prop ensities rather than deter- ministically setting a b eneﬁt level. Here, we observe a noisy analogue of b ( x ) due to imp erfect 22 compliance, measurement error, or some other unobserved determinants of b eha vior. More formally , supp ose w e observe some b ( X , η ) where η captures unobserved v ariation in the treat- men t assignment. Now, there is no single baseline lev el of treatmen t, so we m ust deﬁne an analogous v ersion of ∆ ϕ in the fuzzy setting, and we must establish additional structure/con- ditions to ensure that the unobserved determinants of treatmen t evolv e smo othly in x around the kink. T o formalize this, we ﬁrst deﬁne a nonseparable outcome mo del and establish a fuzzy kink design characterization. Assumption 1 (Nonseparable Mo del) . Supp ose there exist unobserv ables ( ε, η ) and a measur- able structural function g : R × X × E → R suc h that: ( i ) (Poten tial outcomes) F or eac h t ∈ R , Y ( t ) = g ( t, X, ε ) . ( ii ) (F uzzy assignmen t) T = b ( X , η ) for some measurable b : X × H → R . ( iii ) (Consistency) Y = Y ( T ) = g ( b ( X, η ) , X , ε ) where ε ∈ E ⊂ R d ε , η ∈ H ⊂ R d η , and X ⊂ R is the supp ort of the running v ariable. Assumption 2 (F uzzy Kink Characterization) . Let I x 0 b e a closed in terv al con taining the kink p oin t x 0 . Then, supp ose that: ( i ) F or a.e. η , the map x 7→ b ( x, η ) is contin uous on I x 0 and con tinuously diﬀeren tiable on I x 0 \ { x 0 } , with ﬁnite one-sided deriv atives b ′ ( x + 0 , η ) and b ′ ( x − 0 , η ) . ( ii ) Let µ B ( x ) = E [ b ( X, η ) | X = x ] and assume that ∆ B := µ ′ B ( x + 0 ) − µ ′ B ( x − 0 )  = 0 where µ ′ B ( x + 0 ) = lim x ↓ x 0  ∂ ∂ x E [ b ( x, η ) | X = x ]  and µ ′ B ( x − 0 ) = lim x ↑ x 0  ∂ ∂ x E [ b ( x, η ) | X = x ]  . By Assumption 2 , w e guarantee that there is contin uity at the kink p oint, but a discontin u- it y in the ﬁrst order deriv ative. Moreov er, under b oth Assumption 1 and Assumption 2 , letting T 0 = b ( x 0 , η ) and ω ( η ) = b ′ ( x + 0 , η ) − b ′ ( x − 0 , η ) , we can deﬁne the coun terfactual treatment at the kink T δ = T 0 + δ  ω ( η ) ∆ B  , and the associated coun terfactual outcome Y δ = g ( T δ , x 0 , ε ) . R emark 2 (Counterfactual Deﬁnition) . The deﬁnition of the counterfactual T δ ma y seem coun terintuitiv e at ﬁrst glance since the interv ention dep ends on the kink-resp onsiv eness ω ( η ) = b ′ ( x + 0 , η ) − b ′ ( x − 0 , η ) ; it might seem more natural to deﬁne an interv en tion suc h as T δ = T 0 + δ , which shifts every unit by the same amount. The reason the w eighting ω ( η ) / ∆ B is required is that in a fuzzy kink design, the iden tifying v ariation comes from the change in the slop e of E [ b ( X, η ) | X ] at X = x 0 . A small change in the running v ariable shifts a unit’s treatment b y an amount prop ortional to ω ( η ) , so units more resp onsiv e to the kink con tribute more to the lo cal c hange actually observed in the data. The reason we normalize b y ∆ B = E [ ω ( η ) | X = x 0 ] is to allow δ to be in terpreted as a one-unit c hange in the a v erage treatmen t at the kink, since no w E [ T δ − T 0 | X = x 0 ] = δ . Thus, even though T δ do esn’t cor- resp ond to a uniform shift in treatmen t, it is the coun terfactual that aligns with the observed kink-v ariation, with w eights determined by each unit’s kink-responsiveness. 23 With this counterfactual outcome in place, we ma y now deﬁne the fuzzy lo cal treatment eﬀect at the kink for the functional ϕ as ∆ F ϕ = ∂ ∂ δ ϕ  F Y δ | X = x 0      δ =0 = lim δ → 0 ( ϕ  F Y δ | X = x 0  − ϕ  F Y 0 | X = x 0  δ ) , pro vided the limit exists. Next, our goal is to obtain a structural representation of ∆ F ϕ analo- gous to Lemma 1 of W ang and Zhang ( 2025 ). Ho wev er, b efore doing so w e m ust outline a few additional assumptions. First, we need iden tical smo othness assumptions to those required in W ang and Zhang ( 2025 ); for completeness, we write them out in what follows. Assumption 3 (Smo oth F unctional) . Let F b e the space of all one-dimensional distribution functions. Then, assume the functional ϕ : F → R is Hadamard diﬀerentiable at F Y | X = x 0 , with its Hadamard deriv ative denoted by ϕ ′ F Y | X = x 0 . Assumption 4 (Smo oth Structural F unctions) . The function g ( t, x, e ) is contin uously diﬀeren- tiable in ( t, x ) for each e ∈ E , with con tinuous partial deriv atives g 1 ( t, x, e ) = ∂ ∂ t g ( t, x, e ) and g 2 ( t, x, e ) = ∂ ∂ x g ( t, x, e ) . As discussed in W ang and Zhang ( 2025 ), Assumption 4 is analogous to the smo othness conditions imp osed in Card et al. ( 2015 ), but weak er than those required by the identiﬁcation strategy of Chiang and Sasaki ( 2019 ). Under this smo othness condition, the partial deriv ative of h ( x, e, u ) := g ( b ( x, u ) , x, e ) with resp ect to x is giv en b y ∂ ∂ x h ( x, e, u ) := h x ( x, e, u ) = b ′ ( x, u ) g 1 ( b ( x, u ) , x, e ) + g 2 ( b ( x, u ) , x, e ) . (4) An implication of Equation (4) is that x 7→ g 1 ( b ( x, u ) , x, e ) and x 7→ g 2 ( b ( x, u ) , x, e ) are con tinuous at x 0 , but x 7→ h ( x, e, u ) is not con tinuously diﬀerentiable at x 0 due to the dis- con tinuit y in b ′ ( x, u ) . Finally , w e must establish a few regularity conditions in the spirit of conditions R 1( i ) and R 1( ii ) of W ang and Zhang ( 2025 ). Assumption 5 (Regularit y 1) . Let Z = ( ω ( η ) / ∆ B ) g 1 ( T 0 , x 0 , ε ) and assume the following con- ditions hold: ( i ) F or eac h c > 0 , P    Y δ − Y 0 − δ Z   ≥ c | δ | | X = x 0  = o ( | δ | ) as δ → 0 . ( ii ) The conditional distribution of ( Y , Z ) giv en X = x 0 is absolutely contin uous with resp ect to the Leb esgue measure and has a joint density f Y ,Z | X ( y , y ′ | x 0 ) that is contin uous in y for all y ′ . F urthermore, assume there exists a Leb esgue in tegrable function ϖ : R → R with R | y ′ ϖ ( y ′ ) | dy ′ < ∞ suc h that for all ( y, y ′ ) , f Y ,Z | X ( y , y ′ | x 0 ) ≤ | ϖ ( y ′ ) | . 24 Assumption 5 ( i ) is a sto c hastic diﬀerentiabilit y condition along the counterfactual path δ 7→ Y δ induced by the fuzzy kink. It requires that, conditional on X = x 0 , the change in outcomes from shifting treatmen t from T 0 to T 0 + δ ( ω ( η ) / ∆ B ) admits a ﬁrst-order expansion with a remainder that is small enough to control. Assumption 5 ( ii ) ensures that the joint densit y ( Y , Z ) | X = x 0 is well-behav ed. Relativ e to the regularity conditions required in W ang and Zhang ( 2025 ), the diﬀerence now is that the deriv ative direction can v ary . That is, Z includes the random compliance weigh t ω ( η ) / ∆ B , so the domination and integrabilit y requiremen ts m ust control the weigh ted marginal eﬀect ( ω ( η ) / ∆ B ) g 1 ( T 0 , x 0 , ε ) and not just g 1 ( T 0 , x 0 , ε ) alone. Note that w e don’t explicitly need to require an additional smooth outcome distribution assumption lik e Assumption S4 of W ang and Zhang ( 2025 ) since Assumption 5 ( ii ) already implies contin uit y of y in f Y | X ( · | x 0 ) . With these conditions in place, we can no w establish a structural representation of ∆ F ϕ . Lemma 3 (F uzzy Structural Represen tation) . Supp ose that Assumptions 1-5 hold. Then the fuzzy lo c al tr e atment eﬀe ct at the kink ∆ F ϕ admits the r epr esentation ∆ F ϕ = ϕ ′ F Y | X = x 0  E  − f Y | X ( · | x 0 )  ω ( η ) ∆ B  g 1 ( T 0 , x 0 , ε ) | Y = · , X = x 0  . As discussed in W ang and Zhang ( 2025 ), Lemma 3 shows that the fuzzy lo cal treatment eﬀect at the kink can b e expressed as the Hadamard deriv ative of ϕ applied to a conditional exp ectation. Notably , this conditional exp ectation is analogous to the lo cal av erage structural deriv ative discussed in Ho derlein and Mammen ( 2007 , 2009 ). The primary diﬀerence b etw een Lemma 3 and Lemma 1 of W ang and Zhang ( 2025 ) is that now the conditional exp ectation is compliance weigh ted; it av erages g 1 ev aluated at eac h unit’s (random) baseline treatment T 0 = b ( x 0 , η ) , w eigh ted by the unit’s kink-resp onsiv eness ω ( η ) , suc h that units whose treatment is more strongly shifted by the kink contribute more to the iden tiﬁed marginal eﬀect. With this structural representation in place, we can no w ﬁnalize pro of for causal iden tiﬁcation of the fuzzy lo cal treatment eﬀect at the kink; ho wev er, we ﬁrst need to establish a few more assumptions and regularit y conditions. Assumption 6 (Smooth Disturbance Distributions) . The conditional distribution of ( ε, η ) giv en X = x is absolutely con tin uous with respect to Leb esgue measure. F urthermore, it admits a density f ε,η | X ( e, u | x ) that is contin uously diﬀeren tiable in x on I x 0 for all ( e, u ) . Finally , assume there exists some Lebesgue in tegrable function ϖ ( e, u ) suc h that sup x ∈ I x 0     ∂ ∂ x f ε,η | X ( e, u | x )     ≤ | ϖ ( e, u ) | . Finally , for eac h y assume I ( h ( x 0 + t, e, u ) ≤ y ) → I ( h ( x 0 , e, u ) ≤ y ) as t → 0 for all ( e, u ) . Assumption 7 (Regularit y 2) . Recall that we use the notational shorthand h x ( x ± 0 , ε, η ) := ∂ ∂ x h ( x ± 0 , ε, η ) . Assume the follo wing conditions hold: ( i ) F or eac h c > 0 , P    h ( x 0 + δ, ε, η ) − h ( x 0 , ε, η ) − δ h x ( x + 0 , ε, η )   ≥ c | δ | | X = x 0  = o ( | δ | ) P    h ( x 0 + δ, ε, η ) − h ( x 0 , ε, η ) − δ h x ( x − 0 , ε, η )   ≥ c | δ | | X = x 0  = o ( | δ | ) as δ ↓ 0 and δ ↑ 0 , respectively . 25 ( ii ) The conditional distributions of ( Y , h x ( x ± 0 , ε, η )) given X = x 0 are absolutely contin- uous with resp ect to the Leb esgue measure with densities f Y ,h ± x | X ( y , y ′ | x 0 ) that are con tinuous in y for eac h ﬁxed y ′ . Moreov er, there exists a Leb esgue in tegrable function ϖ h : R → R with R | y ′ ϖ h ( y ′ ) | dy ′ < ∞ suc h that for all y, y ′ , f Y ,h ± x | X ( y , y ′ | x 0 ) ≤ | ϖ h ( y ′ ) | . ( iii ) Supp ose that the conditional distribution of η giv en X = x is absolutely con tinuous with resp ect to Leb esgue measure, with conditional density f η | X ( u | x ) that is contin uously diﬀeren tiable in x on I x 0 . F urthermore, that there exists some Leb esgue in tegrable function ϖ η ( u ) such that sup x ∈ I x 0     ∂ ∂ x f η | X ( u | x )     ≤ ϖ η ( u ) . ( iv ) Assume the function b ( x, u ) is contin uous in x at x 0 for eac h u and diﬀeren tiable on eac h side of x 0 with one-sided deriv atives b ′ ( x ± 0 , u ) . Finally , assume there exist Leb esgue in tegrable functions κ 0 , κ 1 suc h that sup x ∈ I x 0 | b ( x, u ) | ≤ κ 0 ( u ) and sup x ∈ I x 0 \{ x 0 }     ∂ ∂ x b ( x, u )     ≤ κ 1 ( u ) together with R κ 0 ( u ) ϖ η ( u ) du < ∞ and E [ κ 1 ( η ) | X = x 0 ] < ∞ . Assumption 6 allows for ( ε, η ) to b e b oth correlated with X and to v ary with x , how ever, it requires this v ariation be smo oth on I x 0 . Assumption 7 ( i ) is a local linearization requiremen t for h ( x, ε, η ) around x 0 that ensures that small changes in x induce appro ximately linear shifts in Y that are controlled b y the one-sided deriv atives h x ( x ± 0 , ε, η ) . Assumption ( ii ) is another regularity condition ensuring the joint distribution of ( Y , h x ( x ± 0 , ε, η )) | X = x 0 is well b eha ved at the kink. Finally , ( iii ) ensures that any selection eﬀect arising from x 7→ f η | X ( · | x ) is smooth and therefore do es not itself generate a kink and ( iv ) adds integrabilit y conditions on b ( x, η ) (and its deriv ative). With these assumptions in place, w e no w establish causal iden tiﬁcation of the fuzzy local treatmen t eﬀect at the kink. Theorem 4 (F uzzy Kink Design Iden tiﬁcation) . Supp ose the c onditions of L emma 3 and Assumptions 6-7 hold. Then, the fuzzy lo c al tr e atment eﬀe ct at the kink is identiﬁe d as ∆ F ϕ = ϕ ′ F Y | X = x 0  FDRKD( · )  , wher e FDRKD( · ) is the fuzzy distributional r e gr ession kink design estimand, FDRKD( y ) = ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) . Analogously to the identiﬁcation results of W ang and Zhang ( 2025 ) in sharp kink designs, Theorem 4 shows that the fuzzy lo cal treatment eﬀect at the kink is identiﬁed by applying the Hadamard deriv ative of the functional ϕ in the direction of the FDRKD estimand. Imp ortantly , FDRKD( y ) is in terpretable as the lo cal distributional eﬀect per unit of the kink-induced 26 treatmen t c hange, represen ted as a distributional lo cal W ald ratio. Clearly , in the case of distributional kink designs the fuzzy W asserstein deriv ative at the kink is iden tiﬁed as Ψ ′ C =    Z 1 0 ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) ! 2 du    1 / 2 after letting ϕ b e the quantile function. No w that w e ha ve established identiﬁcation in the fuzzy kink design setting, in the next section w e discuss estimation. 4.2 Estimation and Inference for the Kinked W asserstein Eﬀect In this section, w e discuss estimation of the W asserstein deriv ative at a p olicy kink. Our strategy will build oﬀ of the w ork of Chiang et al. ( 2019 ) and the framew ork established in Section 3.3 . First, note that Q Y | X ( u | x + 0 ) = Q Y | X ( u | x − 0 ) =: Q Y | X ( u | x 0 ) due to the con tinuit y at x 0 . Second, recall that the deriv ativ e with respect to x of the quantile function can b e written as ∂ ∂ x Q Y | X ( u | x ± 0 ) = − ∂ ∂ x F Y | X  Q Y | X ( u | x 0 ) | x ± 0  f Y | X  Q Y | X ( u | x 0 ) | x 0  . Th us, after taking the diﬀerence ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) and then dividing by the ﬁrst-stage kink µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) , it is clear that ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) = − FDRKD  Q Y | X ( u | x 0 )  f Y | X  Q Y | X ( u | x 0 ) | x 0  . (5) With this in mind, w e can see that estimation of Q Y | X ( u | x 0 ) is the same as in Section 3.3 ; the only additional terms to estimate are ∂ ∂ x F Y | X ( y | x ± 0 ) , µ ′ B ( x ± 0 ) , and f Y | X ( · | x 0 ) . W e b egin b y considering estimation of the ﬁrst t wo terms. Recall that in Section 3.3 we considered a one-sided T aylor expansion of F Y | X ( y | x ) about x = x 0 . Sp eciﬁcally , we deﬁned α a,p ( y ) =  F Y | X ( y | x ± 0 ) , F (1) Y | X ( y | x ± 0 ) h 1! , . . . , F ( p ) Y | X ( y | x ± 0 ) h p p !  T . and then estimated α a,p ( y ) via one-sided local w eigh ted least squares. Consequently , lev eraging this exact approac h it follo ws that ∂ ∂ x b F Y | X ( y | x ± 0 ) = 1 h e T 1 b α ± ,p ( y ) where e 1 = (0 , 1 , 0 , . . . , 0) T . Notably , we can rep eat this approach to estimate µ ′ B ( x ± 0 ) . Sp ecif- ically , if w e use the same lo cal polynomial estimation, no w with outcomes T i = b ( X i , η i ) , i.e. b β ± ,p = arg min β ∈ R p +1 n X i =1 δ ± i T i − r p  X i − x 0 h  T β ! 2 K  X i − x 0 h  , (6) then w e can similarly obtain the estimator b µ ′ B ( x ± 0 ) = 1 h e T 1 b β ± ,p . Bias correction can similarly b e implemen ted following the same steps discussed in Section 3.3 . Finally , we note that there 27 are many w ays one could estimate f Y | X ( · | x 0 ) . One simple metho d w ould b e to deﬁne a lo cal p olynomial conditional density estimator b y replacing T i in Equation (6) with a kernel in y , i.e. h − 1 y K (( Y i − y ) /h y ) . Putting everything together, if we plug-in all of our estimators in to Equation (5) then squaring and numerically integrating ov er (0 , 1) yields an estimate of Ψ ′ C . Inference for the W asserstein deriv ative at the p olicy kink follo ws largely in the same manner as discussed in Section 3.4 ; the only ma jor diﬀerence is the scaling. As discussed in Calonico et al. ( 2014 ); Card et al. ( 2015 ), estimating a deriv ative at a b oundary introduces an additional 1 /h scaling, so its v ariance now scales as ( nh ) − 1 ( h 2 ) − 1 = ( nh 3 ) − 1 . Consequen tly , if w e wan ted to construct a conﬁdence in terv al for Ψ ′ w e simply need to correct this scaling. F ollowing Equation (3) , we can obtain an analogous in terv al of C ′′ n = " ( b Ψ ′ C ) 2 ± z 1 − α/ 2 r b s 2 n + c 2 nh 3 # where b s n is the estimated standard deviation of ( b Ψ ′ C ) 2 , z 1 − α/ 2 is the 1 − α/ 2 quantile of a standard Normal distribution, and c is some constant, suc h as V ( Y ) . 4.3 In terpretation of the Kinked W asserstein Eﬀect In terpretation of the W asserstein deriv ative at a policy kink follo ws analogously to the in ter- pretation established in Section 3.2 for discon tinuit y designs. T o see this, deﬁne the quan tile eﬀect curve at the kink by ∆ Q ′ ( u ) = ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) . Then, again we can immediately see that τ ′ = R 1 0 ∆ Q ′ ( u ) du and (Ψ ′ ) 2 = R 1 0 [∆ Q ′ ( u )] 2 du , so letting U ∼ Uniform(0 , 1) we again obtain the same v ariance decomp osition (Ψ ′ ) 2 = ( τ ′ ) 2 + V (∆ Q ′ ( U )) . Th us, Ψ ′ captures b oth the mean drift through the kink (as measured by τ ′ ) and the het- erogeneit y of the treatment eﬀect across quan tiles (as measured by V (∆ Q ′ ( U )) ). Similarly , b y applying the Cauc hy-Sc hw arz inequality we can obtain the kink analogue of Theorem 1 , | τ ′ | ≤ Ψ ′ . Equality holds if and only if the marginal eﬀect is purely additiv e, or equiv a- len tly that the quantile eﬀect curve is ﬂat, i.e. ∆ Q ′ ( u ) = δ for all u ∈ (0 , 1) . Finally , it is p ossible to obtain an analogous version of Theorem 2 for the kinked W asserstein eﬀect. Let λ k ( x ) = R 1 0 Q Y | X ( u | x ) P ∗ k − 1 ( u ) du be the conditional L -moment ev aluated at x and deﬁne its one-sided deriv ativ es as λ ′ k ( x ± 0 ) = Z 1 0 ∂ ∂ x  Q Y | X ( u | x ± 0 )  P ∗ k − 1 ( u ) du. Then, following the same argumen ts as in the pro of of Theorem 2 , it can b e shown that the W asserstein deriv ative at the kink may b e decomp osed into deriv atives of L -momen ts. W e formalize this decomposition in the follo wing theorem. Theorem 5 ( L -Moment Deriv ative Decomp osition) . Supp ose that R 1 0 [∆ Q ′ ( u )] 2 du < ∞ . Then, Ψ ′ C = ( ∞ X k =1 (2 k − 1)  λ ′ k ( x + 0 ) − λ ′ k ( x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 )  2 ) 1 / 2 . 28 No w, each k in the series represen tation of Ψ ′ C represen ts the instantaneous c hange in L - lo cation, L -scale, L -skewness, etc. at the kink. With these representations and interpretations established, in the next section w e analyze real data sets to see how these metho ds can b e implemen ted in practice. 5 Sim ulations and Data Analysis In this section we consider the practical implemen tation of distributional discon tinuit y designs and distributional kink designs. First, we compare the empirical cov erage and interv al width of the tw o conﬁdence interv als prop osed in Section 3.4.2 . Next, we re-analyze t wo natural exp erimen ts: one regression discontin uity design and one regression kink design. Our goal is to compare traditional mean-based eﬀects to our prop osed distributional eﬀects. 5.1 Sim ulations In what follows we conduct a simulation study to compare the empirical widths and cov erage of the tw o conserv ative conﬁdence in terv als for Ψ described in Section 3.4.2 . W e consider three data generating pro cesses, all of which feature a running v ariable drawn X i ∼ Uniform( − 1 , 1) , treatmen t sharply assigned such that A i = I ( X i ≥ 0) , and the function m ( x ) = 0 . 5 x + x 2 : ( i ) A dditive Eﬀe ct: Let Y i = m ( X i ) + τ A i + ε i with ε i ∼ N (0 , 1) . ( ii ) Diﬀering V arianc es: Let Y i = m ( X i ) + σ ( A i ) ε i with σ (0) = 1 and σ (1) = 2 . ( iii ) He avy T aile d: Let Y i = m ( X i ) + τ A i + (1 + 0 . 3 | X i | )(1 + 0 . 6 A i )(exp( ε i ) − exp(1 / 2)) . The three data generating pro cesses are chosen to represen t increasingly rich forms of treat- men t heterogeneit y . Setting ( i ) is a simple additiv e treatmen t eﬀect mo del where treatment shifts the conditional distribution b y a constant τ = 1 / 2 at every quantile. Setting ( ii ) lea ves the mean unchanged at the cutoﬀ, but doubles the standard deviation. Finally , in setting ( iii ) w e introduce errors that are sk ew ed and hea vy tailed via exp( ε i ) − exp(1 / 2) ; furthermore, we in tro duce the factors (1 + 0 . 3 | X i | ) and (1 + 0 . 6 A i ) to induce heterosk edasticity in the running v ariable and an explicitly non-additiv e treatment eﬀect. In the sim ulation w e consider n ∈ { 10 3 , 10 4 , 10 5 , 10 6 } , which corresp onds to one-sided within-bandwidth sample sizes of roughly 185, 1185, 7500, and 47300. W e also consider diﬀeren t truncation proﬁles for Ψ , where instead of in tegrating o ver the full quan tile grid u ∈ (0 , 1) we consider u ∈ ( γ , 1 − γ ) for γ ∈ { 0 . 05 , 0 . 1 , 0 . 25 } . W e ﬁnd that for small to mo dest sample sizes some degree of truncation is useful for numerical stability . F or each replication and choice of n , w e estimate Ψ using the bias-corrected local p olynomial procedure describ ed in Section 3.3 . W e use the default bandwidth rule h n ∝ n − 1 / 5 , 1,000 bo otstrap replications, and 10,000 o v erall Mon te-Carlo simulations. Our simulation results are visualized in Figure 5 . Broadly sp eaking, we ﬁnd that for small sample sizes, the conserv ativ e interv als deﬁned in Equation (3) are an order of magnitude smaller than the bo otstrap in terv als; as the sample size increases this diﬀerence b ecomes less pronounced, but do es not go a wa y . This is likely b ecause, as outlined in Chiang et al. ( 2019 ), the b o otstrap interv als estimate b oth the running v ariable density at the cutoﬀ as w ell as conditional outcome densities ev aluated at estimated quantiles; b oth can b e unstable with small sample sizes. F urthermore, because Chiang et al. ( 2019 ) deﬁne a uniform conﬁdence band ov er u , if an y quantiles are p o orly estimated then the bands can blow up in width. 29 10 3 10 4 10 5 10 6 10 − 2 10 4 10 10 In terv al width ( i ) Additiv e 10 3 10 4 10 5 10 6 10 − 2 10 5 10 12 n ( ii ) Diﬀ. V ariances 10 3 10 4 10 5 10 6 10 − 2 10 10 10 22 ( iii ) Heavy 10 3 10 4 10 5 10 6 0 . 8 0 . 85 0 . 9 0 . 95 1 Co verage 10 3 10 4 10 5 10 6 0 . 8 0 . 85 0 . 9 0 . 95 1 n 10 3 10 4 10 5 10 6 0 . 8 0 . 85 0 . 9 0 . 95 1 γ = 0 . 05 γ = 0 . 10 γ = 0 . 25 Figure 5: Mon te Carlo conﬁdence in terv al widths and cov erage for b o otstrap in terv als (dashed) and simple in terv als (solid) across trimming lev els γ and sample sizes n . F urthermore, we can see that while cov erage is theoretically conserv ativ e for b oth metho ds, as the sample size increased, the conserv ative interv als attained approximate 1 − α cov erage; mean while, the b o otstrap interv als alwa ys ov ercov ered. 5.2 Distributional Discon tinuit y Design Analysis In this section we re-analyze a canonical regression discontin uity design analysis in order to compare the W asserstein eﬀect to the conv entional mean eﬀect at the cutoﬀ. W e consider the work of Lee ( 2008 ), who studied the causal eﬀect of electoral incumbency in U.S. house elections b y leveraging the idea that elections decided b y very small margins are “as go o d as randomized;” their data is publicly a v ailable via the R pac k age RDHonest . The running v ariable is the Demo cratic vote share margin of victory in a giv en election (deﬁned by the Demo cratic v ote share minus the v ote share of the strongest opp onen t), with a corresp onding discontin uity at zero. The primary outcome is the Demo cratic party’s vote share in the subsequen t election. Empirically , Lee ( 2008 ) ﬁnds clear evidence of an incum b ency adv antage, where barely winning an election leads to a statistically signiﬁcan t jump in the follo wing election, to the tune of a 7-8% increase in v ote share. In what follows, w e consider whether or not there were in teresting distributional eﬀects not visible b y considering the mean alone. In our re-analysis, to keep things simple w e c ho ose p = 1 for our lo cal p olynomial estimator, a triangular kernel, and w e set h = n − 1 / 5 . Under these settings, using the rdrobust pac k age w e estimate the mean eﬀect to b e a 7.099 increase in v ote share with a 95% conﬁdence interv al of [2.648 , 8.038] — these results replicate the ﬁndings of Lee ( 2008 ). Next, we consider the 30 W asserstein eﬀect without truncating quantiles: we obtain an estimate for Ψ of 7.544 with a 95% conﬁdence in terv al of [5.023, 9.412]. The fact that b τ and b Ψ are so close to each other suggests there was not muc h heterogeneit y in the treatment eﬀect. Indeed, if w e break down b Ψ into the distributional R 2 table as sho wn in T able 2 , Moment Explaine d Distanc e k = 1 0.5598 k = 2 0.0413 k = 3 0.1118 k ≥ 4 0.2871 T able 2: Explained distributional v ariation for the incum b ency adv an tage w e can see that most of the eﬀect is explained b y the v ariation in the L -lo cation, with some notable changes as w ell in L -sk ewness and higher-order decomp ositions. Note that as sho wn in Equation (1) , because λ 1 is the mean, it follo ws that R 2 1 = τ 2 Ψ 2 = 1 − γ , where γ is the heterogeneity index discussed in Section 3.2.1 . Th us, if we w ere to plug in each estimate, w e’d ﬁnd 7 . 099 2 / 7 . 544 2 ≈ 0 . 886 as the explained distributional distance coming from the ﬁrst momen t. The gap b etw een 0.5598 and 0.886 is lik ely due to ﬁnite-sample diﬀerences in mean vs quantile eﬀect estimation. Our ﬁndings are further v alidated b y considering the estimated W asserstein dominance. Here, w e ﬁnd b ρ = 0 . 5777 , suggesting that winning a close election was prett y uniformly beneﬁcial, with little quan tile crossing. By com bining the tra- ditional mean eﬀects analysis with our distributional analysis, we w ere able to obtain a muc h more complete understanding of the causal eﬀect of the incum b ency adv antage. 5.3 Distributional Kink Design In this section we re-analyze an existing regression kink design analysis in order to compare the W asserstein deriv ativ e to the mean-eﬀect at the kink. Sp eciﬁcally , w e consider the work of Lundqvist et al. ( 2014 ) who study whether general in tergov ernmental grants increase local public employmen t using kno wn kinks in the Swedish gran t system; the data used is publicly a v ailable via the R pack age causalweight . The running v ariable is the net out-migration rate in a giv en m unicipality , m it = 100(1 − n i,t − 2 /n i,t − 12 ) where n i,t is the p opulation in the i th m unicipality at time t . That is to say , the p ercentage p opulation decrease o ver a ten-y ear windo w with a tw o-year lag. The policy rule for out-migration gran ts is giv en b y g m it = ( a ( m it − 2) , m it > 2 , 0 , m it ≤ 2 , where the kink is at 2% and a is a constan t (100 Swedish krona p er capita p er additional p ercen tage p oin t ab o ve 2% ). In their analysis Lundqvist et al. ( 2014 ) ﬁnd no statistically signiﬁcan t eﬀect of grants on total lo cal public employmen t, making their study a goo d p oint of comparison to distributional eﬀects that consider more than just the mean. In our re-analysis, w e consider h ∈ { 5 , 10 , 15 } and a uniform k ernel to match the analysis of Lundqvist et al. ( 2014 ); we rep ort the h = 10 results, although they are all qualitatively similar. F urthermore, we demean the outcome and b eneﬁt by y ear and cluster our standard 31 errors at the municipalit y . Using the rdrobust pack age w e estimate the lo cal av erage slop e b τ ′ C to b e -0.050, with a 95% conﬁdence interv al of [ − 0 . 378 , 0 . 277] , matching the n ull eﬀect found in Lundqvist et al. ( 2014 ). Our estimated v alue for the W asserstein deriv ativ e b Ψ ′ C is 0.6713, with a 95% conﬁdence interv al of [0 . 000 , 1 . 432] , indicating a null distributional eﬀect. How ever, we do ﬁnd an interesting characterization of the eﬀect in the L -moment decomp osition, as shown in T able 3 . It app ears that most of the distance explained in b Ψ ′ C comes from higher-order momen ts; notably , almost none comes from the mean eﬀect. This may suggest there is more to the story worth lo oking in to: p erhaps there were a few outlier m unicipalities that used their grants extremely well (or p o orly). Practitioners may then consider lo oking into targeted h yp othesis tests on speciﬁc L -momen ts to further explore whether an eﬀect exists at these lev els. Moment Explaine d Distanc e k = 1 0.0007 k = 2 0.1750 k = 3 0.1362 k ≥ 4 0.6881 T able 3: Explained distributional v ariation for the gran t eﬀect. 6 Discussion and Conclusion In this pap er w e introduced distributional discontin uity designs and distributional kink designs, a framew ork for studying distributional causal eﬀects for a scalar outcome at the boundary of a discon tinuit y or kink in treatmen t assignment. A k ey practical motiv ation for this approach is that many applied regression discon tinuit y and kink analyses remain cen tered on mean eﬀects, despite the fact that distributional changes are often of substan tive in terest. How ever, it is not our inten tion to replace these classical to ols; rather, w e show that distributional causal eﬀects pla y a complementary role. The W asserstein eﬀect establishes a natural reference p oint for b oth mean and quantiles eﬀects. Since we sho w that Ψ w eakly upp er b ounds the magnitude of the a verage treatment eﬀect, practitioners now hav e an interpretable index of treatment eﬀect heterogeneity whenever Ψ is meaningfully larger than | τ | . F urthermore, we show that the W asserstein distance admits an orthogonal decomp osition in to squared diﬀerences in L - momen ts. In practice, this decomp osition pro vides a principled wa y to answ er questions like “is the eﬀect mostly a shift in the distribution means, or is it driv en b y c hanges in disp ersion, asymmetry , or tail b ehavior?” Although this work primarily fo cuses on regression discontin uity and kink designs, the principles outlined here extend b eyond these sp eciﬁc applications. One could easily estimate the W asserstein eﬀect in a randomized controlled trial, for example, or an y setting where the exchangeabilit y assumption holds. The exact same in terpretations and decomp ositions describ ed in Section 3.2 will still hold, all that would change is the wa y quantile eﬀects are estimated. Th us, we see this work as opening the do or to ward considering distributional distances as in teresting causal eﬀects in their o wn right. It w ould b e interesting to extend this distributional analysis more broadly to other quasi-experimental designs, suc h the diﬀerence- in-diﬀerences framework. Finally , w e note that although this w ork establishes a nov el framew ork for distributional causal eﬀects at a treatment discontin uity , we only consider univ ariate outcomes. F uture 32 w ork could establish metho ds for estimating the W asserstein distance b et ween multiv ariate outcome distributions, where iden tiﬁcation and estimation no longer reduces to a distance b et ween quantile functions. F urthermore, it would be interesting to consider distributional distances beyond the W asserstein distance; p erhaps other distributional measures capture other underlying phenomena in the data, and admit their o wn useful decomp ositions. A c kno wledgmen ts This pap er is a pro duct of the Iow a Agriculture and Home Economics Exp erimen t Station, Ames, Iow a. Pro ject No. IOW03717 is supp orted by USDA/NIF A and State of Io wa funds. An y opinions, ﬁndings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reﬂect the views of the U.S. Department of Agriculture. The authors would lik e to sincerely thank Emileigh Harrison and T erence Chau for man y helpful discussions and Zhaoh ua Zeng for writing R code to estimate quan tile treatment eﬀects. References L. Ambrosio, N. Gigli, and G. Sa v aré. Gr adient Flows . Lectures in Mathematics. ETH Zürich. Birkhäuser Basel, 1 edition, 2005. ISBN 978-3-7643-7309-2. doi: 10.1007/b137080. M. Ando. How m uch should we trust regression-kink-design estimates? Empiric al Ec onomics , 53(3):1287–1322, Nov ember 2017. doi: 10.1007/s00181- 016- 1155- 8. URL https://ideas. repec.org/a/spr/empeco/v53y2017i3d10.1007_s00181- 016- 1155- 8.html . J. D. Angrist, G. W. Im b ens, and D. B. Rubin. Identiﬁcation of causal eﬀects using instru- men tal v ariables. Journal of the Americ an Statistic al Asso ciation , 91(434):444–455, 1996. doi: 10.1080/01621459.1996.10476902. URL https://www.tandfonline.com/doi/abs/10. 1080/01621459.1996.10476902 . S. Calonico, M. D. Cattaneo, and R. Titiunik. Robust nonparametric conﬁdence interv als for regression-discontin uit y designs. Ec onometric a , 82(6):2295–2326, 2014. doi: h ttps: //doi.org/10.3982/ECT A11757. URL https://onlinelibrary.wiley.com/doi/abs/10. 3982/ECTA11757 . S. Calonico, M. D. Cattaneo, and M. H. F arrell. Optimal bandwidth choice for robust bias- corrected inference in regression discon tinuit y designs. The Ec onometrics Journal , 23(2): 192–210, 11 2019a. ISSN 1368-4221. doi: 10.1093/ectj/utz022. URL https://doi.org/10. 1093/ectj/utz022 . S. Calonico, M. D. Cattaneo, M. H. F arrell, and R. Titiunik. Regression discontin uit y designs using cov ariates. The R eview of Ec onomics and Statistics , 101(3):442–451, July 2019b. doi: None. URL https://ideas.repec.org/a/tpr/restat/v101y2019i3p442- 451.html . D. Card, D. S. Lee, Z. Pei, and A. W eb er. Inference on causal eﬀects in a generalized re- gression kink design. Ec onometric a , 83(6):2453–2483, 2015. doi: https://doi.org/10.3982/ ECT A11224. URL https://onlinelibrary.wiley.com/doi/abs/10.3982/ECTA11224 . D. Card, D. S. Lee, Z. P ei, and A. W eb er. Regression kink design: Theory and practice. W orking P ap er 22781, National Bureau of Economic Researc h, Octob er 2016. URL http: //www.nber.org/papers/w22781 . 33 M. D. Cattaneo and R. Titiunik. Regression discontin uity designs. Annual R eview of Ec onomics , 14(V olume 14, 2022):821–851, 2022. ISSN 1941-1391. doi: h ttps://doi. org/10.1146/ann urev- economics- 051520- 021409. URL https://www.annualreviews.org/ content/journals/10.1146/annurev- economics- 051520- 021409 . M. D. Cattaneo, M. Jansson, and X. Ma. Simple lo cal p olynomial density estimators. Journal of the Americ an Statistic al Asso ciation , 115(531):1449–1455, 2020. doi: 10.1080/01621459. 2019.1635480. URL https://doi.org/10.1080/01621459.2019.1635480 . H. Chen, H. D. Chiang, and Y. Sasaki. Quantile treatment eﬀects in regression kink designs. Ec onometric The ory , 36(6):1167–1191, 2020. doi: 10.1017/S0266466619000409. H. D. Chiang and Y. Sasaki. Causal inference by quantile regression kink designs. Jour- nal of Ec onometrics , 210(2):405–433, 2019. ISSN 0304-4076. doi: https://doi.org/10. 1016/j.jeconom.2019.02.005. URL https://www.sciencedirect.com/science/article/ pii/S0304407619300387 . H. D. Chiang, Y.-C. Hsu, and Y. Sasaki. Robust uniform inference for quantile treatmen t eﬀects in regression discontin uit y designs. Journal of Ec onometrics , 211(2):589–618, 2019. ISSN 0304-4076. doi: https://doi.org/10.1016/j.jeconom.2019.03.006. URL https://www. sciencedirect.com/science/article/pii/S0304407619300569 . J. B. Con w ay . A Course in F unctional Analysis , volume 96 of Gr aduate T exts in Math- ematics . Springer, New Y ork, NY, 2 edition, 1990. ISBN 978-0-387-97245-9. doi: 10.1007/978- 1- 4757- 4383- 8. T. D. Cook. “w aiting for life to arriv e”: A history of the regression-discontin uity design in psychology , statistics and economics. Journal of Ec onometrics , 142(2):636–654, 2008. ISSN 0304-4076. doi: https://doi.org/10.1016/j.jeconom.2007.05.002. URL https://www. sciencedirect.com/science/article/pii/S0304407607001108 . The regression discon ti- n uity design: Theory and applications. M. Dahlb erg, E. Mörk, J. Rattsø, and H. Ågren. Using a discontin uous gran t rule to identify the eﬀect of grants on lo cal taxes and spending. Journal of Public Ec onomics , 92(12):2320–2335, 2008. ISSN 0047-2727. doi: https://doi.org/10.1016/j.jpubeco.2007.05.004. URL https: //www.sciencedirect.com/science/article/pii/S0047272707000886 . New Directions in Fiscal F ederalism. D. V. Dijck e. Regression discon tin uity design with distribution-v alued outcomes, 2025. URL https://arxiv.org/abs/2504.03992 . B. R. F randsen, M. F rölic h, and B. Melly . Quan tile treatmen t eﬀects in the regression dis- con tinuit y design. Journal of Ec onometrics , 168(2):382–395, 2012. ISSN 0304-4076. doi: h ttps://doi.org/10.1016/j.jeconom.2012.02.004. URL https://www.sciencedirect.com/ science/article/pii/S0304407612000607 . M. F rölich and M. Hub er. Including cov ariates in the regression discon tinuit y design. Journal of Business & Ec onomic Statistics , 37(4):736–748, 2019. doi: 10.1080/07350015.2017.1421544. URL https://doi.org/10.1080/07350015.2017.1421544 . P . Ganong and S. Jäger. A p ermutation test for the regression kink design. Journal of the A meric an Statistic al Asso ciation , 113(522):494–504, 2018. doi: 10.1080/01621459.2017. 1328356. URL https://doi.org/10.1080/01621459.2017.1328356 . 34 A. Gretton, K. M. Borgw ardt, M. J. Rasc h, B. Schölk opf, and A. Smola. A kernel tw o- sample test. Journal of Machine L e arning R ese ar ch , 13(25):723–773, 2012. URL http: //jmlr.org/papers/v13/gretton12a.html . F. F. Gunsilius. Distributional synthetic con trols. Ec onometric a , 91(3):1105–1117, 2023. doi: https://doi.org/10.3982/ECT A18260. URL https://onlinelibrary.wiley.com/doi/ abs/10.3982/ECTA18260 . F. F. Gunsilius. A primer on optimal transp ort for causal inference with observ ational data, 2025. URL . J. Gury an. Does money matter? regression-discontin uity estimates from education ﬁnance reform in massach usetts. W orking P ap er 8269, National Bureau of Economic Researc h, Ma y 2001. URL http://www.nber.org/papers/w8269 . J. Hahn, P . T o dd, and W. V. der Klaau w. Iden tiﬁcation and estimation of treatmen t eﬀects with a regression-discon tinuit y design. Ec onometric a , 69(1):201–209, 2001. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/2692190 . S. Ho derlein and E. Mammen. Identiﬁcation of marginal eﬀects in nonseparable mo dels with- out monotonicity . Ec onometric a , 75(5):1513–1518, 2007. doi: h ttps://doi.org/10.1111/j. 1468- 0262.2007.00801.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j. 1468- 0262.2007.00801.x . S. Hoderlein and E. Mammen. Identiﬁcation and estimation of lo cal a v erage deriv atives in non- separable models without monotonicit y . The Ec onometrics Journal , 12(1):1–25, 2009. doi: h ttps://doi.org/10.1111/j.1368- 423X.2008.00273.x. URL https://onlinelibrary.wiley. com/doi/abs/10.1111/j.1368- 423X.2008.00273.x . J. R. M. Hosking. L-moments: Analysis and estimation of distributions using linear combina- tions of order statistics. Journal of the R oyal Statistic al So ciety. Series B (Metho dolo gic al) , 52(1):105–124, 1990. ISSN 00359246. URL http://www.jstor.org/stable/2345653 . G. Imbens and K. Kaly anaraman. Optimal bandwidth c hoice for the regression discon tin uity estimator. The R eview of Ec onomic Studies , 79(3):933–959, 2012. ISSN 00346527, 1467937X. URL http://www.jstor.org/stable/23261375 . G. W. Imbens and T. Lemieux. Regression discon tin uity designs: A guide to practice. Jour- nal of Ec onometrics , 142(2):615–635, 2008. ISSN 0304-4076. doi: https://doi.org/10. 1016/j.jeconom.2007.05.001. URL https://www.sciencedirect.com/science/article/ pii/S0304407607001091 . The regression discontin uity design: Theory and applications. S. Janson. Wiener chaos , page 17–22. Cambridge T racts in Mathematics. Cambridge Univ er- sit y Press, 1997. Z. Jin, Y. Zhang, Z. Zhang, and Y. Zhou. Identiﬁcation and inference in a quantile regression discon tinuit y design under rank similarit y with co v ariates. Ec onometric The ory , 41(1): 172–217, 2025. doi: 10.1017/S026646662300021X. K. Karh unen. Zur sp ektraltheorie sto chastisc her prozesse. 1946. URL https://api. semanticscholar.org/CorpusID:118738283 . 35 K. Kim, J. Kim, and E. H. Kennedy . Causal eﬀects based on distributional distances, 2024. URL . D. Kurisu, Y. Zhou, T. Otsu, and H.-G. Müller. Geo desic causal inference, 2025. URL https://arxiv.org/abs/2406.19604 . D. S. Lee. Randomized exp erimen ts from non-random selection in u.s. house elections. Journal of Ec onometrics , 142(2):675–697, F ebruary 2008. doi: None. URL https://ideas.repec. org/a/eee/econom/v142y2008i2p675- 697.html . D. S. Lee and T. Lemieux. Regression discon tinuit y designs in economics. Journal of Ec onomic Liter atur e , 48(2):281–355, June 2010. doi: 10.1257/jel.48.2.281. URL https://www.aeaweb. org/articles?id=10.1257/jel.48.2.281 . M. Lo ève. Pr ob ability The ory I , v olume 45 of Gr aduate T exts in Mathematics . Springer, New Y ork, NY, 4 edition, 1977. ISBN 978-1-4684-9464-8. doi: 10.1007/978- 1- 4684- 9464- 8. A. Luedtke, M. Carone, and M. J. v an der Laan. An omnibus non-parametric test of equalit y in distribution for unkno wn functions. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 81(1):75–99, 11 2018. ISSN 1369-7412. doi: 10.1111/rssb.12299. URL https://doi.org/10.1111/rssb.12299 . H. Lundqvist, M. Dahlb erg, and E. Mörk. Stim ulating lo cal public employmen t: Do general gran ts work? Americ an Ec onomic Journal: Ec onomic Policy , 6(1):167–192, 2014. ISSN 19457731, 1945774X. URL http://www.jstor.org/stable/43189370 . J. McCrary . Manipulation of the running v ariable in the regression discon tin uity design: A densit y test. Journal of Ec onometrics , 142(2):698–714, 2008. ISSN 0304-4076. doi: h ttps://doi.org/10.1016/j.jeconom.2007.05.005. URL https://www.sciencedirect.com/ science/article/pii/S0304407607001133 . The regression discontin uity design: Theory and applications. H. S. Nielsen, T. Sørensen, and C. T ab er. Estimating the eﬀect of studen t aid on college enrollmen t: Evidence from a gov ernment grant p olicy reform. Americ an Ec onomic Journal: Ec onomic Policy , 2(2):185–215, 2010. Z. Qu and J. Y o on. Nonparametric estimation and inference on conditional quan tile pro cesses. Journal of Ec onometrics , 185(1):1–19, 2015. ISSN 0304-4076. doi: https://doi.org/10. 1016/j.jeconom.2014.10.008. URL https://www.sciencedirect.com/science/article/ pii/S0304407614002462 . Z. Qu and J. Y o on. Uniform inference on quan tile eﬀects under sharp regression discontin uity designs. Journal of Business & Ec onomic Statistics , 37(4):625–647, 2019. doi: 10.1080/ 07350015.2017.1407323. URL https://doi.org/10.1080/07350015.2017.1407323 . K. Schindl and L. W asserman. Causal geo desy: Counterfactual estimation along the path b et ween correlation and causation, 2025. URL . D. Sejdinovic, B. Srip erumbudur, A. Gretton, and K. F ukumizu. Equiv alence of distance- based and RKHS-based statistics in h yp othesis testing. The Annals of Statistics , 41(5):2263 – 2291, 2013. doi: 10.1214/13- A OS1140. URL https://doi.org/10.1214/13- AOS1140 . 36 G. P . Sillitto. Deriv ation of approximan ts to the inv erse distribution function of a con tin uous univ ariate p opulation from the order statistics of a sample. Biometrika , 56(3):641–650, 12 1969. ISSN 0006-3444. doi: 10.1093/biomet/56.3.641. URL https://doi.org/10.1093/ biomet/56.3.641 . D. L. Thistleth waite and D. T. Campb ell. Regression-discontin uit y analysis: An alternative to the ex p ost facto exp erimen t. Journal of Educ ational psycholo gy , 51(6):309, 1960. W. T orous, F. Gunsilius, and P . Rigollet. An optimal transp ort approach to estimating causal eﬀects via nonlinear diﬀerence-in-diﬀerences. Journal of Causal Infer enc e , 12(1):20230004, 2024. doi: doi:10.1515/jci- 2023- 0004. URL https://doi.org/10.1515/jci- 2023- 0004 . A. W. v. d. V aart. Asymptotic Statistics . Cam bridge Series in Statistical and Probabilistic Mathematics. Cambridge Universit y Press, 1998. S. S. V allender. Calculation of the wasserstein distance b et ween probability distributions on the line. The ory of Pr ob ability & Its Applic ations , 18(4):784–786, 1974. doi: 10.1137/1118101. URL https://doi.org/10.1137/1118101 . I. V erdinelli and L. W asserman. Decorrelated v ariable imp ortance. Journal of Machine L e arn- ing R ese ar ch , 25(7):1–27, 2024. URL http://jmlr.org/papers/v25/22- 0801.html . C. Villani et al. Optimal tr ansp ort: old and new , volume 338. Springer, 2009. Z. W ang and Z. Zhang. A uniﬁed framew ork for identiﬁcation and inference of lo cal treatment eﬀects in sharp regression kink designs, 2025. URL . K. Y u and M. C. Jones. Local linear quan tile regression. Journal of the Americ an Statistic al Asso ciation , 93(441):228–237, 1998. doi: 10.1080/01621459.1998.10474104. URL https: //www.tandfonline.com/doi/abs/10.1080/01621459.1998.10474104 . Y. Zhou, D. Kurisu, T. Otsu, and H.-G. Müller. Geo desic diﬀerence-in-diﬀerences, 2025. URL https://arxiv.org/abs/2501.17436 . SUPPLEMENT AR Y MA TERIAL Section A : Con tains all proofs from the main text and supplemen tary material, including: Section A.1 : Pro of of Theorem 1 . Section A.2 : Pro of of Theorem 2 . Section A.3 : Pro of of Theorem 3 . Section A.4 : Pro of of Corollary 1 . Section A.5 : Pro of of Prop osition 1 . Section A.6 : Pro of of Lemma 2 . Section A.7 : Pro of of Lemma 3 . Section A.8 : Pro of of Theorem 4 . Section A.9 : Pro of of Theorem 5 . 37 A Pro ofs A.1 Pro of of Theorem 1 Pr o of: First, observe that w e may write the av erage treatmen t eﬀect at the cutoﬀ, τ , in terms of quantile functions, i.e., τ = Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) du where Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } . Then, w e immediately obtain our desired inequalit y by applying the Cauch y-Sc hw arz inequalit y , | τ | =     Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) du     ≤  Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) 2 du  1 / 2  Z 1 0 1 2 du  1 / 2 = Ψ . Next, we show that | τ | = Ψ under an additive treatment eﬀect Q 1 ( u ) = Q 0 ( u ) + δ , where the quan tiles of the limiting counterfactual distributions ab o ve and b elo w the cutoﬀ only diﬀer by a translation in δ . Immediately , this yields τ = Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) du = Z 1 0 δ du = δ and furthermore that Ψ =  Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) 2 du  1 / 2 =  Z 1 0 δ 2 du  1 / 2 = | δ | . This prov es one direction, i.e. that under an additive treatmen t eﬀect then | τ | = Ψ . One easy w ay to pro ve the other direction is to consider the additive decomp osition of the W asserstein distance, Ψ 2 = τ 2 + V (∆ Q ( U )) . If Ψ = | τ | , this implies that V (∆ Q ( U )) = 0 , and therefore that the quan tile treatment eﬀect ∆ Q ( U ) is constan t. A.2 Pro of of Theorem 2 Pr o of: First, recall by Lemma 1 that the W asserstein eﬀect is identiﬁed as Ψ 2 = Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) 2 du where Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } . Next, let { P ∗ k } ∞ k =0 b e the orthogonal basis of L 2 (0 , 1) deﬁned by the shifted Legendre p olynomials such that the k th shifted Legendre p olynomial is deﬁned as P ∗ k ( x ) = ( − 1) k k X j =0  k j  k + j j  ( − x ) j . F or a ∈ { 0 , 1 } w e deﬁne the L -moments under the limiting quan tiles Q 1 ( u ) and Q 0 ( u ) to be λ ( a ) k = Z 1 0 Q a ( u ) P ∗ k − 1 ( u ) du. 38 F rom here, under the assumption that P a | x ∈ P 2 ( R ) for a ∈ { 0 , 1 } , by Hosking ( 1990 ) and Sillitto ( 1969 ) it follows that Q a ( u ) = ∞ X k =1 (2 k − 1) λ ( a ) k P ∗ k − 1 ( u ) . and consequently , f ( u ) = Q 1 ( u ) − Q 0 ( u ) = ∞ X k =1 (2 k − 1)  λ (1) k − λ (0) k  P ∗ k − 1 ( u ) . No w, let S K ( u ) = P K k =1 (2 k − 1)( λ (1) k − λ (0) k ) P ∗ k − 1 ( u ) b e a partial summation of f , and note that under the mean square conv ergence established by Sillitto ( 1969 ), || f || 2 2 = lim K →∞ || S K ( u ) || 2 2 . By the orthogonalit y of the shifted Legendre p olynomials it follo ws that || S K ( u ) || 2 2 = K X k =1 (2 k − 1) 2  λ (1) k − λ (0) k  2 || P ∗ k − 1 || 2 2 = K X k =1 (2 k − 1)  λ (1) k − λ (0) k  2 since by Hosking ( 1990 ) w e kno w that || P ∗ k || 2 2 = (2 k + 1) − 1 . Therefore, taking the limit as K → ∞ w e can see that Ψ 2 = ∞ X k =1 (2 k − 1)  λ (1) k − λ (0) k  2 thereb y completing the pro of. A.3 Pro of of Theorem 3 Pr o of: T o b egin, let T n = nh Z 1 0 [∆ b Q ( u )] 2 du b e our test statistic where ∆ b Q ( u ) = b Q 1 ( u ) − b Q 0 ( u ) are the lo cal p olynomial estimators of the conditional quan tile functions describ ed in Section 3.3 and Chiang et al. ( 2019 ). F urthermore, assume the regularity conditions discussed in Section 3.3 and Chiang et al. ( 2019 ) hold. Then, under the n ull h yp othesis, √ nh ∆ b Q ( u ) ⇝ G ( u ) in L 2 ([0 , 1]) where G is a mean-zero Gaussian pro cess with co v ariance k ernel κ ( u, v ) = Cov  G ( u ) , G ( v )  . Let { λ k , ϕ k } ∞ k =1 denote the eigenv alue-eigenfunction pairs of the co v ariance op erator induced b y κ , such that { ϕ k } ∞ k =1 forms an orthonormal basis of L 2 ([0 , 1]) . Supp ose that P ∞ k =1 λ k < ∞ 39 and λ 1 > 0 . Then w e ma y apply the Karhunen-Loève theorem ( Karh unen , 1946 ; Lo ève , 1977 ) to expand G ( u ) as G ( u ) = ∞ X k =1 p λ k Z k ϕ k ( u ) where Z k iid ∼ N (0 , 1) for all k . Next, b y the con tinuou s mapping theorem, it follo ws that T n ⇝ T := Z 1 0 [ G ( u )] 2 du = ∞ X k =1 λ k Z 2 k where the ﬁnal equality holds by an application of Parsev al’s iden tity to { ϕ k } ∞ k =1 . F rom here, let F ( t ) = P ( T ≤ t ) and c α = inf { t ∈ R : F ( t ) ≥ 1 − α } denote the (1 − α ) quantile of T . Note that P ( T = c α ) = 0 . With this mac hinery established, we ﬁrst consider ho w to deal with truncation of the series P ∞ k =1 λ k Z 2 k . Let T K = K X k =1 λ k Z 2 k . Imp ortan tly , we cannot ac hieve a level- α test for a ﬁxed v alue of K unless it is the case that λ k = 0 for all k > K . T o see this, let c K,α b e the (1 − α ) quan tile under T K . Then, because T K ≤ T almost surely , it follows that c K,α ≤ c α and therefore, P ( T > c K,α ) ≥ P ( T > c α ) . Consequen tly , for a ﬁxed K our tests will b e an ti-conserv ative. Thus, we must let K → ∞ ; consequen tly , w e giv e K an index in n through the remainder of the pro of. Let F K n ( t ) = P ( T K n ≤ t ) and note that since T K n → T almost surely , it follows that for eac h ﬁxed t , I ( T K n ≤ t ) → I ( T ≤ t ) . Th us, b y the dominated conv ergence theorem, for all t , F K n ( t ) → F ( t ) . Clearly , fol- lo wing standard quantile conv ergence argumen ts it then follo ws that c K n ,α → c α as K n → ∞ ( V aart , 1998 ). W e now control the eﬀect of estimating the eigenv alues. Let b κ n denote an estimator of κ such that b λ k,n denotes the corresp onding estimated eigen v alues. Notably , it is imp ortant to relate ho w well κ is estimated to the n umber of terms we include in our truncation T K n . T o that end, supp ose that || b κ n − κ || 2 = o P ( K − 1 / 2 n ) where || κ || 2 2 = R 1 0 R 1 0 κ ( u, v ) 2 du dv is the Hilb ert-Sc hmidt norm. Then, it follo ws that K n X k =1 | b λ k,n − λ k | ≤ p K n K n X k =1 ( b λ k,n − λ k ) 2 ! 1 / 2 ≤ p K n || b κ n − κ || 2 = o P (1) (7) where the ﬁrst inequality follows by applying Cauc hy-Sc hw arz and the second inequality follo ws b y applying the Hoﬀman-Wielandt inequality for op erators. No w, deﬁne b T K n = P K n k =1 b λ k,n Z 2 k . 40 Our goal here is to show that | b T K n − T K n | = o P (1) . T o do so, let Z n = { ( X i , A i , Y i ) } n i =1 and deﬁne the ev en t E n = n E h | b T K n − T K n | | Z n i > δ o for some δ > 0 . Then, note that for any ε > 0 it follo ws that P  | b T K n − T K n | > ε  = P  | b T K n − T K n | > ε, E n  + P  | b T K n − T K n | > ε, E c n  ≤ P  E h | b T K n − T K n | | Z n i > δ  + P  | b T K n − T K n | > ε, E c n  ( i ) ≤ P  E h | b T K n − T K n | | Z n i > δ  + δ ε where ( i ) follows b y applying a conditional Marko v inequalit y . Then, observe by Equation (7) , E h | b T K n − T K n | | Z n i ≤ K n X k =1 | b λ k,n − λ k | E [ Z 2 k ] = o P (1) . Th us, for eac h ﬁxed δ it follo ws that P ( E [ | b T K n − T K n | | Z n ] > δ ) → 0 and consequen tly , lim sup n P  | b T K n − T K n | > ε  ≤ δ ε . Then, by taking δ → 0 w e can see that P ( | b T K n − T K n | > ε ) → 0 . F rom here, deﬁne the conditional distribution function b F n ( t ) = P K n X k =1 b λ k,n Z 2 k ≤ t | Z n ! let b c n,α b e the corresponding conditional (1 − α ) quan tile, and deﬁne p n,ε = P  | b T K n − T K n | > ε | Z n  . Then, on the even t {| b T K n − T K n | ≤ ε } , for an y t ∈ R and ε > 0 it follo ws that F K n ( t − ε ) − p n,ε ≤ b F n ( t ) ≤ F K n ( t + ε ) + p n,ε and consequently , sup t ∈ R | b F n ( t ) − F K n ( t ) | ≤ p n,ε + sup t ∈ R  F K n ( t + ε ) − F K n ( t − ε )  . Th us, since we hav e already sho wn F K n ( t ) → F ( t ) for each t (so that sup t | F K n ( t ) − F ( t ) | → 0 b y P ólya’s theorem) and since p n,ε = o P (1) (which follo ws b y applying Marko v’s inequality) it follo ws that after taking ε → 0 , sup t ∈ R | b F n ( t ) − F K n ( t ) | = o P (1) . Then, since F K n is contin uous and strictly increasing at c K n ,α it again follo ws that b c n,α − c K n ,α = o P (1) from standard arguments for con vergence of quan tiles. 41 Finally , w e consider the eﬀect of approximating the critical v alue via Monte-Carlo sim ulation. The argument here is standard. Deﬁne the Monte-Carlo draws b T ∗ K n ,b = K n X k =1 b λ k,n Z 2 k,b for b = 1 , . . . , B n and let b c ∗ n,α denote the empirical (1 − α ) quan tile computed from { b T ∗ K n ,b } B n b =1 . Let the Mon te-Carlo empirical distribution function be b F n,B n ( t ) = 1 B n B n X b =1 I ( b T ∗ K n ,b ≤ t ) On the ev ent A n = { b λ 1 ,n > 0 } , the conditional distribution b F n is contin uous, and thus con- tin uous at its quantiles b c n,α . Therefore, applying the Glivenk o-Cantelli theorem conditionally on ( X 1 , . . . , X n ) , it follo ws that as B n → ∞ , sup t ∈ R | b F n,B n ( t ) − b F n ( t ) | → 0 and therefore, b c ∗ n,α − b c n,α = o P (1) . Finally , since b λ 1 ,n → λ 1 > 0 , it follows that P ( A n ) → 1 , so this result holds unconditionally . Putting everything together, it follo ws that under H 0 , lim n →∞ P  T n > b c ∗ n,α  = P ( T > c α ) = α. A.4 Pro of of Corollary 1 Pr o of: First, recall that w e assume K n ≍ r − 2 / (2 β − 1) n . Thus, we assume there exist constan ts 0 < c 1 ≤ c 2 < ∞ and n 0 suc h that for all n ≥ n 0 , c 1 r − 2 / (2 β − 1) n ≤ K n ≤ c 2 r − 2 / (2 β − 1) n . Next, recall that we assume a p olynomial eigenv alue deca y . That is, there exist constants C λ > 0 and β > 1 such that for all k , λ k ≤ C λ k − β . With this in mind, we proceed with the truncation bias. Observe that for an y ﬁxed K n , X k>K n λ k ≤ C λ X k>K n k − β ( i ) ≤ C λ Z ∞ K n x − β dx =  C λ β − 1  K 1 − β n where ( i ) follows since f ( x ) = x − β is a decreasing function. F rom here, it follows that for all n ≥ n 0  C λ β − 1  K 1 − β n ≤ C λ β − 1  c 1 r − 2 / (2 β − 1) n  1 − β = C λ c 1 − β 1 β − 1  r 2( β − 1) / (2 β − 1) n  and therefore, X k>K n λ k = O  r 2( β − 1) (2 β − 1) n  . 42 Next, w e consider the estimation error. Recall that w e assumed || b κ n − κ || 2 = O p ( r n ) for some r n → 0 . Thus, it is clear that √ K n || b κ n − κ || 2 = O p ( √ K n r n ) . Then, it follows that p K n r n ≤  c 2 r − 2 / (2 β − 1) n  1 / 2 r n = √ c 2 r − 1 2 β − 1 n r n = √ c 2 r 2( β − 1) (2 β − 1) n and consequently , p K n || b κ n − κ || 2 = O p  r 2( β − 1) (2 β − 1) n  thereb y completing the proof. Note: the c hoice K n ≍ r − 2 / (2 β − 1) n can easily b e seen b y noting that the truncation bias scales like K 1 − β n . Thus, if w e set K 1 − β n ≍ p K n r n and solve, we obtain the aforementioned rate. A.5 Pro of of Prop osition 1 Pr o of: T o b egin, let T n = nh R 1 0 [∆ b Q ( u )] 2 du b e our test statistic. Then, recall that under the conditions describ ed in Section 3.3 and Section A.3 it follows that T n ⇝ T = ∞ X k =1 λ k Z 2 k where Z k ∼ N (0 , 1) for all k and λ k are the eigen v alues of the cov ariance kernel κ ( u, v ) . F rom here, observe that E [ T ] = E " ∞ X k =1 λ k Z 2 k # = ∞ X k =1 λ k E  Z 2 k  = ∞ X k =1 λ k = Z 1 0 κ ( u, u ) du and V ( T ) = V ∞ X k =1 λ k Z 2 k ! = ∞ X k =1 λ 2 k V  Z 2 k  = 2 ∞ X k =1 λ 2 k = 2 Z 1 0 Z 1 0 κ ( u, v ) 2 du dv . Th us, it follo ws that T has mean µ := R 1 0 κ ( u, u ) du and v ariance σ 2 := 2 R 1 0 R 1 0 κ ( u, v ) 2 du dv . Consequen tly , it follo ws that ( T − µ ) /σ has mean zero and unit v ariance. F urthermore, b y Slutsky’s Theorem it follows that as n → ∞ T n − b µ b σ − → T − µ σ under the assumption that b µ p → µ and b σ p → σ . F rom here, follo wing the one-sided Cheb yshev inequalit y (i.e. Cantelli’s inequalt y) discussed in Luedtk e et al. ( 2018 ), w e note that for any mean zero, unit v ariance random v ariable X and t > 0 , it follows that P ( X ≥ t ) ≤ 1 1 + t 2 . Then, it is easy to see b y the P ortmanteau Theorem, lim sup n →∞ P H 0 ( T n > b c ub n, 1 − α ) ≤ P H 0 T − µ σ > r 1 − α α ! ≤ 1 1 + 1 − α α = α. 43 A.6 Pro of of Lemma 2 Pr o of: T o b egin, let b Ψ 2 n = R 1 0 [∆ b Q ( u )] 2 du . Then, recall that w e deﬁne our interv al as C ′ n = " b Ψ 2 n ± z 1 − α/ 2 r b s 2 n + c 2 nh # where b s n is the estimated standard deviation of Ψ 2 , z 1 − α/ 2 is the 1 − α/ 2 quan tile of a standard Normal distribution, and c is some constan t. Then, observe that P  Ψ 2 ∈ C ′ n  = P    b Ψ 2 n − Ψ 2    > z 1 − α/ 2 r b s 2 n + c 2 nh ! ≤ P    b Ψ 2 n − Ψ 2    > z 1 − α/ 2 r c 2 nh ! = P  b Ψ 2 n − Ψ 2  2 > z 2 1 − α/ 2 c 2 nh ! . F rom here, w e apply Mark ov’s inequalit y to see that P  b Ψ 2 n − Ψ 2  2 > z 2 1 − α/ 2 c 2 nh ! ≤ nh z 2 1 − α/ 2 c 2 E h ( b Ψ 2 n − Ψ 2 ) 2 i = nh z 2 1 − α/ 2 c 2  E h b Ψ 2 n − Ψ 2 i 2 + V ( b Ψ 2 n )  = o (1) where the last equality holds under the assumption that E [ b Ψ 2 n − Ψ 2 ] = o (( nh ) − 1 / 2 ) and V ( b Ψ 2 n ) = o (( nh ) − 1 ) . Therefore, as n → ∞ it follo ws that P  Ψ 2 ∈ C ′ n  → 0 . A.7 Pro of of Lemma 3 Pr o of: The proof of Lemma 3 follows analogously to Lemma 1 of W ang and Zhang ( 2025 ) with tw o mo diﬁcations: one, the baseline treatment at the kink is random ( T 0 = b ( x 0 , η ) ) rather than a constan t t 0 = b ( x 0 ) ; and tw o, the p erturbation direction is δ ( ω ( η ) / ∆ B ) rather than the constant shift δ . With that in mind, let F δ ( · ) = F Y δ | X = x 0 ( · ) and F 0 ( · ) = F Y 0 | X = x 0 ( · ) . F urthermore, deﬁne h δ ( · ) = ( F δ − F 0 ) /δ . Then, under Assumption 3 , w e ha ve that as δ → 0 , ϕ ( F δ ) − ϕ ( F 0 ) δ = ϕ ′ F 0 (∆ F I d ) + o (1) . where ∆ F I d = lim δ → 0 { h δ } . F rom here, let Y δ = g ( T 0 + δ ( ω ( η ) / ∆ B ) , x 0 , ε ) , Y 0 = g ( T 0 , x 0 , ε ) , and deﬁne Z =  ω ( η ) ∆ B  g 1 ( T 0 , x 0 , ε ) . Then, w e ma y deﬁne the remainder term R δ = Y δ − Y 0 − δ Z suc h that Y δ = Y 0 + δ Z + R δ . W e can no w see that h δ ( y ) = F δ ( y ) − F 0 ( y ) δ = 1 δ  E  I ( Y 0 + δ Z + R δ ≤ y ) − I ( Y 0 ≤ y ) | X = x 0   . 44 Our ﬁrst goal is to show that the remainder term R δ drops out. T o that end, w e deﬁne e h δ ( y ) = 1 δ  E [ I ( Y 0 + δ Z ≤ y ) − I ( Y 0 ≤ y ) | X = x 0 ]  . Then, observe that    h δ ( y ) − e h δ ( y )    ≤ 1 | δ |  E  | I ( Y 0 + δ Z + R δ ≤ y ) − I ( Y 0 + δ Z ≤ y ) | | X = x 0   . F rom here, deﬁne the ev ents U = { Y 0 + δ Z + R δ ≤ y } and V = { Y 0 + δ Z ≤ y } and note that | I ( U ) − I ( V ) | = I ( U △ V ) where U △ V denotes the symmetric diﬀerence. Our goal no w is to show the set inclusion U △ V ⊆ {| y − ( Y 0 + δ Z ) | ≤ | R δ |} holds. First, consider the case where R δ ≥ 0 . Then, it is clear that U ⊆ V . F urthermore, the set diﬀerence V \ U o ccurs when { Y 0 + δ Z ≤ y } ∩ { Y 0 + δ Z + R δ > y } ⇐ ⇒ { y − R δ < Y 0 + δ Z ≤ y } , and so it follows that | y − ( Y 0 + δ Z ) | ≤ R δ = | R δ | . Next, supp ose that R δ < 0 . Now we ha ve that V ⊆ U and the set diﬀerence U \ V o ccurs when { Y 0 + δ Z > y } ∩ { Y 0 + δ Z + R δ ≤ y } ⇐ ⇒ { y < Y 0 + δ Z ≤ y − R δ } . Th us, | y − ( Y 0 + δ Z ) | ≤ − R δ = | R δ | . Putting b oth cases together, it follows that    h δ ( y ) − e h δ ( y )    ≤ 1 | δ |  E  I    y − ( Y 0 + δ Z )   ≤ | R δ |  | X = x 0   ≤ 1 | δ |  P ( | R δ | ≥ c | δ | | X = x 0 ) + P ( | Y 0 + δ Z − y | ≤ c | δ | | X = x 0 )  . where the second inequality follows after ﬁxing some c > 0 and splitting on the ev ent that | R δ | ≥ c | δ | or | R δ | < c | δ | . F rom here, it follows by Assumption 5 ( i ) that as δ → 0 , then 1 | δ | P ( | R δ | ≥ c | δ | | X = x 0 ) = o (1) . In the case of the second term, we no w lev erage Assumption 5 ( ii ) to see that P    Y 0 + δ Z − y   ≤ c | δ | | X = x 0  = Z Z y − δ z + c | δ | y − δ z − c | δ | f Y 0 ,Z | X ( a, z | x 0 ) da dz ≤ 2 c | δ | Z | ϖ ( z ) | dz , and so, consequen tly , it follo ws that    h δ ( y ) − e h δ ( y )    = o (1) + O ( c ) . Then, since the choice of c > 0 w as arbitrary , w e can see that h δ ( y ) − e h δ ( y ) → 0 . Next, we ev aluate the limit of e h δ ( y ) as δ → 0 using the join t densit y of ( Y 0 , Z ) . Recall that e h δ ( y ) = 1 δ  P ( Y 0 + δ Z ≤ y | X = x 0 ) − P ( Y 0 ≤ y | X = x 0 )  . 45 Th us, using the identit y I ( U ≤ v ) − I ( U ≤ w ) = I ( w < U ≤ v ) − I ( v < U ≤ w ) with U = Y 0 , v = y − δ Z , and w = y , w e obtain the more con v enient expression e h δ ( y ) = 1 δ  P ( y < Y 0 ≤ y − δ Z | X = x 0 ) − P ( y − δ Z < Y 0 ≤ y | X = x 0 )  . (8) Supp ose δ > 0 and let f ( a, z ) = f Y 0 ,Z | X ( a, z | x 0 ) . Since y − δ Z > y requires Z < 0 and y − δ Z < y requires Z > 0 , we can write P ( y < Y 0 ≤ y − δ Z | X = x 0 ) = Z 0 −∞ Z y − δ z y f ( a, z ) da dz , P ( y − δ Z < Y 0 ≤ y | X = x 0 ) = Z ∞ 0 Z y y − δ z f ( a, z ) da dz . Then, applying the change of v ariables u = ( y − a ) /δ (such that a = y − δ u and da = − δ du ) it follows that 1 δ P ( y < Y 0 ≤ y − δ Z | X = x 0 ) = Z 0 −∞ Z 0 z f ( y − δ u, z ) du dz , 1 δ P ( y − δ Z < Y 0 ≤ y | X = x 0 ) = Z ∞ 0 Z z 0 f ( y − δ u, z ) du dz . Substituting b oth in to Equation (8) yields e h δ ( y ) = Z 0 −∞ Z 0 z f ( y − δ u, z ) du dz − Z ∞ 0 Z z 0 f ( y − δ u, z ) du dz . F rom here, contin uity of f ( · , z ) in its ﬁrst argumen t and the domination condition sp eciﬁed in Assumption 5 ( ii ) implies that lim δ → 0 n e h δ ( y ) o = Z 0 −∞ Z 0 z f ( y , z ) du dz − Z ∞ 0 Z z 0 f ( y , z ) du dz = Z ( − z ) f ( y , z ) dz . Rep eating analogous calculations in the case where δ < 0 yields the same limit. Therefore, the tw o-sided deriv ativ e exists and w e can say that ∆ F I d ( y ) = ∂ ∂ δ F δ ( y )     δ =0 = Z ( − z ) f Y 0 ,Z | X ( y , z | x 0 ) dz = Z ( − z ) f Y | X ( y | x 0 ) f Z | Y ,X ( z | y , x 0 ) dz = − f Y | X ( y | x 0 ) E  ω ( η ) ∆ B  g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0  whic h is the desired equation, and th us completes the proof. 46 A.8 Pro of of Theorem 4 Pr o of: First, recall by Lemma 3 that the fuzzy lo cal treatmen t eﬀect at the kink admits the represen tation ∆ F ϕ = ϕ ′ F Y | X = x 0 (∆ F I d ( · )) where ∆ F I d ( y ) = − f Y | X ( y | x 0 ) E  ω ( η ) ∆ B g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0  suc h that T 0 = b ( x 0 , η ) , ω ( η ) = b ′ ( x + 0 , η ) − b ′ ( x − 0 , η ) , and ∆ B = µ ′ B ( x + 0 ) − µ ′ B ( x − 0 )  = 0 . F rom here, our goal is to show that for all y , ∆ F I d ( y ) = FDRKD( y ) where FDRKD( y ) = ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) and µ B ( x ) = E [ b ( x, η ) | X = x ] . T o that end, follo wing W ang and Zhang ( 2025 ) deﬁne h ( x, e, u ) = g ( b ( x, u ) , x, e ) so that Y = h ( X , ε, η ) . Then, by Assumption 6 it follo ws that we ma y write the conditional cum ulative distribution function as F Y | X ( y | x ) = Z Z I ( h ( x, e, u ) ≤ y ) f ε,η | X ( e, u | x ) de du. Next, let y b e ﬁxed and consider the decomp osition F Y | X ( y | x 0 + t ) − F Y | X ( y | x 0 ) t = A 1 ,t ( y ) + A 2 ,t ( y ) , where A 1 ,t ( y ) = 1 t Z Z  I ( h ( x 0 + t, e, u ) ≤ y ) − I ( h ( x 0 , e, u ) ≤ y )  f ε,η | X ( e, u | x 0 ) de du, A 2 ,t ( y ) = 1 t Z Z I ( h ( x 0 + t, e, u ) ≤ y )  f ε,η | X ( e, u | x 0 + t ) − f ε,η | X ( e, u | x 0 )  de du. Note that A 1 ,t ( y ) is a structural term that holds f ε,η | X ( · | x ) ﬁxed at x 0 , and A 2 ,t ( y ) is a selection term that captures changes in f ε,η | X with x . W e pro ceed with the latter term. Note that by Assumption 6 it follows that lim t → 0 A 2 ,t ( y ) = Z Z lim t → 0  I ( h ( x 0 + t, e, u ) ≤ y )  f ε,η | X ( e, u | x 0 + t ) − f ε,η | X ( e, u | x 0 ) t  de du = Z Z I ( h ( x 0 , e, u ) ≤ y )  ∂ ∂ x f ε,η | X ( e, u | x 0 )  de du | {z } := S ( y ) . Note that this limit is the same when considering b oth t ↑ 0 and t ↓ 0 . Next, w e consider A 1 ,t ( y ) . Deﬁne the one-sided deriv atives H + = ∂ ∂ x h ( x + 0 , ε, η ) = b ′ ( x + 0 , η ) g 1 ( T 0 , x 0 , ε ) + g 2 ( T 0 , x 0 , ε ) and H − = ∂ ∂ x h ( x − 0 , ε, η ) = b ′ ( x − 0 , η ) g 1 ( T 0 , x 0 , ε ) + g 2 ( T 0 , x 0 , ε ) . Then w e can write h ( x 0 + t, e, u ) in terms of the one-sided limits h ( x 0 + t, e, u ) = Y 0 + tH + + R + t where R + t = Y t − Y 0 − tH + , Y t = h ( x 0 + t, e, u ) , and analogous deﬁnitions are giv en for 47 h ( x 0 + t, e, u ) = Y 0 + tH − + R − t . Imp ortan tly , note that the limit as t → 0 of A 1 ,t ( y ) is identical in form to the limit computations done in the proof of Lemma 3 , where tH + pla ys the role of δ Z (similarly , Assumption 7 ( i ) plays an analogous role to Assumption 5 ( i ) and Assumption 7 ( ii ) to Assumption 5 ( ii ) ). This can easily b e seen b y plugging in our decomp ositions of h ( x 0 + t, e, u ) ; observe that A 1 ,t ( y ) = 1 t h P ( h ( x 0 + t, ε, η ) ≤ y | X = x 0 ) − P ( h ( x 0 , ε, η ) ≤ y | X = x 0 ) i = 1 t h P  Y 0 ≤ y − tH + − R + t | X = x 0  − P ( Y 0 ≤ y | X = x 0 ) i = 1 t h P  y < Y 0 ≤ y − tH + − R + t | X = x 0  − P  y − tH + − R + t < Y 0 ≤ y | X = x 0  i . Th us, repeating the same steps as in the pro of of Lemma 3 , it follo ws that lim t ↓ 0 { A 1 ,t ( y ) } = − f Y | X ( y | x 0 ) E  H + | Y = y , X = x 0  and lim t ↑ 0 { A 1 ,t ( y ) } = − f Y | X ( y | x 0 ) E  H − | Y = y , X = x 0  . Com bining the limits for A 1 ,t and A 2 ,t , we hav e the one-sided deriv ative formulas ∂ ∂ x F Y | X ( y | x + 0 ) = − f Y | X ( y | x 0 ) E  H + | Y = y , X = x 0  + S ( y ) , ∂ ∂ x F Y | X ( y | x − 0 ) = − f Y | X ( y | x 0 ) E  H − | Y = y , X = x 0  + S ( y ) . Th us, taking the diﬀerence, it follows that ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) = − f Y | X ( y | x 0 ) E  H + − H − | Y = y , X = x 0  , and furthermore, after plugging in the deﬁnitions of H + and H − , H + − H − =  b ′ ( x + 0 , η ) − b ′ ( x − 0 , η )  g 1 ( T 0 , x 0 , ε ) = ω ( η ) g 1 ( T 0 , x 0 , ε ) , b ecause the g 2 ( T 0 , x 0 , ε ) term cancels. Hence, for ev ery y , ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) = − f Y | X ( y | x 0 ) E [ ω ( η ) g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0 ] . Finally , we must consider the denominator. Here, w e can again apply the same decomp osition argumen t made b efore. Let µ B ( x ) = Z b ( x, u ) f η | X ( u | x ) du. Then, ﬁx t > 0 suc h that x 0 + t ∈ I x 0 \ { x 0 } . Starting from µ B ( x 0 + t ) − µ B ( x 0 ) t = 1 t Z b ( x 0 + t, u ) f η | X ( u | x 0 + t ) du − 1 t Z b ( x 0 , u ) f η | X ( u | x 0 ) du w e can add and subtract R b ( x 0 + t, u ) f η | X ( u | x 0 ) du to obtain the decomposition: µ B ( x 0 + t ) − µ B ( x 0 ) t = ( 1 t Z  b ( x 0 + t, u ) − b ( x 0 , u )  f η | X ( u | x 0 ) du + 1 t Z b ( x 0 + t, u )  f η | X ( u | x 0 + t ) − f η | X ( u | x 0 )  du ) , 48 whic h we term S 1 ,t and S 2 ,t , resp ectively . Thus, following the same arguments as before, under Assumption 7 ( iii ) and Assumption 7 ( iv ) , it can b e shown that the righ t and left deriv ativ es are given by µ ′ B ( x + 0 ) = E  b ′ ( x + 0 , η ) | X = x 0  + Z b ( x 0 , u )  ∂ ∂ x f η | X ( u | x 0 )  du and µ ′ B ( x − 0 ) = E  b ′ ( x − 0 , η ) | X = x 0  + Z b ( x 0 , u )  ∂ ∂ x f η | X ( u | x 0 )  du. Therefore, taking the diﬀerence yields µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) = E  b ′ ( x + 0 , η ) − b ′ ( x − 0 , η ) | X = x 0  = E [ ω ( η ) | X = x 0 ] = ∆ B . Putting everything together, for all y , FDRKD( y ) = ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) = − f Y | X ( y | x 0 ) E  ω ( η ) g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0  ∆ B = − f Y | X ( y | x 0 ) E  ω ( η ) ∆ B g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0  = ∆ F I d ( y ) . Therefore, ∆ F ϕ = ϕ ′ F Y | X = x 0  ∆ F I d ( · )  = ϕ ′ F Y | X = x 0  FDRKD( · )  , whic h pro v es the theorem. A.9 Pro of of Theorem 5 Pr o of: T o b egin, let ⟨ f , g ⟩ = R 1 0 f ( u ) g ( u ) du denote the inner pro duct on L 2 (0 , 1) . Next, recall that a complete orthogonal basis { ϕ k } ∞ k =1 in L 2 (0 , 1) admits the generalized F ourier expansion f ( u ) = ∞ X k =1 a k ϕ k ( u ) for every f ∈ L 2 (0 , 1) , where the co eﬃcients are giv en by a k = ⟨ f , ϕ k ⟩ || ϕ k || 2 2 . Next, let { P ∗ k } ∞ k =0 b e the shifted Legendre p olynomials, suc h that P ∗ k ( u ) = P k (2 u − 1) where P k is the k th Legendre p olynomial and P ∗ 0 = 1 . Note that Z 1 0 P ∗ j ( u ) P ∗ k ( u ) du = ( 0 , j  = k , 1 2 k +1 , j = k . . 49 No w, recall that we deﬁned ∆ Q ′ ( u ) = ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) . Then, under the assumption that R 1 0 [∆ Q ′ ( u )] 2 du < ∞ it follows that ∆ Q ′ ∈ L 2 (0 , 1) . There- fore, we may apply the generalized F ourier expansion with f = ∆ Q ′ to ﬁnd ⟨ ∆ Q ′ , P ∗ k − 1 ⟩ = Z 1 0 ∆ Q ′ ( u ) P ∗ k − 1 ( u ) du = 1 ∆ B Z 1 0  ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 )  P ∗ k − 1 ( u ) du = 1 ∆ B  λ ′ k ( x + 0 ) − λ ′ k ( x − 0 )  . Th us, since || P ∗ k − 1 || 2 2 = (2( k − 1) + 1) − 1 = (2 k − 1) − 1 , it follo ws that ∆ Q ′ ( u ) = ∞ X k =1 (2 k − 1)  λ ′ k ( x + 0 ) − λ ′ k ( x − 0 ) ∆ B  P ∗ k − 1 ( u ) . Finally , applying Parsev al’s identit y for complete orthogonal expansions ( Con wa y , 1990 ) yields || ∆ Q ′ || 2 2 = ∞ X k =1 a 2 k || P ∗ k − 1 || 2 2 = ∞ X k =1 " (2 k − 1) 2  λ ′ k ( x + 0 ) − λ ′ k ( x − 0 ) ∆ B  2 # · 1 2 k − 1 = ∞ X k =1 (2 k − 1)  λ ′ k ( x + 0 ) − λ ′ k ( x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 )  2 , whic h completes the pro of, since by deﬁnition || ∆ Q ′ || 2 2 = R 1 0 (∆ Q ′ ( u )) 2 du = (Ψ ′ C ) 2 . 50

Distributional Discontinuity Design

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment