Distributional Discontinuity Design
Regression discontinuity and kink designs are typically analyzed through mean effects, even when treatment changes the shape of the entire outcome distribution. To address this, we introduce distributional discontinuity designs, a framework for estim…
Authors: Kyle Schindl, Larry Wasserman
Distributional Discon tin uit y Design Kyle Sc hindl † , Larry W asserman ‡ † Departmen t of Statistics Io wa State Univ ersit y kschindl@iastate.edu ‡ Departmen t of Statistics & Data Science Mac hine Learning Department Carnegie Mellon Univ ersity larry@stat.cmu.edu Abstract R e gr ession disc ontinuity and kink designs ar e typic al ly analyze d thr ough me an effe cts, even when tr e atment changes the shap e of the entir e outc ome distribution. T o addr ess this, we intr o duc e distributional disc ontinuity designs, a fr amework for estimating c ausal effe cts for a sc alar outc ome at the b oundary of a disc ontinuity in tr e atment assignment. Our es- timand is the W asserstein distanc e b etwe en limiting c onditional outc ome distributions; a single sc ale-interpr etable me asur e of distribution shift. W e show that this we akly b ounds the aver age tr e atment effe ct, wher e e quality holds if and only if the tr e atment effe ct is pur ely additive; thus, dep artur e fr om e quality me asur es effe ct heter o geneity. T o further enc o de effe ct heter o geneity we show that the W asserstein distanc e admits an ortho gonal de c om- p osition into squar e d differ enc es in L -moments, ther eby quantifying the c ontribution fr om lo c ation, sc ale, skewness, and higher-or der shap e c omp onents to the over al l distributional distanc e. Next, we extend this fr amework to distributional kink designs by evaluating the W asserstein derivative at a p olicy kink; this describ es the flow of pr ob ability mass thr ough the kink. In the c ase of fuzzy kink designs, we derive new identific ation r esults. Final ly, we apply our metho ds on r e al data by r e-analyzing two natur al exp eriments to c omp ar e our distributional effe cts to tr aditional c ausal estimands. Keywor ds: R e gr ession Disc ontinuity Design, R e gr ession Kink Design, Optimal T r ansp ort, W asserstein Distanc e, Quantile T r e atment Effe cts 1 In tro duction First introduced by Thistleth waite and Campbell ( 1960 ) and formalized b y Hahn et al. ( 2001 ), regression discontin uit y design is a quasi-experimental design metho d that exploits disconti- n uities in treatment assignmen t to iden tify causal effects. The k ey idea is that observ ational units arbitrarily close to either side of the treatment discon tinuit y can b e though t of as similar in all resp ects except for treatment status. Thus, in this neigh b orho o d of the discon tinuit y , treatmen t assignment can b e considered “as go o d as random,” and it is therefore reasonable to A ccompanying R co de is av ailable via github.com/kylesc hindl/discontin uity-designs assume that exc hangeability holds. Ov er the years, a very rich and deep literature for regres- sion discontin uit y design metho ds has b een developed, with con tributions to o broad to enu- merate. Mo dern regression discontin uit y design tends to fo cus on lo cal p olynomial estimation and bandwidth selection ( Imbens and Kalyanaraman , 2012 ), robust bias-corrected inference ( Calonico et al. , 2014 , 2019a ), and a suite of diagnostic to ols such as density-manipulation tests ( McCrary , 2008 ; Cattaneo et al. , 2020 ). Readers in terested in the history of regression discon tinuit y design should refer to Cook ( 2008 ), and to Lee and Lemieux ( 2010 ), Im b ens and Lemieux ( 2008 ), and Cattaneo and Titiunik ( 2022 ) for in-depth reviews. Notably , most traditional metho ds primarily fo cus on differences in mean effects ab ov e and b elow the cutoff; there is a muc h smaller literature considering distributional causal effects. F o cusing exclusiv ely on mean effects often limits b oth the usefulness and generalizability of an analysis, because a verages can mask treatment heterogeneity across the outcome distribu- tion. It is easy to imagine a treatmen t that lea ves the av erage unchanged, but has asymmetric effects on the lo wer and upp er tails of the outcome distribution. Consequently , o ver the past sev eral years there has b een an increased fo cus on dev eloping causal effects that consider how the entire outcome distribution changes with resp ect to a treatment. In the regression discon- tin uity design setting F randsen et al. ( 2012 ) in tro duced p oin twise quantile treatment effects for b oth sharp and fuzzy designs. Since then, there ha v e b een sev eral extensions and generaliza- tions of their w ork, to include uniform confidence bands ( Qu and Y o on , 2015 ), bias-corrected estimators ( Qu and Y o on , 2019 ; Chiang et al. , 2019 ), and quan tile effects in regression kink designs ( Chiang and Sasaki , 2019 ; Chen et al. , 2020 ). More recently , Jin et al. ( 2025 ) con- sidered quan tile effects under a lo cal rank similarit y condition and Dijc ke ( 2025 ) considered lo cal a verage quantile treatmen t effects under distribution-v alued outcomes; their framework is conceptually similar, but non-ov erlapping with our own as we consider scalar v alued out- comes. While each of these metho ds is interesting and useful, in practice they can b e difficult to implemen t and in terpret. Practitioners are often not in terested in sp ecific quantile effects and considering man y quantiles can b e hard to summarize and comm unicate. F urthermore, b ecause quan tile treatment effects describ e the marginal outcome distributions, in order to in terpret effects at the individual level a strong rank-in v ariance assumption must b e made. W e address these problems b y defining causal effects in terms of the distributional distance b et ween conditional coun terfactual distributions, which yields a single transparent measure of the ov erall distribution shift. In this pap er we introduce distributional disc ontinuity designs , a framework for studying distributional causal effects ab ov e and b elow some treatment discon tinuit y . Sp ecifically , we define our causal effect to b e the W asserstein distance b et ween the limiting conditional dis- tribution of the counterfactual Y ( a ) | X = x ab ov e and b elo w the treatment discon tinuit y . This provides a clean, one num b er summary of the entire distance b etw een treatment groups, thereb y enco ding the total magnitude of the treatmen t effect and establishing a relative scale for all treatmen t effects. Using the W asserstein distance as our causal effect yields a n umber of nice prop erties. First, we sho w that it is weakly greater than the a verage treatmen t effect at the cutoff, where equality holds if and only if the treatmen t effect is purely additive; this im- mediately pro vides a useful reference p oint to establish the amoun t of treatment heterogeneit y . Second, we show that the W asserstein distance can b e decomp osed in to the effect on individual L -momen ts ( Hosking , 1990 ). This allows us to define a “distributional R 2 ,” i.e. the amoun t of the distributional distance explained by eac h L -moment, thereby providing a nov el w ay of summarizing treatmen t heterogeneity by its effect on lo cation, scale, skewness, etc. Third, since the W asserstein distance describes the total magnitude of the treatmen t effect, we can use it to define the degree to which one quan tile function sto chastically dominates the other. 2 In our analysis, w e consider b oth sharp and fuzzy treatment assignmen ts. Additionally , w e extend the distributional discon tinuit y design framework to regression kink designs b y defining our causal effect to b e the W asserstein deriv ative at the p olicy kink; this describ es the flow of probabilit y mass at the cutoff, and neatly generalizes traditional kink designs. Notably , w e also extend the work of W ang and Zhang ( 2025 ) to establish identification of fuzzy lo cal treatmen t effects at a policy kink. Broadly speaking, our analysis fits into a growing literature of papers that apply optimal transp ort metho ds to causal inference problems in order to compare entire outcome distri- butions, rather than just a verages. F or example, Gunsilius ( 2023 ) dev elops distributional syn thetic con trols that reconstruct a treated unit’s distribution from con trols. In difference- in-differences, optimal transp ort metho ds align pre and p ost treatmen t outcome distributions across groups instead of relying on mean-lev el parallel trends; see T orous et al. ( 2024 ) for a nonlinear difference-in-differences and Zhou et al. ( 2025 ) for a geo desic v ariant. More gen- erally , Kurisu et al. ( 2025 ) and Schindl and W asserman ( 2025 ) consider causal c hange as a mo vemen t along paths in the space of probability distributions. In terested readers can see Gunsilius ( 2025 ) for an extended discussion and review of the literature. The remainder of the pap er is organized as follows: In Section 2 w e define all relev ant notation and definitions. In Section 3 w e formally define the distributional discon tinuit y framew ork, first in the sharp treatmen t assignment setting. Within this section, we consider iden tification of the W asserstein effect ( Section 3.1 ), in terpretation of these effects and their relationship to traditional mean and quan tile based effects ( Section 3.2 ), estimation and the limiting distribution of the W asserstein effect ( Section 3.3 ), inference for the W asserstein effect ( Section 3.4 ), and finally we extend our results to the fuzzy treatment discon tinuit y setting ( Section 3.5 ). In Section 4 w e extend our framew ork to kink ed distributional designs, by defining a nov el causal effect in terms of the W asserstein deriv ative at the p olicy kink and in Section 4.1 we extend the work of W ang and Zhang ( 2025 ) to establish causal identification in fuzzy kink designs. In Section 5 we apply our metho d to real data sets by re-analyzing several natural exp eriments and directly comparing the W asserstein effect to the av erage treatment effect at the cutoff. Finally , in Section 6 w e pro vide a discussion and conclusion of distributional discon tinuit y designs, including limitations and directions for future w ork. 2 Setup & Notation Supp ose we observe Z 1 , . . . , Z n iid ∼ P where Z i = ( X i , A i , Y i ) where X i ∈ R is the “running v ariable,” A i ∈ { 0 , 1 } is the treatmen t assignment, and Y i ∈ R is the observ ed outcome. Note that Y is a scalar, unlik e in Dijck e ( 2025 ), whic h considers a distribution-v alued outcome. T o b egin, we assume that treatmen t is assigned such that A i = ( 1 if X i ≥ x 0 0 if X i < x 0 at cutoff X = x 0 . In Section 3.5 and b eyond w e relax this assignment rule to the fuzzy setting. W e assume that Y has a contin uous distribution. W e are interested in the conditional distri- bution of Y ( a ) | X = x where Y ( a ) are the p otential outcomes under treatment assignment A = a . F urthermore, note that our framework does allow for the inclusion of some v ector of co v ariates; ho wev er, since this is not required for iden tification and is notationally cum b er- some to include, w e omit such terms from our analysis. A brief discussion of conditioning on additional cov ariates can b e found in Section 3.3 . 3 Throughout the pap er, for some function f ( · ) we use the notation lim x ↑ x 0 f ( x ) = f ( x − 0 ) to denote the left-hand limit (i.e. x < x 0 and x → x 0 ) and similarly lim x ↓ x 0 f ( x ) = f ( x + 0 ) to denote the righ t-hand limit. W e sa y P 2 ( · ) is the set of all probabilit y measures with finite second momen ts. W e use F Y | X ( y | x ) to denote the cum ulative distribution function of Y | X = x with the associated quan tile function Q x ( u ) = F − 1 x ( u ) = inf { y : F Y | X ( y | x ) ≥ u } . When we consider the limiting quan tiles, we drop the notation on x and simply say that Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } , where the zero and one notation is used to denote taking the limit from abov e or b elow the cutoff. 3 Distributional Discon tin uit y Design In this section, w e in tro duce distributional causal effects that compare the entire outcome distribution b elow and ab o ve some treatmen t discon tinuit y . Let P a | x denote the conditional coun terfactual distribution of Y ( a ) | X = x and supp ose that treatmen t is assigned in a discon tinuous w ay , where A = I ( X ≥ x 0 ) for some running v ariable X and cutoff x 0 . Then, w e define our causal estimand to b e the 2-W asserstein distance b etw een the counterfactual treatmen t distributions at the treatmen t discon tinuit y X = x 0 , i.e. Ψ = W 2 ( P 1 | x 0 , P 0 | x 0 ) . The 2-W asserstein distance b etw een an y t wo probability distributions P and Q is defined as W 2 2 ( P , Q ) = inf γ ∈ Γ( P ,Q ) Z || x − y || 2 2 dγ ( x, y ) where Γ( P, Q ) is the set of all couplings of P and Q , i.e. the set of all joint distributions γ that preserv e the marginals of P and Q ( Villani et al. , 2009 ). Roughly speaking, γ describ es a wa y of pairing p oints from P and Q suc h that the total quadratic transp ort cost b etw een distributions is minimized under the b est p ossible pairing, as visualized in Figure 1 . More in tuitively , this describ es the minimal transp ortation cost of transforming or “morphing” P in to Q . If P has a density then W 2 2 ( P , Q ) = inf E [ || T ( X ) − X || 2 2 ] where the infimum is o ver all maps T such that T ( X ) ∼ Q . The map T is called the optimal transp ort map. y Densit y f Y (0) | X = x 0 ( y ) f Y (1) | X = x 0 ( y ) Figure 1: Optimal transport maps b et ween coun terfactual distributions. In the case of a treatmen t discontin uity , Ψ measures how far probability mass must b e mo ved in order to transform the un treated distribution at the cutoff into the treated distribu- tion. Thus, it measures differences not only in the means, but also higher moment effects suc h as the v ariance or sk ewness. F or this reason, the W asserstein effect can detect and quantify 4 complex, higher order treatment effects that the traditional regression discon tinuit y design estimand w ould miss. F or example, in Figure 2 , we can see that ab o ve the treatmen t discon- tin uity at x = 0 not only do es the mean change, but the v ariance and ov erall distributional shap e do es as well. F o cusing solely on the difference in means w ould not adequately describ e the full effect of treatmen t here. As a simple motiv ating example, supp ose that there is a treatment discontin uit y at x 0 = 0 and that Y (0) | X = x ∼ N (0 , 1) for all x , and Y (1) | X = x ∼ N (0 , 2 2 ) for all x. Then, it is clear that the av erage treatmen t effect at the cutoff E [ Y (1) | X = x 0 ] − E [ Y (0) | X = x 0 ] is zero, as there is no change in lo cation ab ov e and below the cutoff. How ever, the standard deviation doubles. Researc hers who only consider the av erage treatment effect at the cutoff would conclude there was no treatmen t effect, but in reality , a doubling of the standard deviation could ha ve large practical implications. Similarly , as discussed in Kim et al. ( 2024 ), treatmen t effects could easily tak e a multimodal structure where Y (0) = 0 almost surely , but Y (1) = 1 or Y (1) = − 1 with equal probability . In this setting, the av erage treatment effect is again zero, but treatment harms half the p opulation and b enefits the other half. F ortunately , b oth of these causal effects can be detected by Ψ . F or example, in the first setting with t wo Gaussians, it can b e sho wn that Ψ = | σ 1 − σ 2 | = 1 , indicating a sharp difference in the outcome distributions. In Section 3.2 , we provide more guidance on the in terpretation of the W asserstein effect, and its comparison to the av erage treatment effect at the cutoff. Now that w e hav e defined our effect of in terest, w e establish the conditions under which it is causally iden tified. 5 0 -5 -10 − 1 − 0 . 5 0 0 . 5 1 0 0 . 1 0 . 2 0 . 3 y x Conditional Density Figure 2: Counterfactual distributions ab ov e and below a treatmen t discon tinuit y 5 3.1 Iden tification In this section, w e discuss the assumptions required for causa l iden tification of the distri- butional effect Ψ . These conditions are nearly identical to the iden tification requiremen ts established in F randsen et al. ( 2012 ) for quan tile treatmen t effects in discontin uity designs, since in one dimension the W asserstein distance can b e expressed as the L 2 distance b etw een quan tile functions ( V allender , 1974 ) — the only additional assumption required is finite second momen ts of P a | x in order for the W asserstein distance to b e w ell-defined. F or completeness, w e still outline each assumption required. Let F Y ( a ) | X ( y | x ) b e the cumulativ e distribution function of P a | x . Then, for a sharp treatmen t assignmen t A = I ( X ≥ x 0 ) , we require: ( i ) Consistency: Y = Y ( a ) if A = a for a ∈ { 0 , 1 } . ( ii ) Continuity: F or a ∈ { 0 , 1 } and all y ∈ R , lim x → x 0 F Y ( a ) | X ( y | x ) = F Y ( a ) | X ( y | x 0 ) . ( iii ) Density at thr eshold: f X ( x ) is differen tiable at x = x 0 and lim x → x 0 f X ( x ) > 0 . ( iv ) R e gularity: P a | x ∈ P 2 ( R ) for a ∈ { 0 , 1 } . Assumption ( i ) rules out any in terference or spillo ver effects, where the treatmen t of one observ ation affects the outcomes of another. Assumption ( ii ) ensures that as w e approach the cutoff the cumulativ e distribution functions of the counterfactuals ha v e well-defined lim- its. This rules out sudden jumps or discon tinuities in the outcome distribution that could be unrelated to the treatment assignmen t. Assumption ( iii ) guarantees that there are observ a- tions arbitrarily close to the cutoff on b oth sides, whic h is necessary for well-defined limiting distributions. Finally , assumption ( iv ) ensures that the W asserstein distance is well defined b y requiring the counterfactual distributions to hav e a finite second moment. With these assumptions defined, w e now establish causal iden tification in the following lemma. Lemma 1 (Identification) . Under assumptions ( i ) - ( iv ) by F r andsen et al. ( 2012 ) it fol lows that Ψ = Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) 2 du 1 / 2 wher e Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } ar e the limiting c onditional quantiles of Y | X = x ab ove and b elow the cutoff. By Lemma 1 , we can see that the W asserstein effect Ψ ma y b e expressed as the squared difference in the u -th quantile b elo w and ab ov e the cutoff, in tegrated across the en tire distri- bution. This highligh ts the fact that Ψ measures distributional c hanges of an y form, whether it b e changes in lo cation, scale, shap e, etc. In the next section, w e explore how Ψ can b e in terpreted, and compare it to the traditional regression discon tinuit y design estimand τ . W e note that the reduction to quantiles only holds because Y is scalar; when Y is m ultiv ariate the estimation of the W asserstein effect is more complicated and will b e dealt with in future w ork. 3.2 In terpretation In this section, we build in tuition for how to interpret the W asserstein effect, Ψ . In particular, w e establish an inequality that directly compares Ψ to the a verage treatmen t effect at the cutoff and the conditions under which the effects are equal, w e demonstrate how the direction of the 6 effect at eac h quantile can b e neatly visualized, we decomp ose the distributional effect into individual momen t effects, and w e define a no vel measure of effect magnitude b y considering the degree to which Q 1 sto c hastically dominates Q 0 . 3.2.1 Relation to the A v erage T reatmen t Effect In traditional regression discon tinuit y designs, practitioners are typically interested in estimat- ing the difference in means ab ov e and b elow the treatmen t cutoff, defined b y τ = E [ Y (1) − Y (0) | X = x 0 ] . Notably , this can b e in terpreted through a distributional lens; τ is simply measuring the distance b et ween the means of the counterfactual distributions at the cutoff. In fact, if the treatmen t effect is purely additive (suc h that it only impacts the distribution means) then it can b e shown that these t wo causal effects are equal. In the following theorem, we establish an inequality b et ween the W asserstein and mean effects at the cutoff that shows Ψ must b e w eakly greater than | τ | . F urthermore, we establish the condition under which these effects are iden tical. Theorem 1 (Effect Inequalit y) . The W asserstein effe ct upp er b ounds the aver age tr e atment effe ct at the cutoff, i.e. | τ | ≤ Ψ . F urthermor e, e quality holds if and only if the tr e atment effe ct is pur ely additive; that is, if for some δ ∈ R and for al l u ∈ (0 , 1) that Q 1 ( u ) = Q 0 ( u ) + δ . Theorem 1 sho ws that the jump, or discon tinuit y , in the outcome distributions at the cutoff is alw ays at least as large as the jump in the means. Intuitiv ely , w e can think of the relationship b etw een these effects by framing both in terms of the quantile effect function ∆ Q ( u ) = Q 1 ( u ) − Q 0 ( u ) . Supp ose that U ∼ Uniform (0 , 1) . Then, it is clear that τ = R 1 0 ∆ Q ( u ) du = E [∆ Q ( U )] is simply the av erage (or signed area) of the quantile effect curv e. Mean while, w e can see that the W asserstein effect can equiv alently b e written as Ψ 2 = R 1 0 ∆ Q ( u ) 2 du = E [∆ Q ( U ) 2 ] , i.e. the area under the squared quantile effect curve. Immediately , this yields the v ariance decom- p osition Ψ 2 = τ 2 + V (∆ Q ( U )) . Consequen tly , we can see that Ψ captures the shift in lo cation (as measured by τ ) and the heterogeneit y around that shift (as measured b y the v ariance of ∆ Q ( U ) ). In fact, w e can use this decomp osition to define a heterogeneit y index; let γ := V (∆ Q ( U )) Ψ 2 = 1 − | τ | Ψ 2 . Then, it is clear that γ ∈ [0 , 1] . When γ = 0 , the treatment effect is purely additiv e. Mean- while, when γ = 1 the difference in means explains none of the distributional distance. 7 0 0 . 2 0 . 4 0 . 6 0 . 8 1 − 2 − 1 0 1 u ∆ Q ( u ) Quan tile Effect Curv es 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 5 1 1 . 5 2 u 1 Ψ 2 ∆ Q ( u ) 2 Con tribution Curv es Figure 3: Quantile effect curves (left panel) and contribution curv es (righ t panel) for a hy- p othetical n ull effect curv e (solid) and a sk ew ed effect curv e (dashed). Both effect curv es are defined such that the a verage treatment effect τ = 0 . 3.2.2 Visualizing Quantile Effect Curv es Considering τ b y itself can conceal imp ortant differences: positive and negative quantile effects ma y cancel out in the a verage, thereb y leaving a small a verage treatment effect. This problem is readily addressed by the W asserstein effect. Here, no treatmen t effect is lost or canceled out since Ψ aggregates these effect differences across all quan tiles. How ever, considering Ψ in isolation can b e restrictiv e since it doesn’t describ e the direction of the effect at eac h quan tile (e.g. is treatment harmful or helpful). This concern is easily addressed b y plotting the quan tile effect curv e ∆ Q ( u ) across u ∈ (0 , 1) whic h lets us directly visualize quan tile-by- quan tile con tributions to the W asserstein effect. In this sense, our analysis neatly complemen ts existing metho ds for studying quan tile treatmen t effects, such as in F randsen et al. ( 2012 ), Qu and Y o on ( 2015 ), Qu and Y o on ( 2019 ), and Chiang et al. ( 2019 ). In the left panel of Figure 3 , w e can see t wo curves, b oth of whic h ha v e an av erage treatmen t effect of zero. The heigh t at each quantile sho ws the individual con tribution to the ov erall effect; notably , one effect curv e is nearly constan t, suggesting a n ull treatment effect. How ever, the other curve has a significant negativ e treatmen t effect in the left tail of the distribution that is mask ed b y a p ositive effect near the median. This juxtap osition b et ween effect curv es highlights the imp ortance of considering distributional effects o ver traditional difference-in-means analyses. F urthermore, we can also neatly visualize the contribution of each quantile to the W asserstein effect via the con tribution function u 7→ 1 Ψ 2 ∆ Q ( u ) 2 , as sho wn in the right panel of Figure 3 . Here, w e can see that most of the W asserstein effect in the skew ed distribution is driven b y the left tail. Mean while, the n ull effect curve has nearly a uniformly distributed contribution plot across u ∈ (0 , 1) . 3.2.3 Direction of the T reatmen t Effect Visualizing the quantile effect curv e is a useful exercise and can help practitioners b etter in terpret the W asserstein effect, how ever, it can lea v e some am biguity in terms of the o verall 8 direction of the treatment effect. In this section, w e define a no vel one-n umber summary of the degree to whic h the treated quantiles dominate the untreated ones. Recall that for any t wo quan tile functions Q a ( u ) and Q b ( u ) , Q a sto c hastically dominates Q b if and only if Q a ( u ) ≥ Q b ( u ) for all u ∈ (0 , 1) , as discussed in Qu and Y o on ( 2015 ). Imp ortantly , we can decomp ose the W asserstein effect in to directional-dominance effects by defining the p ositiv e and negativ e splits, ∆ Q + ( u ) = max { ∆ Q ( u ) , 0 } and ∆ Q − ( u ) = max {− ∆ Q ( u ) , 0 } . In tuitively , ∆ Q + ( u ) captures all of the p ositiv e treatment effects across quan tiles (where the difference b et ween Q 1 ( u ) and Q 0 ( u ) is greater than zero), and ∆ Q − ( u ) captures all of the negativ e treatmen t effects. Then, it follo ws that we may write Ψ 2 = Z 1 0 { ∆ Q ( u ) } 2 du = Z 1 0 { ∆ Q + ( u ) } 2 du + Z 1 0 { ∆ Q − ( u ) } 2 du whic h for notational simplicit y we write as Ψ 2 + + Ψ 2 − . Now that w e hav e split the W asserstein effect into p ositiv e and negative quantile effects, w e ma y define the W asserstein Dominance, ρ = Ψ 2 + − Ψ 2 − Ψ 2 + + Ψ 2 − ∈ [ − 1 , 1] . If Q 1 ( u ) stochastically dominates Q 0 ( u ) then Ψ 2 − = 0 and then ρ = 1 . Similarly , if Q 0 ( u ) sto c hastically dominates Q 1 ( u ) then Ψ 2 + = 0 and ρ = − 1 . Thus, ρ neatly describ es the degree to whic h one treatmen t effect dominates the other. When ρ is close to zero, it follows that the quan tile effects cross each other, leading to cancellations. 3.2.4 Decomp osition in to L -Momen ts Although decomp osing the W asserstein effect into ∆ Q ( u ) is useful and lets us neatly visualize the signed con tributions of eac h quan tile, it doesn’t say an ything ab out the momen ts of the coun terfactual distributions at the cutoff. Practitioners may b e interested in understanding the effect contribution from the differences in means, standard deviations, skewnesses, etc. F ortunately , follo wing a similar approach to Sillitto ( 1969 ), the W asserstein effect can be written as a generalized F ourier series using the shifted Legendre p olynomials as an orthogonal basis. The shifted Legendre polynomials are defined by P ∗ k ( x ) = P k (2 x − 1) where P k ( x ) are the usual Legendre p olynomials and form an orthogonal basis on L 2 ([0 , 1]) . A closed form expression for the k th shifted Legendre p olynomial is given by P ∗ k ( x ) = ( − 1) k k X j =0 k j k + j j ( − x ) j . Imp ortan tly , under this orthogonal basis it can b e shown that Ψ may b e decomp osed into the summation of squared differences in L -moments. First introduced b y Hosking ( 1990 ), for an y random v ariable X with a finite first momen t, the k th L -moment is defined as λ k = Z 1 0 Q x ( u ) P ∗ k − 1 ( u ) du 9 where Q x ( u ) is the quan tile function for X . Note that P ∗ 0 = 1 . As sho wn in Hosking ( 1990 ), L -momen ts are defined b y taking exp ectations of linear com binations of order statistics, and represen t a “robust” analogue of conv entional momen ts of a probabilit y distribution that are t ypically less sensitiv e to hea vy tailed distributions and are better b eha ved in small samples. Notably , they are alw ays w ell-defined (as long as the first moment exists) even when not all con ven tional momen ts exist. T o build intuition, let X 1: n ≤ X 2: n ≤ · · · ≤ X n : n b e the order statistics of a random sample of size n from the distribution of X . Then, the first three L -momen ts are giv en b y: λ 1 = E [ X ] , λ 2 = 1 2 E [ X 2:2 − X 1:2 ] , and λ 3 = 1 3 E [( X 3:3 − X 2:3 ) − ( X 2:3 − X 1:3 )] . F o cusing on the second L -momen t, w e can see that it is prop ortional to the exp ected difference b et ween t wo indep enden t dra ws from a distribution. Thu s, it pro vides an alternate measure of disp ersion to the traditional standard deviation. Similarly , λ 3 pro vides an alternate measure of asymmetry to the traditional sk ewness by taking the exp ected difference b etw een upp er and lo wer order statistics. In the follo wing theorem, we establish how Ψ can b e decomposed into a summation of squared differences in L -momen ts. Theorem 2 ( L -Moment Decomp osition) . Supp ose that P a | x ∈ P 2 ( R ) for a ∈ { 0 , 1 } . Then, Ψ 2 = ∞ X k =1 (2 k − 1) λ (1) k − λ (0) k 2 wher e λ ( a ) k = R 1 0 Q a ( u ) P ∗ k − 1 ( u ) du ar e the k th L -moments ab ove and b elow the cutoff. By Theorem 2 we obtain an imp ortant decomp osition of Ψ : we may now define what can be though t of as a “distributional R 2 ,” that is, the amount of the W asserstein effect that can b e explained b y a given L -moment. F or example, for eac h k ≥ 1 the share of the total distributional distance explained by the k th L -moment is given by R 2 k = (2 k − 1) λ (1) k − λ (0) k 2 Ψ 2 (1) suc h that P ∞ k =1 R 2 k = 1 . This decomposition is purely distributional: it decomp oses the W asserstein distance b et ween the marginal coun terfactual outcome distributions at the cutoff and do es not require a rank inv ariance assumption. As an illustrative example, in T able 1 w e can see the explanatory p ow er of the first three moments for the effect curves shown in Figure 3 . Notably , the n ull effect curv e is primarily explained by v ariation in its L -scale and higher-order momen ts, as its quan tile effect curv e is symmetric. Mean while, the skew ed effect curv e is (unsurprisingly) primarily driven b y the differences in its L -sk ewness. Note that b oth ha ve an L -lo cation v alue of zero, since they are b oth defined to ha ve an a verage treatmen t effect of zero. The momen t decomp osition outlined in T able 1 provides a new and p ow erful to ol for decoding treatmen t effect heterogeneit y . No w that we ha ve established several methods of interpreting the W asserstein effect and ho w it compares to traditional causal effects, we turn to estimation and inference. In the next section, w e formalize an estimator for the W asserstein effect and derive its asymptotic 10 Moment Nul l Effe ct Curve Skewe d Effe ct Curve k = 1 0.0000 0.0000 k = 2 0.6079 0.1548 k = 3 0.0000 0.8157 k ≥ 4 0.3921 0.0295 T able 1: Comparison of Explained D istributional Distance prop erties around some chosen bandwidth of the treatment threshold X = x 0 . W e sho w that standard bias correction tec hniques can b e applied to estimation of the W asserstein effect such that empirical bandwidth selection methods can b e implemen ted. 3.3 Estimation and Asymptotics In this section, w e establish formal properties for estimation of the W asserstein effect. Note that Ψ depends on one-sided limiting conditional distributions ev aluated at a single p oint; suc h functionals are not pathwise differentiable, so there is no √ n -regular estimator and no efficien t influence function. W e therefore emplo y a simple plug-in estimator, defined b y b Ψ n = Z 1 0 ( b Q 1 ( u ) − b Q 0 ( u )) 2 du 1 / 2 where Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } are the limiting conditional quant iles of Y | X = x . Thus, estimation of the W asserstein effect reduces to estimation of conditional quan tile processes (whic h is a w ell studied problem), follo wed b y n umerical integration. There are man y w ays that Q a ( u ) can b e estimated. F or example, one natural route is lo cal linear quantile regression, as prop osed b y Y u and Jones ( 1998 ), whic h minimizes the chec k loss of a kernel-w eighted p olynomial estimator in order to pro duce b oundary-adaptive estimates of the conditional quan tile curves. This approac h w as adapted by F randsen et al. ( 2012 ) when first defining quan tile treatmen t effects in a discontin uity design framework. How ev er, the metho ds established in F randsen et al. ( 2012 ) only yield p oint wise confidence in terv als for conditional quantiles. F urthermore, their bandwidth condition requires √ nhh 2 → γ < ∞ . When γ > 0 , the squared bias and v ariance are of the same order; consequently , undersmo oth- ing must b e emplo yed so the bias is negligible relative to the v ariance and γ → 0 . In practice, this means that the standard mean-squared-error optimal bandwidth selection of h ∝ n 1 / 5 can lead to improp er cov erage. More recently , Qu and Y o on ( 2015 ) show ed that local quantile re- gression admits a uniform Bahadur representation which they then lev erage to obtain uniform confidence in terv als for quantile treatmen t effects. Building on this framew ork Qu and Y o on ( 2019 ) sho w that by estimating the leading bias term it is p ossible to obtain bias-adjusted uniform inference in the spirit of Calonico et al. ( 2014 ). Ultimately , the metho ds established b y Qu and Y o on ( 2015 ) and Qu and Y o on ( 2019 ) rely on the fact that the asymptotic distri- bution is conditionally piv otal, so they are not suitable for the local W ald ratios required by fuzzy designs (whic h w e consider in Section 3.5 ), th us, we turn to the framework established in Chiang et al. ( 2019 ). Their approac h dev elops a general theory for lo cal W ald estimands that allo ws for uniform inference across quantiles and can accommo date empirical bandwidth selection. Moreo ver, it encompasses b oth sharp and fuzzy discontin uity designs, as w ell as kink ed designs (which w e also consider in Section 4 ). W e formalize these tec hnical details in what follows. 11 In order to estimate Q a ( u ) , Chiang et al. ( 2019 ) adapt the lo cal p olynomial estimation with bias correction approach established in Calonico et al. ( 2014 ). F or a ∈ { 0 , 1 } let F ( k ) a ( y | x ± 0 ) = ∂ k ∂ x k F Y | X ( y | x ) x → x ± 0 b e the k th partial deriv ative of the conditional cum ulativ e distribution function where a = 1 corresp onds to the right limit (as x ↓ x 0 ) and a = 0 corresp onds to the left limit (as x ↑ x 0 ). Then, under appropriate smo othness assumptions, it follo ws that we ma y define the follo wing p th order one-sided T aylor expansions ab out x = x 0 , F Y | X ( y | x ) ≈ F Y | X ( y | x + 0 ) + · · · + F ( p ) Y | X ( y | x + 0 ) p ! ( x − x 0 ) p = r p x − x 0 h T α 1 ,p ( y ) F Y | X ( y | x ) ≈ F Y | X ( y | x − 0 ) + · · · + F ( p ) Y | X ( y | x − 0 ) p ! ( x − x 0 ) p = r p x − x 0 h T α 0 ,p ( y ) for x > x 0 and x < x 0 resp ectiv ely , where w e sa y F Y | X ( y | x + 0 ) = lim x ↓ x 0 F Y | X ( y | x ) and F Y | X ( y | x − 0 ) = lim x ↑ x 0 F Y | X ( y | x ) are the one-sided limits of F Y | X ( y | x ) , w e define r p ( u ) = (1 , u, . . . , u p ) T , and α a,p ( y ) = F Y | X ( y | x ± 0 ) , F (1) Y | X ( y | x ± 0 ) h 1! , . . . , F ( p ) Y | X ( y | x ± 0 ) h p p ! T . Then, we may estimate the co efficients separately on each side of the treatmen t discon tin uity b y solving one-sided lo cal w eighted least squares problems, defined b y b α 1 ,p ( y ) = arg min α ∈ R p +1 n X i =1 I ( X i ≥ x 0 ) I ( Y i ≤ y ) − r p X i − x 0 h T α ! 2 K X i − x 0 h where K ( · ) is some kernel function and the estimator for b α 0 ,p ( y ) follows analogously with I ( X i ≤ x 0 ) . Clearly , if e 0 = (1 , 0 , . . . , 0) T is a standard basis v ector it follows that b F Y | X ( y | x + 0 ) = e T 0 b α 1 ,p ( y ) and b F Y | X ( y | x − 0 ) = e T 0 b α 0 ,p ( y ) . Ho wev er, w e are not quite done defining our estimator. F rom here, Chiang et al. ( 2019 ) add a bias correction term in the style of Calonico et al. ( 2014 ) in order to allo w for empirical bandwidth selection. T o dev elop a deep er understanding of this calculation, observ e that the bias of our lo cal p olynomial estimator is giv en by E h b F Y | X ( y | x ± 0 ) i − F Y | X ( y | x ± 0 ) = h p +1 e T 0 (Γ ± p ) − 1 Λ ± p,p +1 F ( p +1) Y | X ( y | x ± 0 ) ( p + 1)! | {z } B ± ( y ,h,p ) + o ( h p +1 ) where we define B ± ( y , h, p ) to b e the bias suc h that Γ ± p = Z R ± K ( u ) r p ( u ) r p ( u ) T du and Λ ± p,q = Z R ± u q K ( u ) r p ( u ) du. In tuitively , Γ ± p is a matrix that describ es ho w the p olynomial regressors interact under the k ernel w eights and Λ ± p,q captures how the next higher-order term in the T aylor expansion 12 in teracts with the regressors. Then, Lemma 1 of Chiang et al. ( 2019 ) shows that under some set of regularit y conditions ∆ ± B ( y ) := √ nh b F Y | X ( y | x ± 0 ) − F Y | X ( y | x ± 0 ) − b B ± ( y , h, p ) admits the uniform Bahadur represen tation ∆ ± B ( y ) = n X i =1 e T 0 (Γ ± p ) − 1 r p X i − x 0 h K X i − x 0 h I ( Y i ≤ y ) − F Y | X ( y | X i ) δ ± i √ nhf X ( x 0 ) + o P | X (1) where δ + i = I ( X i ≥ x 0 ) and δ − i = I ( X i ≤ x 0 ) . W e defer the reader to Assumption 1 of Chiang et al. ( 2019 ) for a comprehensiv e list of these regularity conditions. Notably , Chiang et al. ( 2019 ) require that the Kernel function K ( · ) is b ounded and contin uous and is of VC t yp e, which allo ws for common kernels such as the uniform, triangular, biweigh t, triw eight, and Epanechnik ov kernels, but rules out the Gaussian kernel due to its un b ounded support. F urthermore, for some bandwidth h satisfying h → 0 , they require nh 2 → ∞ and nh 2 p +3 → 0 . The former condition is a stronger assumption than the typical nh → ∞ in order to allow for uniform con vergence of the quantile pro cess, and the latter condition controls the bias relativ e to the v ariance. No w that we ha ve defined this mac hinery , we discuss the conditional w eak conv ergence of our estimator. First, note that Chiang et al. ( 2019 ) consider con vergence of the quan tile pro cess after trimming the left and right tails, such that u ∈ [ ς , 1 − ς ] for some ς ∈ (0 , 1 / 2) . They do so since near the tails the conditional quantile function can b e difficult to estimate reliably , so instead they establish weak con vergence in l ∞ ([ ς , 1 − ς ]) . How ever, in order to prop erly estimate the W asserstein effect we need to extend the domain of the quantiles to the full support on [0 , 1] . Therefore, for weak conv ergence w e require the additional assumptions that: ( i ) The p otential outcomes are compactly supp orted. ( ii ) f Y ( a ) | X ( y | x ) is uniformly b ounded a w ay from zero on that support. With these assumptions in place, let ν ± n ( y ) = n X i =1 e T 0 (Γ ± p ) − 1 r p X i − x 0 h K X i − x 0 h I ( Y i ≤ y ) − F Y | X ( y | X i ) δ ± i √ nhf X ( x 0 ) . Then, by Theorem 1 of Chiang et al. ( 2019 ) it follo ws that ν ± n ⇝ G H ± where G H ± are zero mean Gaussian pro cesses with some co v ariance function H ± . Now that w e ha ve established conditional w eak conv ergence for the bias corrected cumulativ e distribution functions abov e and b elow the cutoff, we need to inv ert them in order to obtain weak con vergence for the quan tile processes. Simply put, w e define b Q 1 ( u ) = inf { y : b F Y | X ( y | x + 0 ) − b B + ( y , h, p ) ≥ u } and b Q 0 ( u ) = inf { y : b F Y | X ( y | x − 0 ) − b B − ( y , h, p ) ≥ u } 13 suc h that the quantile treatmen t effect may b e defined as ∆ b Q ( u ) = b Q 1 ( u ) − b Q 0 ( u ) . F rom here, since the quan tile map F 7→ F − 1 is Hadamard differen tiable, w e ma y apply the functional delta metho d to see that √ nh ( b Q 1 ( u ) − Q 1 ( u )) ⇝ − G H + ( Q 1 ( u )) f Y | X ( Q 1 ( u ) | x + 0 ) and √ nh ( b Q 0 ( u ) − Q 0 ( u )) ⇝ − G H − ( Q 0 ( u )) f Y | X ( Q 0 ( u ) | x − 0 ) . Consequen tly , it follows that √ nh (∆ b Q ( u ) − ∆ Q ( u )) ⇝ G H − ( Q 0 ( u )) f Y | X ( Q 0 ( u ) | x − 0 ) − G H + ( Q 1 ( u )) f Y | X ( Q 1 ( u ) | x + 0 ) . In practice, p is often chosen to b e tw o, yielding lo cal quadratic p olynomial estimators. Higher order p olynomials can potentially reduce bias even further, but they also come with the risk of a larger v ariance due to sensitivity of the estimator near the b oundary . In the next section, w e discuss inference for the W asserstein effect. R emark 1 (Conditioning on co v ariates) . Although cov ariates are not required for iden tification, they are often of interest to practitioners b oth to obtain cov ariate indexed causal effects and to improv e precision ( F rölich and Hub er , 2019 ; Calonico et al. , 2019b ). Let W i ∈ R d denote a vector of cov ariates and let b µ a,W ( x ) denote a lo cal p olynomial estimate of E [ W | X = x ] ; for example, following the same estimation procedure describ ed in Section 3.3 . W e ma y then define the cen tered cov ariates f W a,i = W i − b µ a,W ( X i ) for a ∈ { 0 , 1 } . Then, follo wing Chiang et al. ( 2019 ), for eac h y w e solve for ( e α 1 ,p ( y ) , e ϑ 1 ( y )) = arg min α ∈ R p +1 ,ϑ ∈ R d n X i =1 δ + i I ( Y i ≤ y ) − r p X i − x 0 h T α − f W T 1 ,i ϑ ! 2 K X i − x 0 h and analogously , for ( e α 0 ,p ( y ) , e ϑ 0 ( y )) b y replacing δ + i with δ − i and f W 1 ,i with f W 0 ,i . F rom here, if our goal is target F Y | X ( y | x + 0 ) (using the co v ariates only as v ariance-reducing nuisances), then we simply take e F Y | X ( y | x + 0 ) = e T 0 e α 1 ,p ( y ) and e F Y | X ( y | x − 0 ) = e T 0 e α 0 ,p ( y ) . Note that since we ha ve centered our cov ariates we can now safely interpret eac h estimate as the cumulativ e distribution function at the a v erage co v ariate v alue. If our goal is the conditional cumulativ e distribution function itself, F Y | X,W ( y | x + 0 , w ) = lim x ↓ x 0 P ( Y ≤ y | X = x, W = w ) then for any w ∈ R d w e define our estimator to b e e F Y | X,W ( y | x ± 0 ) = e T 0 e α a,p ( y ) + w T e ϑ a ( y ) . With these estimators in place, we can no w estimate the conditional W asserstein effect Ψ( w ) = Z 1 0 e Q 1 ( u ; w ) − e Q 0 ( u ; w ) 2 du 1 / 2 where e Q 1 ( u ; w ) and e Q 0 ( u ; w ) are the inv erses of e F Y | X,W ( y | x + 0 ) and e F Y | X,W ( y | x − 0 ) , resp ec- tiv ely . Bias correction and the m ultiplier b o otstrap can b e implemented following the same co v ariate augmentation pro cedure with higher-order lo cal fits. 14 3.4 Statistical Inference No w that we hav e established methods for estimation of quan tile treatment effects as w ell as their limiting distributions, w e turn to inference for the W asserstein effect. Surprisingly , statistical inference in this setting is not as straightforw ard as one migh t exp ect since Ψ is a quadratic parameter; here, the limiting distribution and rate of con vergence change as Ψ → 0 . T o illustrate this p oint broadly for quadratic parameters, V erdinelli and W asserman ( 2024 ) consider a toy example where X 1 , . . . , X n ∼ N ( µ, σ 2 ) and we are interested in estimating ψ = µ 2 . Using the estimator b ψ = ¯ X 2 n , it follo ws that √ n ( b ψ − ψ ) ⇝ N (0 , η 2 ) for some η 2 when µ = 0 , and n b ψ ⇝ σ 2 χ 2 1 when µ = 0 . Moreo ver, when µ is close to zero, its distribution will b e neither normal nor χ 2 1 , and its rate of conv ergence will b e b etw een 1 /n and 1 / √ n . This is a common (and p erhaps understudied problem) in statistics; other parameters such as kernel tw o-sample statistics ( Gretton et al. , 2012 ) and Reproducing Kernel Hilbert Space corrections ( Sejdino vic et al. , 2013 ) suffer from this misalignment of con v ergence around the n ull. In the con text of distributional discon tin uity design, the delta metho d fails for our functional Ψ 2 = R 1 0 [∆ Q ( u )] 2 du , so we m ust construct our h yp othesis tests and confidence interv als around this fact. W e first consider testing the n ull hypothesis of no distributional c hange ab ov e and b elow the cutoff; that is, ∆ Q ( u ) = 0 for all u ∈ (0 , 1) (or equiv alently that Ψ = 0 ). Then, we define tw o metho ds of constructing v alid (but conserv ativ e) confidence interv als for Ψ . 3.4.1 T esting the Null Hyp othesis In this section, we test the null h yp othesis of no causal effect. Under the null, it follows that ∆ Q ( u ) = 0 for all u ∈ (0 , 1) . F urthermore, as discussed in Section 3.3 it follows that √ nh ∆ b Q ( u ) ⇝ G ( u ) where G ( u ) is a mean-zero Gaussian pro cess with cov ariance kernel κ . F rom here, w e may apply the Karhunen-Loève theorem ( Karh unen , 1946 ; Lo ève , 1977 ) to expand G ( u ) as G ( u ) = ∞ X k =1 p λ k Z k ϕ k ( u ) where { ϕ k } ∞ k =1 are an orthonormal basis on L 2 ([0 , 1]) defined by the eigenfunctions of the co v ariance operator induced b y the k ernel κ ( u, v ) (with eigen v alues λ 1 , λ 2 , . . . ) and Z k ∼ N (0 , 1) for all k . Then, it follo ws that nh b Ψ 2 n = Z 1 0 √ nh ∆ b Q ( u ) 2 du ⇝ ∞ X k =1 λ k Z 2 k , (2) whic h is a second-order Gaussian (or Wiener-Itô) Chaos ( Janson , 1997 ). F rom here, there are sev eral w ays w e can go ab out conducting our h yp othesis test. The first option is to directly estimate the eigen v alues of κ ( u, v ) = Cov ( G ( u ) , G ( v )) and approximate Equation (2) via Mon te-Carlo simulation. Although in principle this app ears to b e a straigh tforw ard pro cedure, the v alidit y of suc h a test is not automatic as w e must estimate λ 1 , λ 2 , . . . , truncate P K k =1 λ k Z 2 k for some K , and appro ximate the null distribution via Mon te-Carlo simulation. In the follo wing theorem, w e formally establish the conditions required to obtain a v alid lev el- α test under this pro cedure. 15 Theorem 3 (Eigenv alue T est) . Supp ose that P ∞ k =1 λ k < ∞ with λ 1 > 0 and define the Monte-Carlo dr aws b T ∗ K n ,b = K n X k =1 b λ k,n Z 2 k,b wher e b λ k,n ar e the estimate d eigenvalues and Z k,b ∼ N (0 , 1) for k ≥ 1 and b = 1 , . . . , B n . L et b c ∗ n,α b e the empiric al (1 − α ) quantile c ompute d fr om { b T ∗ K n ,b } B n b =1 . Then, supp osing that B n → ∞ and K n → ∞ , for any α ∈ (0 , 1) it fol lows that lim n →∞ P H 0 ( nh b Ψ 2 n > b c ∗ n,α ) = α as long as || b κ n − κ || 2 = o P ( K − 1 / 2 n ) . By Theorem 3 , we can see that the conditions required to obtain a level- α test using Monte- Carlo sim ulation dep end crucially on the n umber of terms included in our truncation, K n . Imp ortan tly , there are tw o errors in tro duced by sim ulating the critical v alue: the truncation error, con trolled by P k>K n λ k , and the estimation error, controlled by √ K n || b κ n − κ || 2 . Thus, K n m ust div erge to eliminate the truncation error, but not so fast that the estimation error fails to v anish. One w ay to obtain a rate for K n is to assume some kind of p olynomial eigen v alue deca y of the form λ k ≲ k − β for some β > 1 ; in the follo wing corollary we formalize this notion. Corollary 1 (Eigenv alue Deca y) . Assume the c onditions of The or em 3 hold and supp ose that || b κ n − κ || 2 = O p ( r n ) for some r n → 0 . F urthermor e, supp ose that ther e exist c onstants C λ > 0 and β > 1 such that for al l k , λ k ≤ C λ k − β . Then, it fol lows that letting K n ≍ r − 2 / (2 β − 1) n b alanc es the trunc ation bias and estimation err or, such that X k>K n λ k = O r 2( β − 1) 2 β − 1 n and p K n || b κ n − κ || 2 = O p r 2( β − 1) 2 β − 1 n . Corollary 1 clarifies the relationship b etw een b oth the truncation bias and estimation error, as well as the K n and the rate of eigenv alue deca y . Clearly , faster eigen v alue decay (i.e. a larger β ) allo ws for a smaller K n ; in this setting there will be less sensitivity to estimating κ . Con versely , slow er deca y requires a larger K n and therefore requires more accurate estimation of the co v ariance op erator. A natural c hoice for the rate is r n ≍ ( nh ) − 1 / 2 , as this aligns with the effective sample size in a discon tin uity design setting. While Theorem 3 and Corollary 1 establish a useful testing framework, choosing K n in practice can be tric ky . That, com bined with the computational burden of Monte-Carlo simu- lation, suggests the eigen v alue test may b e less than desirable for practitioners. Alternatively , one could leverage Theorem 5 of Luedtke et al. ( 2018 ) to obtain a conserv ative, but computa- tionally simple statistical test. Specifically , Luedtk e et al. ( 2018 ) deriv e non-parametric tests of equalit y in distribution betw een unkno wn functions; they sho w that suc h a test also man- ifests as a Gaussian c haos, whic h can b e easily b ounded by applying a one-sided Chebyshev inequalit y . In the follo wing prop osition, we lev erage their results to obtain a conserv ative test for no causal effect. 16 Prop osition 1 (Conserv ative T est) . L et µ = R 1 0 κ ( u, u ) du and σ 2 = 2 R 1 0 R 1 0 κ ( u, v ) 2 du dv . Fix α ∈ (0 , 1) and define c ub 1 − α = µ + σ p (1 − α ) /α . Then, by Lue dtke et al. ( 2018 ) it fol lows that lim sup n →∞ P H 0 ( nh b Ψ 2 n > b c ub n, 1 − α ) ≤ α wher e b c ub n, 1 − α = b µ + b σ p (1 − α ) /α for any estimators such that b µ p → µ and b σ p → σ . Note that we may equiv alently define µ and σ 2 as P ∞ k =1 λ k and 2 P ∞ k =1 λ 2 k , resp ectiv ely . Prop osition 1 provides us with a more con venien t statistical test that requires few er assump- tions on the estimation error of κ . Practically sp eaking, the condition K 1 / 2 n || b κ n − κ || 2 = o P (1) required in Theorem 3 means the eigen v alue test is only trust worth y when b κ n is estimated accurately enough that one can include man y eigenv alues without the sim ulated critical v alue b ecoming sensitive to K n . With a small sample size, b κ n ma y only supp ort a small K n , making the test fragile to the truncation c hoice and p otentially anti-conserv ative if K n is pushed to o large. In such settings the conserv ative test is preferable; it a voids estimating the full eigen- sp ectrum and instead requires only consisten t estimation of µ and σ . Finally , w e note that one may reject the n ull hypothesis using a one-sided 1 − α upp er confidence b ound for Ψ using the interv als defined in the follo wing section. 3.4.2 Constructing Confidence Interv als As discussed in V erdinelli and W asserman ( 2024 ), constructing confidence interv als for quadratic parameters with uniformly correct cov erage (with length n − 1 / 2 a wa y from the null and length n − 1 at the null) is an unsolved problem in statistics. In practice, w e deal with this problem b y constructing in terv als that are conserv ative near the n ull. W e consider t wo approaches for constructing suc h interv als. Later, in Section 5 , w e compare the cov erage and width of b oth metho ds via sim ulation. First, w e consider constructing a confidence interv al for Ψ using the uniform confidence band defined for the quan tile treatmen t effect. As shown b y Chiang et al. ( 2019 ), we can construct a m ultiplier bo otstrap pro cess G ∗ n suc h that G ∗ n ⇝ G . Therefore, if we let b c n,α b e the 1 − α conditional quantile of sup u | G ∗ n ( u ) | , it follows that ∆ b Q ( u ) ± 1 √ nh b c n,α yields a 1 − α confidence band. With that in mind, let a n ( u ) = ∆ b Q ( u ) − b c n,α √ nh and b n ( u ) = ∆ b Q ( u ) + b c n,α √ nh . Then, it is clear that ov er the interv al [ a, b ] that max x ∈ [ a,b ] x 2 = b 2 , a ≥ 0 a 2 , b ≤ 0 , max { a 2 , b 2 } , a < 0 < b and min x ∈ [ a,b ] x 2 = a 2 , a ≥ 0 b 2 , b ≤ 0 0 a < 0 < b. Therefore, if w e define the upp er and lo wer b ounds M n ( u ) = max { a 2 n ( u ) , b 2 n ( u ) } and M n ( u ) = ( max { a n ( u ) , 0 } ) 2 + ( min { b n ( u ) , 0 } ) 2 then it becomes straightforw ard to construct the interv al C n = [ R 1 0 M n ( u ) du, R 1 0 M n ( u ) du ] . Under the regularity conditions established in Chiang et al. ( 2019 ) it immediately follows that lim inf n →∞ P (Ψ 2 ∈ C n ) ≥ 1 − α. 17 Alternativ ely , we can artificially widen our confidence interv al follo wing the approach of V erdinelli and W asserman ( 2024 ). Sp ecifically , w e could define C ′ n = " b Ψ 2 n ± z 1 − α/ 2 r b s 2 n + c 2 nh # (3) where b s n is the estimated standard deviation of Ψ 2 , z 1 − α/ 2 is the 1 − α/ 2 quan tile of a standard Normal distribution, and c is some constan t, such as V ( Y ) . W e no w confirm that this provides a v alid, but possibly conserv ativ e, confidence interv al. Lemma 2 (Conserv ative Interv al) . L et C ′ n b e the interval define d in Equation (3) for some c onstant c . Supp ose that E [ b Ψ 2 n − Ψ 2 ] = o (( nh ) − 1 / 2 ) and V ( b Ψ 2 n ) = o (( nh ) − 1 ) . Then, it fol lows that P (Ψ 2 ∈ C ′ n ) = o (1) . In practice, either of the prop osed metho ds for constructing confidence interv als for Ψ is reasonable; their empirical widths are further discussed in Section 5 . A dditionally , as noted in Section 3.4.1 , we can chec k the n ull h yp othesis of no causal effect by chec king if zero is in C n or C ′ n . In the following section, we extend our analysis to the fuzzy treatment assignmen t setting. 3.5 F uzzy Distributional Discon tin uity Design In many applications treatment assignment ab o ve and b elow the cutoff is not p erfectly sharp. That is, although the probabilit y of receiving treatmen t jumps discontin uously at the thresh- old, some units below the threshold ma y receiv e treatment, and some ab ov e may not; such settings are referred to as “fuzzy” regression discontin uity designs ( Hahn et al. , 2001 ). In- tuitiv ely , in this setting the cutoff acts as an instrument for treatment status; crossing the threshold changes the lik eliho o d of treatment but does not fix it. No w, it no longer makes sense to directly compare outcome distributions abov e and b elow the cutoff b ecause these groups differ in more than treatmen t status. Notably , F randsen et al. ( 2012 ) extend the framew ork proposed b y Angrist et al. ( 1996 ) to define lo cal alwa ys-tak ers, nev er-takers, com- pliers, defiers, and indefinites. T o do so, let X i b e the running v ariable with cutoff x 0 and no w let A i ( x ) denote unit i ’s p oten tial treatment status if the running v ariable w ere x . Observ ed treatmen t is then A i = A i ( X i ) . Then, we define the one-sided treatment limits (when they exist) as A − i = lim x ↑ x 0 { A i ( x ) } and A + i = lim x ↓ x 0 { A i ( x ) } suc h that A − i is the treatment status that w ould b e received if the running v ariable approaches the cutoff from the left and A + i the treatment that would b e received from the righ t, then we ma y define the following mutually exclusive groups: • L o c al A lways-T akers : AT = { i : A − i = 1 , A + i = 1 } . • L o c al Never-T akers : N T = { i : A − i = 0 , A + i = 0 } . • L o c al Compliers : C = { i : A − i = 0 , A + i = 1 } • L o c al Defiers : D = { i : A − i = 1 , A + i = 0 } 18 • L o c al Indefinites : I = { i : one or both of ( A − i , A + i ) do not exist } . No w, w e fo cus on the sub-p opulation of lo cal compliers when defining a fuzzy distribu- tional effect, as this is the group whose treatmen t status is actually c hanged b y the treatmen t discon tinuit y . In this setting, we need require additional assumptions for causal iden tification. Bey ond the assumptions discussed in Section 3.1 , w e need to assume ( v ) T r e atment Disc ontinuity: lim x ↓ x 0 P ( A = 1 | X = x ) > lim x ↑ x 0 P ( A = 1 | X = x ) . ( v i ) L o c al Smo othness: E [ A ± | X = x ] and F Y ( a ) | G = g ,X ( y | g , x ) are contin uous at X = x 0 , the latter for all y , eac h a ∈ { 0 , 1 } , and each g ∈ {AT , N T , C , D } . ( v ii ) Monotonicity : lim x → x 0 P ( A + ≥ A − | X = x ) = 1 and P ( I ) = 0 . Assumption ( v ) simply requires that the probability of treatmen t changes discontin uously at the threshold X = x 0 . Assumption ( v i ) requires that the fraction of units that w ould tak e treatmen t evolv es smo othly as the treatmen t cutoff is approac hed. This guaran tees that the only discontin uit y of the observed treatmen t assignmen t is through the discon tinuit y at X = x 0 and not through some hidden break in the treatment assignment mechanism. F urthermore, ( v i ) requires that (within eac h compliance group) the distribution of the p otential outcomes v aries smo othly with the running v ariable at the cutoff. This again ensures that an y discon- tin uity in observed outcome distributions is attributable to the change in the probabilit y of treatmen t, rather than a discontin uity in the p otential outcome distributions themselves. As- sumption ( v ii ) rules out the existence of defiers (units that alwa ys go against their treatment assignmen t) and indefinites (units with ill-defined treatment limits) in a neigh b orho o d of the treatmen t discontin uity . Intuitiv ely , this assumption implies that all units w eakly comply with the treatment assignment mec hanism, such that mo ving from b elow to ab o ve the cutoff can only increase the chance of treatment. F urthermore, it ensures that every unit has w ell-defined p oten tial treatmen t statuses. With these assumptions in place, it now follows that the group iden tified b y the discon tin uity are gen uine compliers. With assumptions ( v ) - ( v ii ) in place F randsen et al. ( 2012 ) show that the cumulativ e dis- tribution functions for compliers abov e and b elow the cutoff are iden tified as F 1 |C ( y ) = lim x ↓ x 0 E [ I ( Y ≤ y ) A | X = x ] − lim x ↑ x 0 E [ I ( Y ≤ y ) A | X = x ] lim x ↓ x 0 E [ A | X = x ] − lim x ↑ x 0 E [ A | X = x ] and F 0 |C ( y ) = lim x ↓ x 0 E [ I ( Y ≤ y )(1 − A ) | X = x ] − lim x ↑ x 0 E [ I ( Y ≤ y )(1 − A ) | X = x ] lim x ↓ x 0 E [(1 − A ) | X = x ] − lim x ↑ x 0 E [(1 − A ) | X = x ] . Therefore, it follo ws that the W asserstein effect for compliers is defined and iden tified as Ψ C = Z 1 0 Q 1 |C ( u ) − Q 0 |C ( u ) 2 du 1 / 2 where Q a |C ( u ) = inf { y : F a |C ( y ) ≥ u } for a ∈ { 0 , 1 } are the quan tiles of the complier cum ulative distribution functions ab ov e and below the treatmen t discon tin uity . In terpretation of the fuzzy W asserstein effect follows analogously to the sharp case; Ψ 2 C still acts as a distributional analogue upper b ounding the lo cal a verage treatment effect at the cutoff. Moreov er, the same inequalities and decomp ositions can b e extended to Ψ 2 C . Now that we hav e defined the W asserstein effect in a fuzzy distributional discon tinuit y design framework, we discuss estimation. 19 3.5.1 Estimation of the F uzzy W asserstein Effect T o estimate the fuzzy W asserstein effect, w e can directly extend the procedure describ ed in Section 3.3 , again follo wing the work of Chiang et al. ( 2019 ). F or a ∈ { 0 , 1 } we define G a ( y | x ) = E [ I ( Y ≤ y ) I ( A = a ) | X = x ] and π a ( x ) = E [ I ( A = a ) | X = x ] . Then, taking the corresp onding one-sided limits we hav e F a |C ( y ) = G a ( y | x + 0 ) − G a ( y | x − 0 ) π a ( x + 0 ) − π a ( x − 0 ) . T o estimate G a ( y | x ± 0 ) and π a ( x ± 0 ) we use one-sided lo cal p olynomial regression, mirroring the sharp case. Sp ecifically , for G a ( y | x ) w e solv e b α G a,p ( y ) = arg min α ∈ R p +1 n X i =1 δ ± i I ( Y i ≤ y ) I ( A i = a ) − r p X i − x 0 h T α 2 K X i − x 0 h . Analogously , for π a ( x ± 0 ) w e solv e b α π a,p b y letting I ( A i = a ) b e the dep enden t v ariable. Then, it follows that b G a ( y | x ± 0 ) = e T 0 b α G a,p ( y ) and b π a ( x ± 0 ) = e T 0 b α π a,p , which yields lo cal W ald estimator b F a |C ( y ) = b G a ( y | x + 0 ) − b G a ( y | x − 0 ) b π a ( x + 0 ) − b π a ( x − 0 ) . Finally , the complier quantile function is giv en by b Q a |C ( u ) = inf { y : b F a |C ( y ) ≥ u } . Statistical inference for the fuzzy W asserstein effect follo ws exactly as in the sharp case, as described in Section 3.4 . 4 Distributional Kink Designs In man y practical settings, we do not observ e a discon tinuit y in the treatment assignmen t, but rather a kink or c hange in slop e of the policy . The idea here is the same as under regression discontin uity designs; it is assumed that units arbitrarily close to either side of the p olicy kink are comparable, and therefore a causal interpretation can b e justified. A canonical application of regression kink designs is that of Nielsen et al. ( 2010 ), who estimate the causal effect of studen t grants on college enrollmen t in Denmark. Here, their running v ariable X is some con tinuous measure of parental income and the b enefit b ( X ) exhibits a kink at different eligibilit y thresholds; that is, the full grant is offered up to some level X = x 1 , a linear phaseout o ccurs b et ween x 1 and x 2 (with a decreasing b enefit in x given), and zero b enefit is offered for paren tal incomes greater than x 2 . Other notable early applications of regression kink designs can be found in Guryan ( 2001 ) and Dahlberg et al. ( 2008 ); interested readers should refer to Card et al. ( 2016 ) for a review, and to Ando ( 2017 ); Ganong and Jäger ( 2018 ) for discussions of inference and robustness in finite samples. In a sharp regression kink design the b enefit is set deterministically according to the kno wn assignmen t rule b ( · ) . One causal target is the lo cal av erage marginal effect of the b enefit on the outcomes; that is, the slop e of the dose-response curv e at the p olicy kink, τ ′ = ∂ ∂ t E [ Y ( t ) | X = x 0 ] t = b ( x 0 ) ( i ) = µ ′ Y ( x + 0 ) − µ ′ Y ( x − 0 ) b ′ ( x + 0 ) − b ′ ( x − 0 ) 20 0 1 2 3 4 5 6 0 0 . 5 1 1 . 5 2 b ( x ) y Figure 4: Example of a hypothetical regression kink design. where equality ( i ) follows by the identifying assumptions outlined in Card et al. ( 2015 ) and µ ′ Y ( x + 0 ) = lim x ↓ x 0 ∂ ∂ x E [ Y | X = x ] , b ′ ( x + 0 ) = lim x ↓ x 0 ∂ ∂ x b ( x ) , and analogous definitions are giv en for µ ′ Y ( x − 0 ) and b ′ ( x − 0 ) . F or example, in Figure 4 w e can see an example of a regression kink design, where there is a clear kink in the dose-resp onse curv e at X = x 0 . As w as the case for regression discon tin uity design, far more information can b e gained by considering distributional causal effects. Notably , quan tile treatmen t effects in kinked designs hav e b een explored by Chiang and Sasaki ( 2019 ), Chiang et al. ( 2019 ), Chen et al. ( 2020 ), and W ang and Zhang ( 2025 ). How ever, these approaches suffer from the same set of dra wbac ks as b efore — namely , difficult y in implementation and interpretation. Th us, in what follows, w e sho w that the W asserstein deriv ative at the p olicy kink pro vides a clean generalization of traditional kink design effects. Let g ( t, x, ε ) b e a function of the b enefit, running v ariable, and unobserv ables. Then, we ma y define the coun terfactual Y ( t ) = g ( t, X , ε ) and the observ ed outcome Y = g ( b ( X ) , X , ε ) . Again, w e let P t | x denote the conditional distribution of Y ( t ) | X = x under some b enefit or treatmen t lev el t = b ( x ) . In this setting, w e can think of P t | x as a distribution along some absolutely contin uous path of distributions in the running v ariable x . This allo ws us to define the W asserstein deriv ative at the p olicy kink X = x 0 as Ψ ′ = lim δ → 0 W 2 ( P t 0 + δ | x 0 , P t 0 | x 0 ) | δ | = Z 1 0 ∂ ∂ t Q Y ( t ) | X = x 0 ( u ) t = b ( x 0 ) ! 2 du 1 / 2 where w e define t 0 = b ( x 0 ) . In tuitiv ely , Ψ ′ represen ts the instan taneous rate at which proba- bilit y mass mov es or flows at the p olicy kink ( Am brosio et al. , 2005 ). While the traditional 21 regression kink design estimand τ ′ measures how the cen ter of mass mo ves or drifts through the kink, Ψ ′ measures how the entire distribution mo ves. No w w e consider iden tification of the W asserstein deriv ative at the kink. Assume that b ′ ( x + 0 ) = b ′ ( x − 0 ) and b ( · ) is a kno wn function. Naturally , we exp ect the b ehavior of F Y | X ( y | x ) near x 0 to provide information ab out the causal effect; how ev er, making this in tuition rigorous is subtle. F or iden tification and in terpretation, the classical approac hes of Card et al. ( 2015 ) and Chiang and Sasaki ( 2019 ) can b e surprisingly difficult to work with. In the case of mean effects, Card et al. ( 2015 ) show that τ ′ can b e written as a w eighted a verage of individual-level marginal effects where the w eights dep end on unobserv ables. F or quantile effects, Chiang and Sasaki ( 2019 ) obtain an analogous weigh ted-av erage of structural deriv atives ev aluated along some laten t b oundary set. Both settings can b e hard to translate into standard treatment-effect language, e.g. “what is the effect of a marginal increase in the b enefit.” F urthermore, these iden tification strategies can require additional regularity conditions to make the weigh ts well- defined; for example, Chiang and Sasaki ( 2019 ) suggest a rank-inv ariance assumption. More recen tly , W ang and Zhang ( 2025 ), prop osed a more direct identification strategy that leads to a cleaner interpretation. Sp ecifically , W ang and Zhang ( 2025 ) define the lo cal treatmen t effect at the kink to be ∆ ϕ = ∂ ∂ t ϕ ( F Y ( t ) | X = x 0 ) t = b ( x 0 ) = lim δ → 0 ϕ ( F Y ( t 0 + δ ) | X = x 0 ) − ϕ ( F Y ( t 0 ) | X = x 0 ) δ where ϕ is some Hadamard differentiable functional. Now, the estimand describ es a genuinely lo cal av erage marginal effect of a small p olicy-induced change in the b enefit level around b ( x 0 ) for units at X = x 0 . Under some regularity conditions, W ang and Zhang ( 2025 ) show that the causal effect ∆ ϕ is iden tified as ϕ ′ F Y | X = x 0 (DRKD( · )) where DRKD( · ) is the distributional regression kink design estimand, DRKD( y ) = ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) b ′ ( x + 0 ) − b ′ ( x − 0 ) . In the case of distributional kink designs, w e let ϕ u ( F ) = F − 1 ( u ) denote the u -quantile functional. Then, since Y is univ ariate, it follows that the W asserstein deriv ative at the p olicy kink is iden tified as Ψ ′ = Z 1 0 ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) b ′ ( x + 0 ) − b ′ ( x − 0 ) ! 2 du 1 / 2 . Notably , if Y is m ultiv ariate this identification strategy do es not work as the W asserstein distance is no longer a function of the limiting conditional quan tiles; establishing and iden ti- fying suc h distributional causal effects in high dimensional settings is an in teresting and op en question. In the follo wing section, we extend the w ork of W ang and Zhang ( 2025 ) to handle iden tification of fuzzy treatment assignment in kink designs. 4.1 F uzzy Distributional Kink Designs Although W ang and Zhang ( 2025 ) establish a clean and in terpretable framework for identifica- tion of causal effects in sharp kink designs, they do not consider the fuzzy treatmen t assignment setting, where the running v ariable induces a kink in treatment prop ensities rather than deter- ministically setting a b enefit level. Here, we observe a noisy analogue of b ( x ) due to imp erfect 22 compliance, measurement error, or some other unobserved determinants of b eha vior. More formally , supp ose w e observe some b ( X , η ) where η captures unobserved v ariation in the treat- men t assignment. Now, there is no single baseline lev el of treatmen t, so we m ust define an analogous v ersion of ∆ ϕ in the fuzzy setting, and we must establish additional structure/con- ditions to ensure that the unobserved determinants of treatmen t evolv e smo othly in x around the kink. T o formalize this, we first define a nonseparable outcome mo del and establish a fuzzy kink design characterization. Assumption 1 (Nonseparable Mo del) . Supp ose there exist unobserv ables ( ε, η ) and a measur- able structural function g : R × X × E → R suc h that: ( i ) (Poten tial outcomes) F or eac h t ∈ R , Y ( t ) = g ( t, X, ε ) . ( ii ) (F uzzy assignmen t) T = b ( X , η ) for some measurable b : X × H → R . ( iii ) (Consistency) Y = Y ( T ) = g ( b ( X, η ) , X , ε ) where ε ∈ E ⊂ R d ε , η ∈ H ⊂ R d η , and X ⊂ R is the supp ort of the running v ariable. Assumption 2 (F uzzy Kink Characterization) . Let I x 0 b e a closed in terv al con taining the kink p oin t x 0 . Then, supp ose that: ( i ) F or a.e. η , the map x 7→ b ( x, η ) is contin uous on I x 0 and con tinuously differen tiable on I x 0 \ { x 0 } , with finite one-sided deriv atives b ′ ( x + 0 , η ) and b ′ ( x − 0 , η ) . ( ii ) Let µ B ( x ) = E [ b ( X, η ) | X = x ] and assume that ∆ B := µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) = 0 where µ ′ B ( x + 0 ) = lim x ↓ x 0 ∂ ∂ x E [ b ( x, η ) | X = x ] and µ ′ B ( x − 0 ) = lim x ↑ x 0 ∂ ∂ x E [ b ( x, η ) | X = x ] . By Assumption 2 , w e guarantee that there is contin uity at the kink p oint, but a discontin u- it y in the first order deriv ative. Moreov er, under b oth Assumption 1 and Assumption 2 , letting T 0 = b ( x 0 , η ) and ω ( η ) = b ′ ( x + 0 , η ) − b ′ ( x − 0 , η ) , we can define the coun terfactual treatment at the kink T δ = T 0 + δ ω ( η ) ∆ B , and the associated coun terfactual outcome Y δ = g ( T δ , x 0 , ε ) . R emark 2 (Counterfactual Definition) . The definition of the counterfactual T δ ma y seem coun terintuitiv e at first glance since the interv ention dep ends on the kink-resp onsiv eness ω ( η ) = b ′ ( x + 0 , η ) − b ′ ( x − 0 , η ) ; it might seem more natural to define an interv en tion suc h as T δ = T 0 + δ , which shifts every unit by the same amount. The reason the w eighting ω ( η ) / ∆ B is required is that in a fuzzy kink design, the iden tifying v ariation comes from the change in the slop e of E [ b ( X, η ) | X ] at X = x 0 . A small change in the running v ariable shifts a unit’s treatment b y an amount prop ortional to ω ( η ) , so units more resp onsiv e to the kink con tribute more to the lo cal c hange actually observed in the data. The reason we normalize b y ∆ B = E [ ω ( η ) | X = x 0 ] is to allow δ to be in terpreted as a one-unit c hange in the a v erage treatmen t at the kink, since no w E [ T δ − T 0 | X = x 0 ] = δ . Thus, even though T δ do esn’t cor- resp ond to a uniform shift in treatmen t, it is the coun terfactual that aligns with the observed kink-v ariation, with w eights determined by each unit’s kink-responsiveness. 23 With this counterfactual outcome in place, we ma y now define the fuzzy lo cal treatment effect at the kink for the functional ϕ as ∆ F ϕ = ∂ ∂ δ ϕ F Y δ | X = x 0 δ =0 = lim δ → 0 ( ϕ F Y δ | X = x 0 − ϕ F Y 0 | X = x 0 δ ) , pro vided the limit exists. Next, our goal is to obtain a structural representation of ∆ F ϕ analo- gous to Lemma 1 of W ang and Zhang ( 2025 ). Ho wev er, b efore doing so w e m ust outline a few additional assumptions. First, we need iden tical smo othness assumptions to those required in W ang and Zhang ( 2025 ); for completeness, we write them out in what follows. Assumption 3 (Smo oth F unctional) . Let F b e the space of all one-dimensional distribution functions. Then, assume the functional ϕ : F → R is Hadamard differentiable at F Y | X = x 0 , with its Hadamard deriv ative denoted by ϕ ′ F Y | X = x 0 . Assumption 4 (Smo oth Structural F unctions) . The function g ( t, x, e ) is contin uously differen- tiable in ( t, x ) for each e ∈ E , with con tinuous partial deriv atives g 1 ( t, x, e ) = ∂ ∂ t g ( t, x, e ) and g 2 ( t, x, e ) = ∂ ∂ x g ( t, x, e ) . As discussed in W ang and Zhang ( 2025 ), Assumption 4 is analogous to the smo othness conditions imp osed in Card et al. ( 2015 ), but weak er than those required by the identification strategy of Chiang and Sasaki ( 2019 ). Under this smo othness condition, the partial deriv ative of h ( x, e, u ) := g ( b ( x, u ) , x, e ) with resp ect to x is giv en b y ∂ ∂ x h ( x, e, u ) := h x ( x, e, u ) = b ′ ( x, u ) g 1 ( b ( x, u ) , x, e ) + g 2 ( b ( x, u ) , x, e ) . (4) An implication of Equation (4) is that x 7→ g 1 ( b ( x, u ) , x, e ) and x 7→ g 2 ( b ( x, u ) , x, e ) are con tinuous at x 0 , but x 7→ h ( x, e, u ) is not con tinuously differentiable at x 0 due to the dis- con tinuit y in b ′ ( x, u ) . Finally , w e must establish a few regularity conditions in the spirit of conditions R 1( i ) and R 1( ii ) of W ang and Zhang ( 2025 ). Assumption 5 (Regularit y 1) . Let Z = ( ω ( η ) / ∆ B ) g 1 ( T 0 , x 0 , ε ) and assume the following con- ditions hold: ( i ) F or eac h c > 0 , P Y δ − Y 0 − δ Z ≥ c | δ | | X = x 0 = o ( | δ | ) as δ → 0 . ( ii ) The conditional distribution of ( Y , Z ) giv en X = x 0 is absolutely contin uous with resp ect to the Leb esgue measure and has a joint density f Y ,Z | X ( y , y ′ | x 0 ) that is contin uous in y for all y ′ . F urthermore, assume there exists a Leb esgue in tegrable function ϖ : R → R with R | y ′ ϖ ( y ′ ) | dy ′ < ∞ suc h that for all ( y, y ′ ) , f Y ,Z | X ( y , y ′ | x 0 ) ≤ | ϖ ( y ′ ) | . 24 Assumption 5 ( i ) is a sto c hastic differentiabilit y condition along the counterfactual path δ 7→ Y δ induced by the fuzzy kink. It requires that, conditional on X = x 0 , the change in outcomes from shifting treatmen t from T 0 to T 0 + δ ( ω ( η ) / ∆ B ) admits a first-order expansion with a remainder that is small enough to control. Assumption 5 ( ii ) ensures that the joint densit y ( Y , Z ) | X = x 0 is well-behav ed. Relativ e to the regularity conditions required in W ang and Zhang ( 2025 ), the difference now is that the deriv ative direction can v ary . That is, Z includes the random compliance weigh t ω ( η ) / ∆ B , so the domination and integrabilit y requiremen ts m ust control the weigh ted marginal effect ( ω ( η ) / ∆ B ) g 1 ( T 0 , x 0 , ε ) and not just g 1 ( T 0 , x 0 , ε ) alone. Note that w e don’t explicitly need to require an additional smooth outcome distribution assumption lik e Assumption S4 of W ang and Zhang ( 2025 ) since Assumption 5 ( ii ) already implies contin uit y of y in f Y | X ( · | x 0 ) . With these conditions in place, we can no w establish a structural representation of ∆ F ϕ . Lemma 3 (F uzzy Structural Represen tation) . Supp ose that Assumptions 1-5 hold. Then the fuzzy lo c al tr e atment effe ct at the kink ∆ F ϕ admits the r epr esentation ∆ F ϕ = ϕ ′ F Y | X = x 0 E − f Y | X ( · | x 0 ) ω ( η ) ∆ B g 1 ( T 0 , x 0 , ε ) | Y = · , X = x 0 . As discussed in W ang and Zhang ( 2025 ), Lemma 3 shows that the fuzzy lo cal treatment effect at the kink can b e expressed as the Hadamard deriv ative of ϕ applied to a conditional exp ectation. Notably , this conditional exp ectation is analogous to the lo cal av erage structural deriv ative discussed in Ho derlein and Mammen ( 2007 , 2009 ). The primary difference b etw een Lemma 3 and Lemma 1 of W ang and Zhang ( 2025 ) is that now the conditional exp ectation is compliance weigh ted; it av erages g 1 ev aluated at eac h unit’s (random) baseline treatment T 0 = b ( x 0 , η ) , w eigh ted by the unit’s kink-resp onsiv eness ω ( η ) , suc h that units whose treatment is more strongly shifted by the kink contribute more to the iden tified marginal effect. With this structural representation in place, we can no w finalize pro of for causal iden tification of the fuzzy lo cal treatment effect at the kink; ho wev er, we first need to establish a few more assumptions and regularit y conditions. Assumption 6 (Smooth Disturbance Distributions) . The conditional distribution of ( ε, η ) giv en X = x is absolutely con tin uous with respect to Leb esgue measure. F urthermore, it admits a density f ε,η | X ( e, u | x ) that is contin uously differen tiable in x on I x 0 for all ( e, u ) . Finally , assume there exists some Lebesgue in tegrable function ϖ ( e, u ) suc h that sup x ∈ I x 0 ∂ ∂ x f ε,η | X ( e, u | x ) ≤ | ϖ ( e, u ) | . Finally , for eac h y assume I ( h ( x 0 + t, e, u ) ≤ y ) → I ( h ( x 0 , e, u ) ≤ y ) as t → 0 for all ( e, u ) . Assumption 7 (Regularit y 2) . Recall that we use the notational shorthand h x ( x ± 0 , ε, η ) := ∂ ∂ x h ( x ± 0 , ε, η ) . Assume the follo wing conditions hold: ( i ) F or eac h c > 0 , P h ( x 0 + δ, ε, η ) − h ( x 0 , ε, η ) − δ h x ( x + 0 , ε, η ) ≥ c | δ | | X = x 0 = o ( | δ | ) P h ( x 0 + δ, ε, η ) − h ( x 0 , ε, η ) − δ h x ( x − 0 , ε, η ) ≥ c | δ | | X = x 0 = o ( | δ | ) as δ ↓ 0 and δ ↑ 0 , respectively . 25 ( ii ) The conditional distributions of ( Y , h x ( x ± 0 , ε, η )) given X = x 0 are absolutely contin- uous with resp ect to the Leb esgue measure with densities f Y ,h ± x | X ( y , y ′ | x 0 ) that are con tinuous in y for eac h fixed y ′ . Moreov er, there exists a Leb esgue in tegrable function ϖ h : R → R with R | y ′ ϖ h ( y ′ ) | dy ′ < ∞ suc h that for all y, y ′ , f Y ,h ± x | X ( y , y ′ | x 0 ) ≤ | ϖ h ( y ′ ) | . ( iii ) Supp ose that the conditional distribution of η giv en X = x is absolutely con tinuous with resp ect to Leb esgue measure, with conditional density f η | X ( u | x ) that is contin uously differen tiable in x on I x 0 . F urthermore, that there exists some Leb esgue in tegrable function ϖ η ( u ) such that sup x ∈ I x 0 ∂ ∂ x f η | X ( u | x ) ≤ ϖ η ( u ) . ( iv ) Assume the function b ( x, u ) is contin uous in x at x 0 for eac h u and differen tiable on eac h side of x 0 with one-sided deriv atives b ′ ( x ± 0 , u ) . Finally , assume there exist Leb esgue in tegrable functions κ 0 , κ 1 suc h that sup x ∈ I x 0 | b ( x, u ) | ≤ κ 0 ( u ) and sup x ∈ I x 0 \{ x 0 } ∂ ∂ x b ( x, u ) ≤ κ 1 ( u ) together with R κ 0 ( u ) ϖ η ( u ) du < ∞ and E [ κ 1 ( η ) | X = x 0 ] < ∞ . Assumption 6 allows for ( ε, η ) to b e b oth correlated with X and to v ary with x , how ever, it requires this v ariation be smo oth on I x 0 . Assumption 7 ( i ) is a local linearization requiremen t for h ( x, ε, η ) around x 0 that ensures that small changes in x induce appro ximately linear shifts in Y that are controlled b y the one-sided deriv atives h x ( x ± 0 , ε, η ) . Assumption ( ii ) is another regularity condition ensuring the joint distribution of ( Y , h x ( x ± 0 , ε, η )) | X = x 0 is well b eha ved at the kink. Finally , ( iii ) ensures that any selection effect arising from x 7→ f η | X ( · | x ) is smooth and therefore do es not itself generate a kink and ( iv ) adds integrabilit y conditions on b ( x, η ) (and its deriv ative). With these assumptions in place, w e no w establish causal iden tification of the fuzzy local treatmen t effect at the kink. Theorem 4 (F uzzy Kink Design Iden tification) . Supp ose the c onditions of L emma 3 and Assumptions 6-7 hold. Then, the fuzzy lo c al tr e atment effe ct at the kink is identifie d as ∆ F ϕ = ϕ ′ F Y | X = x 0 FDRKD( · ) , wher e FDRKD( · ) is the fuzzy distributional r e gr ession kink design estimand, FDRKD( y ) = ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) . Analogously to the identification results of W ang and Zhang ( 2025 ) in sharp kink designs, Theorem 4 shows that the fuzzy lo cal treatment effect at the kink is identified by applying the Hadamard deriv ative of the functional ϕ in the direction of the FDRKD estimand. Imp ortantly , FDRKD( y ) is in terpretable as the lo cal distributional effect per unit of the kink-induced 26 treatmen t c hange, represen ted as a distributional lo cal W ald ratio. Clearly , in the case of distributional kink designs the fuzzy W asserstein deriv ative at the kink is iden tified as Ψ ′ C = Z 1 0 ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) ! 2 du 1 / 2 after letting ϕ b e the quantile function. No w that w e ha ve established identification in the fuzzy kink design setting, in the next section w e discuss estimation. 4.2 Estimation and Inference for the Kinked W asserstein Effect In this section, w e discuss estimation of the W asserstein deriv ative at a p olicy kink. Our strategy will build off of the w ork of Chiang et al. ( 2019 ) and the framew ork established in Section 3.3 . First, note that Q Y | X ( u | x + 0 ) = Q Y | X ( u | x − 0 ) =: Q Y | X ( u | x 0 ) due to the con tinuit y at x 0 . Second, recall that the deriv ativ e with respect to x of the quantile function can b e written as ∂ ∂ x Q Y | X ( u | x ± 0 ) = − ∂ ∂ x F Y | X Q Y | X ( u | x 0 ) | x ± 0 f Y | X Q Y | X ( u | x 0 ) | x 0 . Th us, after taking the difference ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) and then dividing by the first-stage kink µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) , it is clear that ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) = − FDRKD Q Y | X ( u | x 0 ) f Y | X Q Y | X ( u | x 0 ) | x 0 . (5) With this in mind, w e can see that estimation of Q Y | X ( u | x 0 ) is the same as in Section 3.3 ; the only additional terms to estimate are ∂ ∂ x F Y | X ( y | x ± 0 ) , µ ′ B ( x ± 0 ) , and f Y | X ( · | x 0 ) . W e b egin b y considering estimation of the first t wo terms. Recall that in Section 3.3 we considered a one-sided T aylor expansion of F Y | X ( y | x ) about x = x 0 . Sp ecifically , we defined α a,p ( y ) = F Y | X ( y | x ± 0 ) , F (1) Y | X ( y | x ± 0 ) h 1! , . . . , F ( p ) Y | X ( y | x ± 0 ) h p p ! T . and then estimated α a,p ( y ) via one-sided local w eigh ted least squares. Consequently , lev eraging this exact approac h it follo ws that ∂ ∂ x b F Y | X ( y | x ± 0 ) = 1 h e T 1 b α ± ,p ( y ) where e 1 = (0 , 1 , 0 , . . . , 0) T . Notably , we can rep eat this approach to estimate µ ′ B ( x ± 0 ) . Sp ecif- ically , if w e use the same lo cal polynomial estimation, no w with outcomes T i = b ( X i , η i ) , i.e. b β ± ,p = arg min β ∈ R p +1 n X i =1 δ ± i T i − r p X i − x 0 h T β ! 2 K X i − x 0 h , (6) then w e can similarly obtain the estimator b µ ′ B ( x ± 0 ) = 1 h e T 1 b β ± ,p . Bias correction can similarly b e implemen ted following the same steps discussed in Section 3.3 . Finally , we note that there 27 are many w ays one could estimate f Y | X ( · | x 0 ) . One simple metho d w ould b e to define a lo cal p olynomial conditional density estimator b y replacing T i in Equation (6) with a kernel in y , i.e. h − 1 y K (( Y i − y ) /h y ) . Putting everything together, if we plug-in all of our estimators in to Equation (5) then squaring and numerically integrating ov er (0 , 1) yields an estimate of Ψ ′ C . Inference for the W asserstein deriv ative at the p olicy kink follo ws largely in the same manner as discussed in Section 3.4 ; the only ma jor difference is the scaling. As discussed in Calonico et al. ( 2014 ); Card et al. ( 2015 ), estimating a deriv ative at a b oundary introduces an additional 1 /h scaling, so its v ariance now scales as ( nh ) − 1 ( h 2 ) − 1 = ( nh 3 ) − 1 . Consequen tly , if w e wan ted to construct a confidence in terv al for Ψ ′ w e simply need to correct this scaling. F ollowing Equation (3) , we can obtain an analogous in terv al of C ′′ n = " ( b Ψ ′ C ) 2 ± z 1 − α/ 2 r b s 2 n + c 2 nh 3 # where b s n is the estimated standard deviation of ( b Ψ ′ C ) 2 , z 1 − α/ 2 is the 1 − α/ 2 quantile of a standard Normal distribution, and c is some constant, suc h as V ( Y ) . 4.3 In terpretation of the Kinked W asserstein Effect In terpretation of the W asserstein deriv ative at a policy kink follo ws analogously to the in ter- pretation established in Section 3.2 for discon tinuit y designs. T o see this, define the quan tile effect curve at the kink by ∆ Q ′ ( u ) = ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) . Then, again we can immediately see that τ ′ = R 1 0 ∆ Q ′ ( u ) du and (Ψ ′ ) 2 = R 1 0 [∆ Q ′ ( u )] 2 du , so letting U ∼ Uniform(0 , 1) we again obtain the same v ariance decomp osition (Ψ ′ ) 2 = ( τ ′ ) 2 + V (∆ Q ′ ( U )) . Th us, Ψ ′ captures b oth the mean drift through the kink (as measured by τ ′ ) and the het- erogeneit y of the treatment effect across quan tiles (as measured by V (∆ Q ′ ( U )) ). Similarly , b y applying the Cauc hy-Sc hw arz inequality we can obtain the kink analogue of Theorem 1 , | τ ′ | ≤ Ψ ′ . Equality holds if and only if the marginal effect is purely additiv e, or equiv a- len tly that the quantile effect curve is flat, i.e. ∆ Q ′ ( u ) = δ for all u ∈ (0 , 1) . Finally , it is p ossible to obtain an analogous version of Theorem 2 for the kinked W asserstein effect. Let λ k ( x ) = R 1 0 Q Y | X ( u | x ) P ∗ k − 1 ( u ) du be the conditional L -moment ev aluated at x and define its one-sided deriv ativ es as λ ′ k ( x ± 0 ) = Z 1 0 ∂ ∂ x Q Y | X ( u | x ± 0 ) P ∗ k − 1 ( u ) du. Then, following the same argumen ts as in the pro of of Theorem 2 , it can b e shown that the W asserstein deriv ative at the kink may b e decomp osed into deriv atives of L -momen ts. W e formalize this decomposition in the follo wing theorem. Theorem 5 ( L -Moment Deriv ative Decomp osition) . Supp ose that R 1 0 [∆ Q ′ ( u )] 2 du < ∞ . Then, Ψ ′ C = ( ∞ X k =1 (2 k − 1) λ ′ k ( x + 0 ) − λ ′ k ( x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) 2 ) 1 / 2 . 28 No w, each k in the series represen tation of Ψ ′ C represen ts the instantaneous c hange in L - lo cation, L -scale, L -skewness, etc. at the kink. With these representations and interpretations established, in the next section w e analyze real data sets to see how these metho ds can b e implemen ted in practice. 5 Sim ulations and Data Analysis In this section we consider the practical implemen tation of distributional discon tinuit y designs and distributional kink designs. First, we compare the empirical cov erage and interv al width of the tw o confidence interv als prop osed in Section 3.4.2 . Next, we re-analyze t wo natural exp erimen ts: one regression discontin uity design and one regression kink design. Our goal is to compare traditional mean-based effects to our prop osed distributional effects. 5.1 Sim ulations In what follows we conduct a simulation study to compare the empirical widths and cov erage of the tw o conserv ative confidence in terv als for Ψ described in Section 3.4.2 . W e consider three data generating pro cesses, all of which feature a running v ariable drawn X i ∼ Uniform( − 1 , 1) , treatmen t sharply assigned such that A i = I ( X i ≥ 0) , and the function m ( x ) = 0 . 5 x + x 2 : ( i ) A dditive Effe ct: Let Y i = m ( X i ) + τ A i + ε i with ε i ∼ N (0 , 1) . ( ii ) Differing V arianc es: Let Y i = m ( X i ) + σ ( A i ) ε i with σ (0) = 1 and σ (1) = 2 . ( iii ) He avy T aile d: Let Y i = m ( X i ) + τ A i + (1 + 0 . 3 | X i | )(1 + 0 . 6 A i )(exp( ε i ) − exp(1 / 2)) . The three data generating pro cesses are chosen to represen t increasingly rich forms of treat- men t heterogeneit y . Setting ( i ) is a simple additiv e treatmen t effect mo del where treatment shifts the conditional distribution b y a constant τ = 1 / 2 at every quantile. Setting ( ii ) lea ves the mean unchanged at the cutoff, but doubles the standard deviation. Finally , in setting ( iii ) w e introduce errors that are sk ew ed and hea vy tailed via exp( ε i ) − exp(1 / 2) ; furthermore, we in tro duce the factors (1 + 0 . 3 | X i | ) and (1 + 0 . 6 A i ) to induce heterosk edasticity in the running v ariable and an explicitly non-additiv e treatment effect. In the sim ulation w e consider n ∈ { 10 3 , 10 4 , 10 5 , 10 6 } , which corresp onds to one-sided within-bandwidth sample sizes of roughly 185, 1185, 7500, and 47300. W e also consider differen t truncation profiles for Ψ , where instead of in tegrating o ver the full quan tile grid u ∈ (0 , 1) we consider u ∈ ( γ , 1 − γ ) for γ ∈ { 0 . 05 , 0 . 1 , 0 . 25 } . W e find that for small to mo dest sample sizes some degree of truncation is useful for numerical stability . F or each replication and choice of n , w e estimate Ψ using the bias-corrected local p olynomial procedure describ ed in Section 3.3 . W e use the default bandwidth rule h n ∝ n − 1 / 5 , 1,000 bo otstrap replications, and 10,000 o v erall Mon te-Carlo simulations. Our simulation results are visualized in Figure 5 . Broadly sp eaking, we find that for small sample sizes, the conserv ativ e interv als defined in Equation (3) are an order of magnitude smaller than the bo otstrap in terv als; as the sample size increases this difference b ecomes less pronounced, but do es not go a wa y . This is likely b ecause, as outlined in Chiang et al. ( 2019 ), the b o otstrap interv als estimate b oth the running v ariable density at the cutoff as w ell as conditional outcome densities ev aluated at estimated quantiles; b oth can b e unstable with small sample sizes. F urthermore, because Chiang et al. ( 2019 ) define a uniform confidence band ov er u , if an y quantiles are p o orly estimated then the bands can blow up in width. 29 10 3 10 4 10 5 10 6 10 − 2 10 4 10 10 In terv al width ( i ) Additiv e 10 3 10 4 10 5 10 6 10 − 2 10 5 10 12 n ( ii ) Diff. V ariances 10 3 10 4 10 5 10 6 10 − 2 10 10 10 22 ( iii ) Heavy 10 3 10 4 10 5 10 6 0 . 8 0 . 85 0 . 9 0 . 95 1 Co verage 10 3 10 4 10 5 10 6 0 . 8 0 . 85 0 . 9 0 . 95 1 n 10 3 10 4 10 5 10 6 0 . 8 0 . 85 0 . 9 0 . 95 1 γ = 0 . 05 γ = 0 . 10 γ = 0 . 25 Figure 5: Mon te Carlo confidence in terv al widths and cov erage for b o otstrap in terv als (dashed) and simple in terv als (solid) across trimming lev els γ and sample sizes n . F urthermore, we can see that while cov erage is theoretically conserv ativ e for b oth metho ds, as the sample size increased, the conserv ative interv als attained approximate 1 − α cov erage; mean while, the b o otstrap interv als alwa ys ov ercov ered. 5.2 Distributional Discon tinuit y Design Analysis In this section we re-analyze a canonical regression discontin uity design analysis in order to compare the W asserstein effect to the conv entional mean effect at the cutoff. W e consider the work of Lee ( 2008 ), who studied the causal effect of electoral incumbency in U.S. house elections b y leveraging the idea that elections decided b y very small margins are “as go o d as randomized;” their data is publicly a v ailable via the R pac k age RDHonest . The running v ariable is the Demo cratic vote share margin of victory in a giv en election (defined by the Demo cratic v ote share minus the v ote share of the strongest opp onen t), with a corresp onding discontin uity at zero. The primary outcome is the Demo cratic party’s vote share in the subsequen t election. Empirically , Lee ( 2008 ) finds clear evidence of an incum b ency adv antage, where barely winning an election leads to a statistically significan t jump in the follo wing election, to the tune of a 7-8% increase in v ote share. In what follows, w e consider whether or not there were in teresting distributional effects not visible b y considering the mean alone. In our re-analysis, to keep things simple w e c ho ose p = 1 for our lo cal p olynomial estimator, a triangular kernel, and w e set h = n − 1 / 5 . Under these settings, using the rdrobust pac k age w e estimate the mean effect to b e a 7.099 increase in v ote share with a 95% confidence interv al of [2.648 , 8.038] — these results replicate the findings of Lee ( 2008 ). Next, we consider the 30 W asserstein effect without truncating quantiles: we obtain an estimate for Ψ of 7.544 with a 95% confidence in terv al of [5.023, 9.412]. The fact that b τ and b Ψ are so close to each other suggests there was not muc h heterogeneit y in the treatment effect. Indeed, if w e break down b Ψ into the distributional R 2 table as sho wn in T able 2 , Moment Explaine d Distanc e k = 1 0.5598 k = 2 0.0413 k = 3 0.1118 k ≥ 4 0.2871 T able 2: Explained distributional v ariation for the incum b ency adv an tage w e can see that most of the effect is explained b y the v ariation in the L -lo cation, with some notable changes as w ell in L -sk ewness and higher-order decomp ositions. Note that as sho wn in Equation (1) , because λ 1 is the mean, it follo ws that R 2 1 = τ 2 Ψ 2 = 1 − γ , where γ is the heterogeneity index discussed in Section 3.2.1 . Th us, if we w ere to plug in each estimate, w e’d find 7 . 099 2 / 7 . 544 2 ≈ 0 . 886 as the explained distributional distance coming from the first momen t. The gap b etw een 0.5598 and 0.886 is lik ely due to finite-sample differences in mean vs quantile effect estimation. Our findings are further v alidated b y considering the estimated W asserstein dominance. Here, w e find b ρ = 0 . 5777 , suggesting that winning a close election was prett y uniformly beneficial, with little quan tile crossing. By com bining the tra- ditional mean effects analysis with our distributional analysis, we w ere able to obtain a muc h more complete understanding of the causal effect of the incum b ency adv antage. 5.3 Distributional Kink Design In this section we re-analyze an existing regression kink design analysis in order to compare the W asserstein deriv ativ e to the mean-effect at the kink. Sp ecifically , w e consider the work of Lundqvist et al. ( 2014 ) who study whether general in tergov ernmental grants increase local public employmen t using kno wn kinks in the Swedish gran t system; the data used is publicly a v ailable via the R pack age causalweight . The running v ariable is the net out-migration rate in a giv en m unicipality , m it = 100(1 − n i,t − 2 /n i,t − 12 ) where n i,t is the p opulation in the i th m unicipality at time t . That is to say , the p ercentage p opulation decrease o ver a ten-y ear windo w with a tw o-year lag. The policy rule for out-migration gran ts is giv en b y g m it = ( a ( m it − 2) , m it > 2 , 0 , m it ≤ 2 , where the kink is at 2% and a is a constan t (100 Swedish krona p er capita p er additional p ercen tage p oin t ab o ve 2% ). In their analysis Lundqvist et al. ( 2014 ) find no statistically significan t effect of grants on total lo cal public employmen t, making their study a goo d p oint of comparison to distributional effects that consider more than just the mean. In our re-analysis, w e consider h ∈ { 5 , 10 , 15 } and a uniform k ernel to match the analysis of Lundqvist et al. ( 2014 ); we rep ort the h = 10 results, although they are all qualitatively similar. F urthermore, we demean the outcome and b enefit by y ear and cluster our standard 31 errors at the municipalit y . Using the rdrobust pack age w e estimate the lo cal av erage slop e b τ ′ C to b e -0.050, with a 95% confidence interv al of [ − 0 . 378 , 0 . 277] , matching the n ull effect found in Lundqvist et al. ( 2014 ). Our estimated v alue for the W asserstein deriv ativ e b Ψ ′ C is 0.6713, with a 95% confidence interv al of [0 . 000 , 1 . 432] , indicating a null distributional effect. How ever, we do find an interesting characterization of the effect in the L -moment decomp osition, as shown in T able 3 . It app ears that most of the distance explained in b Ψ ′ C comes from higher-order momen ts; notably , almost none comes from the mean effect. This may suggest there is more to the story worth lo oking in to: p erhaps there were a few outlier m unicipalities that used their grants extremely well (or p o orly). Practitioners may then consider lo oking into targeted h yp othesis tests on specific L -momen ts to further explore whether an effect exists at these lev els. Moment Explaine d Distanc e k = 1 0.0007 k = 2 0.1750 k = 3 0.1362 k ≥ 4 0.6881 T able 3: Explained distributional v ariation for the gran t effect. 6 Discussion and Conclusion In this pap er w e introduced distributional discontin uity designs and distributional kink designs, a framew ork for studying distributional causal effects for a scalar outcome at the boundary of a discon tinuit y or kink in treatmen t assignment. A k ey practical motiv ation for this approach is that many applied regression discon tinuit y and kink analyses remain cen tered on mean effects, despite the fact that distributional changes are often of substan tive in terest. How ever, it is not our inten tion to replace these classical to ols; rather, w e show that distributional causal effects pla y a complementary role. The W asserstein effect establishes a natural reference p oint for b oth mean and quantiles effects. Since we sho w that Ψ w eakly upp er b ounds the magnitude of the a verage treatment effect, practitioners now hav e an interpretable index of treatment effect heterogeneity whenever Ψ is meaningfully larger than | τ | . F urthermore, we show that the W asserstein distance admits an orthogonal decomp osition in to squared differences in L - momen ts. In practice, this decomp osition pro vides a principled wa y to answ er questions like “is the effect mostly a shift in the distribution means, or is it driv en b y c hanges in disp ersion, asymmetry , or tail b ehavior?” Although this work primarily fo cuses on regression discontin uity and kink designs, the principles outlined here extend b eyond these sp ecific applications. One could easily estimate the W asserstein effect in a randomized controlled trial, for example, or an y setting where the exchangeabilit y assumption holds. The exact same in terpretations and decomp ositions describ ed in Section 3.2 will still hold, all that would change is the wa y quantile effects are estimated. Th us, we see this work as opening the do or to ward considering distributional distances as in teresting causal effects in their o wn right. It w ould b e interesting to extend this distributional analysis more broadly to other quasi-experimental designs, suc h the difference- in-differences framework. Finally , w e note that although this w ork establishes a nov el framew ork for distributional causal effects at a treatment discontin uity , we only consider univ ariate outcomes. F uture 32 w ork could establish metho ds for estimating the W asserstein distance b et ween multiv ariate outcome distributions, where iden tification and estimation no longer reduces to a distance b et ween quantile functions. F urthermore, it would be interesting to consider distributional distances beyond the W asserstein distance; p erhaps other distributional measures capture other underlying phenomena in the data, and admit their o wn useful decomp ositions. A c kno wledgmen ts This pap er is a pro duct of the Iow a Agriculture and Home Economics Exp erimen t Station, Ames, Iow a. Pro ject No. IOW03717 is supp orted by USDA/NIF A and State of Io wa funds. An y opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the U.S. Department of Agriculture. The authors would lik e to sincerely thank Emileigh Harrison and T erence Chau for man y helpful discussions and Zhaoh ua Zeng for writing R code to estimate quan tile treatment effects. References L. Ambrosio, N. Gigli, and G. Sa v aré. Gr adient Flows . Lectures in Mathematics. ETH Zürich. Birkhäuser Basel, 1 edition, 2005. ISBN 978-3-7643-7309-2. doi: 10.1007/b137080. M. Ando. How m uch should we trust regression-kink-design estimates? Empiric al Ec onomics , 53(3):1287–1322, Nov ember 2017. doi: 10.1007/s00181- 016- 1155- 8. URL https://ideas. repec.org/a/spr/empeco/v53y2017i3d10.1007_s00181- 016- 1155- 8.html . J. D. Angrist, G. W. Im b ens, and D. B. Rubin. Identification of causal effects using instru- men tal v ariables. Journal of the Americ an Statistic al Asso ciation , 91(434):444–455, 1996. doi: 10.1080/01621459.1996.10476902. URL https://www.tandfonline.com/doi/abs/10. 1080/01621459.1996.10476902 . S. Calonico, M. D. Cattaneo, and R. Titiunik. Robust nonparametric confidence interv als for regression-discontin uit y designs. Ec onometric a , 82(6):2295–2326, 2014. doi: h ttps: //doi.org/10.3982/ECT A11757. URL https://onlinelibrary.wiley.com/doi/abs/10. 3982/ECTA11757 . S. Calonico, M. D. Cattaneo, and M. H. F arrell. Optimal bandwidth choice for robust bias- corrected inference in regression discon tinuit y designs. The Ec onometrics Journal , 23(2): 192–210, 11 2019a. ISSN 1368-4221. doi: 10.1093/ectj/utz022. URL https://doi.org/10. 1093/ectj/utz022 . S. Calonico, M. D. Cattaneo, M. H. F arrell, and R. Titiunik. Regression discontin uit y designs using cov ariates. The R eview of Ec onomics and Statistics , 101(3):442–451, July 2019b. doi: None. URL https://ideas.repec.org/a/tpr/restat/v101y2019i3p442- 451.html . D. Card, D. S. Lee, Z. Pei, and A. W eb er. Inference on causal effects in a generalized re- gression kink design. Ec onometric a , 83(6):2453–2483, 2015. doi: https://doi.org/10.3982/ ECT A11224. URL https://onlinelibrary.wiley.com/doi/abs/10.3982/ECTA11224 . D. Card, D. S. Lee, Z. P ei, and A. W eb er. Regression kink design: Theory and practice. W orking P ap er 22781, National Bureau of Economic Researc h, Octob er 2016. URL http: //www.nber.org/papers/w22781 . 33 M. D. Cattaneo and R. Titiunik. Regression discontin uity designs. Annual R eview of Ec onomics , 14(V olume 14, 2022):821–851, 2022. ISSN 1941-1391. doi: h ttps://doi. org/10.1146/ann urev- economics- 051520- 021409. URL https://www.annualreviews.org/ content/journals/10.1146/annurev- economics- 051520- 021409 . M. D. Cattaneo, M. Jansson, and X. Ma. Simple lo cal p olynomial density estimators. Journal of the Americ an Statistic al Asso ciation , 115(531):1449–1455, 2020. doi: 10.1080/01621459. 2019.1635480. URL https://doi.org/10.1080/01621459.2019.1635480 . H. Chen, H. D. Chiang, and Y. Sasaki. Quantile treatment effects in regression kink designs. Ec onometric The ory , 36(6):1167–1191, 2020. doi: 10.1017/S0266466619000409. H. D. Chiang and Y. Sasaki. Causal inference by quantile regression kink designs. Jour- nal of Ec onometrics , 210(2):405–433, 2019. ISSN 0304-4076. doi: https://doi.org/10. 1016/j.jeconom.2019.02.005. URL https://www.sciencedirect.com/science/article/ pii/S0304407619300387 . H. D. Chiang, Y.-C. Hsu, and Y. Sasaki. Robust uniform inference for quantile treatmen t effects in regression discontin uit y designs. Journal of Ec onometrics , 211(2):589–618, 2019. ISSN 0304-4076. doi: https://doi.org/10.1016/j.jeconom.2019.03.006. URL https://www. sciencedirect.com/science/article/pii/S0304407619300569 . J. B. Con w ay . A Course in F unctional Analysis , volume 96 of Gr aduate T exts in Math- ematics . Springer, New Y ork, NY, 2 edition, 1990. ISBN 978-0-387-97245-9. doi: 10.1007/978- 1- 4757- 4383- 8. T. D. Cook. “w aiting for life to arriv e”: A history of the regression-discontin uity design in psychology , statistics and economics. Journal of Ec onometrics , 142(2):636–654, 2008. ISSN 0304-4076. doi: https://doi.org/10.1016/j.jeconom.2007.05.002. URL https://www. sciencedirect.com/science/article/pii/S0304407607001108 . The regression discon ti- n uity design: Theory and applications. M. Dahlb erg, E. Mörk, J. Rattsø, and H. Ågren. Using a discontin uous gran t rule to identify the effect of grants on lo cal taxes and spending. Journal of Public Ec onomics , 92(12):2320–2335, 2008. ISSN 0047-2727. doi: https://doi.org/10.1016/j.jpubeco.2007.05.004. URL https: //www.sciencedirect.com/science/article/pii/S0047272707000886 . New Directions in Fiscal F ederalism. D. V. Dijck e. Regression discon tin uity design with distribution-v alued outcomes, 2025. URL https://arxiv.org/abs/2504.03992 . B. R. F randsen, M. F rölic h, and B. Melly . Quan tile treatmen t effects in the regression dis- con tinuit y design. Journal of Ec onometrics , 168(2):382–395, 2012. ISSN 0304-4076. doi: h ttps://doi.org/10.1016/j.jeconom.2012.02.004. URL https://www.sciencedirect.com/ science/article/pii/S0304407612000607 . M. F rölich and M. Hub er. Including cov ariates in the regression discon tinuit y design. Journal of Business & Ec onomic Statistics , 37(4):736–748, 2019. doi: 10.1080/07350015.2017.1421544. URL https://doi.org/10.1080/07350015.2017.1421544 . P . Ganong and S. Jäger. A p ermutation test for the regression kink design. Journal of the A meric an Statistic al Asso ciation , 113(522):494–504, 2018. doi: 10.1080/01621459.2017. 1328356. URL https://doi.org/10.1080/01621459.2017.1328356 . 34 A. Gretton, K. M. Borgw ardt, M. J. Rasc h, B. Schölk opf, and A. Smola. A kernel tw o- sample test. Journal of Machine L e arning R ese ar ch , 13(25):723–773, 2012. URL http: //jmlr.org/papers/v13/gretton12a.html . F. F. Gunsilius. Distributional synthetic con trols. Ec onometric a , 91(3):1105–1117, 2023. doi: https://doi.org/10.3982/ECT A18260. URL https://onlinelibrary.wiley.com/doi/ abs/10.3982/ECTA18260 . F. F. Gunsilius. A primer on optimal transp ort for causal inference with observ ational data, 2025. URL . J. Gury an. Does money matter? regression-discontin uity estimates from education finance reform in massach usetts. W orking P ap er 8269, National Bureau of Economic Researc h, Ma y 2001. URL http://www.nber.org/papers/w8269 . J. Hahn, P . T o dd, and W. V. der Klaau w. Iden tification and estimation of treatmen t effects with a regression-discon tinuit y design. Ec onometric a , 69(1):201–209, 2001. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/2692190 . S. Ho derlein and E. Mammen. Identification of marginal effects in nonseparable mo dels with- out monotonicity . Ec onometric a , 75(5):1513–1518, 2007. doi: h ttps://doi.org/10.1111/j. 1468- 0262.2007.00801.x. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/j. 1468- 0262.2007.00801.x . S. Hoderlein and E. Mammen. Identification and estimation of lo cal a v erage deriv atives in non- separable models without monotonicit y . The Ec onometrics Journal , 12(1):1–25, 2009. doi: h ttps://doi.org/10.1111/j.1368- 423X.2008.00273.x. URL https://onlinelibrary.wiley. com/doi/abs/10.1111/j.1368- 423X.2008.00273.x . J. R. M. Hosking. L-moments: Analysis and estimation of distributions using linear combina- tions of order statistics. Journal of the R oyal Statistic al So ciety. Series B (Metho dolo gic al) , 52(1):105–124, 1990. ISSN 00359246. URL http://www.jstor.org/stable/2345653 . G. Imbens and K. Kaly anaraman. Optimal bandwidth c hoice for the regression discon tin uity estimator. The R eview of Ec onomic Studies , 79(3):933–959, 2012. ISSN 00346527, 1467937X. URL http://www.jstor.org/stable/23261375 . G. W. Imbens and T. Lemieux. Regression discon tin uity designs: A guide to practice. Jour- nal of Ec onometrics , 142(2):615–635, 2008. ISSN 0304-4076. doi: https://doi.org/10. 1016/j.jeconom.2007.05.001. URL https://www.sciencedirect.com/science/article/ pii/S0304407607001091 . The regression discontin uity design: Theory and applications. S. Janson. Wiener chaos , page 17–22. Cambridge T racts in Mathematics. Cambridge Univ er- sit y Press, 1997. Z. Jin, Y. Zhang, Z. Zhang, and Y. Zhou. Identification and inference in a quantile regression discon tinuit y design under rank similarit y with co v ariates. Ec onometric The ory , 41(1): 172–217, 2025. doi: 10.1017/S026646662300021X. K. Karh unen. Zur sp ektraltheorie sto chastisc her prozesse. 1946. URL https://api. semanticscholar.org/CorpusID:118738283 . 35 K. Kim, J. Kim, and E. H. Kennedy . Causal effects based on distributional distances, 2024. URL . D. Kurisu, Y. Zhou, T. Otsu, and H.-G. Müller. Geo desic causal inference, 2025. URL https://arxiv.org/abs/2406.19604 . D. S. Lee. Randomized exp erimen ts from non-random selection in u.s. house elections. Journal of Ec onometrics , 142(2):675–697, F ebruary 2008. doi: None. URL https://ideas.repec. org/a/eee/econom/v142y2008i2p675- 697.html . D. S. Lee and T. Lemieux. Regression discon tinuit y designs in economics. Journal of Ec onomic Liter atur e , 48(2):281–355, June 2010. doi: 10.1257/jel.48.2.281. URL https://www.aeaweb. org/articles?id=10.1257/jel.48.2.281 . M. Lo ève. Pr ob ability The ory I , v olume 45 of Gr aduate T exts in Mathematics . Springer, New Y ork, NY, 4 edition, 1977. ISBN 978-1-4684-9464-8. doi: 10.1007/978- 1- 4684- 9464- 8. A. Luedtke, M. Carone, and M. J. v an der Laan. An omnibus non-parametric test of equalit y in distribution for unkno wn functions. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 81(1):75–99, 11 2018. ISSN 1369-7412. doi: 10.1111/rssb.12299. URL https://doi.org/10.1111/rssb.12299 . H. Lundqvist, M. Dahlb erg, and E. Mörk. Stim ulating lo cal public employmen t: Do general gran ts work? Americ an Ec onomic Journal: Ec onomic Policy , 6(1):167–192, 2014. ISSN 19457731, 1945774X. URL http://www.jstor.org/stable/43189370 . J. McCrary . Manipulation of the running v ariable in the regression discon tin uity design: A densit y test. Journal of Ec onometrics , 142(2):698–714, 2008. ISSN 0304-4076. doi: h ttps://doi.org/10.1016/j.jeconom.2007.05.005. URL https://www.sciencedirect.com/ science/article/pii/S0304407607001133 . The regression discontin uity design: Theory and applications. H. S. Nielsen, T. Sørensen, and C. T ab er. Estimating the effect of studen t aid on college enrollmen t: Evidence from a gov ernment grant p olicy reform. Americ an Ec onomic Journal: Ec onomic Policy , 2(2):185–215, 2010. Z. Qu and J. Y o on. Nonparametric estimation and inference on conditional quan tile pro cesses. Journal of Ec onometrics , 185(1):1–19, 2015. ISSN 0304-4076. doi: https://doi.org/10. 1016/j.jeconom.2014.10.008. URL https://www.sciencedirect.com/science/article/ pii/S0304407614002462 . Z. Qu and J. Y o on. Uniform inference on quan tile effects under sharp regression discontin uity designs. Journal of Business & Ec onomic Statistics , 37(4):625–647, 2019. doi: 10.1080/ 07350015.2017.1407323. URL https://doi.org/10.1080/07350015.2017.1407323 . K. Schindl and L. W asserman. Causal geo desy: Counterfactual estimation along the path b et ween correlation and causation, 2025. URL . D. Sejdinovic, B. Srip erumbudur, A. Gretton, and K. F ukumizu. Equiv alence of distance- based and RKHS-based statistics in h yp othesis testing. The Annals of Statistics , 41(5):2263 – 2291, 2013. doi: 10.1214/13- A OS1140. URL https://doi.org/10.1214/13- AOS1140 . 36 G. P . Sillitto. Deriv ation of approximan ts to the inv erse distribution function of a con tin uous univ ariate p opulation from the order statistics of a sample. Biometrika , 56(3):641–650, 12 1969. ISSN 0006-3444. doi: 10.1093/biomet/56.3.641. URL https://doi.org/10.1093/ biomet/56.3.641 . D. L. Thistleth waite and D. T. Campb ell. Regression-discontin uit y analysis: An alternative to the ex p ost facto exp erimen t. Journal of Educ ational psycholo gy , 51(6):309, 1960. W. T orous, F. Gunsilius, and P . Rigollet. An optimal transp ort approach to estimating causal effects via nonlinear difference-in-differences. Journal of Causal Infer enc e , 12(1):20230004, 2024. doi: doi:10.1515/jci- 2023- 0004. URL https://doi.org/10.1515/jci- 2023- 0004 . A. W. v. d. V aart. Asymptotic Statistics . Cam bridge Series in Statistical and Probabilistic Mathematics. Cambridge Universit y Press, 1998. S. S. V allender. Calculation of the wasserstein distance b et ween probability distributions on the line. The ory of Pr ob ability & Its Applic ations , 18(4):784–786, 1974. doi: 10.1137/1118101. URL https://doi.org/10.1137/1118101 . I. V erdinelli and L. W asserman. Decorrelated v ariable imp ortance. Journal of Machine L e arn- ing R ese ar ch , 25(7):1–27, 2024. URL http://jmlr.org/papers/v25/22- 0801.html . C. Villani et al. Optimal tr ansp ort: old and new , volume 338. Springer, 2009. Z. W ang and Z. Zhang. A unified framew ork for identification and inference of lo cal treatment effects in sharp regression kink designs, 2025. URL . K. Y u and M. C. Jones. Local linear quan tile regression. Journal of the Americ an Statistic al Asso ciation , 93(441):228–237, 1998. doi: 10.1080/01621459.1998.10474104. URL https: //www.tandfonline.com/doi/abs/10.1080/01621459.1998.10474104 . Y. Zhou, D. Kurisu, T. Otsu, and H.-G. Müller. Geo desic difference-in-differences, 2025. URL https://arxiv.org/abs/2501.17436 . SUPPLEMENT AR Y MA TERIAL Section A : Con tains all proofs from the main text and supplemen tary material, including: Section A.1 : Pro of of Theorem 1 . Section A.2 : Pro of of Theorem 2 . Section A.3 : Pro of of Theorem 3 . Section A.4 : Pro of of Corollary 1 . Section A.5 : Pro of of Prop osition 1 . Section A.6 : Pro of of Lemma 2 . Section A.7 : Pro of of Lemma 3 . Section A.8 : Pro of of Theorem 4 . Section A.9 : Pro of of Theorem 5 . 37 A Pro ofs A.1 Pro of of Theorem 1 Pr o of: First, observe that w e may write the av erage treatmen t effect at the cutoff, τ , in terms of quantile functions, i.e., τ = Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) du where Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } . Then, w e immediately obtain our desired inequalit y by applying the Cauch y-Sc hw arz inequalit y , | τ | = Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) du ≤ Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) 2 du 1 / 2 Z 1 0 1 2 du 1 / 2 = Ψ . Next, we show that | τ | = Ψ under an additive treatment effect Q 1 ( u ) = Q 0 ( u ) + δ , where the quan tiles of the limiting counterfactual distributions ab o ve and b elo w the cutoff only differ by a translation in δ . Immediately , this yields τ = Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) du = Z 1 0 δ du = δ and furthermore that Ψ = Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) 2 du 1 / 2 = Z 1 0 δ 2 du 1 / 2 = | δ | . This prov es one direction, i.e. that under an additive treatmen t effect then | τ | = Ψ . One easy w ay to pro ve the other direction is to consider the additive decomp osition of the W asserstein distance, Ψ 2 = τ 2 + V (∆ Q ( U )) . If Ψ = | τ | , this implies that V (∆ Q ( U )) = 0 , and therefore that the quan tile treatment effect ∆ Q ( U ) is constan t. A.2 Pro of of Theorem 2 Pr o of: First, recall by Lemma 1 that the W asserstein effect is identified as Ψ 2 = Z 1 0 ( Q 1 ( u ) − Q 0 ( u )) 2 du where Q 1 ( u ) = inf { y : lim x ↓ x 0 F Y | X ( y | x ) ≥ u } and Q 0 ( u ) = inf { y : lim x ↑ x 0 F Y | X ( y | x ) ≥ u } . Next, let { P ∗ k } ∞ k =0 b e the orthogonal basis of L 2 (0 , 1) defined by the shifted Legendre p olynomials such that the k th shifted Legendre p olynomial is defined as P ∗ k ( x ) = ( − 1) k k X j =0 k j k + j j ( − x ) j . F or a ∈ { 0 , 1 } w e define the L -moments under the limiting quan tiles Q 1 ( u ) and Q 0 ( u ) to be λ ( a ) k = Z 1 0 Q a ( u ) P ∗ k − 1 ( u ) du. 38 F rom here, under the assumption that P a | x ∈ P 2 ( R ) for a ∈ { 0 , 1 } , by Hosking ( 1990 ) and Sillitto ( 1969 ) it follows that Q a ( u ) = ∞ X k =1 (2 k − 1) λ ( a ) k P ∗ k − 1 ( u ) . and consequently , f ( u ) = Q 1 ( u ) − Q 0 ( u ) = ∞ X k =1 (2 k − 1) λ (1) k − λ (0) k P ∗ k − 1 ( u ) . No w, let S K ( u ) = P K k =1 (2 k − 1)( λ (1) k − λ (0) k ) P ∗ k − 1 ( u ) b e a partial summation of f , and note that under the mean square conv ergence established by Sillitto ( 1969 ), || f || 2 2 = lim K →∞ || S K ( u ) || 2 2 . By the orthogonalit y of the shifted Legendre p olynomials it follo ws that || S K ( u ) || 2 2 = K X k =1 (2 k − 1) 2 λ (1) k − λ (0) k 2 || P ∗ k − 1 || 2 2 = K X k =1 (2 k − 1) λ (1) k − λ (0) k 2 since by Hosking ( 1990 ) w e kno w that || P ∗ k || 2 2 = (2 k + 1) − 1 . Therefore, taking the limit as K → ∞ w e can see that Ψ 2 = ∞ X k =1 (2 k − 1) λ (1) k − λ (0) k 2 thereb y completing the pro of. A.3 Pro of of Theorem 3 Pr o of: T o b egin, let T n = nh Z 1 0 [∆ b Q ( u )] 2 du b e our test statistic where ∆ b Q ( u ) = b Q 1 ( u ) − b Q 0 ( u ) are the lo cal p olynomial estimators of the conditional quan tile functions describ ed in Section 3.3 and Chiang et al. ( 2019 ). F urthermore, assume the regularity conditions discussed in Section 3.3 and Chiang et al. ( 2019 ) hold. Then, under the n ull h yp othesis, √ nh ∆ b Q ( u ) ⇝ G ( u ) in L 2 ([0 , 1]) where G is a mean-zero Gaussian pro cess with co v ariance k ernel κ ( u, v ) = Cov G ( u ) , G ( v ) . Let { λ k , ϕ k } ∞ k =1 denote the eigenv alue-eigenfunction pairs of the co v ariance op erator induced b y κ , such that { ϕ k } ∞ k =1 forms an orthonormal basis of L 2 ([0 , 1]) . Supp ose that P ∞ k =1 λ k < ∞ 39 and λ 1 > 0 . Then w e ma y apply the Karhunen-Loève theorem ( Karh unen , 1946 ; Lo ève , 1977 ) to expand G ( u ) as G ( u ) = ∞ X k =1 p λ k Z k ϕ k ( u ) where Z k iid ∼ N (0 , 1) for all k . Next, b y the con tinuou s mapping theorem, it follo ws that T n ⇝ T := Z 1 0 [ G ( u )] 2 du = ∞ X k =1 λ k Z 2 k where the final equality holds by an application of Parsev al’s iden tity to { ϕ k } ∞ k =1 . F rom here, let F ( t ) = P ( T ≤ t ) and c α = inf { t ∈ R : F ( t ) ≥ 1 − α } denote the (1 − α ) quantile of T . Note that P ( T = c α ) = 0 . With this mac hinery established, we first consider ho w to deal with truncation of the series P ∞ k =1 λ k Z 2 k . Let T K = K X k =1 λ k Z 2 k . Imp ortan tly , we cannot ac hieve a level- α test for a fixed v alue of K unless it is the case that λ k = 0 for all k > K . T o see this, let c K,α b e the (1 − α ) quan tile under T K . Then, because T K ≤ T almost surely , it follows that c K,α ≤ c α and therefore, P ( T > c K,α ) ≥ P ( T > c α ) . Consequen tly , for a fixed K our tests will b e an ti-conserv ative. Thus, we must let K → ∞ ; consequen tly , w e giv e K an index in n through the remainder of the pro of. Let F K n ( t ) = P ( T K n ≤ t ) and note that since T K n → T almost surely , it follows that for eac h fixed t , I ( T K n ≤ t ) → I ( T ≤ t ) . Th us, b y the dominated conv ergence theorem, for all t , F K n ( t ) → F ( t ) . Clearly , fol- lo wing standard quantile conv ergence argumen ts it then follo ws that c K n ,α → c α as K n → ∞ ( V aart , 1998 ). W e now control the effect of estimating the eigenv alues. Let b κ n denote an estimator of κ such that b λ k,n denotes the corresp onding estimated eigen v alues. Notably , it is imp ortant to relate ho w well κ is estimated to the n umber of terms we include in our truncation T K n . T o that end, supp ose that || b κ n − κ || 2 = o P ( K − 1 / 2 n ) where || κ || 2 2 = R 1 0 R 1 0 κ ( u, v ) 2 du dv is the Hilb ert-Sc hmidt norm. Then, it follo ws that K n X k =1 | b λ k,n − λ k | ≤ p K n K n X k =1 ( b λ k,n − λ k ) 2 ! 1 / 2 ≤ p K n || b κ n − κ || 2 = o P (1) (7) where the first inequality follows by applying Cauc hy-Sc hw arz and the second inequality follo ws b y applying the Hoffman-Wielandt inequality for op erators. No w, define b T K n = P K n k =1 b λ k,n Z 2 k . 40 Our goal here is to show that | b T K n − T K n | = o P (1) . T o do so, let Z n = { ( X i , A i , Y i ) } n i =1 and define the ev en t E n = n E h | b T K n − T K n | | Z n i > δ o for some δ > 0 . Then, note that for any ε > 0 it follo ws that P | b T K n − T K n | > ε = P | b T K n − T K n | > ε, E n + P | b T K n − T K n | > ε, E c n ≤ P E h | b T K n − T K n | | Z n i > δ + P | b T K n − T K n | > ε, E c n ( i ) ≤ P E h | b T K n − T K n | | Z n i > δ + δ ε where ( i ) follows b y applying a conditional Marko v inequalit y . Then, observe by Equation (7) , E h | b T K n − T K n | | Z n i ≤ K n X k =1 | b λ k,n − λ k | E [ Z 2 k ] = o P (1) . Th us, for eac h fixed δ it follo ws that P ( E [ | b T K n − T K n | | Z n ] > δ ) → 0 and consequen tly , lim sup n P | b T K n − T K n | > ε ≤ δ ε . Then, by taking δ → 0 w e can see that P ( | b T K n − T K n | > ε ) → 0 . F rom here, define the conditional distribution function b F n ( t ) = P K n X k =1 b λ k,n Z 2 k ≤ t | Z n ! let b c n,α b e the corresponding conditional (1 − α ) quan tile, and define p n,ε = P | b T K n − T K n | > ε | Z n . Then, on the even t {| b T K n − T K n | ≤ ε } , for an y t ∈ R and ε > 0 it follo ws that F K n ( t − ε ) − p n,ε ≤ b F n ( t ) ≤ F K n ( t + ε ) + p n,ε and consequently , sup t ∈ R | b F n ( t ) − F K n ( t ) | ≤ p n,ε + sup t ∈ R F K n ( t + ε ) − F K n ( t − ε ) . Th us, since we hav e already sho wn F K n ( t ) → F ( t ) for each t (so that sup t | F K n ( t ) − F ( t ) | → 0 b y P ólya’s theorem) and since p n,ε = o P (1) (which follo ws b y applying Marko v’s inequality) it follo ws that after taking ε → 0 , sup t ∈ R | b F n ( t ) − F K n ( t ) | = o P (1) . Then, since F K n is contin uous and strictly increasing at c K n ,α it again follo ws that b c n,α − c K n ,α = o P (1) from standard arguments for con vergence of quan tiles. 41 Finally , w e consider the effect of approximating the critical v alue via Monte-Carlo sim ulation. The argument here is standard. Define the Monte-Carlo draws b T ∗ K n ,b = K n X k =1 b λ k,n Z 2 k,b for b = 1 , . . . , B n and let b c ∗ n,α denote the empirical (1 − α ) quan tile computed from { b T ∗ K n ,b } B n b =1 . Let the Mon te-Carlo empirical distribution function be b F n,B n ( t ) = 1 B n B n X b =1 I ( b T ∗ K n ,b ≤ t ) On the ev ent A n = { b λ 1 ,n > 0 } , the conditional distribution b F n is contin uous, and thus con- tin uous at its quantiles b c n,α . Therefore, applying the Glivenk o-Cantelli theorem conditionally on ( X 1 , . . . , X n ) , it follo ws that as B n → ∞ , sup t ∈ R | b F n,B n ( t ) − b F n ( t ) | → 0 and therefore, b c ∗ n,α − b c n,α = o P (1) . Finally , since b λ 1 ,n → λ 1 > 0 , it follows that P ( A n ) → 1 , so this result holds unconditionally . Putting everything together, it follo ws that under H 0 , lim n →∞ P T n > b c ∗ n,α = P ( T > c α ) = α. A.4 Pro of of Corollary 1 Pr o of: First, recall that w e assume K n ≍ r − 2 / (2 β − 1) n . Thus, we assume there exist constan ts 0 < c 1 ≤ c 2 < ∞ and n 0 suc h that for all n ≥ n 0 , c 1 r − 2 / (2 β − 1) n ≤ K n ≤ c 2 r − 2 / (2 β − 1) n . Next, recall that we assume a p olynomial eigenv alue deca y . That is, there exist constants C λ > 0 and β > 1 such that for all k , λ k ≤ C λ k − β . With this in mind, we proceed with the truncation bias. Observe that for an y fixed K n , X k>K n λ k ≤ C λ X k>K n k − β ( i ) ≤ C λ Z ∞ K n x − β dx = C λ β − 1 K 1 − β n where ( i ) follows since f ( x ) = x − β is a decreasing function. F rom here, it follows that for all n ≥ n 0 C λ β − 1 K 1 − β n ≤ C λ β − 1 c 1 r − 2 / (2 β − 1) n 1 − β = C λ c 1 − β 1 β − 1 r 2( β − 1) / (2 β − 1) n and therefore, X k>K n λ k = O r 2( β − 1) (2 β − 1) n . 42 Next, w e consider the estimation error. Recall that w e assumed || b κ n − κ || 2 = O p ( r n ) for some r n → 0 . Thus, it is clear that √ K n || b κ n − κ || 2 = O p ( √ K n r n ) . Then, it follows that p K n r n ≤ c 2 r − 2 / (2 β − 1) n 1 / 2 r n = √ c 2 r − 1 2 β − 1 n r n = √ c 2 r 2( β − 1) (2 β − 1) n and consequently , p K n || b κ n − κ || 2 = O p r 2( β − 1) (2 β − 1) n thereb y completing the proof. Note: the c hoice K n ≍ r − 2 / (2 β − 1) n can easily b e seen b y noting that the truncation bias scales like K 1 − β n . Thus, if w e set K 1 − β n ≍ p K n r n and solve, we obtain the aforementioned rate. A.5 Pro of of Prop osition 1 Pr o of: T o b egin, let T n = nh R 1 0 [∆ b Q ( u )] 2 du b e our test statistic. Then, recall that under the conditions describ ed in Section 3.3 and Section A.3 it follows that T n ⇝ T = ∞ X k =1 λ k Z 2 k where Z k ∼ N (0 , 1) for all k and λ k are the eigen v alues of the cov ariance kernel κ ( u, v ) . F rom here, observe that E [ T ] = E " ∞ X k =1 λ k Z 2 k # = ∞ X k =1 λ k E Z 2 k = ∞ X k =1 λ k = Z 1 0 κ ( u, u ) du and V ( T ) = V ∞ X k =1 λ k Z 2 k ! = ∞ X k =1 λ 2 k V Z 2 k = 2 ∞ X k =1 λ 2 k = 2 Z 1 0 Z 1 0 κ ( u, v ) 2 du dv . Th us, it follo ws that T has mean µ := R 1 0 κ ( u, u ) du and v ariance σ 2 := 2 R 1 0 R 1 0 κ ( u, v ) 2 du dv . Consequen tly , it follo ws that ( T − µ ) /σ has mean zero and unit v ariance. F urthermore, b y Slutsky’s Theorem it follows that as n → ∞ T n − b µ b σ − → T − µ σ under the assumption that b µ p → µ and b σ p → σ . F rom here, follo wing the one-sided Cheb yshev inequalit y (i.e. Cantelli’s inequalt y) discussed in Luedtk e et al. ( 2018 ), w e note that for any mean zero, unit v ariance random v ariable X and t > 0 , it follows that P ( X ≥ t ) ≤ 1 1 + t 2 . Then, it is easy to see b y the P ortmanteau Theorem, lim sup n →∞ P H 0 ( T n > b c ub n, 1 − α ) ≤ P H 0 T − µ σ > r 1 − α α ! ≤ 1 1 + 1 − α α = α. 43 A.6 Pro of of Lemma 2 Pr o of: T o b egin, let b Ψ 2 n = R 1 0 [∆ b Q ( u )] 2 du . Then, recall that w e define our interv al as C ′ n = " b Ψ 2 n ± z 1 − α/ 2 r b s 2 n + c 2 nh # where b s n is the estimated standard deviation of Ψ 2 , z 1 − α/ 2 is the 1 − α/ 2 quan tile of a standard Normal distribution, and c is some constan t. Then, observe that P Ψ 2 ∈ C ′ n = P b Ψ 2 n − Ψ 2 > z 1 − α/ 2 r b s 2 n + c 2 nh ! ≤ P b Ψ 2 n − Ψ 2 > z 1 − α/ 2 r c 2 nh ! = P b Ψ 2 n − Ψ 2 2 > z 2 1 − α/ 2 c 2 nh ! . F rom here, w e apply Mark ov’s inequalit y to see that P b Ψ 2 n − Ψ 2 2 > z 2 1 − α/ 2 c 2 nh ! ≤ nh z 2 1 − α/ 2 c 2 E h ( b Ψ 2 n − Ψ 2 ) 2 i = nh z 2 1 − α/ 2 c 2 E h b Ψ 2 n − Ψ 2 i 2 + V ( b Ψ 2 n ) = o (1) where the last equality holds under the assumption that E [ b Ψ 2 n − Ψ 2 ] = o (( nh ) − 1 / 2 ) and V ( b Ψ 2 n ) = o (( nh ) − 1 ) . Therefore, as n → ∞ it follo ws that P Ψ 2 ∈ C ′ n → 0 . A.7 Pro of of Lemma 3 Pr o of: The proof of Lemma 3 follows analogously to Lemma 1 of W ang and Zhang ( 2025 ) with tw o mo difications: one, the baseline treatment at the kink is random ( T 0 = b ( x 0 , η ) ) rather than a constan t t 0 = b ( x 0 ) ; and tw o, the p erturbation direction is δ ( ω ( η ) / ∆ B ) rather than the constant shift δ . With that in mind, let F δ ( · ) = F Y δ | X = x 0 ( · ) and F 0 ( · ) = F Y 0 | X = x 0 ( · ) . F urthermore, define h δ ( · ) = ( F δ − F 0 ) /δ . Then, under Assumption 3 , w e ha ve that as δ → 0 , ϕ ( F δ ) − ϕ ( F 0 ) δ = ϕ ′ F 0 (∆ F I d ) + o (1) . where ∆ F I d = lim δ → 0 { h δ } . F rom here, let Y δ = g ( T 0 + δ ( ω ( η ) / ∆ B ) , x 0 , ε ) , Y 0 = g ( T 0 , x 0 , ε ) , and define Z = ω ( η ) ∆ B g 1 ( T 0 , x 0 , ε ) . Then, w e ma y define the remainder term R δ = Y δ − Y 0 − δ Z suc h that Y δ = Y 0 + δ Z + R δ . W e can no w see that h δ ( y ) = F δ ( y ) − F 0 ( y ) δ = 1 δ E I ( Y 0 + δ Z + R δ ≤ y ) − I ( Y 0 ≤ y ) | X = x 0 . 44 Our first goal is to show that the remainder term R δ drops out. T o that end, w e define e h δ ( y ) = 1 δ E [ I ( Y 0 + δ Z ≤ y ) − I ( Y 0 ≤ y ) | X = x 0 ] . Then, observe that h δ ( y ) − e h δ ( y ) ≤ 1 | δ | E | I ( Y 0 + δ Z + R δ ≤ y ) − I ( Y 0 + δ Z ≤ y ) | | X = x 0 . F rom here, define the ev ents U = { Y 0 + δ Z + R δ ≤ y } and V = { Y 0 + δ Z ≤ y } and note that | I ( U ) − I ( V ) | = I ( U △ V ) where U △ V denotes the symmetric difference. Our goal no w is to show the set inclusion U △ V ⊆ {| y − ( Y 0 + δ Z ) | ≤ | R δ |} holds. First, consider the case where R δ ≥ 0 . Then, it is clear that U ⊆ V . F urthermore, the set difference V \ U o ccurs when { Y 0 + δ Z ≤ y } ∩ { Y 0 + δ Z + R δ > y } ⇐ ⇒ { y − R δ < Y 0 + δ Z ≤ y } , and so it follows that | y − ( Y 0 + δ Z ) | ≤ R δ = | R δ | . Next, supp ose that R δ < 0 . Now we ha ve that V ⊆ U and the set difference U \ V o ccurs when { Y 0 + δ Z > y } ∩ { Y 0 + δ Z + R δ ≤ y } ⇐ ⇒ { y < Y 0 + δ Z ≤ y − R δ } . Th us, | y − ( Y 0 + δ Z ) | ≤ − R δ = | R δ | . Putting b oth cases together, it follows that h δ ( y ) − e h δ ( y ) ≤ 1 | δ | E I y − ( Y 0 + δ Z ) ≤ | R δ | | X = x 0 ≤ 1 | δ | P ( | R δ | ≥ c | δ | | X = x 0 ) + P ( | Y 0 + δ Z − y | ≤ c | δ | | X = x 0 ) . where the second inequality follows after fixing some c > 0 and splitting on the ev ent that | R δ | ≥ c | δ | or | R δ | < c | δ | . F rom here, it follows by Assumption 5 ( i ) that as δ → 0 , then 1 | δ | P ( | R δ | ≥ c | δ | | X = x 0 ) = o (1) . In the case of the second term, we no w lev erage Assumption 5 ( ii ) to see that P Y 0 + δ Z − y ≤ c | δ | | X = x 0 = Z Z y − δ z + c | δ | y − δ z − c | δ | f Y 0 ,Z | X ( a, z | x 0 ) da dz ≤ 2 c | δ | Z | ϖ ( z ) | dz , and so, consequen tly , it follo ws that h δ ( y ) − e h δ ( y ) = o (1) + O ( c ) . Then, since the choice of c > 0 w as arbitrary , w e can see that h δ ( y ) − e h δ ( y ) → 0 . Next, we ev aluate the limit of e h δ ( y ) as δ → 0 using the join t densit y of ( Y 0 , Z ) . Recall that e h δ ( y ) = 1 δ P ( Y 0 + δ Z ≤ y | X = x 0 ) − P ( Y 0 ≤ y | X = x 0 ) . 45 Th us, using the identit y I ( U ≤ v ) − I ( U ≤ w ) = I ( w < U ≤ v ) − I ( v < U ≤ w ) with U = Y 0 , v = y − δ Z , and w = y , w e obtain the more con v enient expression e h δ ( y ) = 1 δ P ( y < Y 0 ≤ y − δ Z | X = x 0 ) − P ( y − δ Z < Y 0 ≤ y | X = x 0 ) . (8) Supp ose δ > 0 and let f ( a, z ) = f Y 0 ,Z | X ( a, z | x 0 ) . Since y − δ Z > y requires Z < 0 and y − δ Z < y requires Z > 0 , we can write P ( y < Y 0 ≤ y − δ Z | X = x 0 ) = Z 0 −∞ Z y − δ z y f ( a, z ) da dz , P ( y − δ Z < Y 0 ≤ y | X = x 0 ) = Z ∞ 0 Z y y − δ z f ( a, z ) da dz . Then, applying the change of v ariables u = ( y − a ) /δ (such that a = y − δ u and da = − δ du ) it follows that 1 δ P ( y < Y 0 ≤ y − δ Z | X = x 0 ) = Z 0 −∞ Z 0 z f ( y − δ u, z ) du dz , 1 δ P ( y − δ Z < Y 0 ≤ y | X = x 0 ) = Z ∞ 0 Z z 0 f ( y − δ u, z ) du dz . Substituting b oth in to Equation (8) yields e h δ ( y ) = Z 0 −∞ Z 0 z f ( y − δ u, z ) du dz − Z ∞ 0 Z z 0 f ( y − δ u, z ) du dz . F rom here, contin uity of f ( · , z ) in its first argumen t and the domination condition sp ecified in Assumption 5 ( ii ) implies that lim δ → 0 n e h δ ( y ) o = Z 0 −∞ Z 0 z f ( y , z ) du dz − Z ∞ 0 Z z 0 f ( y , z ) du dz = Z ( − z ) f ( y , z ) dz . Rep eating analogous calculations in the case where δ < 0 yields the same limit. Therefore, the tw o-sided deriv ativ e exists and w e can say that ∆ F I d ( y ) = ∂ ∂ δ F δ ( y ) δ =0 = Z ( − z ) f Y 0 ,Z | X ( y , z | x 0 ) dz = Z ( − z ) f Y | X ( y | x 0 ) f Z | Y ,X ( z | y , x 0 ) dz = − f Y | X ( y | x 0 ) E ω ( η ) ∆ B g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0 whic h is the desired equation, and th us completes the proof. 46 A.8 Pro of of Theorem 4 Pr o of: First, recall by Lemma 3 that the fuzzy lo cal treatmen t effect at the kink admits the represen tation ∆ F ϕ = ϕ ′ F Y | X = x 0 (∆ F I d ( · )) where ∆ F I d ( y ) = − f Y | X ( y | x 0 ) E ω ( η ) ∆ B g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0 suc h that T 0 = b ( x 0 , η ) , ω ( η ) = b ′ ( x + 0 , η ) − b ′ ( x − 0 , η ) , and ∆ B = µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) = 0 . F rom here, our goal is to show that for all y , ∆ F I d ( y ) = FDRKD( y ) where FDRKD( y ) = ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) and µ B ( x ) = E [ b ( x, η ) | X = x ] . T o that end, follo wing W ang and Zhang ( 2025 ) define h ( x, e, u ) = g ( b ( x, u ) , x, e ) so that Y = h ( X , ε, η ) . Then, by Assumption 6 it follo ws that we ma y write the conditional cum ulative distribution function as F Y | X ( y | x ) = Z Z I ( h ( x, e, u ) ≤ y ) f ε,η | X ( e, u | x ) de du. Next, let y b e fixed and consider the decomp osition F Y | X ( y | x 0 + t ) − F Y | X ( y | x 0 ) t = A 1 ,t ( y ) + A 2 ,t ( y ) , where A 1 ,t ( y ) = 1 t Z Z I ( h ( x 0 + t, e, u ) ≤ y ) − I ( h ( x 0 , e, u ) ≤ y ) f ε,η | X ( e, u | x 0 ) de du, A 2 ,t ( y ) = 1 t Z Z I ( h ( x 0 + t, e, u ) ≤ y ) f ε,η | X ( e, u | x 0 + t ) − f ε,η | X ( e, u | x 0 ) de du. Note that A 1 ,t ( y ) is a structural term that holds f ε,η | X ( · | x ) fixed at x 0 , and A 2 ,t ( y ) is a selection term that captures changes in f ε,η | X with x . W e pro ceed with the latter term. Note that by Assumption 6 it follows that lim t → 0 A 2 ,t ( y ) = Z Z lim t → 0 I ( h ( x 0 + t, e, u ) ≤ y ) f ε,η | X ( e, u | x 0 + t ) − f ε,η | X ( e, u | x 0 ) t de du = Z Z I ( h ( x 0 , e, u ) ≤ y ) ∂ ∂ x f ε,η | X ( e, u | x 0 ) de du | {z } := S ( y ) . Note that this limit is the same when considering b oth t ↑ 0 and t ↓ 0 . Next, w e consider A 1 ,t ( y ) . Define the one-sided deriv atives H + = ∂ ∂ x h ( x + 0 , ε, η ) = b ′ ( x + 0 , η ) g 1 ( T 0 , x 0 , ε ) + g 2 ( T 0 , x 0 , ε ) and H − = ∂ ∂ x h ( x − 0 , ε, η ) = b ′ ( x − 0 , η ) g 1 ( T 0 , x 0 , ε ) + g 2 ( T 0 , x 0 , ε ) . Then w e can write h ( x 0 + t, e, u ) in terms of the one-sided limits h ( x 0 + t, e, u ) = Y 0 + tH + + R + t where R + t = Y t − Y 0 − tH + , Y t = h ( x 0 + t, e, u ) , and analogous definitions are giv en for 47 h ( x 0 + t, e, u ) = Y 0 + tH − + R − t . Imp ortan tly , note that the limit as t → 0 of A 1 ,t ( y ) is identical in form to the limit computations done in the proof of Lemma 3 , where tH + pla ys the role of δ Z (similarly , Assumption 7 ( i ) plays an analogous role to Assumption 5 ( i ) and Assumption 7 ( ii ) to Assumption 5 ( ii ) ). This can easily b e seen b y plugging in our decomp ositions of h ( x 0 + t, e, u ) ; observe that A 1 ,t ( y ) = 1 t h P ( h ( x 0 + t, ε, η ) ≤ y | X = x 0 ) − P ( h ( x 0 , ε, η ) ≤ y | X = x 0 ) i = 1 t h P Y 0 ≤ y − tH + − R + t | X = x 0 − P ( Y 0 ≤ y | X = x 0 ) i = 1 t h P y < Y 0 ≤ y − tH + − R + t | X = x 0 − P y − tH + − R + t < Y 0 ≤ y | X = x 0 i . Th us, repeating the same steps as in the pro of of Lemma 3 , it follo ws that lim t ↓ 0 { A 1 ,t ( y ) } = − f Y | X ( y | x 0 ) E H + | Y = y , X = x 0 and lim t ↑ 0 { A 1 ,t ( y ) } = − f Y | X ( y | x 0 ) E H − | Y = y , X = x 0 . Com bining the limits for A 1 ,t and A 2 ,t , we hav e the one-sided deriv ative formulas ∂ ∂ x F Y | X ( y | x + 0 ) = − f Y | X ( y | x 0 ) E H + | Y = y , X = x 0 + S ( y ) , ∂ ∂ x F Y | X ( y | x − 0 ) = − f Y | X ( y | x 0 ) E H − | Y = y , X = x 0 + S ( y ) . Th us, taking the difference, it follows that ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) = − f Y | X ( y | x 0 ) E H + − H − | Y = y , X = x 0 , and furthermore, after plugging in the definitions of H + and H − , H + − H − = b ′ ( x + 0 , η ) − b ′ ( x − 0 , η ) g 1 ( T 0 , x 0 , ε ) = ω ( η ) g 1 ( T 0 , x 0 , ε ) , b ecause the g 2 ( T 0 , x 0 , ε ) term cancels. Hence, for ev ery y , ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) = − f Y | X ( y | x 0 ) E [ ω ( η ) g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0 ] . Finally , we must consider the denominator. Here, w e can again apply the same decomp osition argumen t made b efore. Let µ B ( x ) = Z b ( x, u ) f η | X ( u | x ) du. Then, fix t > 0 suc h that x 0 + t ∈ I x 0 \ { x 0 } . Starting from µ B ( x 0 + t ) − µ B ( x 0 ) t = 1 t Z b ( x 0 + t, u ) f η | X ( u | x 0 + t ) du − 1 t Z b ( x 0 , u ) f η | X ( u | x 0 ) du w e can add and subtract R b ( x 0 + t, u ) f η | X ( u | x 0 ) du to obtain the decomposition: µ B ( x 0 + t ) − µ B ( x 0 ) t = ( 1 t Z b ( x 0 + t, u ) − b ( x 0 , u ) f η | X ( u | x 0 ) du + 1 t Z b ( x 0 + t, u ) f η | X ( u | x 0 + t ) − f η | X ( u | x 0 ) du ) , 48 whic h we term S 1 ,t and S 2 ,t , resp ectively . Thus, following the same arguments as before, under Assumption 7 ( iii ) and Assumption 7 ( iv ) , it can b e shown that the righ t and left deriv ativ es are given by µ ′ B ( x + 0 ) = E b ′ ( x + 0 , η ) | X = x 0 + Z b ( x 0 , u ) ∂ ∂ x f η | X ( u | x 0 ) du and µ ′ B ( x − 0 ) = E b ′ ( x − 0 , η ) | X = x 0 + Z b ( x 0 , u ) ∂ ∂ x f η | X ( u | x 0 ) du. Therefore, taking the difference yields µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) = E b ′ ( x + 0 , η ) − b ′ ( x − 0 , η ) | X = x 0 = E [ ω ( η ) | X = x 0 ] = ∆ B . Putting everything together, for all y , FDRKD( y ) = ∂ ∂ x F Y | X ( y | x + 0 ) − ∂ ∂ x F Y | X ( y | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) = − f Y | X ( y | x 0 ) E ω ( η ) g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0 ∆ B = − f Y | X ( y | x 0 ) E ω ( η ) ∆ B g 1 ( T 0 , x 0 , ε ) | Y = y , X = x 0 = ∆ F I d ( y ) . Therefore, ∆ F ϕ = ϕ ′ F Y | X = x 0 ∆ F I d ( · ) = ϕ ′ F Y | X = x 0 FDRKD( · ) , whic h pro v es the theorem. A.9 Pro of of Theorem 5 Pr o of: T o b egin, let ⟨ f , g ⟩ = R 1 0 f ( u ) g ( u ) du denote the inner pro duct on L 2 (0 , 1) . Next, recall that a complete orthogonal basis { ϕ k } ∞ k =1 in L 2 (0 , 1) admits the generalized F ourier expansion f ( u ) = ∞ X k =1 a k ϕ k ( u ) for every f ∈ L 2 (0 , 1) , where the co efficients are giv en by a k = ⟨ f , ϕ k ⟩ || ϕ k || 2 2 . Next, let { P ∗ k } ∞ k =0 b e the shifted Legendre p olynomials, suc h that P ∗ k ( u ) = P k (2 u − 1) where P k is the k th Legendre p olynomial and P ∗ 0 = 1 . Note that Z 1 0 P ∗ j ( u ) P ∗ k ( u ) du = ( 0 , j = k , 1 2 k +1 , j = k . . 49 No w, recall that we defined ∆ Q ′ ( u ) = ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) . Then, under the assumption that R 1 0 [∆ Q ′ ( u )] 2 du < ∞ it follows that ∆ Q ′ ∈ L 2 (0 , 1) . There- fore, we may apply the generalized F ourier expansion with f = ∆ Q ′ to find ⟨ ∆ Q ′ , P ∗ k − 1 ⟩ = Z 1 0 ∆ Q ′ ( u ) P ∗ k − 1 ( u ) du = 1 ∆ B Z 1 0 ∂ ∂ x Q Y | X ( u | x + 0 ) − ∂ ∂ x Q Y | X ( u | x − 0 ) P ∗ k − 1 ( u ) du = 1 ∆ B λ ′ k ( x + 0 ) − λ ′ k ( x − 0 ) . Th us, since || P ∗ k − 1 || 2 2 = (2( k − 1) + 1) − 1 = (2 k − 1) − 1 , it follo ws that ∆ Q ′ ( u ) = ∞ X k =1 (2 k − 1) λ ′ k ( x + 0 ) − λ ′ k ( x − 0 ) ∆ B P ∗ k − 1 ( u ) . Finally , applying Parsev al’s identit y for complete orthogonal expansions ( Con wa y , 1990 ) yields || ∆ Q ′ || 2 2 = ∞ X k =1 a 2 k || P ∗ k − 1 || 2 2 = ∞ X k =1 " (2 k − 1) 2 λ ′ k ( x + 0 ) − λ ′ k ( x − 0 ) ∆ B 2 # · 1 2 k − 1 = ∞ X k =1 (2 k − 1) λ ′ k ( x + 0 ) − λ ′ k ( x − 0 ) µ ′ B ( x + 0 ) − µ ′ B ( x − 0 ) 2 , whic h completes the pro of, since by definition || ∆ Q ′ || 2 2 = R 1 0 (∆ Q ′ ( u )) 2 du = (Ψ ′ C ) 2 . 50
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment