Individualized Causal Effects under Network Interference with Combinatorial Treatments

Individualized Causal Effects under Netw ork Interfer ence with Combinatorial T r eatments Y unping Lu 1 Haoang Chi 2 Qirui Hu 3 Zhiheng Zhang 3 Abstract Modern causal decision-making increasingly de- mands individualized treatment-ef fect estima- tion in networks where interv entions are high- dimensional, combinatorial v ectors. While net- work interference, ef fect heterogeneity , and multi- dimensional treatments hav e been studied sepa- rately , their intersection yields an exponentially large interv ention space that makes standard iden- tiﬁcation tools and low-dimensional e xposure mappings untenable. W e bridge this gap with a uniﬁed frame work that constructs a global potential-outcome emulator for unit-lev el infer- ence. Our method combines (1) rooted network conﬁgurations to leverage local smoothness, (2) doubly robust orthogonalization to mitigate con- founding from network position and cov ariates, and (3) sparse spectral learning to efﬁciently esti- mate response surfaces ov er the 2 p -dimensional treatment space. W e also decompose network ed effects into o wn-treatment, structural, and inter - action components, and provide ﬁnite-sample er - ror bounds and asymptotic consistency guaran- tees. Overall, we show that indi vidualized causal inference remains feasible in high-dimensional networked settings without collapsing the inter - vention space. 1. Introduction Causal inference in netw orked systems has become a central problem across social sciences, epidemiology , and online experimentation. In many applications, units are embed- ded in a network, and interventions are high-dimensional tr eatment slates —vectors of features that can be simulta- neously toggled. Examples include v accination campaigns, peer effects in education, and product changes on digital platforms. In these settings, the classical no-interference 1 Univ ersity of Leeds 2 National Univ ersity of Defense T echnol- ogy 3 School of Statistics and Data Science, SUFE. Correspondence to: Zhiheng Zhang < zhangzhiheng@mail.shufe.edu.cn > . Pr eprint. F ebruary 24, 2026. assumption (SUTV A) is often violated: one unit’ s outcome may depend on others’ treatment assignments, making the causal estimand inherently global in the assignment vector ( Hudgens & Halloran , 2008 ; Aronow & Samii , 2017 ; Sa vje et al. , 2021 ; Leung , 2022 ). At the same time, polic y decisions are increasingly indi vidu- alized: practitioners seek unit-le vel or conditional effects to guide personalized targeting and adaptiv e e xperimentation ( Chernozhukov et al. , 2018 ; K ¨ unzel et al. , 2019 ; W ager & Athey , 2018 ). Reconciling interference, heterogeneity , and multi-dimensional interv entions is therefore crucial for cred- ible causal decision-making in networked en vironments. This paper integrates three literatures that hav e advanced in parallel. First, the interference literature has de veloped esti- mands and randomization-based methods for average direct and spillov er effects ( Hudgens & Halloran , 2008 ; Arono w & Samii , 2017 ; Savje et al. , 2021 ). Second, the heterogeneous treatment effect (HTE) literature provides tools for learning individualized effects under no interference, ranging from machine learning to meta-learners ( Chernozhukov et al. , 2018 ; K ¨ unzel et al. , 2019 ; W ager & Athey , 2018 ), extended to interference settings ( Ma et al. , 2022 ; Agarwal et al. , 2022 ). Third, factorial and multi-treatment causal inference has clariﬁed potential-outcome formulations for multi-factor interventions and randomization-based inference ( Dasgupta & Pillai , 2015 ; Lopez & Gutman , 2017 ; Zhao & Ding , 2021 ; Agarwal et al. , 2023 ). These three strands differ in their primary bottlenec k : in- terference requires modeling dependence between units’ assignments, HTE demands controlling confounding while preserving heterogeneity , and multi-dimensional treatments face another conceptual exponentially large action space. Real-world deployments often encounter all three chal- lenges simultaneously . Crucially , this is not a simple ad- ditiv e problem in which interference, heterogeneity , and multi-dimensional treatments can be handled separately . Their interaction creates endogenous dependence structures and conﬁguration-speciﬁc overlap conditions, under which standard identiﬁcation and estimation arguments from each individual literature no longer apply . Consider a lar ge on- line platform running a networked experiment. Each user i ∈ [ N ] is assigned a p -dimensional binary feature vector (a 1 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments ‘slate”) T i ∈ {− 1 , 1 } p , and the platform wishes to answer individualized questions of the form: ‘F or this user , what is the outcome change if we toggle a particular subset of featur es, under the user’ s curr ent local network en vir on- ment?” In principle, e ven without interference, this requires learning a response surface over 2 p treatment combinations; with interference, the rele vant counterfactual depends on neighbors’ assignments and the local network conﬁguration, producing a space of counterfactuals that is exponentially large in both p and network size. Existing approaches fail in a common, structural way . This paper introduces a framew ork that transforms this seemingly intractable problem into one amenable to high- dimensional inference. Our approach combines: (i) graph localization to represent interference en vironments with rooted network conﬁgurations, exploiting smoothness in conﬁguration space; (ii) doubly-robust (orthogonal) resid- ualization to mitigate confounding from both network po- sition and covariates; and (iii) sparse spectral learning for high-dimensional combinatorial treatments, enabling stable estimation and extrapolation. These components together yield a global potential-outcome emulator that produces individualized causal contrasts for arbitrary assignments. A ke y adv antage of the framework is its principled de- composition of indi vidualized causal ef fects into: (a) o wn- treatment effects, (b) structural ef fects from changes in the network conﬁguration, and (c) joint ef fects that combine both. This decomposition clariﬁes which components of heterogeneity are identiﬁable from network data and aligns with practical interventions such as feature toggles and net- work redesigns. W e show that, under local regularity and sparse spectral structure, individualized causal comparisons remain feasible e ven when the counterf actual space gro ws exponentially with both the treatment dimension and net- work size. The main contributions of this paper are as follo ws: • W e formulate a potential-outcome frame work for in- dividualized causal effects under network interference with combinatorial treatments, with a clear decompo- sition into own-treatment, structural, and joint ef fects. • W e dev elop an estimator that integrates graph- conﬁguration localization, doubly robust orthogo- nalization, and sparse spectral learning to reco ver unit-lev el response functions and construct a global potential-outcome emulator . • W e establish both ﬁnite-sample error bounds and asymptotic guarantees under localized dependence and ov erlap, and characterize robustness and partial identi- ﬁcation when some treatment directions are not locally identiﬁable. 2. Problem F ormalization Let t = ( t 1 , . . . , t N ) denote a global assignment, where each unit i receiv es a p -dimensional binary treatment slate t i ∈ {− 1 , 1 } p . For each t , let Y i ( t ) denote the potential outcome of unit i , and denote Y i = Y i ( T ) as the observ ed outcome under actual assignment T . W e observe a single assignment–outcome snapshot on a networked population,  ( Y i , T i , X i , N i ) : i ∈ [ N ]  , (1) where X i are pre-treatment cov ariates and N i encodes net- work neighborhood information. Our goal is to construct an estimator b Y ( t ) = ( b Y 1 ( t ) , . . . , b Y N ( t )) that approximates the vector of potential outcomes under arbitrary global assignments. This problem is well deﬁned but intrinsically challenging: the assignment space grows exponentially in p and N , and interference induces dependence across units through the network. Beyond global predictions, we focus on individualized causal inference. For each unit, we target three families of causal contrasts: own-tr eatment effects that vary the unit’ s own slate holding the local interference en vironment ﬁxed, structur al ef fects that v ary the local network conﬁg- uration holding the slate ﬁxed, and joint effects that vary both. For user-speciﬁed contrasts, we provide point esti- mates with uncertainty quantiﬁcation, while retaining the ability to ev aluate b Y ( t ) for arbitrary t . Follo wing Auerbach & T abord-Meehan ( 2021 ), we summa- rize the interference en vironment of unit i by a rooted net- work conﬁguration G i ( t ) , constructed from an ego-centered neighborhood of i (e.g., within a ﬁxed graph radius) together with verte x marks that include treatment slates (and, if de- sired, additional marks such as discretized cov ariates). W e write G i := G i ( T ) for the realized random conﬁguration and g i for its realization. Let G denote the space of rooted, marked conﬁgurations (up to root-preserving isomorphism). W e equip G with a truncated rooted-graph distance (as in the local approach of Auerbach & T abord-Meehan ( 2021 )). Let B r ( g ) denote the radius- r rooted ball around the root of g (induced subgraph with marks carried along). When comparing g and g ′ , we ﬁrst check whether B r ( g ) and B r ( g ′ ) are root-isomorphic as unmarked graphs ; among root-preserving isomor- phisms, we then measure mark mismatches. Concretely , let τ g ( v ) ∈ {− 1 , 1 } p denote the treatment-slate mark at vertex v in conﬁguration g , and deﬁne ∆ r ( g , g ′ ) := min ϕ : B r ( g ) ≃ B r ( g ′ ) X v ∈ V ( B r ( g )) I  τ g ( v )  = τ g ′ ( ϕ ( v ))  / | V ( B r ( g )) | if root-isomorphic (otherwise equals to 1 ) W e then truncate at a small radius R ∈ Z + and set d R ( g , g ′ ) := R X r =0 2 − ( r +1) ∆ r ( g , g ′ ) . (2) 2 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments The metric d R induces a notion of “local similarity” be- tween interference en vironments and enables nonparametric localization ov er G . Illustrative example. T o illustrate the rooted-graph dis- tance and the role of the matching ϕ , consider a social network where each unit receives a binary treatment slate and outcomes may depend on neighbors’ assignments. Let R = 1 , thus each conﬁguration is the root and its immediate neighbors. Suppose the root has exactly two neighbors, and consider two conﬁgurations g and g ′ whose unmarked e go networks are identical: in both, the root is connected to two neighbors. Hence B 1 ( g ) and B 1 ( g ′ ) are root-isomorphic as unmarked graphs. A root-preserving isomorphism ϕ is then a bijection between the neighbor sets of g and g ′ that ﬁxes the root. When multiple such bijections exist, ∆ 1 ( g , g ′ ) uses the one that minimizes treatment-slate mismatches. In g , the two neighbors receiv e slates (+1 , − 1) and ( − 1 , +1) ; in g ′ , they receiv e (+1 , +1) and ( − 1 , +1) . Under the op- timal matching, one neighbor is paired with an identical slate, while the other is paired with a dif ferent slate, so there is exactly one mismatch and ∆ 1 ( g , g ′ ) = 1 / 2 . If instead g ′ had a dif ferent local structure—e.g., a dif ferent number of neighbors—then B 1 ( g ) and B 1 ( g ′ ) would not be root- isomorphic, no root-preserving matching w ould exist, and by deﬁnition ∆ 1 ( g , g ′ ) = 1 . The overall distance d R ( g , g ′ ) then aggregates these discrepancies across radii with ge- ometrically decaying weights, placing more emphasis on mismatches closer to the root, consistent with the idea that nearby interference en vironments matter most for the root’ s outcome. Also, to represent response surfaces ov er {− 1 , 1 } p , we use W alsh characters. For each subset S ⊆ [ p ] , deﬁne Z S ( t ) = Q ℓ ∈ S t ℓ and let Z ( t ) collect { Z S ( t ) : S ⊆ [ p ] } . This provides a conv enient orthonormal dictionary on the hy- percube and supports sparse/near -sparse modeling of high- order interactions. Giv en a tar get unit i , we localize around its realized conﬁg- uration g i by assigning weights to sample units j : w ( i ) j := K G  d R ( g j , g i ) /b G  P N k =1 K G  d R ( g k , g i ) /b G  , N X j =1 w ( i ) j = 1 , (3) where K G is a compactly supported kernel (e.g., indicator or Epanechnikov) and b G > 0 is a bandwidth (or , equiv alently , one may use a k NN radius). W e quantify the effective local sample size via the Kish measure n ( i ) eﬀ := 1 / P N j =1  w ( i ) j  2 , which controls the variance of localized av erages under homoskedastic noise and will serve as the fundamental local sample-size parameter in our analysis. The following assumptions make the target problem well- deﬁned and render it statistically tractable; each is standard in nearby literatures and will be revisited in the identiﬁcation and theory sections. Assumption 2.1 (Conﬁguration suf ﬁciency / local interfer - ence) . There exists a (possibly radius-truncated) conﬁg- uration mapping G i ( t ) such that, for all i and all global assignments t , the potential outcome Y i ( t ) depends on t only through the unit’ s own slate t i and its rooted con- ﬁguration G i ( t ) (and covariates X i ), and thus Y i ( t ) = Y i  t i , G i ( t )  . Assumption 2.1 reduces the global interference problem to a localized object in G , enabling borro wing of information across units with similar en vironments ( Auerbach & T abord- Meehan , 2021 ). Sensitivity to the truncation radius R (and to alternati ve conﬁguration constructions) can be assessed empirically by checking stability of ﬁtted ef fects as R varies . Relaxations. If interference decays with graph distance, increasing R yields a controlled bias–v ariance trade-off; our methods naturally accommodate such sensitivity analyses. Assumption 2.2 (Local re gularity in conﬁguration space) . For each ﬁxed ( t, x ) , the function g 7→ E [ Y ( t, G ) | G = g , X = x ] is locally Lipschitz with respect to d R . Speciﬁ- cally , ∀ g 0 ∈ G there exists L ( g 0 , x, t ) > 0 s.t.   E [ Y ( t, G ) | G = g , X = x ] − E [ Y ( t, G ) | G = g 0 , X = x ]   ≤ L ( g 0 , x, t ) d R ( g , g 0 ) for all g in a neighborhood of g 0 . Assumption 2.2 justiﬁes kernel or k NN localization and en- sures that localization bias vanishes as neighborhoods shrink. One may diagnose violations by examining residual stability as a function of d R ( g j , g i ) and by cross-validating localiza- tion bandwidths. Piecewise regularity can be handled by using cov ers/adapti ve bandwidths; when smoothness fails in speciﬁc regions, our frame work permits reporting local- ized uncertainty inﬂation or re verting to coarser , partially identiﬁed summaries. Assumption 2.3 (Local overlap / design richness) . Fix a target conﬁguration–co v ariate pair ( g 0 , x 0 ) . There exists a neighborhood N ( g 0 , x 0 ) and a constant κ ( g 0 , x 0 ) > 0 such that, for all ( g , x ) ∈ N ( g 0 , x 0 ) , the conditional cov ariance matrix of the orthogonalized treatment features satisﬁes λ min  E h e Z ( T ) e Z ( T ) ⊤    G = g , X = x i ≥ κ ( g 0 , x 0 ) , where e Z ( T ) = Z ( T ) − E [ Z ( T ) | G, X ] denotes the resid- ualized treatment-feature vector . Assumption 2.3 prev ents degeneracy (e.g., near- deterministic treatment assignment within a conﬁguration), which would otherwise preclude individualized identi- ﬁcation. Overlap can be assessed locally via ef fecti ve sample sizes, propensity diagnostics, and conditioning of weighted design matrices. When local overlap fails in some directions, we will characterize robustness and partial identiﬁcation behavior (e.g., reporting bounds or restricting to identiﬁable contrasts). 3 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Assumption 2.4 (Sparse/near -sparse structure ov er combi- natorial treatments) . For each conﬁguration–cov ariate pair ( g , x ) , let f ( t ; g , x ) := E [ Y ( t, G ) | G = g , X = x ] denote the conditional mean response as a function of the treat- ment slate. There exists a coef ﬁcient v ector α ( g , x ) in the W alsh–Hadamard basis such that f ( t ; g , x ) admits the ex- pansion f ( t ; g , x ) = ⟨ α ( g , x ) , Z ( t ) ⟩ , and α ( g , x ) is appr ox- imately sparse in the sense that P k>s ( g ,x ) | α ( k ) ( g , x ) | ≤ C ( g , x ) s ( g , x ) − γ , γ > 0 , where α ( k ) ( g , x ) denotes the k -th largest coef ﬁcient in magnitude. Assumption 2.4 con verts an e xponentially large treatment space into a high-dimensional but tractable estimation prob- lem via sparse learning. Sparsity/compressibility can be probed by model selection stability , interaction-order diag- nostics, and out-of-sample v alidation. Approximate sparsity yields graceful degradation (slower rates and wider inter- vals) rather than failure; one may also impose hierarchical truncations when warranted by domain kno wledge. Assumptions 2.1 – 2.4 jointly formalize the sense in which indivi dualized causal inference under interference with com- binatorial treatments is well-posed yet nontrivial . 3. Identiﬁcation Throughout this section we ﬁrst formalize the causal ob- jects of interest and then state an identiﬁcation theorem that connects these objects to observ able functionals of the data-generating distribution. For a generic unit with cov ariates X = x and conﬁguration G = g , recall the deﬁnition of the own-slate conditional mean response in Assumption 2.4 . T o emphasize the depen- dence of the local conﬁguration on the global assignment, we may write G i ( t ) as G ⟨ t ⟩ i . Gi ven ( g , x ) , we target three families of indi vidualized contrasts: θ T i ( t → t ′ ; g ) := f ( t ′ ; g , x ) − f ( t ; g , x ) , (4) θ G i ( g → g ′ ; t ) := f ( t ; g ′ , x ) − f ( t ; g , x ) , (5) θ G,T i  ( g , t ) → ( g ′ , t ′ )  := f ( t ′ ; g ′ , x ) − f ( t ; g , x ) . (6) These contrasts correspond to (i) toggling the unit’ s own treatment slate holding the interference en vironment ﬁxed, (ii) changing the interference en vironment holding the slate ﬁxed, and (iii) changing both simultaneously . 3.1. Identiﬁcation via local orthogonal moments A central difﬁculty is that ev en under randomized assign- ments, conditioning or localizing on the r ealized conﬁgu- ration can induce dependence between a unit’ s own slate and its interference en vironment. This creates a form of endogeneity that in v alidates naiv e regression of Y on Z ( T ) within localized neighborhoods. W e therefore identify the W alsh coef ﬁcients through an orthogonalized (residualized) moment equation. Deﬁne the nuisance functions µ ( g , x ) := E [ Y | G = g , X = x ] , m ( g , x ) := E [ Z ( T ) | G = g , X = x ] , and the residuals e Y := Y − µ ( G, X ) , e Z := Z ( T ) − m ( G, X ) . Fix a tar get conﬁguration g ∈ G . Let w g ( G ) denote a non- negati v e localization weight. This weight is the population analogue of the kernel or k NN weights deﬁned in Section 2, and is designed to upweight observations whose realized conﬁgurations are close to g . Concretely , one may take 1 w g ( G ) ∝ K G  d R ( G, g ) /b G  , with normalization ensuring unit expectation. All identi- ﬁcation results below are stated for a generic choice of such localization weights. For α ∈ R 2 p deﬁne the score Ψ( α ; g ) := E h w g ( G ) e Z  e Y − e Z ⊤ α  i . This is the population analogue of a localized regression of e Y on e Z . Local overlap (Assumption 2.3 ) guarantees that the rele vant directions of e Z exhibit non-degenerate variation locally , ensuring well-posedness. Theorem 3.1 (Identiﬁcation of localized W alsh coef ﬁcients) . Assume Assumptions 2.1–2.4. Fix a targ et pair ( g , x ) in the interior of the support. Then there exists a (possibly localized) coefﬁcient vector α ⋆ ( g , x ) such that Ψ  α ⋆ ( g , x ); g  = 0 . (7) Mor eover , under the local overlap condition in As- sumption 2.3, α ⋆ ( g , x ) is unique within the model class implied by Assumption 2.4 (e .g., the sparse/near- sparse cone), and it identiﬁes the response function in Assumption 2.4 in the sense that f ( t ; g , x ) = ⟨ α ⋆ ( g , x ) , Z ( t ) ⟩ for all tr eatment slates t ∈ {− 1 , 1 } p . Consequently , for any unit i and any global assignment t , E [ Y i ( t ) | X i = x i ] = D α ⋆  G ⟨ t ⟩ i , x i  , Z ( t i ) E , and the individualized contrasts in ( 4 ) – ( 6 ) ar e identiﬁed as unique functions of α ⋆ ( · , · ) . Theorem 3.1 formalizes a practical message: once the inter- fer ence en vir onment is summarized by a r ooted conﬁgur a- tion, one can tr eat causal learning locally in conﬁgur ation space, pr ovided there is sufﬁcient tr eatment variation lo- cally . For example, in a platform experiment where a user’ s outcome depends on her own feature slate and the slates 1 Here g ∈ G denotes a ﬁxed target conﬁguration at which inference is performed, while G is a G -valued random variable representing the realized conﬁguration of a randomly sampled unit. The weight w g ( G ) therefore assigns larger mass to realizations of G that are closer to the target g under the rooted-graph distance d R . 4 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments of nearby friends, even globally randomized assignment can become locally confounded: restricting attention to users with similar realized neighborhoods (e.g., two treated friends and one control friend) can induce dependence be- tween the user’ s o wn slate and the neighborhood pattern. The orthogonal moment addresses this by residualizing both outcomes and treatment features on ( G, X ) . Importantly , the identiﬁcation result does not assert point identiﬁcation of e very direction in the combinatorial treat- ment space. If local ov erlap fails in certain W alsh direc- tions—say , some treatment components are nearly deter- ministic within a conﬁguration neighborhood—then those directions are not point-identiﬁed. This reﬂects an inher- ent data limitation: without local v ariation, indi vidualized contrasts along those directions cannot be learned and the solution to the moment condition 7 is non-unique. Dev elop- ing robustness and partial-identiﬁcation results is a natural direction for future work. A natural question is:: “If assignments ar e randomized, why do we need orthogonalization at all?” The key is that our target is local in conﬁguration space, and condi- tioning/localizing on realized conﬁgurations is a form of post-assignment selection. Even under random assignment, selection on neighborhood patterns generally induces corre- lation between a unit’ s slate and its neighbors’ slates. The orthogonal moment is constructed to be insensiti ve to this in- duced dependence and to enable principled inference within localized neighborhoods. At the population level, the mo- ment condition in ( 7 ) is exactly unbiased; approximation errors arise only at the estimation stage through localization, nuisance estimation, and ﬁnite-sample effects. Corollary 3.2 (Identiﬁcation of individualized contrasts) . Under the conditions of Theor em 3.1 , the thr ee contrast fam- ilies θ T i , θ G i , and θ G,T i ar e identiﬁed for any user -speciﬁed ( t, t ′ , g , g ′ ) for which the corresponding directions satisfy local overlap. 4. Estimation This section gi ves sample analogues of the orthogonal mo- ment equation in Section 3 and yields estimators of (i) lo- calized W alsh coefﬁcients and (ii) individualized causal contrasts. In addition to Assumptions 2.1 – 2.4 , we impose the following high-dimensional estimation conditions. Assumption 4.1 (Cross-ﬁtted nuisance accuracy) . Let µ ( g , x ) = E [ Y | G = g , X = x ] and m ( g , x ) = E [ Z ( T ) | G = g , X = x ] . There exist cross-ﬁtted estimators ˆ µ and ˆ m (deﬁned be- low) such that max i ∈ [ N ]   ˆ µ ( G i , X i ) − µ ( G i , X i )   = o p (1) , max i ∈ [ N ]   ˆ m ( G i , X i ) − m ( G i , X i )   ∞ = o p (1) . Assumption 4.2 (W eighted restricted eigen value and noise tails) . Fix a conﬁguration center g ∈ G and let w j ( g ) be Algorithm 1 Oracle (population) identiﬁcation of individu- alized contrasts at ( g , x ) 1: Input: target conﬁguration g , cov ariates x , localization weights w g ( · ) , W alsh dictionary Z ( · ) . 2: Compute nuisances µ ( g , x ) = E [ Y | G = g , X = x ] and m ( g , x ) = E [ Z ( T ) | G = g , X = x ] . 3: Form residuals e Y = Y − µ ( G, X ) and e Z = Z ( T ) − m ( G, X ) . 4: Solve for α ⋆ ( g , x ) such that Ψ( α ⋆ ( g , x ); g , x ) = 0 . 5: Deﬁne the response function b f ( t ; g , x ) = ⟨ α ⋆ ( g , x ) , Z ( t ) ⟩ for any slate t . 6: Output: for any user-speciﬁed ( t, t ′ , g , g ′ ) , com- pute contrasts θ T ( t → t ′ ; g ) , θ G ( g → g ′ ; t ) , and θ G,T (( g , t ) → ( g ′ , t ′ )) by plugging b f into ( 4 )–( 6 ). the kernel weights deﬁned in ( 3 ) with g i replaced by g . Let ˆ e Z j denote the cross-ﬁtted residualized treatment-feature vector deﬁned in ( 8 ), and deﬁne the weighted Gram matrix b Σ( g ) := P N j =1 w j ( g ) ˆ e Z j ˆ e Z ⊤ j . There exists κ g > 0 such that b Σ( g ) satisﬁes a restricted-eigen v alue condition o ver the sparse cone associated with Assumption 2.4 , with probabil- ity tending to one. Moreover , the (cross-ﬁtted) regression noise is conditionally sub-Gaussian giv en ( G, X , T ) with proxy variance σ 2 . Assumption 4.3 (Local sparsity/near-sparsity) . For each ( g , x ) , the W alsh coef ﬁcient vector α ( g , x ) in Assump- tion 2.4 is sparse or approximately sparse with effecti ve spar- sity level s ( g , x ) satisfying s ( g , x ) log(2 p ) = o ( n eﬀ ( g )) , where n eﬀ ( g ) :=  P N j =1 w j ( g ) 2  − 1 is the Kish effecti ve sample size associated with the weights w j ( g ) . Assumptions 4.1 – 4.3 are standard high-dimensional esti- mation conditions that complement, rather than strengthen, the identiﬁcation assumptions in Section 3 . Assumption 4.1 ensures that nuisance estimation errors are second-order and do not affect the orthogonal moment at ﬁrst order , while Assumptions 4.2 and 4.3 guarantee that the resulting local- ized high-dimensional regression problem is well-posed at the ef fecti ve sample size scale n eﬀ ( g ) . These conditions are sufﬁcient for consistent estimation and v alid inference, and are satisﬁed by a broad class of modern machine-learning nuisance estimators and weighted sparse regressions. W e estimate the nuisance functions µ and m and construct residuals using cross-ﬁtting. Partition the index set [ N ] into K cf ≥ 2 disjoint folds {I k } K cf k =1 . F or each fold k , ﬁt nuisance estimators ˆ µ ( − k ) ( · , · ) and ˆ m ( − k ) ( · , · ) using only observations with indices in [ N ] \ I k . For each i ∈ I k , deﬁne the cross-ﬁtted residuals := Y i − ˆ µ ( − k ) ( G i , X i ) , ˆ e Z i := Z ( T i ) − ˆ m ( − k ) ( G i , X i ) . (8) 5 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments W e use { ( ˆ e Y i , ˆ e Z i ) } N i =1 as the debiased inputs for localized high-dimensional regression. 4.1. Localized weighted Lasso for W alsh coefﬁcients For a tar get unit i , we estimate the coef ﬁcient vector at the unit’ s realized conﬁguration by setting g = g i and writing w ( i ) j := w j ( g i ) (which coincides with ( 3 ) ). Deﬁne the weighted Lasso estimator ˆ α i ∈ argmin β ∈ R 2 p    N X j =1 w ( i ) j  ˆ e Y j − ˆ e Z ⊤ j β  2 + λ i ∥ β ∥ 1    , (9) where λ i ≍ ˆ σ q log(2 p ) n eff ( g i ) and ˆ σ is any consistent estimator of the noise scale (e.g., from localized residuals). For any slate t ∈ {− 1 , 1 } p , deﬁne the plug-in response estimate b f i ( t ; g i , x i ) := ⟨ ˆ α i , Z ( t ) ⟩ . Then for any ( t, t ′ ) , b θ T i ( t → t ′ ; g i ) = ⟨ ˆ α i , Z ( t ′ ) − Z ( t ) ⟩ . (10) T o estimate structural and joint effects, repeat ( 9 ) with weights centered at a second conﬁguration g ′ (i.e., replace w ( i ) j by w j ( g ′ ) ) to obtain an estimator ˆ α i ( g ′ ) , and set b θ G i ( g i → g ′ ; t ) =  ˆ α i ( g ′ ) − ˆ α i , Z ( t )  , b θ G,T i (( g i , t ) → ( g ′ , t ′ )) = ⟨ ˆ α i ( g ′ ) , Z ( t ′ ) ⟩ − ⟨ ˆ α i , Z ( t ) ⟩ . (11) 4.2. Debiased inference f or an own-tr eatment contrast For a user-speciﬁed contrast ( t, t ′ ) , deﬁne the direction v t,t ′ := Z ( t ′ ) − Z ( t ) ∈ R 2 p . Let b Σ( g i ) be the weighted Gram matrix at g = g i . Compute an approximate inv erse- direction ˆ γ i by any standard high-dimensional procedure (e.g., CLIME/nodewise re gression), for instance via ˆ γ i ∈ arg min γ ∈ R 2 p ∥ γ ∥ 1 s.t.    b Σ( g i ) γ − v t,t ′    ∞ ≤ η i , η i ≍ s log(2 p ) n eﬀ ( g i ) . (12) Deﬁne the debiased estimator e θ T i ( t → t ′ ; g i ) = v ⊤ t,t ′ ˆ α i + ˆ γ ⊤ i N X j =1 w ( i ) j ˆ e Z j  ˆ e Y j − ˆ e Z ⊤ j ˆ α i  . (13) The follo wing section sho ws that under Assumptions 4.1 – 4.3 , e θ T i admits asymptotically normal inference with an estimated variance obtained from the weighted empirical second moment of the inﬂuence term in ( 13 ). 5. Theoretical analysis This section de velops ﬁnite-sample and asymptotic guaran- tees for the localized estimators in Section 4 . The key point is that the statistical dif ﬁculty is controlled by the ef fective local sample size and local spectral sparsity , not by the exponential number of treatment slates. Write d := 2 p for the ambient W alsh dimension. Fix a target unit i and recall the kernel weights w ( i ) j in ( 3 ) . Recall the effecti ve local sample size as n i := n eﬀ ( g i ) =  P N j =1 ( w ( i ) j ) 2  − 1 . Let s i := s ( g i , x i ) denote the effecti v e sparsity lev el from Assumption 4.3 . For nuisance estimation, de- ﬁne the sup errors δ µ := max j ∈ [ N ]   ˆ µ ( G j , X j ) − µ ( G j , X j )   , δ m := max j ∈ [ N ]   ˆ m ( G j , X j ) − m ( G j , X j )   ∞ . Finally , to separate sampling er- ror from localization bias , deﬁne 2 bias i ( b G ) := sup t ∈{− 1 , 1 } p sup g : d R ( g ,g i ) ≤ b G   f ( t ; g , x i ) − f ( t ; g i , x i )   . By Assumption 2.2 , for each ﬁxed ( g i , x i ) there exists a ﬁnite constant C i < ∞ (depending on ( g i , x i ) but not on b G ) such that bias i ( b G ) ≤ C i b G . Step 1: Orthogonalization removes ﬁrst-order nuisance effects W e ﬁrst record the key property that enables valid high-dimensional learning with ﬂexible nuisances: cross- ﬁtting makes nuisance errors enter only at second order . Lemma 5.1 (Orthogonalization remainder) . Let ˆ e Y j and ˆ e Z j be the cross-ﬁtted residuals in ( 8 ) , and deﬁne the cor- r esponding oracle residuals e Y j := Y j − µ ( G j , X j ) and e Z j := Z ( T j ) − m ( G j , X j ) . Then, uniformly over α in any ﬁxed ℓ 1 -ball, the differ ence between the empirical weighted scor e built from ( ˆ e Y j , ˆ e Z j ) and the one built fr om ( e Y j , e Z j ) is of or der O p ( δ µ δ m ) . In particular , if δ µ δ m = o p (1) , nuisance estimation does not contribute a ﬁr st-or der term. Lemma 5.1 formalizes why we can combine mod- ern machine-learning nuisances with localized high- dimensional regression: ev en though ˆ µ and ˆ m may be com- plex, their errors do not bias the target moment at ﬁrst order . A common misconception is that “an y smoothing or selec- tion in validates orthogonality”; here the point is precisely that orthogonalization targets the post-localization endo- geneity induced by conditioning on realized conﬁgurations. Step 2: Finite-sample rates for localized W alsh learning W e no w state a nonasymptotic oracle inequality for the local- ized weighted Lasso ( 9 ) . The rate is dri ven by ( s i , log d, n i ) plus a transparent localization bias term. Theorem 5.2 (Finite-sample error for localized weighted Lasso) . F ix a targ et unit i and let ˆ α i be deﬁned by ( 9 ) with λ i ≍ ˆ σ p log( d ) /n i . Assume Assumptions 4.2 – 4.3 and δ µ δ m = o p ( λ i ) . Then with probability tending to one, ∥ ˆ α i − α ( g i , x i ) ∥ 1 ≲ s i q log d n i + s i bias i ( b G ) + s i δ µ δ m , 2 Equiv alently , covariates can be viewed as additional root marks in the conﬁguration, so localization in g implicitly restricts attention to samples with comparable cov ariate v alues. 6 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments ∥ ˆ α i − α ( g i , x i ) ∥ 2 ≲ q s i log d n i + bias i ( b G ) + δ µ δ m , P N j =1 w ( i ) j  ˆ e Z ⊤ j ( ˆ α i − α ( g i , x i ))  2 ≲ s i log d n i + bias i ( b G ) 2 + ( δ µ δ m ) 2 . What this theorem says (and what it does not). Theo- rem 5.2 is a local result: it quantiﬁes how well we can learn a unit’ s W alsh coefﬁcients near its realized conﬁguration. It does not claim uniform recov ery ov er all conﬁgurations without additional cov ering ar guments. The theorem high- lights the intended tradeof f: shrinking b G reduces bias i ( b G ) but decreases n i , while increasing b G increases the effecti v e sample size but introduces localization bias. Step 3: Consequences for individualized causal contrasts W e ne xt translate coefﬁcient error into error for indi vidual- ized effects. Corollary 5.3 (Plug-in error for individualized contrasts) . F ix a unit i and consider any pair of slates ( t, t ′ ) . Let v t,t ′ := Z ( t ′ ) − Z ( t ) as in Section 4 . Then   b θ T i ( t → t ′ ; g i ) − θ T i ( t → t ′ ; g i )   ≤ ∥ v t,t ′ ∥ ∞ ∥ ˆ α i − α ( g i , x i ) ∥ 1 ≤ 2 ∥ ˆ α i − α ( g i , x i ) ∥ 1 , and analogously for b θ G i and b θ G,T i in ( 11 ) (with an additional err or term contributed by the second ﬁt at g ′ ). Consequently , Theorem 5.2 yields   b θ T i ( t → t ′ ; g i ) − θ T i ( t → t ′ ; g i )   ≲ s i q log d n i + s i bias i ( b G ) + s i δ µ δ m . Remark 5.4 . A common objection is that “the contrast is dense, so shouldn’t the error be exponential?”] Note that it is tempting (but misleading) to bound the contrast error by ∥ v t,t ′ ∥ 2 ∥ ˆ α i − α ∥ 2 , which would scale as √ d and obscure the point of the spectral approach. Corollary 5.3 uses the correct geometry: since W alsh characters are uniformly bounded, ∥ v t,t ′ ∥ ∞ ≤ 2 , the relev ant control is via the ℓ 1 error of a sparse vector . This is the mechanism by which the expo- nential slate space is con verted into a high-dimensional b ut tractable problem, with complexity entering only through log d = log (2 p ) = Θ( p ) . Step 4: Debiased inference f or a user -speciﬁed contrast W e no w state an asymptotic normality result for the debiased estimator ( 13 ) . This provides valid uncertainty quantiﬁca- tion for individualized contrasts. Theorem 5.5 (Asymptotic normality of the debiased con- trast) . F ix a unit i and a contrast ( t, t ′ ) with dir ection v t,t ′ . Let e θ T i be deﬁned in ( 13 ) and let γ ⋆ i denote the popula- tion solution of the local linear system Σ( g i ) γ = v t,t ′ , wher e Σ( g i ) := E [ w g i ( G ) e Z e Z ⊤ ] is the population ana- logue of b Σ( g i ) . Assume: (i) γ ⋆ i exists and is sparse or appr oximately spar se; (ii) the estimator ˆ γ i in ( 12 ) satisﬁes ∥ ˆ γ i − γ ⋆ i ∥ 1 = o p (1) ; (iii) ( s i + ∥ γ ⋆ i ∥ 0 ) log d = o ( √ n i ) and bias i ( b G ) = o ( n − 1 / 2 i ) ; (iv) a central limit theor em holds for the weighted inﬂuence sum induced by the local- ization weights conditioning that δ µ δ m = o p ( n − 1 / 2 i ) in Algorithm 2 Localized DR-Lasso and debiased inference for unit i 1: Input: data { ( Y j , T j , X j , g j ) } N j =1 , target unit i , kernel K G , bandwidth b G , folds {I k } , contrast ( t, t ′ ) . 2: Compute weights w ( i ) j via ( 3 ) and n i = ( P j ( w ( i ) j ) 2 ) − 1 . 3: Cross-ﬁt nuisances ˆ µ, ˆ m on folds and form residuals ( ˆ e Y j , ˆ e Z j ) via ( 8 ). 4: Solve weighted Lasso ( 9 ) to obtain ˆ α i and the plug-in contrast b θ T i via ( 10 ). 5: Compute b Σ( g i ) = P j w ( i ) j ˆ e Z j ˆ e Z ⊤ j and solve ( 12 ) to obtain ˆ γ i . 6: Output: the debiased estimator e θ T i via ( 13 ) and a con- ﬁdence interval using ˆ σ 2 θ,i in Theorem 5.5 . Theor em 5.2 , then √ n i  e θ T i ( t → t ′ ; g i ) − θ T i ( t → t ′ ; g i )  ⇒ N  0 , σ 2 θ,i  , wher e σ 2 θ,i = V ar( γ ⋆ ⊤ i e Z ε ) and ε := e Y − e Z ⊤ α ( g i , x i ) . A consistent variance estimator is ˆ σ 2 θ,i := n i P N j =1 ( w ( i ) j ) 2  ˆ γ ⊤ i ˆ e Z j ˆ ε j  2 , ˆ ε j := ˆ e Y j − ˆ e Z ⊤ j ˆ α i . Theorem 5.5 is an individualized inference result: it yields asymptotically valid uncertainty for a user-chosen contrast at a speciﬁc unit and conﬁguration neighborhood. It is not a claim of uniform inference over all units and all slates without further structure. The key requirement is that the effecti ve local sample size n i grows f ast enough relati ve to the local complexity ( s i and the sparsity of γ ⋆ i ), so that the debiasing remainder is asymptotically negligible. T ogether , Theorems 5.2 and 5.5 show that indi vidualized causal learning under interference can be statistically feasi- ble e ven with e xponentially many slates: the price of com- binatorial treatments enters through log(2 p ) = Θ( p ) and local sparsity , while interference is handled through local- ization and ef fecti ve sample size. In addition, we deduce the discussion of model robustness under local ov erlap failure (Assumption 2.3 ), which induces the partial identiﬁcation bound, in the Appendix. 6. Experiments W e comprehensi vely v alidate our proposed method through synthetic experiments. The settings are listed in Section A . Figure 1 sho ws the comparison of point estimates and 95% conﬁdence interv als for indi vidualized causal contrasts across the three estimators when the sample size is N = 500 . The true parameter v alue is 0 (red dashed line). Our method achie ves a point estimate of 0.031 with a conﬁdence interv al 7 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Oracle P r oposed (Ours) Baseline 0.2 0.1 0.0 0.1 0.2 0.3 0.4 Estimated P arameter V alue T rue value = 0.0 -0.007 0.031 0.124 CI: [-0.052, 0.035] [-0.074, 0.159] [-0.117, 0.369] Width: 0.087 Width: 0.233 Width: 0.486 Comparison of Point Estimates and 95% Confidence Intervals (100 experiments at N=500) F igur e 1. Comparison of point estimates and 95% conﬁdence intervals in synthetic experiments ( N = 500 , 100 independent repetitions). Our proposed method (width 0.233, bias 0.031) sub- stantially outperforms the baseline (0.486, 0.124) and approaches the oracle (0.087, -0.007), providing empirical evidence for Theo- rems 5.2 and 5.5. width of 0.233, substantially outperforming the baseline (point estimate 0.124, width 0.486) and approaching the ora- cle (point estimate -0.007, width 0.087). This result demon- strates that the proposed localized representation, doubly robust orthogonalization, and sparse spectral learning ef- fectiv ely reduce bias and signiﬁcantly shrink uncertainty in ﬁnite samples, providing direct empirical support for the ﬁnite-sample error bounds in Theorem 5.2 and asymptotic normality in Theorem 5.5. 6.1. Evolution of the pr oposed estimator Data are generated according to the model speciﬁed in Section 2 . The underlying network is an Erd ˝ os-R ´ enyi random graph with av erage degree d = 8 , inducing het- erogeneous interference patterns. Node-le vel cov ariates are drawn as X i ∼ N (0 , I p ) with p = 10 , and com- binatorial treatments are assigned uniformly at random: t i ∈ {− 1 , 1 } p , yielding a 2 10 -dimensional W alsh basis expansion. Outcomes follow Outcomes follow Assump- tion 2.4 with s i = 3 activ e W alsh coefﬁcients and Gaus- sian noise ε i ∼ N (0 , 0 . 25) . The performance of the debi- ased localized Lasso estimator (Algorithm 2) is ev aluated across sample sizes N ∈ { 50 , 100 , 200 , 500 , 1000 } , with M = 100 Monte Carlo repetitions per conﬁguration. Fig- ure 2 summarizes the ﬁnite-sample properties of the pro- posed estimator . Three patterns emerge that align precisely with the theoretical predictions of Theorems 5.2 and 5.5 Bias. The median estimates (red lines) exhibit minimal deviation from the true parameter across all sample sizes, ranging from − 0 . 012 ( N = 50 ) to 0 . 002 ( N = 1000 ). This near-zero median bias conﬁrms the asymptotic unbiasedness established in Theorem 5.5, demonstrating that the debias- ing correction successfully eliminates the shrinkage bias inherent in naiv e Lasso estimators. F igur e 2. Conv ergence of the Localized DR-Lasso Estimator . Boxplots display the sampling distribution across 100 simulations for each sample size. The horizontal axis indicates the total sam- ple size N ; the vertical axis measures relative estimation error (normalized to zero). T op annotations : Sample mean ± standard deviation; Center : Sample median (red line); Bottom : Empirical 95% conﬁdence interval width ( W = Q 97 . 5 − Q 2 . 5 ). Con ver gence Rate. The empirical 95% conﬁdence interval width monotonically contracts from 0 . 529 at N = 50 to 0 . 119 at N = 1000 , approximately follo wing the √ N rate predicted by Theorem 5.2. Speciﬁcally , doubling the sam- ple size roughly halves the standard deviation (e.g., 0 . 143 at N = 50 vs. 0 . 031 at N = 1000 ), conﬁrming √ N - consistency . Coverage. The empirical coverage of the constructed conﬁdence interv als remains within [0 . 92 , 0 . 96] across all settings, closely tracking the nominal 95% le v el. This validates the asymptotic normality approximation in Theorem 5.5 e ven for moderate sample sizes. Implications. The simulation results corroborate the efﬁcienc y bounds deriv ed in Section 5 : the proposed estimator achie ves oracle- like performance (comparable to the infeasible estimator that observes true interference patterns) while remaining computable via standard con v ex optimization. 7. Conclusion In this work, we presented a uniﬁed framew ork for indi vidu- alized causal inference in networked systems characterized by high-dimensional, combinatorial treatments. By inte- grating rooted network conﬁgurations with doubly robust orthogonalization and sparse spectral learning, our approach constructs a global potential-outcome emulator that remains statistically tractable e ven as the intervention space gro ws to 2 p dimensions. W e provided a principled decomposition of networked causal ef fects into o wn-treatment, structural, and joint components, supported by rigorous ﬁnite-sample error bounds and asymptotic normality guarantees. Our empirical v alidation demonstrates that this localized repre- sentation effecti v ely mitigates confounding from network positions and reduces estimation bias, approaching oracle- lev el performance in ﬁnite samples. Ultimately , these results establish that individualized causal decision-making is fea- sible in complex network ed en vironments without the need to collapse or simplify the intervention space. 8 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Impact Statement This paper presents work whose goal is to advance the ﬁeld of machine learning. There are many potential societal consequences of our work, none of which we feel must be speciﬁcally highlighted here. References Agarwal, A., Cen, S. H., Shah, D., and Y u, C. L. Net- work synthetic interventions: A causal framework for panel data under network interference. arXiv preprint arXiv:2210.11355 , 2022. Agarwal, A., Agarwal, A., and V ijaykumar, S. Synthetic combinations: A causal inference framework for combi- natorial interventions. Advances in Neural Information Pr ocessing Systems , 36:19195–19216, 2023. Aronow , P . M. and Samii, C. Estimating av erage causal ef- fects under interference between units. Annals of Applied Statistics , 11(4):1912–1947, 2017. Auerbach, E. and T abord-Meehan, M. The local approach to causal inference under network interference. arXiv pr eprint arXiv:2105.03810 , 2021. Chernozhukov , V ., Chetverikov , D., Demirer , M., Duﬂo, E., Hansen, C., Ne wey , W ., and Robins, J. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal , 21(1):C1–C68, 2018. Dasgupta, T . and Pillai, N. Causal inference from 2 k fac- torial designs using potential outcomes. J ournal of the Royal Statistical Society: Series B , 77(4):727–753, 2015. Hudgens, M. G. and Halloran, M. E. T o ward causal infer - ence with interference. Journal of the American Statisti- cal Association , 103(482):832–842, 2008. K ¨ unzel, S. R., Sekhon, J. S., Bickel, P . J., and Y u, B. Met- alearners for estimating heterogeneous treatment ef fects using machine learning. Pr oceedings of the National Academy of Sciences , 116(10):4156–4165, 2019. Langley , P . Crafting papers on machine learning. In Langley , P . (ed.), Pr oceedings of the 17th International Confer ence on Machine Learning (ICML 2000) , pp. 1207–1216, Stan- ford, CA, 2000. Morgan Kaufmann. Leung, M. P . Approximate randomized experiments in networks. Review of Economics and Statistics , 104(2): 318–334, 2022. Lopez, M. J. and Gutman, R. Estimation of causal effects with multiple treatments: A revie w and new ideas. Statis- tical Science , 32(3):432–454, 2017. Ma, J., W an, M., Y ang, L., Li, J., Hecht, B., and T ee v an, J. Learning causal ef fects on hypergraphs. In Pr oceedings of the 28th ACM SIGKDD Confer ence on Knowledge Discovery and Data Mining , pp. 1202–1212, 2022. Savje, F ., Aronow , P . M., and Hudgens, M. G. A verage treatment ef fects in the presence of unkno wn interference. Annals of Statistics , 49(2):673–701, 2021. W ager , S. and Athey , S. Estimation and inference of hetero- geneous treatment effects using random forests. Journal of the American Statistical Association , 113(523):1228– 1242, 2018. Zhao, Q. and Ding, P . Bayesian inference for factorial experiments with interference. Biometrika , 108(4):1067– 1081, 2021. 9 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments A. Simulation setup The synthetic data are generated strictly according to the model described in Section 2 : the network is an Erd ˝ os–R ´ enyi random graph (with v ariable number of nodes N and a verage degree d controlling interference strength, typically d = 5 – 10 ), cov ariates X i ∼ N (0 , I ) for each node, combinatorial treatments t i ∈ {− 1 , 1 } p ( p = 10 , yielding a treatment space of size 2 10 = 1024 ), and potential outcomes e xpanded in the W alsh-Hadamard basis with coefﬁcients α i satisfying sparsity (support size s = 20 ) and the local smoothness assumption (bandwidth b G = 2 on rooted network conﬁgurations). The interference structure includes own-treatment effects and ﬁrst-order neighbor interference, with the true causal contrast parameter set to 0. Each experimental setting is independently repeated 100 times to obtain rob ust statistics (median point estimates, 95% conﬁdence interval widths, etc.). W e compare three estimators: Oracle : An oracle estimator that assumes kno wledge of the true potential outcome model and all nuisance functions, serving only as a theoretical optimum benchmark. Proposed (Ours): Our method, integrating localized rooted netw ork representation, doubly rob ust orthogonalization, and sparse spectral learning. Baseline : A strong baseline using a graph-agnostic doubly robust learner (Graph-Agnostic DR-Learner) combined with simple network a veraging, representing typical e xisting techniques for handling network interference or high-dimensional treatments. B. Proofs f or Section 5 B.1. Notation and weighted algebra Fix a target unit i throughout this appendix and abbreviate the k ernel weights w j := w ( i ) j . Recall d := 2 p and the effecti ve local sample size n i := n eﬀ ( g i ) =  N X j =1 w 2 j  − 1 . For scalar sequences ( a j ) N j =1 , deﬁne the weighted inner product and norm ⟨ a, b ⟩ w := N X j =1 w j a j b j , ∥ a ∥ 2 w, 2 := N X j =1 w j a 2 j . For v ector cov ariates u j ∈ R d , deﬁne the weighted Gram matrix b Σ i ( u ) := N X j =1 w j u j u ⊤ j . In particular , b Σ( g i ) in Assumption 4.2 equals b Σ i ( ˆ e Z ) with u j = ˆ e Z j . For S ⊆ [ d ] and c 0 > 0 , deﬁne the sparse cone C ( S, c 0 ) :=  ∆ ∈ R d : ∥ ∆ S c ∥ 1 ≤ c 0 ∥ ∆ S ∥ 1  . W e interpret Assumption 4.2 as: for the relev ant support set S , there exists κ g i > 0 such that, with probability tending to one, ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≥ κ g i ∥ ∆ ∥ 2 2 for all ∆ ∈ C ( S, 3) . (14) B.2. T wo elementary lemmas W e start with two basic f acts repeatedly used belo w . Lemma B.1 (Con v ex-hull bound for m ( g , x ) ) . F or any ( g , x ) , the conditional mean m ( g , x ) = E [ Z ( T ) | G = g , X = x ] lies in the con ve x hull of { Z ( t ) : t ∈ {− 1 , 1 } p } ⊆ {− 1 , 1 } d . Consequently , for any a ∈ R d ,   a ⊤ m ( g , x )   ≤ sup t ∈{− 1 , 1 } p   a ⊤ Z ( t )   . 10 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Pr oof. By deﬁnition, m ( g , x ) = P t ∈{− 1 , 1 } p ¶ ( T = t | G = g , X = x ) Z ( t ) , a con v ex combination of the vertices { Z ( t ) } . For an y ﬁxed a ∈ R d , the map z 7→ a ⊤ z is linear , hence a ⊤ m ( g , x ) lies between the minimum and maximum of a ⊤ Z ( t ) ov er t . Applying the same argument to − a yields the stated absolute-v alue bound. Lemma B.2 (W eighted sub-Gaussian maximal inequality) . Let ( ξ j ) N j =1 be conditionally independent given a sigma-ﬁeld F , with E [ ξ j | F ] = 0 and conditionally sub-Gaussian tails E [exp( λξ j ) | F ] ≤ exp( λ 2 σ 2 / 2) for all λ ∈ R . Let ( w j ) N j =1 be nonne gative F -measurable weights with P j w j = 1 . If ( a j,k ) j,k ar e F -measurable and satisfy | a j,k | ≤ B for all j, k , then for S k := P N j =1 w j a j,k ξ j , ¶  max 1 ≤ k ≤ d | S k | ≥ t     F  ≤ 2 d exp  − n i t 2 2 σ 2 B 2  , n i =  X j w 2 j  − 1 . In particular , taking t = C σ B p log d/n i yields max k | S k | = O p ( p log d/n i ) . Pr oof. Condition on F . Each S k is a weighted sum of conditionally independent sub-Gaussian variables w j a j,k ξ j with proxy variance σ 2 P j w 2 j a 2 j,k ≤ σ 2 B 2 P j w 2 j = σ 2 B 2 /n i . Hence S k is conditionally sub-Gaussian with that proxy variance, so ¶ ( | S k | ≥ t | F ) ≤ 2 exp( − n i t 2 / (2 σ 2 B 2 )) . A union bound over k ∈ [ d ] gi ves the displayed inequality . B.3. Proof of Lemma 5.1 (orthogonalization r emainder) W e ﬁrst prov e a more e xplicit inequality; Lemma 5.1 follows as a corollary . Lemma B.3 (Empirical score perturbation bound) . F ix a fold partition and a tar get unit i . F or eac h j , let ∆ µj := ˆ µ ( G j , X j ) − µ ( G j , X j ) , ∆ mj := ˆ m ( G j , X j ) − m ( G j , X j ) , so that ˆ e Y j = e Y j − ∆ µj and ˆ e Z j = e Z j − ∆ mj . F or any α ∈ R d , deﬁne the empirical weighted scor es ˆ Ψ i ( α ) := N X j =1 w j ˆ e Z j  ˆ e Y j − ˆ e Z ⊤ j α  , Ψ ◦ i ( α ) := N X j =1 w j e Z j  e Y j − e Z ⊤ j α  . Then, for any α with ∥ α ∥ 1 ≤ M , ∥ ˆ Ψ i ( α ) − Ψ ◦ i ( α ) ∥ ∞ ≤    N X j =1 w j e Z j ∆ µj    ∞ | {z } ( I ) +    N X j =1 w j e Z j (∆ ⊤ mj α )    ∞ | {z } ( I I ) +    N X j =1 w j ∆ mj ( e Y j − e Z ⊤ j α )    ∞ | {z } ( I I I ) +    N X j =1 w j ∆ mj ∆ µj    ∞ | {z } ( I V ) +    N X j =1 w j ∆ mj (∆ ⊤ mj α )    ∞ | {z } ( V ) . Mor eover , ( I V ) ≤ δ µ δ m and ( V ) ≤ M δ 2 m . If, in addition, ∥ e Z j ∥ ∞ ≤ 2 almost sur ely and e Y j − e Z ⊤ j α is conditionally sub-Gaussian given ( G j , X j , T j ) with pr oxy σ 2 , then ( I ) + ( I I ) + ( I I I ) = O p ( δ µ + δ m ) r log d n i ! . Pr oof. The decomposition is a direct e xpansion: ˆ e Z j ( ˆ e Y j − ˆ e Z ⊤ j α ) − e Z j ( e Y j − e Z ⊤ j α ) = − ( e Z j − ∆ mj )∆ µj + e Z j (∆ ⊤ mj α ) − ∆ mj ( e Y j − e Z ⊤ j α ) + ∆ mj ∆ µj − ∆ mj (∆ ⊤ mj α ) . Summing with weights and taking ℓ ∞ norms yields the ﬁrst claim. For ( I V ) , each coordinate satisﬁes   P j w j ∆ mj,k ∆ µj   ≤ P j w j δ m δ µ = δ µ δ m . For ( V ) ,   ∆ mj,k (∆ ⊤ mj α )   ≤ ∥ ∆ mj ∥ 2 ∞ ∥ α ∥ 1 ≤ δ 2 m M , hence ( V ) ≤ M δ 2 m . For the stochastic bound 11 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments on ( I ) – ( I I I ) , note that conditional on the training folds, ∆ µj and ∆ mj are ﬁxed (cross-ﬁtting), and each summand is a weighted sum of mean-zero sub-Gaussian terms with coefﬁcients bounded by 2 . Lemma B.2 (with B = 2 ) and a union bound ov er d coordinates yield ( I ) + ( I I ) + ( I I I ) = O p (( δ µ + δ m ) p log d/n i ) . Pr oof of Lemma 5.1 . Lemma B.3 implies, uniformly over ∥ α ∥ 1 ≤ M , ∥ ˆ Ψ i ( α ) − Ψ ◦ i ( α ) ∥ ∞ = O p ( δ µ δ m ) + O p ( δ µ + δ m ) r log d n i ! + O p ( δ 2 m ) . Under Assumption 4.1 , δ µ = o p (1) and δ m = o p (1) . If moreov er n i → ∞ and log d = o ( n i ) (as required by Assump- tion 4.3 ), then the latter tw o terms are o p (1) . The leading deterministic second-order term is δ µ δ m , which is the sense in which nuisance estimation enters only at second order . B.4. Proof of Theor em 5.2 (ﬁnite-sample weighted Lasso bounds) W e work conditionally on the realized weights w j and the fold assignment. Let α ⋆ i := α ( g i , x i ) be the target coefﬁcient vector . Deﬁne the ﬁtted residuals ˆ ε j ( α ) := ˆ e Y j − ˆ e Z ⊤ j α, ˆ ε j := ˆ ε j ( ˆ α i ) . Step 0: localization bias as a deterministic perturbation. For j with w j > 0 , the kernel construction implies d R ( g j , g i ) ≤ b G . Deﬁne the pointwise response difference ∆ f ij ( t ) := f ( t ; g j , x i ) − f ( t ; g i , x i ) . Then by deﬁnition of bias i ( b G ) , sup t | ∆ f ij ( t ) | ≤ bias i ( b G ) for all such j . Using Lemma B.1 and the identity µ ( g , x ) = α ( g , x ) ⊤ m ( g , x ) , one veriﬁes that for any j with w j > 0 ,   ( α ( g j , x i ) − α ( g i , x i )) ⊤ e Z j   ≤ 2 bias i ( b G ) . (15) Indeed, ( α ( g j , x i ) − α ( g i , x i )) ⊤ e Z j = ( α ( g j , x i ) − α ( g i , x i )) ⊤ Z ( T j ) − ( α ( g j , x i ) − α ( g i , x i )) ⊤ m ( g j , x i ) , and both terms are bounded by sup t | ( α ( g j , x i ) − α ( g i , x i )) ⊤ Z ( t ) | = sup t | ∆ f ij ( t ) | ≤ bias i ( b G ) , while the second term uses Lemma B.1 . Hence ( 15 ) holds. Step 1: a weighted basic inequality . By optimality of ˆ α i in ( 9 ), for any β ∈ R d , N X j =1 w j  ˆ e Y j − ˆ e Z ⊤ j ˆ α i  2 + λ i ∥ ˆ α i ∥ 1 ≤ N X j =1 w j  ˆ e Y j − ˆ e Z ⊤ j β  2 + λ i ∥ β ∥ 1 . (16) T ake β = α ⋆ i and deﬁne ∆ := ˆ α i − α ⋆ i . Expanding the squares yields ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≤ 2    N X j =1 w j ˆ e Z j ˆ ε j ( α ⋆ i )    ∞ ∥ ∆ ∥ 1 + λ i  ∥ α ⋆ i ∥ 1 − ∥ α ⋆ i + ∆ ∥ 1  . (17) Step 2: controlling the score term. Decompose ˆ ε j ( α ⋆ i ) = e Y j − e Z ⊤ j α ⋆ i | {z } =: u j +  ˆ e Y j − e Y j  −  ˆ e Z j − e Z j  ⊤ α ⋆ i | {z } =: r j + ( e Z j − ˆ e Z j ) ⊤ ( α ( G j , X j ) − α ⋆ i ) | {z } =: b j , where u j is the oracle residual at the target coefﬁcient, r j is the nuisance-induced perturbation, and b j is a higher-order mixed term. By ( 15 ) and ∥ ˆ e Z j − e Z j ∥ ∞ = ∥ ∆ mj ∥ ∞ ≤ δ m , | b j | ≤ ∥ ˆ e Z j − e Z j ∥ ∞ ∥ α ( G j , X j ) − α ⋆ i ∥ 1 ≤ 2 δ m bias i ( b G ) , 12 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments where the last step uses the same con v ex-hull ar gument as in ( 15 ). Moreover , Lemma B.3 implies    N X j =1 w j ˆ e Z j r j    ∞ = O p ( δ µ δ m ) + O p ( δ µ + δ m ) r log d n i ! . Finally , write u j = ε j + e Z ⊤ j ( α ( G j , X j ) − α ⋆ i ) where ε j := e Y j − e Z ⊤ j α ( G j , X j ) is the regression noise. By ( 15 ) , the deterministic part obeys | e Z ⊤ j ( α ( G j , X j ) − α ⋆ i ) | ≤ 2 bias i ( b G ) for all j with w j > 0 . Therefore,    N X j =1 w j ˆ e Z j u j    ∞ ≤    N X j =1 w j ˆ e Z j ε j    ∞ | {z } ( ⋆ ) +2 bias i ( b G )    N X j =1 w j ˆ e Z j    ∞ . Since Z ( T ) is {− 1 , 1 } d -valued and m ( · , · ) is a conditional mean, ∥ e Z j ∥ ∞ ≤ 2 , and we may assume ˆ m is truncated to [ − 1 , 1] d so that ∥ ˆ e Z j ∥ ∞ ≤ 2 as well. Applying Lemma B.2 to ( ⋆ ) gi ves, with probability tending to one, ( ⋆ ) ≲ σ r log d n i . Collecting bounds yields    N X j =1 w j ˆ e Z j ˆ ε j ( α ⋆ i )    ∞ ≲ σ r log d n i + bias i ( b G ) + δ µ δ m , (18) up to terms that are o p ( λ i ) under Assumption 4.1 . Step 3: cone condition and ℓ 2 /ℓ 1 bounds. Let S be the index set of the s i largest coordinates of α ⋆ i in magnitude. Standard arguments (triangle inequality) gi v e ∥ α ⋆ i ∥ 1 − ∥ α ⋆ i + ∆ ∥ 1 ≤ ∥ ∆ S ∥ 1 − ∥ ∆ S c ∥ 1 + 2 ∥ α ⋆ i,S c ∥ 1 . Plugging this and ( 18 ) into ( 17 ) yields ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≤ 2Λ i ∥ ∆ ∥ 1 + λ i  ∥ ∆ S ∥ 1 − ∥ ∆ S c ∥ 1 + 2 ∥ α ⋆ i,S c ∥ 1  , where Λ i ≲ σ p log d/n i + bias i ( b G ) + δ µ δ m . Choosing λ i so that λ i ≳ σ p log d/n i ensures the stochastic part is dominated by λ i , and thus ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≤ C 1 λ i ∥ ∆ S ∥ 1 − C 2 λ i ∥ ∆ S c ∥ 1 + C 3 λ i ∥ α ⋆ i,S c ∥ 1 + C 4  bias i ( b G ) + δ µ δ m  ∥ ∆ ∥ 1 , for univ ersal constants C k > 0 . Rearranging yields the cone condition ∆ ∈ C ( S, 3) up to the approximation term ∥ α ⋆ i,S c ∥ 1 (which is controlled by Assumption 4.3 ). On the event ( 14 ), we therefore ha v e κ g i ∥ ∆ ∥ 2 2 ≤ ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≲ λ i ∥ ∆ S ∥ 1 +  bias i ( b G ) + δ µ δ m  ∥ ∆ ∥ 1 + λ i ∥ α ⋆ i,S c ∥ 1 . Using ∥ ∆ S ∥ 1 ≤ √ s i ∥ ∆ ∥ 2 and ∥ ∆ ∥ 1 ≤ 4 ∥ ∆ S ∥ 1 + 2 ∥ α ⋆ i,S c ∥ 1 under the cone condition giv es ∥ ∆ ∥ 2 ≲ √ s i λ i + bias i ( b G ) + δ µ δ m + ∥ α ⋆ i,S c ∥ 1 √ s i . Similarly , ∥ ∆ ∥ 1 ≲ s i λ i + s i bias i ( b G ) + s i δ µ δ m + ∥ α ⋆ i,S c ∥ 1 . Under Assumption 4.3 , the approximation terms in volving ∥ α ⋆ i,S c ∥ 1 are dominated by the displayed rates (this is the standard “effecti ve sparsity” interpretation). Substituting λ i ≍ ˆ σ p log d/n i yields ( 5.2 ) and ( 5.2 ). 13 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Step 4: prediction error bound. From ( 17 ) and the previous steps, N X j =1 w j  ˆ e Z ⊤ j ∆  2 = ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≲ s i λ 2 i + bias i ( b G ) 2 + ( δ µ δ m ) 2 , which giv es ( 5.2 ). This completes the proof. B.5. Proof of Cor ollary 5.3 Pr oof. For own-treatment contrasts, b θ T i ( t → t ′ ; g i ) − θ T i ( t → t ′ ; g i ) = ⟨ ˆ α i − α ( g i , x i ) , v t,t ′ ⟩ . By H ¨ older’ s inequality ,   ⟨ ˆ α i − α ( g i , x i ) , v t,t ′ ⟩   ≤ ∥ ˆ α i − α ( g i , x i ) ∥ 1 ∥ v t,t ′ ∥ ∞ . Since each W alsh coordinate is ± 1 , v t,t ′ = Z ( t ′ ) − Z ( t ) has entries in {− 2 , 0 , 2 } and hence ∥ v t,t ′ ∥ ∞ ≤ 2 . The stated bound follows by in voking Theorem 5.2 . The structural and joint contrasts follow by the same argument, applied to the two localized ﬁts at g i and g ′ . B.6. Proof of Theor em 5.5 Pr oof. W e prove an asymptotic linear e xpansion and then in voke the weighted CL T assumed in Theorem 5.5 (iv). Step 1: an exact decomposition. Let α ⋆ i := α ( g i , x i ) and ∆ := ˆ α i − α ⋆ i . Write b Σ i := b Σ( g i ) = P j w j ˆ e Z j ˆ e Z ⊤ j . From the deﬁnition ( 13 ), e θ T i ( t → t ′ ; g i ) = v ⊤ t,t ′ ˆ α i + ˆ γ ⊤ i N X j =1 w j ˆ e Z j  ˆ e Y j − ˆ e Z ⊤ j ˆ α i  = v ⊤ t,t ′ α ⋆ i + ˆ γ ⊤ i N X j =1 w j ˆ e Z j  ˆ e Y j − ˆ e Z ⊤ j α ⋆ i  | {z } =: A 1 +  v ⊤ t,t ′ − ˆ γ ⊤ i b Σ i  ∆ | {z } =: A 2 . (19) Since θ T i ( t → t ′ ; g i ) = v ⊤ t,t ′ α ⋆ i , it remains to analyze A 1 and A 2 . Step 2: controlling the debiasing remainder A 2 . By the feasibility constraint in ( 12 ), ∥ b Σ i ˆ γ i − v t,t ′ ∥ ∞ ≤ η i . Hence, by H ¨ older’ s inequality , | A 2 | ≤ η i ∥ ∆ ∥ 1 . Theorem 5.2 giv es ∥ ∆ ∥ 1 ≲ s i p log d/n i + s i bias i ( b G ) + s i δ µ δ m . With η i ≍ p log d/n i , we obtain | A 2 | ≲ s i log d n i + s i bias i ( b G ) r log d n i + s i δ µ δ m r log d n i . (20) Under the growth condition in Theorem 5.5 (iii), s i log d = o ( √ n i ) and bias i ( b G ) = o ( n − 1 / 2 i ) . If additionally δ µ δ m = o p ( n − 1 / 2 i ) (the standard DML second-order condition), then √ n i A 2 = o p (1) . Step 3: asymptotic linearity of A 1 . Decompose ˆ e Y j − ˆ e Z ⊤ j α ⋆ i = e Y j − e Z ⊤ j α ⋆ i | {z } =: u j +  ˆ e Y j − e Y j  −  ˆ e Z j − e Z j  ⊤ α ⋆ i | {z } =: r j . 14 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Thus A 1 = ˆ γ ⊤ i N X j =1 w j ˆ e Z j u j + ˆ γ ⊤ i N X j =1 w j ˆ e Z j r j =: A 1 a + A 1 b . For A 1 b , Lemma B.3 implies ∥ P j w j ˆ e Z j r j ∥ ∞ = O p ( δ µ δ m ) + o p ( n − 1 / 2 i ) under Assumption 4.1 , hence A 1 b = O p ( ∥ ˆ γ i ∥ 1 δ µ δ m ) + o p ( n − 1 / 2 i ) . In particular, if ∥ ˆ γ i ∥ 1 = O p (1) and δ µ δ m = o p ( n − 1 / 2 i ) , then √ n i A 1 b = o p (1) . For A 1 a , write ˆ γ i = γ ⋆ i + ( ˆ γ i − γ ⋆ i ) and obtain A 1 a = γ ⋆ ⊤ i N X j =1 w j ˆ e Z j u j + ( ˆ γ i − γ ⋆ i ) ⊤ N X j =1 w j ˆ e Z j u j =: B 1 + B 2 . The second term B 2 is controlled by the standard CLIME/node wise rate. Under the sparsity condition ∥ γ ⋆ i ∥ 0 ≤ s γ ,i and standard conditions for CLIME, one has ∥ ˆ γ i − γ ⋆ i ∥ 1 = O p ( s γ ,i p log d/n i ) and ∥ P j w j ˆ e Z j u j ∥ ∞ = O p ( p log d/n i + bias i ( b G )) , so that B 2 = O p  s γ ,i log d n i  + O p s γ ,i bias i ( b G ) r log d n i ! . Under Theorem 5.5 (iii) and bias i ( b G ) = o ( n − 1 / 2 i ) , this giv es √ n i B 2 = o p (1) . It remains to identify B 1 . Write u j = ε j + b ij , where ε j := e Y j − e Z ⊤ j α ( G j , X j ) is the regression noise and b ij := e Z ⊤ j ( α ( G j , X j ) − α ⋆ i ) is the localization bias term. By ( 15 ), | b ij | ≤ 2 bias i ( b G ) for all j with w j > 0 . Hence B 1 = γ ⋆ ⊤ i N X j =1 w j ˆ e Z j ε j + γ ⋆ ⊤ i N X j =1 w j ˆ e Z j b ij =: C 1 + C 2 . The deterministic term C 2 is bounded by | C 2 | ≤ 2 bias i ( b G ) ∥ γ ⋆ i ∥ 1 ∥ P j w j ˆ e Z j ∥ ∞ , and Lemma B.2 giv es ∥ P j w j ˆ e Z j ∥ ∞ = O p ( p log d/n i ) . Thus √ n i C 2 = o p (1) if bias i ( b G ) = o ( n − 1 / 2 i ) . The leading term is therefore C 1 . Replacing ˆ e Z j by e Z j only incurs a nuisance remainder of order δ m times a sub-Gaussian weighted av erage, hence is o p ( n − 1 / 2 i ) under Assumption 4.1 . Consequently , √ n i C 1 = √ n i γ ⋆ ⊤ i N X j =1 w j e Z j ε j + o p (1) . By Theorem 5.5 (iv) (the weighted CL T under the dependence induced by localization), √ n i γ ⋆ ⊤ i N X j =1 w j e Z j ε j ⇒ N (0 , σ 2 θ,i ) , σ 2 θ,i = V ar( γ ⋆ ⊤ i e Z ε ) . Combining Steps 1–3 yields the claimed asymptotic normality . Step 4: variance estimator consistency . Deﬁne ˆ ε j := ˆ e Y j − ˆ e Z ⊤ j ˆ α i and ˆ σ 2 θ,i := n i N X j =1 ( w j ) 2  ˆ γ ⊤ i ˆ e Z j ˆ ε j  2 . Under the same bounds used above (consistency of ˆ α i and ˆ γ i , bounded ℓ 1 norms, and δ µ , δ m = o p (1) ), the difference between ˆ σ 2 θ,i and the corresponding oracle plug-in based on ( γ ⋆ i , e Z , ε ) is o p (1) , while the oracle plug-in con verges to σ 2 θ,i by the law of large numbers for the weighted second moment (or its dependence-robust analogue implied by (iv)). This establishes ˆ σ 2 θ,i → p σ 2 θ,i . 15 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments B.7. Proof of Pr oposition A.1 (partial identiﬁcation bound) Pr oof. By the assumed L -Lipschitz property of t 7→ f ( t ; g , x ) with respect to Hamming distance, | f ( t ′ ; g , x ) − f ( t ; g , x ) | ≤ L d H ( t, t ′ ) . The left-hand side equals | θ T ( t → t ′ ; g ) | by deﬁnition, proving the claim. 16

Individualized Causal Effects under Network Interference with Combinatorial Treatments

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment