Individualized Causal Effects under Network Interference with Combinatorial Treatments
Modern causal decision-making increasingly demands individualized treatment-effect estimation in networks where interventions are high-dimensional, combinatorial vectors. While network interference, effect heterogeneity, and multi-dimensional treatme…
Authors: Yunping Lu, Haoang Chi, Qirui Hu
Individualized Causal Effects under Netw ork Interfer ence with Combinatorial T r eatments Y unping Lu 1 Haoang Chi 2 Qirui Hu 3 Zhiheng Zhang 3 Abstract Modern causal decision-making increasingly de- mands individualized treatment-ef fect estima- tion in networks where interv entions are high- dimensional, combinatorial v ectors. While net- work interference, ef fect heterogeneity , and multi- dimensional treatments hav e been studied sepa- rately , their intersection yields an exponentially large interv ention space that makes standard iden- tification tools and low-dimensional e xposure mappings untenable. W e bridge this gap with a unified frame work that constructs a global potential-outcome emulator for unit-lev el infer- ence. Our method combines (1) rooted network configurations to leverage local smoothness, (2) doubly robust orthogonalization to mitigate con- founding from network position and cov ariates, and (3) sparse spectral learning to efficiently esti- mate response surfaces ov er the 2 p -dimensional treatment space. W e also decompose network ed effects into o wn-treatment, structural, and inter - action components, and provide finite-sample er - ror bounds and asymptotic consistency guaran- tees. Overall, we show that indi vidualized causal inference remains feasible in high-dimensional networked settings without collapsing the inter - vention space. 1. Introduction Causal inference in netw orked systems has become a central problem across social sciences, epidemiology , and online experimentation. In many applications, units are embed- ded in a network, and interventions are high-dimensional tr eatment slates —vectors of features that can be simulta- neously toggled. Examples include v accination campaigns, peer effects in education, and product changes on digital platforms. In these settings, the classical no-interference 1 Univ ersity of Leeds 2 National Univ ersity of Defense T echnol- ogy 3 School of Statistics and Data Science, SUFE. Correspondence to: Zhiheng Zhang < zhangzhiheng@mail.shufe.edu.cn > . Pr eprint. F ebruary 24, 2026. assumption (SUTV A) is often violated: one unit’ s outcome may depend on others’ treatment assignments, making the causal estimand inherently global in the assignment vector ( Hudgens & Halloran , 2008 ; Aronow & Samii , 2017 ; Sa vje et al. , 2021 ; Leung , 2022 ). At the same time, polic y decisions are increasingly indi vidu- alized: practitioners seek unit-le vel or conditional effects to guide personalized targeting and adaptiv e e xperimentation ( Chernozhukov et al. , 2018 ; K ¨ unzel et al. , 2019 ; W ager & Athey , 2018 ). Reconciling interference, heterogeneity , and multi-dimensional interv entions is therefore crucial for cred- ible causal decision-making in networked en vironments. This paper integrates three literatures that hav e advanced in parallel. First, the interference literature has de veloped esti- mands and randomization-based methods for average direct and spillov er effects ( Hudgens & Halloran , 2008 ; Arono w & Samii , 2017 ; Savje et al. , 2021 ). Second, the heterogeneous treatment effect (HTE) literature provides tools for learning individualized effects under no interference, ranging from machine learning to meta-learners ( Chernozhukov et al. , 2018 ; K ¨ unzel et al. , 2019 ; W ager & Athey , 2018 ), extended to interference settings ( Ma et al. , 2022 ; Agarwal et al. , 2022 ). Third, factorial and multi-treatment causal inference has clarified potential-outcome formulations for multi-factor interventions and randomization-based inference ( Dasgupta & Pillai , 2015 ; Lopez & Gutman , 2017 ; Zhao & Ding , 2021 ; Agarwal et al. , 2023 ). These three strands differ in their primary bottlenec k : in- terference requires modeling dependence between units’ assignments, HTE demands controlling confounding while preserving heterogeneity , and multi-dimensional treatments face another conceptual exponentially large action space. Real-world deployments often encounter all three chal- lenges simultaneously . Crucially , this is not a simple ad- ditiv e problem in which interference, heterogeneity , and multi-dimensional treatments can be handled separately . Their interaction creates endogenous dependence structures and configuration-specific overlap conditions, under which standard identification and estimation arguments from each individual literature no longer apply . Consider a lar ge on- line platform running a networked experiment. Each user i ∈ [ N ] is assigned a p -dimensional binary feature vector (a 1 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments ‘slate”) T i ∈ {− 1 , 1 } p , and the platform wishes to answer individualized questions of the form: ‘F or this user , what is the outcome change if we toggle a particular subset of featur es, under the user’ s curr ent local network en vir on- ment?” In principle, e ven without interference, this requires learning a response surface over 2 p treatment combinations; with interference, the rele vant counterfactual depends on neighbors’ assignments and the local network configuration, producing a space of counterfactuals that is exponentially large in both p and network size. Existing approaches fail in a common, structural way . This paper introduces a framew ork that transforms this seemingly intractable problem into one amenable to high- dimensional inference. Our approach combines: (i) graph localization to represent interference en vironments with rooted network configurations, exploiting smoothness in configuration space; (ii) doubly-robust (orthogonal) resid- ualization to mitigate confounding from both network po- sition and covariates; and (iii) sparse spectral learning for high-dimensional combinatorial treatments, enabling stable estimation and extrapolation. These components together yield a global potential-outcome emulator that produces individualized causal contrasts for arbitrary assignments. A ke y adv antage of the framework is its principled de- composition of indi vidualized causal ef fects into: (a) o wn- treatment effects, (b) structural ef fects from changes in the network configuration, and (c) joint ef fects that combine both. This decomposition clarifies which components of heterogeneity are identifiable from network data and aligns with practical interventions such as feature toggles and net- work redesigns. W e show that, under local regularity and sparse spectral structure, individualized causal comparisons remain feasible e ven when the counterf actual space gro ws exponentially with both the treatment dimension and net- work size. The main contributions of this paper are as follo ws: • W e formulate a potential-outcome frame work for in- dividualized causal effects under network interference with combinatorial treatments, with a clear decompo- sition into own-treatment, structural, and joint ef fects. • W e dev elop an estimator that integrates graph- configuration localization, doubly robust orthogo- nalization, and sparse spectral learning to reco ver unit-lev el response functions and construct a global potential-outcome emulator . • W e establish both finite-sample error bounds and asymptotic guarantees under localized dependence and ov erlap, and characterize robustness and partial identi- fication when some treatment directions are not locally identifiable. 2. Problem F ormalization Let t = ( t 1 , . . . , t N ) denote a global assignment, where each unit i receiv es a p -dimensional binary treatment slate t i ∈ {− 1 , 1 } p . For each t , let Y i ( t ) denote the potential outcome of unit i , and denote Y i = Y i ( T ) as the observ ed outcome under actual assignment T . W e observe a single assignment–outcome snapshot on a networked population, ( Y i , T i , X i , N i ) : i ∈ [ N ] , (1) where X i are pre-treatment cov ariates and N i encodes net- work neighborhood information. Our goal is to construct an estimator b Y ( t ) = ( b Y 1 ( t ) , . . . , b Y N ( t )) that approximates the vector of potential outcomes under arbitrary global assignments. This problem is well defined but intrinsically challenging: the assignment space grows exponentially in p and N , and interference induces dependence across units through the network. Beyond global predictions, we focus on individualized causal inference. For each unit, we target three families of causal contrasts: own-tr eatment effects that vary the unit’ s own slate holding the local interference en vironment fixed, structur al ef fects that v ary the local network config- uration holding the slate fixed, and joint effects that vary both. For user-specified contrasts, we provide point esti- mates with uncertainty quantification, while retaining the ability to ev aluate b Y ( t ) for arbitrary t . Follo wing Auerbach & T abord-Meehan ( 2021 ), we summa- rize the interference en vironment of unit i by a rooted net- work configuration G i ( t ) , constructed from an ego-centered neighborhood of i (e.g., within a fixed graph radius) together with verte x marks that include treatment slates (and, if de- sired, additional marks such as discretized cov ariates). W e write G i := G i ( T ) for the realized random configuration and g i for its realization. Let G denote the space of rooted, marked configurations (up to root-preserving isomorphism). W e equip G with a truncated rooted-graph distance (as in the local approach of Auerbach & T abord-Meehan ( 2021 )). Let B r ( g ) denote the radius- r rooted ball around the root of g (induced subgraph with marks carried along). When comparing g and g ′ , we first check whether B r ( g ) and B r ( g ′ ) are root-isomorphic as unmarked graphs ; among root-preserving isomor- phisms, we then measure mark mismatches. Concretely , let τ g ( v ) ∈ {− 1 , 1 } p denote the treatment-slate mark at vertex v in configuration g , and define ∆ r ( g , g ′ ) := min ϕ : B r ( g ) ≃ B r ( g ′ ) X v ∈ V ( B r ( g )) I τ g ( v ) = τ g ′ ( ϕ ( v )) / | V ( B r ( g )) | if root-isomorphic (otherwise equals to 1 ) W e then truncate at a small radius R ∈ Z + and set d R ( g , g ′ ) := R X r =0 2 − ( r +1) ∆ r ( g , g ′ ) . (2) 2 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments The metric d R induces a notion of “local similarity” be- tween interference en vironments and enables nonparametric localization ov er G . Illustrative example. T o illustrate the rooted-graph dis- tance and the role of the matching ϕ , consider a social network where each unit receives a binary treatment slate and outcomes may depend on neighbors’ assignments. Let R = 1 , thus each configuration is the root and its immediate neighbors. Suppose the root has exactly two neighbors, and consider two configurations g and g ′ whose unmarked e go networks are identical: in both, the root is connected to two neighbors. Hence B 1 ( g ) and B 1 ( g ′ ) are root-isomorphic as unmarked graphs. A root-preserving isomorphism ϕ is then a bijection between the neighbor sets of g and g ′ that fixes the root. When multiple such bijections exist, ∆ 1 ( g , g ′ ) uses the one that minimizes treatment-slate mismatches. In g , the two neighbors receiv e slates (+1 , − 1) and ( − 1 , +1) ; in g ′ , they receiv e (+1 , +1) and ( − 1 , +1) . Under the op- timal matching, one neighbor is paired with an identical slate, while the other is paired with a dif ferent slate, so there is exactly one mismatch and ∆ 1 ( g , g ′ ) = 1 / 2 . If instead g ′ had a dif ferent local structure—e.g., a dif ferent number of neighbors—then B 1 ( g ) and B 1 ( g ′ ) would not be root- isomorphic, no root-preserving matching w ould exist, and by definition ∆ 1 ( g , g ′ ) = 1 . The overall distance d R ( g , g ′ ) then aggregates these discrepancies across radii with ge- ometrically decaying weights, placing more emphasis on mismatches closer to the root, consistent with the idea that nearby interference en vironments matter most for the root’ s outcome. Also, to represent response surfaces ov er {− 1 , 1 } p , we use W alsh characters. For each subset S ⊆ [ p ] , define Z S ( t ) = Q ℓ ∈ S t ℓ and let Z ( t ) collect { Z S ( t ) : S ⊆ [ p ] } . This provides a conv enient orthonormal dictionary on the hy- percube and supports sparse/near -sparse modeling of high- order interactions. Giv en a tar get unit i , we localize around its realized config- uration g i by assigning weights to sample units j : w ( i ) j := K G d R ( g j , g i ) /b G P N k =1 K G d R ( g k , g i ) /b G , N X j =1 w ( i ) j = 1 , (3) where K G is a compactly supported kernel (e.g., indicator or Epanechnikov) and b G > 0 is a bandwidth (or , equiv alently , one may use a k NN radius). W e quantify the effective local sample size via the Kish measure n ( i ) eff := 1 / P N j =1 w ( i ) j 2 , which controls the variance of localized av erages under homoskedastic noise and will serve as the fundamental local sample-size parameter in our analysis. The following assumptions make the target problem well- defined and render it statistically tractable; each is standard in nearby literatures and will be revisited in the identification and theory sections. Assumption 2.1 (Configuration suf ficiency / local interfer - ence) . There exists a (possibly radius-truncated) config- uration mapping G i ( t ) such that, for all i and all global assignments t , the potential outcome Y i ( t ) depends on t only through the unit’ s own slate t i and its rooted con- figuration G i ( t ) (and covariates X i ), and thus Y i ( t ) = Y i t i , G i ( t ) . Assumption 2.1 reduces the global interference problem to a localized object in G , enabling borro wing of information across units with similar en vironments ( Auerbach & T abord- Meehan , 2021 ). Sensitivity to the truncation radius R (and to alternati ve configuration constructions) can be assessed empirically by checking stability of fitted ef fects as R varies . Relaxations. If interference decays with graph distance, increasing R yields a controlled bias–v ariance trade-off; our methods naturally accommodate such sensitivity analyses. Assumption 2.2 (Local re gularity in configuration space) . For each fixed ( t, x ) , the function g 7→ E [ Y ( t, G ) | G = g , X = x ] is locally Lipschitz with respect to d R . Specifi- cally , ∀ g 0 ∈ G there exists L ( g 0 , x, t ) > 0 s.t. E [ Y ( t, G ) | G = g , X = x ] − E [ Y ( t, G ) | G = g 0 , X = x ] ≤ L ( g 0 , x, t ) d R ( g , g 0 ) for all g in a neighborhood of g 0 . Assumption 2.2 justifies kernel or k NN localization and en- sures that localization bias vanishes as neighborhoods shrink. One may diagnose violations by examining residual stability as a function of d R ( g j , g i ) and by cross-validating localiza- tion bandwidths. Piecewise regularity can be handled by using cov ers/adapti ve bandwidths; when smoothness fails in specific regions, our frame work permits reporting local- ized uncertainty inflation or re verting to coarser , partially identified summaries. Assumption 2.3 (Local overlap / design richness) . Fix a target configuration–co v ariate pair ( g 0 , x 0 ) . There exists a neighborhood N ( g 0 , x 0 ) and a constant κ ( g 0 , x 0 ) > 0 such that, for all ( g , x ) ∈ N ( g 0 , x 0 ) , the conditional cov ariance matrix of the orthogonalized treatment features satisfies λ min E h e Z ( T ) e Z ( T ) ⊤ G = g , X = x i ≥ κ ( g 0 , x 0 ) , where e Z ( T ) = Z ( T ) − E [ Z ( T ) | G, X ] denotes the resid- ualized treatment-feature vector . Assumption 2.3 prev ents degeneracy (e.g., near- deterministic treatment assignment within a configuration), which would otherwise preclude individualized identi- fication. Overlap can be assessed locally via ef fecti ve sample sizes, propensity diagnostics, and conditioning of weighted design matrices. When local overlap fails in some directions, we will characterize robustness and partial identification behavior (e.g., reporting bounds or restricting to identifiable contrasts). 3 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Assumption 2.4 (Sparse/near -sparse structure ov er combi- natorial treatments) . For each configuration–cov ariate pair ( g , x ) , let f ( t ; g , x ) := E [ Y ( t, G ) | G = g , X = x ] denote the conditional mean response as a function of the treat- ment slate. There exists a coef ficient v ector α ( g , x ) in the W alsh–Hadamard basis such that f ( t ; g , x ) admits the ex- pansion f ( t ; g , x ) = ⟨ α ( g , x ) , Z ( t ) ⟩ , and α ( g , x ) is appr ox- imately sparse in the sense that P k>s ( g ,x ) | α ( k ) ( g , x ) | ≤ C ( g , x ) s ( g , x ) − γ , γ > 0 , where α ( k ) ( g , x ) denotes the k -th largest coef ficient in magnitude. Assumption 2.4 con verts an e xponentially large treatment space into a high-dimensional but tractable estimation prob- lem via sparse learning. Sparsity/compressibility can be probed by model selection stability , interaction-order diag- nostics, and out-of-sample v alidation. Approximate sparsity yields graceful degradation (slower rates and wider inter- vals) rather than failure; one may also impose hierarchical truncations when warranted by domain kno wledge. Assumptions 2.1 – 2.4 jointly formalize the sense in which indivi dualized causal inference under interference with com- binatorial treatments is well-posed yet nontrivial . 3. Identification Throughout this section we first formalize the causal ob- jects of interest and then state an identification theorem that connects these objects to observ able functionals of the data-generating distribution. For a generic unit with cov ariates X = x and configuration G = g , recall the definition of the own-slate conditional mean response in Assumption 2.4 . T o emphasize the depen- dence of the local configuration on the global assignment, we may write G i ( t ) as G ⟨ t ⟩ i . Gi ven ( g , x ) , we target three families of indi vidualized contrasts: θ T i ( t → t ′ ; g ) := f ( t ′ ; g , x ) − f ( t ; g , x ) , (4) θ G i ( g → g ′ ; t ) := f ( t ; g ′ , x ) − f ( t ; g , x ) , (5) θ G,T i ( g , t ) → ( g ′ , t ′ ) := f ( t ′ ; g ′ , x ) − f ( t ; g , x ) . (6) These contrasts correspond to (i) toggling the unit’ s own treatment slate holding the interference en vironment fixed, (ii) changing the interference en vironment holding the slate fixed, and (iii) changing both simultaneously . 3.1. Identification via local orthogonal moments A central difficulty is that ev en under randomized assign- ments, conditioning or localizing on the r ealized configu- ration can induce dependence between a unit’ s own slate and its interference en vironment. This creates a form of endogeneity that in v alidates naiv e regression of Y on Z ( T ) within localized neighborhoods. W e therefore identify the W alsh coef ficients through an orthogonalized (residualized) moment equation. Define the nuisance functions µ ( g , x ) := E [ Y | G = g , X = x ] , m ( g , x ) := E [ Z ( T ) | G = g , X = x ] , and the residuals e Y := Y − µ ( G, X ) , e Z := Z ( T ) − m ( G, X ) . Fix a tar get configuration g ∈ G . Let w g ( G ) denote a non- negati v e localization weight. This weight is the population analogue of the kernel or k NN weights defined in Section 2, and is designed to upweight observations whose realized configurations are close to g . Concretely , one may take 1 w g ( G ) ∝ K G d R ( G, g ) /b G , with normalization ensuring unit expectation. All identi- fication results below are stated for a generic choice of such localization weights. For α ∈ R 2 p define the score Ψ( α ; g ) := E h w g ( G ) e Z e Y − e Z ⊤ α i . This is the population analogue of a localized regression of e Y on e Z . Local overlap (Assumption 2.3 ) guarantees that the rele vant directions of e Z exhibit non-degenerate variation locally , ensuring well-posedness. Theorem 3.1 (Identification of localized W alsh coef ficients) . Assume Assumptions 2.1–2.4. Fix a targ et pair ( g , x ) in the interior of the support. Then there exists a (possibly localized) coefficient vector α ⋆ ( g , x ) such that Ψ α ⋆ ( g , x ); g = 0 . (7) Mor eover , under the local overlap condition in As- sumption 2.3, α ⋆ ( g , x ) is unique within the model class implied by Assumption 2.4 (e .g., the sparse/near- sparse cone), and it identifies the response function in Assumption 2.4 in the sense that f ( t ; g , x ) = ⟨ α ⋆ ( g , x ) , Z ( t ) ⟩ for all tr eatment slates t ∈ {− 1 , 1 } p . Consequently , for any unit i and any global assignment t , E [ Y i ( t ) | X i = x i ] = D α ⋆ G ⟨ t ⟩ i , x i , Z ( t i ) E , and the individualized contrasts in ( 4 ) – ( 6 ) ar e identified as unique functions of α ⋆ ( · , · ) . Theorem 3.1 formalizes a practical message: once the inter- fer ence en vir onment is summarized by a r ooted configur a- tion, one can tr eat causal learning locally in configur ation space, pr ovided there is sufficient tr eatment variation lo- cally . For example, in a platform experiment where a user’ s outcome depends on her own feature slate and the slates 1 Here g ∈ G denotes a fixed target configuration at which inference is performed, while G is a G -valued random variable representing the realized configuration of a randomly sampled unit. The weight w g ( G ) therefore assigns larger mass to realizations of G that are closer to the target g under the rooted-graph distance d R . 4 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments of nearby friends, even globally randomized assignment can become locally confounded: restricting attention to users with similar realized neighborhoods (e.g., two treated friends and one control friend) can induce dependence be- tween the user’ s o wn slate and the neighborhood pattern. The orthogonal moment addresses this by residualizing both outcomes and treatment features on ( G, X ) . Importantly , the identification result does not assert point identification of e very direction in the combinatorial treat- ment space. If local ov erlap fails in certain W alsh direc- tions—say , some treatment components are nearly deter- ministic within a configuration neighborhood—then those directions are not point-identified. This reflects an inher- ent data limitation: without local v ariation, indi vidualized contrasts along those directions cannot be learned and the solution to the moment condition 7 is non-unique. Dev elop- ing robustness and partial-identification results is a natural direction for future work. A natural question is:: “If assignments ar e randomized, why do we need orthogonalization at all?” The key is that our target is local in configuration space, and condi- tioning/localizing on realized configurations is a form of post-assignment selection. Even under random assignment, selection on neighborhood patterns generally induces corre- lation between a unit’ s slate and its neighbors’ slates. The orthogonal moment is constructed to be insensiti ve to this in- duced dependence and to enable principled inference within localized neighborhoods. At the population level, the mo- ment condition in ( 7 ) is exactly unbiased; approximation errors arise only at the estimation stage through localization, nuisance estimation, and finite-sample effects. Corollary 3.2 (Identification of individualized contrasts) . Under the conditions of Theor em 3.1 , the thr ee contrast fam- ilies θ T i , θ G i , and θ G,T i ar e identified for any user -specified ( t, t ′ , g , g ′ ) for which the corresponding directions satisfy local overlap. 4. Estimation This section gi ves sample analogues of the orthogonal mo- ment equation in Section 3 and yields estimators of (i) lo- calized W alsh coefficients and (ii) individualized causal contrasts. In addition to Assumptions 2.1 – 2.4 , we impose the following high-dimensional estimation conditions. Assumption 4.1 (Cross-fitted nuisance accuracy) . Let µ ( g , x ) = E [ Y | G = g , X = x ] and m ( g , x ) = E [ Z ( T ) | G = g , X = x ] . There exist cross-fitted estimators ˆ µ and ˆ m (defined be- low) such that max i ∈ [ N ] ˆ µ ( G i , X i ) − µ ( G i , X i ) = o p (1) , max i ∈ [ N ] ˆ m ( G i , X i ) − m ( G i , X i ) ∞ = o p (1) . Assumption 4.2 (W eighted restricted eigen value and noise tails) . Fix a configuration center g ∈ G and let w j ( g ) be Algorithm 1 Oracle (population) identification of individu- alized contrasts at ( g , x ) 1: Input: target configuration g , cov ariates x , localization weights w g ( · ) , W alsh dictionary Z ( · ) . 2: Compute nuisances µ ( g , x ) = E [ Y | G = g , X = x ] and m ( g , x ) = E [ Z ( T ) | G = g , X = x ] . 3: Form residuals e Y = Y − µ ( G, X ) and e Z = Z ( T ) − m ( G, X ) . 4: Solve for α ⋆ ( g , x ) such that Ψ( α ⋆ ( g , x ); g , x ) = 0 . 5: Define the response function b f ( t ; g , x ) = ⟨ α ⋆ ( g , x ) , Z ( t ) ⟩ for any slate t . 6: Output: for any user-specified ( t, t ′ , g , g ′ ) , com- pute contrasts θ T ( t → t ′ ; g ) , θ G ( g → g ′ ; t ) , and θ G,T (( g , t ) → ( g ′ , t ′ )) by plugging b f into ( 4 )–( 6 ). the kernel weights defined in ( 3 ) with g i replaced by g . Let ˆ e Z j denote the cross-fitted residualized treatment-feature vector defined in ( 8 ), and define the weighted Gram matrix b Σ( g ) := P N j =1 w j ( g ) ˆ e Z j ˆ e Z ⊤ j . There exists κ g > 0 such that b Σ( g ) satisfies a restricted-eigen v alue condition o ver the sparse cone associated with Assumption 2.4 , with probabil- ity tending to one. Moreover , the (cross-fitted) regression noise is conditionally sub-Gaussian giv en ( G, X , T ) with proxy variance σ 2 . Assumption 4.3 (Local sparsity/near-sparsity) . For each ( g , x ) , the W alsh coef ficient vector α ( g , x ) in Assump- tion 2.4 is sparse or approximately sparse with effecti ve spar- sity level s ( g , x ) satisfying s ( g , x ) log(2 p ) = o ( n eff ( g )) , where n eff ( g ) := P N j =1 w j ( g ) 2 − 1 is the Kish effecti ve sample size associated with the weights w j ( g ) . Assumptions 4.1 – 4.3 are standard high-dimensional esti- mation conditions that complement, rather than strengthen, the identification assumptions in Section 3 . Assumption 4.1 ensures that nuisance estimation errors are second-order and do not affect the orthogonal moment at first order , while Assumptions 4.2 and 4.3 guarantee that the resulting local- ized high-dimensional regression problem is well-posed at the ef fecti ve sample size scale n eff ( g ) . These conditions are sufficient for consistent estimation and v alid inference, and are satisfied by a broad class of modern machine-learning nuisance estimators and weighted sparse regressions. W e estimate the nuisance functions µ and m and construct residuals using cross-fitting. Partition the index set [ N ] into K cf ≥ 2 disjoint folds {I k } K cf k =1 . F or each fold k , fit nuisance estimators ˆ µ ( − k ) ( · , · ) and ˆ m ( − k ) ( · , · ) using only observations with indices in [ N ] \ I k . For each i ∈ I k , define the cross-fitted residuals := Y i − ˆ µ ( − k ) ( G i , X i ) , ˆ e Z i := Z ( T i ) − ˆ m ( − k ) ( G i , X i ) . (8) 5 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments W e use { ( ˆ e Y i , ˆ e Z i ) } N i =1 as the debiased inputs for localized high-dimensional regression. 4.1. Localized weighted Lasso for W alsh coefficients For a tar get unit i , we estimate the coef ficient vector at the unit’ s realized configuration by setting g = g i and writing w ( i ) j := w j ( g i ) (which coincides with ( 3 ) ). Define the weighted Lasso estimator ˆ α i ∈ argmin β ∈ R 2 p N X j =1 w ( i ) j ˆ e Y j − ˆ e Z ⊤ j β 2 + λ i ∥ β ∥ 1 , (9) where λ i ≍ ˆ σ q log(2 p ) n eff ( g i ) and ˆ σ is any consistent estimator of the noise scale (e.g., from localized residuals). For any slate t ∈ {− 1 , 1 } p , define the plug-in response estimate b f i ( t ; g i , x i ) := ⟨ ˆ α i , Z ( t ) ⟩ . Then for any ( t, t ′ ) , b θ T i ( t → t ′ ; g i ) = ⟨ ˆ α i , Z ( t ′ ) − Z ( t ) ⟩ . (10) T o estimate structural and joint effects, repeat ( 9 ) with weights centered at a second configuration g ′ (i.e., replace w ( i ) j by w j ( g ′ ) ) to obtain an estimator ˆ α i ( g ′ ) , and set b θ G i ( g i → g ′ ; t ) = ˆ α i ( g ′ ) − ˆ α i , Z ( t ) , b θ G,T i (( g i , t ) → ( g ′ , t ′ )) = ⟨ ˆ α i ( g ′ ) , Z ( t ′ ) ⟩ − ⟨ ˆ α i , Z ( t ) ⟩ . (11) 4.2. Debiased inference f or an own-tr eatment contrast For a user-specified contrast ( t, t ′ ) , define the direction v t,t ′ := Z ( t ′ ) − Z ( t ) ∈ R 2 p . Let b Σ( g i ) be the weighted Gram matrix at g = g i . Compute an approximate inv erse- direction ˆ γ i by any standard high-dimensional procedure (e.g., CLIME/nodewise re gression), for instance via ˆ γ i ∈ arg min γ ∈ R 2 p ∥ γ ∥ 1 s.t. b Σ( g i ) γ − v t,t ′ ∞ ≤ η i , η i ≍ s log(2 p ) n eff ( g i ) . (12) Define the debiased estimator e θ T i ( t → t ′ ; g i ) = v ⊤ t,t ′ ˆ α i + ˆ γ ⊤ i N X j =1 w ( i ) j ˆ e Z j ˆ e Y j − ˆ e Z ⊤ j ˆ α i . (13) The follo wing section sho ws that under Assumptions 4.1 – 4.3 , e θ T i admits asymptotically normal inference with an estimated variance obtained from the weighted empirical second moment of the influence term in ( 13 ). 5. Theoretical analysis This section de velops finite-sample and asymptotic guaran- tees for the localized estimators in Section 4 . The key point is that the statistical dif ficulty is controlled by the ef fective local sample size and local spectral sparsity , not by the exponential number of treatment slates. Write d := 2 p for the ambient W alsh dimension. Fix a target unit i and recall the kernel weights w ( i ) j in ( 3 ) . Recall the effecti ve local sample size as n i := n eff ( g i ) = P N j =1 ( w ( i ) j ) 2 − 1 . Let s i := s ( g i , x i ) denote the effecti v e sparsity lev el from Assumption 4.3 . For nuisance estimation, de- fine the sup errors δ µ := max j ∈ [ N ] ˆ µ ( G j , X j ) − µ ( G j , X j ) , δ m := max j ∈ [ N ] ˆ m ( G j , X j ) − m ( G j , X j ) ∞ . Finally , to separate sampling er- ror from localization bias , define 2 bias i ( b G ) := sup t ∈{− 1 , 1 } p sup g : d R ( g ,g i ) ≤ b G f ( t ; g , x i ) − f ( t ; g i , x i ) . By Assumption 2.2 , for each fixed ( g i , x i ) there exists a finite constant C i < ∞ (depending on ( g i , x i ) but not on b G ) such that bias i ( b G ) ≤ C i b G . Step 1: Orthogonalization removes first-order nuisance effects W e first record the key property that enables valid high-dimensional learning with flexible nuisances: cross- fitting makes nuisance errors enter only at second order . Lemma 5.1 (Orthogonalization remainder) . Let ˆ e Y j and ˆ e Z j be the cross-fitted residuals in ( 8 ) , and define the cor- r esponding oracle residuals e Y j := Y j − µ ( G j , X j ) and e Z j := Z ( T j ) − m ( G j , X j ) . Then, uniformly over α in any fixed ℓ 1 -ball, the differ ence between the empirical weighted scor e built from ( ˆ e Y j , ˆ e Z j ) and the one built fr om ( e Y j , e Z j ) is of or der O p ( δ µ δ m ) . In particular , if δ µ δ m = o p (1) , nuisance estimation does not contribute a fir st-or der term. Lemma 5.1 formalizes why we can combine mod- ern machine-learning nuisances with localized high- dimensional regression: ev en though ˆ µ and ˆ m may be com- plex, their errors do not bias the target moment at first order . A common misconception is that “an y smoothing or selec- tion in validates orthogonality”; here the point is precisely that orthogonalization targets the post-localization endo- geneity induced by conditioning on realized configurations. Step 2: Finite-sample rates for localized W alsh learning W e no w state a nonasymptotic oracle inequality for the local- ized weighted Lasso ( 9 ) . The rate is dri ven by ( s i , log d, n i ) plus a transparent localization bias term. Theorem 5.2 (Finite-sample error for localized weighted Lasso) . F ix a targ et unit i and let ˆ α i be defined by ( 9 ) with λ i ≍ ˆ σ p log( d ) /n i . Assume Assumptions 4.2 – 4.3 and δ µ δ m = o p ( λ i ) . Then with probability tending to one, ∥ ˆ α i − α ( g i , x i ) ∥ 1 ≲ s i q log d n i + s i bias i ( b G ) + s i δ µ δ m , 2 Equiv alently , covariates can be viewed as additional root marks in the configuration, so localization in g implicitly restricts attention to samples with comparable cov ariate v alues. 6 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments ∥ ˆ α i − α ( g i , x i ) ∥ 2 ≲ q s i log d n i + bias i ( b G ) + δ µ δ m , P N j =1 w ( i ) j ˆ e Z ⊤ j ( ˆ α i − α ( g i , x i )) 2 ≲ s i log d n i + bias i ( b G ) 2 + ( δ µ δ m ) 2 . What this theorem says (and what it does not). Theo- rem 5.2 is a local result: it quantifies how well we can learn a unit’ s W alsh coefficients near its realized configuration. It does not claim uniform recov ery ov er all configurations without additional cov ering ar guments. The theorem high- lights the intended tradeof f: shrinking b G reduces bias i ( b G ) but decreases n i , while increasing b G increases the effecti v e sample size but introduces localization bias. Step 3: Consequences for individualized causal contrasts W e ne xt translate coefficient error into error for indi vidual- ized effects. Corollary 5.3 (Plug-in error for individualized contrasts) . F ix a unit i and consider any pair of slates ( t, t ′ ) . Let v t,t ′ := Z ( t ′ ) − Z ( t ) as in Section 4 . Then b θ T i ( t → t ′ ; g i ) − θ T i ( t → t ′ ; g i ) ≤ ∥ v t,t ′ ∥ ∞ ∥ ˆ α i − α ( g i , x i ) ∥ 1 ≤ 2 ∥ ˆ α i − α ( g i , x i ) ∥ 1 , and analogously for b θ G i and b θ G,T i in ( 11 ) (with an additional err or term contributed by the second fit at g ′ ). Consequently , Theorem 5.2 yields b θ T i ( t → t ′ ; g i ) − θ T i ( t → t ′ ; g i ) ≲ s i q log d n i + s i bias i ( b G ) + s i δ µ δ m . Remark 5.4 . A common objection is that “the contrast is dense, so shouldn’t the error be exponential?”] Note that it is tempting (but misleading) to bound the contrast error by ∥ v t,t ′ ∥ 2 ∥ ˆ α i − α ∥ 2 , which would scale as √ d and obscure the point of the spectral approach. Corollary 5.3 uses the correct geometry: since W alsh characters are uniformly bounded, ∥ v t,t ′ ∥ ∞ ≤ 2 , the relev ant control is via the ℓ 1 error of a sparse vector . This is the mechanism by which the expo- nential slate space is con verted into a high-dimensional b ut tractable problem, with complexity entering only through log d = log (2 p ) = Θ( p ) . Step 4: Debiased inference f or a user -specified contrast W e no w state an asymptotic normality result for the debiased estimator ( 13 ) . This provides valid uncertainty quantifica- tion for individualized contrasts. Theorem 5.5 (Asymptotic normality of the debiased con- trast) . F ix a unit i and a contrast ( t, t ′ ) with dir ection v t,t ′ . Let e θ T i be defined in ( 13 ) and let γ ⋆ i denote the popula- tion solution of the local linear system Σ( g i ) γ = v t,t ′ , wher e Σ( g i ) := E [ w g i ( G ) e Z e Z ⊤ ] is the population ana- logue of b Σ( g i ) . Assume: (i) γ ⋆ i exists and is sparse or appr oximately spar se; (ii) the estimator ˆ γ i in ( 12 ) satisfies ∥ ˆ γ i − γ ⋆ i ∥ 1 = o p (1) ; (iii) ( s i + ∥ γ ⋆ i ∥ 0 ) log d = o ( √ n i ) and bias i ( b G ) = o ( n − 1 / 2 i ) ; (iv) a central limit theor em holds for the weighted influence sum induced by the local- ization weights conditioning that δ µ δ m = o p ( n − 1 / 2 i ) in Algorithm 2 Localized DR-Lasso and debiased inference for unit i 1: Input: data { ( Y j , T j , X j , g j ) } N j =1 , target unit i , kernel K G , bandwidth b G , folds {I k } , contrast ( t, t ′ ) . 2: Compute weights w ( i ) j via ( 3 ) and n i = ( P j ( w ( i ) j ) 2 ) − 1 . 3: Cross-fit nuisances ˆ µ, ˆ m on folds and form residuals ( ˆ e Y j , ˆ e Z j ) via ( 8 ). 4: Solve weighted Lasso ( 9 ) to obtain ˆ α i and the plug-in contrast b θ T i via ( 10 ). 5: Compute b Σ( g i ) = P j w ( i ) j ˆ e Z j ˆ e Z ⊤ j and solve ( 12 ) to obtain ˆ γ i . 6: Output: the debiased estimator e θ T i via ( 13 ) and a con- fidence interval using ˆ σ 2 θ,i in Theorem 5.5 . Theor em 5.2 , then √ n i e θ T i ( t → t ′ ; g i ) − θ T i ( t → t ′ ; g i ) ⇒ N 0 , σ 2 θ,i , wher e σ 2 θ,i = V ar( γ ⋆ ⊤ i e Z ε ) and ε := e Y − e Z ⊤ α ( g i , x i ) . A consistent variance estimator is ˆ σ 2 θ,i := n i P N j =1 ( w ( i ) j ) 2 ˆ γ ⊤ i ˆ e Z j ˆ ε j 2 , ˆ ε j := ˆ e Y j − ˆ e Z ⊤ j ˆ α i . Theorem 5.5 is an individualized inference result: it yields asymptotically valid uncertainty for a user-chosen contrast at a specific unit and configuration neighborhood. It is not a claim of uniform inference over all units and all slates without further structure. The key requirement is that the effecti ve local sample size n i grows f ast enough relati ve to the local complexity ( s i and the sparsity of γ ⋆ i ), so that the debiasing remainder is asymptotically negligible. T ogether , Theorems 5.2 and 5.5 show that indi vidualized causal learning under interference can be statistically feasi- ble e ven with e xponentially many slates: the price of com- binatorial treatments enters through log(2 p ) = Θ( p ) and local sparsity , while interference is handled through local- ization and ef fecti ve sample size. In addition, we deduce the discussion of model robustness under local ov erlap failure (Assumption 2.3 ), which induces the partial identification bound, in the Appendix. 6. Experiments W e comprehensi vely v alidate our proposed method through synthetic experiments. The settings are listed in Section A . Figure 1 sho ws the comparison of point estimates and 95% confidence interv als for indi vidualized causal contrasts across the three estimators when the sample size is N = 500 . The true parameter v alue is 0 (red dashed line). Our method achie ves a point estimate of 0.031 with a confidence interv al 7 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Oracle P r oposed (Ours) Baseline 0.2 0.1 0.0 0.1 0.2 0.3 0.4 Estimated P arameter V alue T rue value = 0.0 -0.007 0.031 0.124 CI: [-0.052, 0.035] [-0.074, 0.159] [-0.117, 0.369] Width: 0.087 Width: 0.233 Width: 0.486 Comparison of Point Estimates and 95% Confidence Intervals (100 experiments at N=500) F igur e 1. Comparison of point estimates and 95% confidence intervals in synthetic experiments ( N = 500 , 100 independent repetitions). Our proposed method (width 0.233, bias 0.031) sub- stantially outperforms the baseline (0.486, 0.124) and approaches the oracle (0.087, -0.007), providing empirical evidence for Theo- rems 5.2 and 5.5. width of 0.233, substantially outperforming the baseline (point estimate 0.124, width 0.486) and approaching the ora- cle (point estimate -0.007, width 0.087). This result demon- strates that the proposed localized representation, doubly robust orthogonalization, and sparse spectral learning ef- fectiv ely reduce bias and significantly shrink uncertainty in finite samples, providing direct empirical support for the finite-sample error bounds in Theorem 5.2 and asymptotic normality in Theorem 5.5. 6.1. Evolution of the pr oposed estimator Data are generated according to the model specified in Section 2 . The underlying network is an Erd ˝ os-R ´ enyi random graph with av erage degree d = 8 , inducing het- erogeneous interference patterns. Node-le vel cov ariates are drawn as X i ∼ N (0 , I p ) with p = 10 , and com- binatorial treatments are assigned uniformly at random: t i ∈ {− 1 , 1 } p , yielding a 2 10 -dimensional W alsh basis expansion. Outcomes follow Outcomes follow Assump- tion 2.4 with s i = 3 activ e W alsh coefficients and Gaus- sian noise ε i ∼ N (0 , 0 . 25) . The performance of the debi- ased localized Lasso estimator (Algorithm 2) is ev aluated across sample sizes N ∈ { 50 , 100 , 200 , 500 , 1000 } , with M = 100 Monte Carlo repetitions per configuration. Fig- ure 2 summarizes the finite-sample properties of the pro- posed estimator . Three patterns emerge that align precisely with the theoretical predictions of Theorems 5.2 and 5.5 Bias. The median estimates (red lines) exhibit minimal deviation from the true parameter across all sample sizes, ranging from − 0 . 012 ( N = 50 ) to 0 . 002 ( N = 1000 ). This near-zero median bias confirms the asymptotic unbiasedness established in Theorem 5.5, demonstrating that the debias- ing correction successfully eliminates the shrinkage bias inherent in naiv e Lasso estimators. F igur e 2. Conv ergence of the Localized DR-Lasso Estimator . Boxplots display the sampling distribution across 100 simulations for each sample size. The horizontal axis indicates the total sam- ple size N ; the vertical axis measures relative estimation error (normalized to zero). T op annotations : Sample mean ± standard deviation; Center : Sample median (red line); Bottom : Empirical 95% confidence interval width ( W = Q 97 . 5 − Q 2 . 5 ). Con ver gence Rate. The empirical 95% confidence interval width monotonically contracts from 0 . 529 at N = 50 to 0 . 119 at N = 1000 , approximately follo wing the √ N rate predicted by Theorem 5.2. Specifically , doubling the sam- ple size roughly halves the standard deviation (e.g., 0 . 143 at N = 50 vs. 0 . 031 at N = 1000 ), confirming √ N - consistency . Coverage. The empirical coverage of the constructed confidence interv als remains within [0 . 92 , 0 . 96] across all settings, closely tracking the nominal 95% le v el. This validates the asymptotic normality approximation in Theorem 5.5 e ven for moderate sample sizes. Implications. The simulation results corroborate the efficienc y bounds deriv ed in Section 5 : the proposed estimator achie ves oracle- like performance (comparable to the infeasible estimator that observes true interference patterns) while remaining computable via standard con v ex optimization. 7. Conclusion In this work, we presented a unified framew ork for indi vidu- alized causal inference in networked systems characterized by high-dimensional, combinatorial treatments. By inte- grating rooted network configurations with doubly robust orthogonalization and sparse spectral learning, our approach constructs a global potential-outcome emulator that remains statistically tractable e ven as the intervention space gro ws to 2 p dimensions. W e provided a principled decomposition of networked causal ef fects into o wn-treatment, structural, and joint components, supported by rigorous finite-sample error bounds and asymptotic normality guarantees. Our empirical v alidation demonstrates that this localized repre- sentation effecti v ely mitigates confounding from network positions and reduces estimation bias, approaching oracle- lev el performance in finite samples. Ultimately , these results establish that individualized causal decision-making is fea- sible in complex network ed en vironments without the need to collapse or simplify the intervention space. 8 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Impact Statement This paper presents work whose goal is to advance the field of machine learning. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here. References Agarwal, A., Cen, S. H., Shah, D., and Y u, C. L. Net- work synthetic interventions: A causal framework for panel data under network interference. arXiv preprint arXiv:2210.11355 , 2022. Agarwal, A., Agarwal, A., and V ijaykumar, S. Synthetic combinations: A causal inference framework for combi- natorial interventions. Advances in Neural Information Pr ocessing Systems , 36:19195–19216, 2023. Aronow , P . M. and Samii, C. Estimating av erage causal ef- fects under interference between units. Annals of Applied Statistics , 11(4):1912–1947, 2017. Auerbach, E. and T abord-Meehan, M. The local approach to causal inference under network interference. arXiv pr eprint arXiv:2105.03810 , 2021. Chernozhukov , V ., Chetverikov , D., Demirer , M., Duflo, E., Hansen, C., Ne wey , W ., and Robins, J. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal , 21(1):C1–C68, 2018. Dasgupta, T . and Pillai, N. Causal inference from 2 k fac- torial designs using potential outcomes. J ournal of the Royal Statistical Society: Series B , 77(4):727–753, 2015. Hudgens, M. G. and Halloran, M. E. T o ward causal infer - ence with interference. Journal of the American Statisti- cal Association , 103(482):832–842, 2008. K ¨ unzel, S. R., Sekhon, J. S., Bickel, P . J., and Y u, B. Met- alearners for estimating heterogeneous treatment ef fects using machine learning. Pr oceedings of the National Academy of Sciences , 116(10):4156–4165, 2019. Langley , P . Crafting papers on machine learning. In Langley , P . (ed.), Pr oceedings of the 17th International Confer ence on Machine Learning (ICML 2000) , pp. 1207–1216, Stan- ford, CA, 2000. Morgan Kaufmann. Leung, M. P . Approximate randomized experiments in networks. Review of Economics and Statistics , 104(2): 318–334, 2022. Lopez, M. J. and Gutman, R. Estimation of causal effects with multiple treatments: A revie w and new ideas. Statis- tical Science , 32(3):432–454, 2017. Ma, J., W an, M., Y ang, L., Li, J., Hecht, B., and T ee v an, J. Learning causal ef fects on hypergraphs. In Pr oceedings of the 28th ACM SIGKDD Confer ence on Knowledge Discovery and Data Mining , pp. 1202–1212, 2022. Savje, F ., Aronow , P . M., and Hudgens, M. G. A verage treatment ef fects in the presence of unkno wn interference. Annals of Statistics , 49(2):673–701, 2021. W ager , S. and Athey , S. Estimation and inference of hetero- geneous treatment effects using random forests. Journal of the American Statistical Association , 113(523):1228– 1242, 2018. Zhao, Q. and Ding, P . Bayesian inference for factorial experiments with interference. Biometrika , 108(4):1067– 1081, 2021. 9 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments A. Simulation setup The synthetic data are generated strictly according to the model described in Section 2 : the network is an Erd ˝ os–R ´ enyi random graph (with v ariable number of nodes N and a verage degree d controlling interference strength, typically d = 5 – 10 ), cov ariates X i ∼ N (0 , I ) for each node, combinatorial treatments t i ∈ {− 1 , 1 } p ( p = 10 , yielding a treatment space of size 2 10 = 1024 ), and potential outcomes e xpanded in the W alsh-Hadamard basis with coefficients α i satisfying sparsity (support size s = 20 ) and the local smoothness assumption (bandwidth b G = 2 on rooted network configurations). The interference structure includes own-treatment effects and first-order neighbor interference, with the true causal contrast parameter set to 0. Each experimental setting is independently repeated 100 times to obtain rob ust statistics (median point estimates, 95% confidence interval widths, etc.). W e compare three estimators: Oracle : An oracle estimator that assumes kno wledge of the true potential outcome model and all nuisance functions, serving only as a theoretical optimum benchmark. Proposed (Ours): Our method, integrating localized rooted netw ork representation, doubly rob ust orthogonalization, and sparse spectral learning. Baseline : A strong baseline using a graph-agnostic doubly robust learner (Graph-Agnostic DR-Learner) combined with simple network a veraging, representing typical e xisting techniques for handling network interference or high-dimensional treatments. B. Proofs f or Section 5 B.1. Notation and weighted algebra Fix a target unit i throughout this appendix and abbreviate the k ernel weights w j := w ( i ) j . Recall d := 2 p and the effecti ve local sample size n i := n eff ( g i ) = N X j =1 w 2 j − 1 . For scalar sequences ( a j ) N j =1 , define the weighted inner product and norm ⟨ a, b ⟩ w := N X j =1 w j a j b j , ∥ a ∥ 2 w, 2 := N X j =1 w j a 2 j . For v ector cov ariates u j ∈ R d , define the weighted Gram matrix b Σ i ( u ) := N X j =1 w j u j u ⊤ j . In particular , b Σ( g i ) in Assumption 4.2 equals b Σ i ( ˆ e Z ) with u j = ˆ e Z j . For S ⊆ [ d ] and c 0 > 0 , define the sparse cone C ( S, c 0 ) := ∆ ∈ R d : ∥ ∆ S c ∥ 1 ≤ c 0 ∥ ∆ S ∥ 1 . W e interpret Assumption 4.2 as: for the relev ant support set S , there exists κ g i > 0 such that, with probability tending to one, ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≥ κ g i ∥ ∆ ∥ 2 2 for all ∆ ∈ C ( S, 3) . (14) B.2. T wo elementary lemmas W e start with two basic f acts repeatedly used belo w . Lemma B.1 (Con v ex-hull bound for m ( g , x ) ) . F or any ( g , x ) , the conditional mean m ( g , x ) = E [ Z ( T ) | G = g , X = x ] lies in the con ve x hull of { Z ( t ) : t ∈ {− 1 , 1 } p } ⊆ {− 1 , 1 } d . Consequently , for any a ∈ R d , a ⊤ m ( g , x ) ≤ sup t ∈{− 1 , 1 } p a ⊤ Z ( t ) . 10 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Pr oof. By definition, m ( g , x ) = P t ∈{− 1 , 1 } p ¶ ( T = t | G = g , X = x ) Z ( t ) , a con v ex combination of the vertices { Z ( t ) } . For an y fixed a ∈ R d , the map z 7→ a ⊤ z is linear , hence a ⊤ m ( g , x ) lies between the minimum and maximum of a ⊤ Z ( t ) ov er t . Applying the same argument to − a yields the stated absolute-v alue bound. Lemma B.2 (W eighted sub-Gaussian maximal inequality) . Let ( ξ j ) N j =1 be conditionally independent given a sigma-field F , with E [ ξ j | F ] = 0 and conditionally sub-Gaussian tails E [exp( λξ j ) | F ] ≤ exp( λ 2 σ 2 / 2) for all λ ∈ R . Let ( w j ) N j =1 be nonne gative F -measurable weights with P j w j = 1 . If ( a j,k ) j,k ar e F -measurable and satisfy | a j,k | ≤ B for all j, k , then for S k := P N j =1 w j a j,k ξ j , ¶ max 1 ≤ k ≤ d | S k | ≥ t F ≤ 2 d exp − n i t 2 2 σ 2 B 2 , n i = X j w 2 j − 1 . In particular , taking t = C σ B p log d/n i yields max k | S k | = O p ( p log d/n i ) . Pr oof. Condition on F . Each S k is a weighted sum of conditionally independent sub-Gaussian variables w j a j,k ξ j with proxy variance σ 2 P j w 2 j a 2 j,k ≤ σ 2 B 2 P j w 2 j = σ 2 B 2 /n i . Hence S k is conditionally sub-Gaussian with that proxy variance, so ¶ ( | S k | ≥ t | F ) ≤ 2 exp( − n i t 2 / (2 σ 2 B 2 )) . A union bound over k ∈ [ d ] gi ves the displayed inequality . B.3. Proof of Lemma 5.1 (orthogonalization r emainder) W e first prov e a more e xplicit inequality; Lemma 5.1 follows as a corollary . Lemma B.3 (Empirical score perturbation bound) . F ix a fold partition and a tar get unit i . F or eac h j , let ∆ µj := ˆ µ ( G j , X j ) − µ ( G j , X j ) , ∆ mj := ˆ m ( G j , X j ) − m ( G j , X j ) , so that ˆ e Y j = e Y j − ∆ µj and ˆ e Z j = e Z j − ∆ mj . F or any α ∈ R d , define the empirical weighted scor es ˆ Ψ i ( α ) := N X j =1 w j ˆ e Z j ˆ e Y j − ˆ e Z ⊤ j α , Ψ ◦ i ( α ) := N X j =1 w j e Z j e Y j − e Z ⊤ j α . Then, for any α with ∥ α ∥ 1 ≤ M , ∥ ˆ Ψ i ( α ) − Ψ ◦ i ( α ) ∥ ∞ ≤ N X j =1 w j e Z j ∆ µj ∞ | {z } ( I ) + N X j =1 w j e Z j (∆ ⊤ mj α ) ∞ | {z } ( I I ) + N X j =1 w j ∆ mj ( e Y j − e Z ⊤ j α ) ∞ | {z } ( I I I ) + N X j =1 w j ∆ mj ∆ µj ∞ | {z } ( I V ) + N X j =1 w j ∆ mj (∆ ⊤ mj α ) ∞ | {z } ( V ) . Mor eover , ( I V ) ≤ δ µ δ m and ( V ) ≤ M δ 2 m . If, in addition, ∥ e Z j ∥ ∞ ≤ 2 almost sur ely and e Y j − e Z ⊤ j α is conditionally sub-Gaussian given ( G j , X j , T j ) with pr oxy σ 2 , then ( I ) + ( I I ) + ( I I I ) = O p ( δ µ + δ m ) r log d n i ! . Pr oof. The decomposition is a direct e xpansion: ˆ e Z j ( ˆ e Y j − ˆ e Z ⊤ j α ) − e Z j ( e Y j − e Z ⊤ j α ) = − ( e Z j − ∆ mj )∆ µj + e Z j (∆ ⊤ mj α ) − ∆ mj ( e Y j − e Z ⊤ j α ) + ∆ mj ∆ µj − ∆ mj (∆ ⊤ mj α ) . Summing with weights and taking ℓ ∞ norms yields the first claim. For ( I V ) , each coordinate satisfies P j w j ∆ mj,k ∆ µj ≤ P j w j δ m δ µ = δ µ δ m . For ( V ) , ∆ mj,k (∆ ⊤ mj α ) ≤ ∥ ∆ mj ∥ 2 ∞ ∥ α ∥ 1 ≤ δ 2 m M , hence ( V ) ≤ M δ 2 m . For the stochastic bound 11 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments on ( I ) – ( I I I ) , note that conditional on the training folds, ∆ µj and ∆ mj are fixed (cross-fitting), and each summand is a weighted sum of mean-zero sub-Gaussian terms with coefficients bounded by 2 . Lemma B.2 (with B = 2 ) and a union bound ov er d coordinates yield ( I ) + ( I I ) + ( I I I ) = O p (( δ µ + δ m ) p log d/n i ) . Pr oof of Lemma 5.1 . Lemma B.3 implies, uniformly over ∥ α ∥ 1 ≤ M , ∥ ˆ Ψ i ( α ) − Ψ ◦ i ( α ) ∥ ∞ = O p ( δ µ δ m ) + O p ( δ µ + δ m ) r log d n i ! + O p ( δ 2 m ) . Under Assumption 4.1 , δ µ = o p (1) and δ m = o p (1) . If moreov er n i → ∞ and log d = o ( n i ) (as required by Assump- tion 4.3 ), then the latter tw o terms are o p (1) . The leading deterministic second-order term is δ µ δ m , which is the sense in which nuisance estimation enters only at second order . B.4. Proof of Theor em 5.2 (finite-sample weighted Lasso bounds) W e work conditionally on the realized weights w j and the fold assignment. Let α ⋆ i := α ( g i , x i ) be the target coefficient vector . Define the fitted residuals ˆ ε j ( α ) := ˆ e Y j − ˆ e Z ⊤ j α, ˆ ε j := ˆ ε j ( ˆ α i ) . Step 0: localization bias as a deterministic perturbation. For j with w j > 0 , the kernel construction implies d R ( g j , g i ) ≤ b G . Define the pointwise response difference ∆ f ij ( t ) := f ( t ; g j , x i ) − f ( t ; g i , x i ) . Then by definition of bias i ( b G ) , sup t | ∆ f ij ( t ) | ≤ bias i ( b G ) for all such j . Using Lemma B.1 and the identity µ ( g , x ) = α ( g , x ) ⊤ m ( g , x ) , one verifies that for any j with w j > 0 , ( α ( g j , x i ) − α ( g i , x i )) ⊤ e Z j ≤ 2 bias i ( b G ) . (15) Indeed, ( α ( g j , x i ) − α ( g i , x i )) ⊤ e Z j = ( α ( g j , x i ) − α ( g i , x i )) ⊤ Z ( T j ) − ( α ( g j , x i ) − α ( g i , x i )) ⊤ m ( g j , x i ) , and both terms are bounded by sup t | ( α ( g j , x i ) − α ( g i , x i )) ⊤ Z ( t ) | = sup t | ∆ f ij ( t ) | ≤ bias i ( b G ) , while the second term uses Lemma B.1 . Hence ( 15 ) holds. Step 1: a weighted basic inequality . By optimality of ˆ α i in ( 9 ), for any β ∈ R d , N X j =1 w j ˆ e Y j − ˆ e Z ⊤ j ˆ α i 2 + λ i ∥ ˆ α i ∥ 1 ≤ N X j =1 w j ˆ e Y j − ˆ e Z ⊤ j β 2 + λ i ∥ β ∥ 1 . (16) T ake β = α ⋆ i and define ∆ := ˆ α i − α ⋆ i . Expanding the squares yields ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≤ 2 N X j =1 w j ˆ e Z j ˆ ε j ( α ⋆ i ) ∞ ∥ ∆ ∥ 1 + λ i ∥ α ⋆ i ∥ 1 − ∥ α ⋆ i + ∆ ∥ 1 . (17) Step 2: controlling the score term. Decompose ˆ ε j ( α ⋆ i ) = e Y j − e Z ⊤ j α ⋆ i | {z } =: u j + ˆ e Y j − e Y j − ˆ e Z j − e Z j ⊤ α ⋆ i | {z } =: r j + ( e Z j − ˆ e Z j ) ⊤ ( α ( G j , X j ) − α ⋆ i ) | {z } =: b j , where u j is the oracle residual at the target coefficient, r j is the nuisance-induced perturbation, and b j is a higher-order mixed term. By ( 15 ) and ∥ ˆ e Z j − e Z j ∥ ∞ = ∥ ∆ mj ∥ ∞ ≤ δ m , | b j | ≤ ∥ ˆ e Z j − e Z j ∥ ∞ ∥ α ( G j , X j ) − α ⋆ i ∥ 1 ≤ 2 δ m bias i ( b G ) , 12 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments where the last step uses the same con v ex-hull ar gument as in ( 15 ). Moreover , Lemma B.3 implies N X j =1 w j ˆ e Z j r j ∞ = O p ( δ µ δ m ) + O p ( δ µ + δ m ) r log d n i ! . Finally , write u j = ε j + e Z ⊤ j ( α ( G j , X j ) − α ⋆ i ) where ε j := e Y j − e Z ⊤ j α ( G j , X j ) is the regression noise. By ( 15 ) , the deterministic part obeys | e Z ⊤ j ( α ( G j , X j ) − α ⋆ i ) | ≤ 2 bias i ( b G ) for all j with w j > 0 . Therefore, N X j =1 w j ˆ e Z j u j ∞ ≤ N X j =1 w j ˆ e Z j ε j ∞ | {z } ( ⋆ ) +2 bias i ( b G ) N X j =1 w j ˆ e Z j ∞ . Since Z ( T ) is {− 1 , 1 } d -valued and m ( · , · ) is a conditional mean, ∥ e Z j ∥ ∞ ≤ 2 , and we may assume ˆ m is truncated to [ − 1 , 1] d so that ∥ ˆ e Z j ∥ ∞ ≤ 2 as well. Applying Lemma B.2 to ( ⋆ ) gi ves, with probability tending to one, ( ⋆ ) ≲ σ r log d n i . Collecting bounds yields N X j =1 w j ˆ e Z j ˆ ε j ( α ⋆ i ) ∞ ≲ σ r log d n i + bias i ( b G ) + δ µ δ m , (18) up to terms that are o p ( λ i ) under Assumption 4.1 . Step 3: cone condition and ℓ 2 /ℓ 1 bounds. Let S be the index set of the s i largest coordinates of α ⋆ i in magnitude. Standard arguments (triangle inequality) gi v e ∥ α ⋆ i ∥ 1 − ∥ α ⋆ i + ∆ ∥ 1 ≤ ∥ ∆ S ∥ 1 − ∥ ∆ S c ∥ 1 + 2 ∥ α ⋆ i,S c ∥ 1 . Plugging this and ( 18 ) into ( 17 ) yields ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≤ 2Λ i ∥ ∆ ∥ 1 + λ i ∥ ∆ S ∥ 1 − ∥ ∆ S c ∥ 1 + 2 ∥ α ⋆ i,S c ∥ 1 , where Λ i ≲ σ p log d/n i + bias i ( b G ) + δ µ δ m . Choosing λ i so that λ i ≳ σ p log d/n i ensures the stochastic part is dominated by λ i , and thus ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≤ C 1 λ i ∥ ∆ S ∥ 1 − C 2 λ i ∥ ∆ S c ∥ 1 + C 3 λ i ∥ α ⋆ i,S c ∥ 1 + C 4 bias i ( b G ) + δ µ δ m ∥ ∆ ∥ 1 , for univ ersal constants C k > 0 . Rearranging yields the cone condition ∆ ∈ C ( S, 3) up to the approximation term ∥ α ⋆ i,S c ∥ 1 (which is controlled by Assumption 4.3 ). On the event ( 14 ), we therefore ha v e κ g i ∥ ∆ ∥ 2 2 ≤ ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≲ λ i ∥ ∆ S ∥ 1 + bias i ( b G ) + δ µ δ m ∥ ∆ ∥ 1 + λ i ∥ α ⋆ i,S c ∥ 1 . Using ∥ ∆ S ∥ 1 ≤ √ s i ∥ ∆ ∥ 2 and ∥ ∆ ∥ 1 ≤ 4 ∥ ∆ S ∥ 1 + 2 ∥ α ⋆ i,S c ∥ 1 under the cone condition giv es ∥ ∆ ∥ 2 ≲ √ s i λ i + bias i ( b G ) + δ µ δ m + ∥ α ⋆ i,S c ∥ 1 √ s i . Similarly , ∥ ∆ ∥ 1 ≲ s i λ i + s i bias i ( b G ) + s i δ µ δ m + ∥ α ⋆ i,S c ∥ 1 . Under Assumption 4.3 , the approximation terms in volving ∥ α ⋆ i,S c ∥ 1 are dominated by the displayed rates (this is the standard “effecti ve sparsity” interpretation). Substituting λ i ≍ ˆ σ p log d/n i yields ( 5.2 ) and ( 5.2 ). 13 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Step 4: prediction error bound. From ( 17 ) and the previous steps, N X j =1 w j ˆ e Z ⊤ j ∆ 2 = ∆ ⊤ b Σ i ( ˆ e Z ) ∆ ≲ s i λ 2 i + bias i ( b G ) 2 + ( δ µ δ m ) 2 , which giv es ( 5.2 ). This completes the proof. B.5. Proof of Cor ollary 5.3 Pr oof. For own-treatment contrasts, b θ T i ( t → t ′ ; g i ) − θ T i ( t → t ′ ; g i ) = ⟨ ˆ α i − α ( g i , x i ) , v t,t ′ ⟩ . By H ¨ older’ s inequality , ⟨ ˆ α i − α ( g i , x i ) , v t,t ′ ⟩ ≤ ∥ ˆ α i − α ( g i , x i ) ∥ 1 ∥ v t,t ′ ∥ ∞ . Since each W alsh coordinate is ± 1 , v t,t ′ = Z ( t ′ ) − Z ( t ) has entries in {− 2 , 0 , 2 } and hence ∥ v t,t ′ ∥ ∞ ≤ 2 . The stated bound follows by in voking Theorem 5.2 . The structural and joint contrasts follow by the same argument, applied to the two localized fits at g i and g ′ . B.6. Proof of Theor em 5.5 Pr oof. W e prove an asymptotic linear e xpansion and then in voke the weighted CL T assumed in Theorem 5.5 (iv). Step 1: an exact decomposition. Let α ⋆ i := α ( g i , x i ) and ∆ := ˆ α i − α ⋆ i . Write b Σ i := b Σ( g i ) = P j w j ˆ e Z j ˆ e Z ⊤ j . From the definition ( 13 ), e θ T i ( t → t ′ ; g i ) = v ⊤ t,t ′ ˆ α i + ˆ γ ⊤ i N X j =1 w j ˆ e Z j ˆ e Y j − ˆ e Z ⊤ j ˆ α i = v ⊤ t,t ′ α ⋆ i + ˆ γ ⊤ i N X j =1 w j ˆ e Z j ˆ e Y j − ˆ e Z ⊤ j α ⋆ i | {z } =: A 1 + v ⊤ t,t ′ − ˆ γ ⊤ i b Σ i ∆ | {z } =: A 2 . (19) Since θ T i ( t → t ′ ; g i ) = v ⊤ t,t ′ α ⋆ i , it remains to analyze A 1 and A 2 . Step 2: controlling the debiasing remainder A 2 . By the feasibility constraint in ( 12 ), ∥ b Σ i ˆ γ i − v t,t ′ ∥ ∞ ≤ η i . Hence, by H ¨ older’ s inequality , | A 2 | ≤ η i ∥ ∆ ∥ 1 . Theorem 5.2 giv es ∥ ∆ ∥ 1 ≲ s i p log d/n i + s i bias i ( b G ) + s i δ µ δ m . With η i ≍ p log d/n i , we obtain | A 2 | ≲ s i log d n i + s i bias i ( b G ) r log d n i + s i δ µ δ m r log d n i . (20) Under the growth condition in Theorem 5.5 (iii), s i log d = o ( √ n i ) and bias i ( b G ) = o ( n − 1 / 2 i ) . If additionally δ µ δ m = o p ( n − 1 / 2 i ) (the standard DML second-order condition), then √ n i A 2 = o p (1) . Step 3: asymptotic linearity of A 1 . Decompose ˆ e Y j − ˆ e Z ⊤ j α ⋆ i = e Y j − e Z ⊤ j α ⋆ i | {z } =: u j + ˆ e Y j − e Y j − ˆ e Z j − e Z j ⊤ α ⋆ i | {z } =: r j . 14 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments Thus A 1 = ˆ γ ⊤ i N X j =1 w j ˆ e Z j u j + ˆ γ ⊤ i N X j =1 w j ˆ e Z j r j =: A 1 a + A 1 b . For A 1 b , Lemma B.3 implies ∥ P j w j ˆ e Z j r j ∥ ∞ = O p ( δ µ δ m ) + o p ( n − 1 / 2 i ) under Assumption 4.1 , hence A 1 b = O p ( ∥ ˆ γ i ∥ 1 δ µ δ m ) + o p ( n − 1 / 2 i ) . In particular, if ∥ ˆ γ i ∥ 1 = O p (1) and δ µ δ m = o p ( n − 1 / 2 i ) , then √ n i A 1 b = o p (1) . For A 1 a , write ˆ γ i = γ ⋆ i + ( ˆ γ i − γ ⋆ i ) and obtain A 1 a = γ ⋆ ⊤ i N X j =1 w j ˆ e Z j u j + ( ˆ γ i − γ ⋆ i ) ⊤ N X j =1 w j ˆ e Z j u j =: B 1 + B 2 . The second term B 2 is controlled by the standard CLIME/node wise rate. Under the sparsity condition ∥ γ ⋆ i ∥ 0 ≤ s γ ,i and standard conditions for CLIME, one has ∥ ˆ γ i − γ ⋆ i ∥ 1 = O p ( s γ ,i p log d/n i ) and ∥ P j w j ˆ e Z j u j ∥ ∞ = O p ( p log d/n i + bias i ( b G )) , so that B 2 = O p s γ ,i log d n i + O p s γ ,i bias i ( b G ) r log d n i ! . Under Theorem 5.5 (iii) and bias i ( b G ) = o ( n − 1 / 2 i ) , this giv es √ n i B 2 = o p (1) . It remains to identify B 1 . Write u j = ε j + b ij , where ε j := e Y j − e Z ⊤ j α ( G j , X j ) is the regression noise and b ij := e Z ⊤ j ( α ( G j , X j ) − α ⋆ i ) is the localization bias term. By ( 15 ), | b ij | ≤ 2 bias i ( b G ) for all j with w j > 0 . Hence B 1 = γ ⋆ ⊤ i N X j =1 w j ˆ e Z j ε j + γ ⋆ ⊤ i N X j =1 w j ˆ e Z j b ij =: C 1 + C 2 . The deterministic term C 2 is bounded by | C 2 | ≤ 2 bias i ( b G ) ∥ γ ⋆ i ∥ 1 ∥ P j w j ˆ e Z j ∥ ∞ , and Lemma B.2 giv es ∥ P j w j ˆ e Z j ∥ ∞ = O p ( p log d/n i ) . Thus √ n i C 2 = o p (1) if bias i ( b G ) = o ( n − 1 / 2 i ) . The leading term is therefore C 1 . Replacing ˆ e Z j by e Z j only incurs a nuisance remainder of order δ m times a sub-Gaussian weighted av erage, hence is o p ( n − 1 / 2 i ) under Assumption 4.1 . Consequently , √ n i C 1 = √ n i γ ⋆ ⊤ i N X j =1 w j e Z j ε j + o p (1) . By Theorem 5.5 (iv) (the weighted CL T under the dependence induced by localization), √ n i γ ⋆ ⊤ i N X j =1 w j e Z j ε j ⇒ N (0 , σ 2 θ,i ) , σ 2 θ,i = V ar( γ ⋆ ⊤ i e Z ε ) . Combining Steps 1–3 yields the claimed asymptotic normality . Step 4: variance estimator consistency . Define ˆ ε j := ˆ e Y j − ˆ e Z ⊤ j ˆ α i and ˆ σ 2 θ,i := n i N X j =1 ( w j ) 2 ˆ γ ⊤ i ˆ e Z j ˆ ε j 2 . Under the same bounds used above (consistency of ˆ α i and ˆ γ i , bounded ℓ 1 norms, and δ µ , δ m = o p (1) ), the difference between ˆ σ 2 θ,i and the corresponding oracle plug-in based on ( γ ⋆ i , e Z , ε ) is o p (1) , while the oracle plug-in con verges to σ 2 θ,i by the law of large numbers for the weighted second moment (or its dependence-robust analogue implied by (iv)). This establishes ˆ σ 2 θ,i → p σ 2 θ,i . 15 Individualized Causal Effects under Network Interfer ence with Combinatorial T reatments B.7. Proof of Pr oposition A.1 (partial identification bound) Pr oof. By the assumed L -Lipschitz property of t 7→ f ( t ; g , x ) with respect to Hamming distance, | f ( t ′ ; g , x ) − f ( t ; g , x ) | ≤ L d H ( t, t ′ ) . The left-hand side equals | θ T ( t → t ′ ; g ) | by definition, proving the claim. 16
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment