balnet: Pathwise Estimation of Covariate Balancing Propensity Scores

balnet : P ath wise Estimation of Co v ariate Balancing Prop ensit y Scores Erik Sv erdrup 1 T rev or Hastie 2 1 Departmen t of Econometrics & Business Statistics, Monash Univ ersity 2 Departmen t of Statistics, Stanford Universit y Abstract W e present balnet , an R pack age for scalable path wise estimation of co v ariate balancing prop ensit y scores via logistic cov ariate balancing loss functions. Regular- ization paths are computed with Y ang and Hastie ( 2024 )’s generic elastic net solver, supp orting con vex losses with non-smo oth p enalties, as well as group penalties and feature-sp eciﬁc p enalt y factors. F or lasso p enalization, balnet computes a regular- ized balance path from the largest observ ed cov ariate imbalance to a user-speciﬁed fraction of this maxim um. W e illustrate the metho d with an application to spa- tial pixel-level balancing for constructing synthetic con trol w eights for the av erage treatmen t eﬀect on the treated, using satellite data on wildﬁres. 1 In tro duction The prop ensit y score plays a cen tral role in estimating causal eﬀects in observ ational settings ( Rosen baum and Rubin , 1983 ). It is deﬁned as the probabilit y that an individual with a giv en pre-treatmen t co v ariate proﬁle is assigned to the treatmen t, and is frequently estimated using logistic regression. In high-dimensional settings, such as applications with many confounders, rich basis expansions, or higher-order interactions, a standard approac h is to estimate the prop ensit y score using lasso ( Tibshirani , 1996 ) or elastic net p enalized logistic regression ( Zou and Hastie , 2005 ), as implemented in the widely used R pack age glmnet ( F riedman et al. , 2010 ). A primary use of the prop ensit y score is inv erse probabilit y weigh ting (IPW), where outcomes are rew eighted to construct treated and control groups that are comparable in terms of their pre-treatmen t co v ariates. Under the true (oracle) propensity score, IPW exactly balances cov ariate means b et w een treatment arms and the ov erall p opula- tion. Ho wev er, ev en without mo del-missp eciﬁcation, this exact balancing prop ert y do es not generally hold for prop ensit y scores estimated using standard maximum lik eliho o d logistic regression. Maxim um likelihoo d estimation prioritizes accurate prediction of treatmen t assignment, rather than cov ariate balance. Imai and Ratk ovic ( 2014 ) intro- duced the notion of cov ariate balancing prop ensit y scores, prop ensity score estimators explicitly designed to optimize cov ariate balance. Causal eﬀects estimated using these 1 kinds of weigh ts are more robust to model mis-speciﬁcation, and more eﬃcient than approac hes that estimate prop ensity scores via maximum lik eliho od ( Ben-Michael et al. , 2021 ; Chattopadhy a y et al. , 2020 ). A n um b er of metho ds ha v e b een prop osed to directly target cov ariate balance. These include the co v ariate balancing propensity score estimator, CBPS ( F ong et al. , 2022 ), whic h enforces moment conditions via generalized metho d of momen ts, and en tropy balancing, ebal ( Hainmueller , 2012 ), whic h directly optimizes weigh ts to achiev e balance. Suc h metho ds may ac hiev e exact balance in the relativ ely small datasets commonly encoun tered in the so cial sciences. How ev er, in large-scale datasets, these approac hes can b e computationally challenging to scale and are often infeasible, as exact balance ma y not b e attainable 1 . The stable balancing weigh ts approac h, sbw ( Zubizarreta , 2015 ), relaxes the requirement of exact balance and instead optimizes w eights to satisfy balance lev els pre-sp eciﬁed b y the researc her. While this approac h is eﬀectiv e, it requires the user to choose these levels a priori. If the c hosen level is infeasible for a given dataset, or if a diﬀeren t bias–v ariance tradeoﬀ is desired, the underlying optimization problem m ust b e re-solved. In practice, this leads to iterative tuning b et w een balance constraints and downstream mo del v alidation ( Keele et al. , 2025 ). Motiv ated by these challenges, we seek an algorithmic solution that simpliﬁes the selection of balance levels in a manner analogous to regularization paths for p enalized generalized linear mo dels. Our approach builds on recent w ork that uniﬁes man y balance- orien ted estimators through cov ariate balancing loss functions ( Zhao , 2019 ). F or the logistic link, W ager ( 2024 , Chapter 7) note (emphasis added): . . . if we b eliev e in a linear-logistic sp eciﬁcation and wan t to use an IPW estimator, then we should learn the propensity model by minimizing the c ovariate-b alancing loss function rather than by the usual maximum likeli- ho o d loss used for logistic regression. This observ ation motiv ates the use of cov ariate balancing loss functions with mo dern regularization techniques. The p enalized versions of these loss functions corresp ond to the primal of optimization problems that constrains cov ariate imbalance ( Ben-Michael et al. , 2021 ; T an , 2020 ; Zhao , 2019 ). As a consequence, the in terpretation of the p enalt y parameter diﬀers fundamen tally from that in p enalized logistic regression, where it acts as a co eﬃcien t budget. F or lasso in particular, the p enalt y parameter directly controls the maxim um allow able absolute im balance across co v ariates, oﬀering an intuitiv e metric to diagnose o v erlap issues, as well as monitor solv er progress in applications. T o our knowledge, the closest existing softw are is RCAL ( T an and Sun , 2020 ), which implemen ts lasso regularization for cov ariate balancing loss functions. How ev er, b ecause this solver is written entirely in R , it can b e diﬃcult to scale to large datasets. This limitation is particularly salient in applications such as spatial or pixel-level balancing, where b oth the n um b er of observ ations and cov ariates are large. 1 A closely related approach from the survey literature, av ailable in the survey pack age ( Lumley , 2004 ), is raking, which calibrates survey weigh ts so that sample marginal distributions matc h known p opulation margins. 2 Our pac k age, balnet for R ( R Core T eam , 2024 ), addresses these limitations by com bining co v ariate balancing loss functions with mo dern optimization techniques for large-scale regularization problems. Our approach uses proximal quasi-Newton metho ds for generic conv ex losses, including algorithmic reﬁnemen ts for co ordinate descent such as screening and activ e-set rules ( F riedman et al. , 2010 ; Simon et al. , 2013 ; Tibshirani et al. , 2012 ). The core algorithm builds on the adelie C++ e lastic net solver b y Y ang and Hastie ( 2024 ), interfaced with R via Rcpp ( Eddelbuettel and F ran¸ cois , 2011 ), and relies on n umerical linear algebra routines from Eigen ( Guennebaud et al. , 2010 ). In an application from W u et al. ( 2023 ) inv olving spatial balancing with ab out 140,000 observ ations and 500 co v ariates, balnet computes the full regularization path, from ra w, unw eighted imbalance do wn to commonly accepted balance thresholds, in under one minute on a standard laptop. 2 Regularization paths for co v ariate balancing prop ensit y scores Let Y i (1) and Y i (0) denote the p otential outcomes under treatment and control, resp ec- tiv ely ( Imbens and Rubin , 2015 ). The realized treatmen t assignment is W i ∈ { 0 , 1 } and the observed outcome is Y i = Y i ( W i ), for a sample of units i = 1 , . . . , n . In observ ational settings, a common identifying assumption is unconfoundedness: conditional on a set of pre-treatmen t co v ariates X i ∈ R p , treatmen t assignmen t is as go o d as random. Under this assumption, a cen tral ob ject is the prop ensit y score, e ( x ) = P [ W i = 1 | X i = x ] . Giv en suitable o verlap conditions, potential outcome means are identiﬁed via in v erse- prop ensit y w eighted (IPW) outcomes. W e b egin by considering the treated mean, ex- tensions to the av erage treatment eﬀect (A TE) and the a verage treatmen t eﬀect on the treated (A TT) follo w by symmetry . The treated mean is iden tiﬁed by E [ Y i (1)] = E  W i Y i e ( X i )  . The oracle propensity score also satisﬁes p opulation-lev el cov ariate balance, E  W i X ij e ( X i )  = E [ X ij ] , j = 1 , . . . , p. F or notational simplicit y , we describe the setting in terms of linear dep endence on X i , the discussion extends directly to nonlinear sp eciﬁcations via basis expansions and in- teractions. Under a logistic prop ensit y score model, e ( x ) = 1 1 + exp ( − ( β 0 + x ⊤ β )) . 3 If β is estimated by maxim um likelihoo d using the standard Bernoulli log-likelihoo d, the gradien t equations are 1 n n X i =1  W i − ˆ e ( X i )  X ij = 0 , j = 1 , . . . , p, (1) where ˆ e ( X i ) are the ﬁtted prop ensit y scores. Maxim um likelihoo d essentially balances predicted and observed treatmen t. Achieving ﬁnite-sample co v ariate balance using the ﬁtted prop ensities requires that 1 n n X i =1 W i X ij ˆ e ( X i ) = 1 n n X i =1 X ij , j = 1 , . . . , p. (2) T o enforce this cov ariate balancing condition, w e instead work backw ards and require the gradients satisfy 1 n n X i =1  W i ˆ e ( X i ) − 1  X ij = 0 , j = 1 , . . . , p. (3) Substituting in for the logistic prop ensity scores, this is equiv alen t to requiring that ˆ β satisﬁes 1 n n X i =1 W i h 1 + exp  − ( ˆ β 0 + X i ˆ β )  i − 1 ! X ij = 0 , j = 1 , . . . , p. (4) The loss function that enforces ( 4 ) is l 1 ( η ) = 1 n n X i =1  W i exp( − η i ) + (1 − W i ) η i  , (5) where η i = β 0 + X i β . This can b e v eriﬁed b y setting the gradien t of l 1 ( η ) to zero, whic h yields the balance conditions in ( 4 ). The IPW estimator constructed using such weigh ts has fa vorable statistical prop erties: when the logistic model is correctly sp eciﬁed it is √ n - consisten t with eﬃcien t v ariance ( W ager , 2024 , Theorem 7.1). Intuitiv ely , this approac h targets the ratio W i /e ( X i ) (the Riesz represen ter for E [ Y i (1)]; see e.g. Hirsh b erg and W ager ( 2021 )), rather than estimating e ( X i ) via maximum lik eliho o d and inv erting the ﬁtted probabilities 2 . The balance condition for the con trol group mean is 1 n n X i =1 (1 − W i ) X ij 1 − ˆ e ( X i ) = 1 n n X i =1 X ij , j = 1 , . . . , p, (6) whic h yields an analogous loss l 0 ( η ) for estimating propensity scores that target E [ Y i (0)], l 0 ( η ) = 1 n n X i =1  (1 − W i ) exp( η i ) − W i η i  . (7) 2 These weigh ts can also improv e doubly robust metho ds by stabilizing the AIPW debiasing term ( Robins et al. , 1994 ); see, for example, V ermeulen and V ansteelandt ( 2015 ). 4 Remark 1. F or estimating the A TE, E [ Y i (1)] − E [ Y i (0)] , this appr o ach r e quir es ﬁtting two pr op ensity sc or e mo dels: one tar geting b alanc e for the tr e ate d me an and one for the c ontr ol me an, unlike maximum likeliho o d which ﬁts one set of c o eﬃcients. Remark 2. F or softwar e, it is suﬃcient to implement only the loss in ( 5 ) . By symme- try, pr op ensities tar geting E [ Y i (0)] ar e obtaine d by solving ( 5 ) with inverte d tr e atment indic ators W ′ i = 1 − W i . The pr op ensities tar geting E [ Y i (0)] c an also b e use d to tar get the A TT, E [ Y i (1) − Y i (0) | W i = 1] , sinc e the losses only diﬀer by a sc aling (the former weights the c ontr ol to match 1 n P n i =1 X i while the latter weights the c ontr ols to match 1 P n i =1 W i P n i =1 W i X i ). In practice, directly minimizing ( 5 ) may not b e feasible. Exact balance ma y b e unattainable in settings with high-dimensional co v ariates ( p > n ) or limited ov erlap, p o- ten tially pro ducing extreme IPW w eights. The cov ariate balancing loss imp oses stronger requiremen ts than maximum likelihoo d estimation, whic h only requires o verlap, e.g., the absence of complete separation. Prop osition 1 (Prop osition S1, T an ( 2020 )) . The loss l 1 ( η ) is strictly c onvex, b ounde d b elow, and has a unique minimizer if and only if the fol lowing set is empty: ( β  = 0 : η i ≥ 0 if W i = 1 for i = 1 , . . . , n, and 1 n n X i =1 (1 − W i ) η i ≤ 0 ) . This condition is stricter than logistic separation: it not only forbids a linear hy- p erplane from p erfectly separating treated and con trols, but also prev ents the w eighted con trol mean from exceeding the treated mean along any linear combination of co v ari- ates. Regularization is therefore key to making this approach practical. W e consider the p enalized ob jective l 1 ( η ) + λP ( β ) , (8) where P ( · ) is a suitable p enalty . The default c hoice in balnet is the lasso, P ( β ) = P p j =1 | β j | . This p enalt y has a particularly con venien t interpretation for balancing losses ( T an , 2020 ; Zhao , 2019 ). Exact balance requires 1 n n X i =1 W i X ij ˆ e ( X i ) − 1 n n X i =1 X ij = 0 , j = 1 , . . . , p. (9) Under lasso p enalization, this constraint is relaxed to a b o x constraint since the corre- sp onding KKT conditions giv es      1 n n X i =1 W i X ij ˆ e ( X i ) − 1 n n X i =1 X ij      ≤ λ, j = 1 , . . . , p. (10) That is, the regularization path indexed by λ yields a sequence of solutions with grad- ually decreasing maximum im balance tolerances, corresp onding to an ℓ ∞ b ound on the co v ariate imbalance v ector. 5 balnet minimizes ( 8 ) using the group elastic net solv er of Y ang and Hastie ( 2024 ), whic h supports lasso, ridge, elastic net, and group p enalties 3 . Group p enalties, optionally com bined with p enalt y factors, allow balnet to balance sets of related cov ariates jointly , suc h as in teractions or categorical expansions, an approach that is useful in man y applied settings (e.g., Ben-Mic hael et al. , 2021 ). P athwise optimization is a natural ﬁt for this loss and can provide useful practical diagnostic for o verlap. balnet constructs a log-spaced λ sequence in the spirit of glmnet , starting at the smallest λ max for which the solution has ˆ β = 0, and pro ceeding to a ﬁxed fraction of this v alue (with default path length 100). F or the treated mean, this maxim um v alue is simply giv en by the largest un w eighted treated imbalance, λ max 1 = max j      1 n n X i =1 W i X ij − 1 n n X i =1 X ij      . F or lasso p enalization, balnet also allows users to sp ecify a target λ min corresp onding directly to a maximum allow able im balance. The path solv er then attempts to reac h this tolerance and gracefully truncates the path if further reductions are infeasible 4 . By default, balnet standardizes the cov ariates, so λ can b e interpreted as a limit on the standardized mean diﬀerence. Another b eneﬁt of path solvers is the eﬃcient computation of cross-v alidation grids, as implemen ted in cv.balnet . Direct approac hes to cross-v alidation for regularized mo dels ﬁx a grid of p enalty parameters and re-solve the optimization problem at eac h v alue (e.g., Chattopadhy a y et al. ( 2020 )). In contrast, the path-based approac h computes the full sequence of regularization parameters at a computational cost that is often comparable to solving the optimization problem for a single v alue of λ ( F riedman et al. , 2007 ). 3 Application: Pixel-lev el balancing W e illustrate balnet using data from W u et al. ( 2023 ), who estimate the causal eﬀect of past lo w-in tensity ﬁre on future high-in tensity ﬁres using NASA’s MODIS instrument ( Giglio et al. , 2003 ). T reatmen t status W i is equal to 1 for pixels that exp erienced a lo w-intensit y ﬁre in a given fo cal year and 0 otherwise, and the outcome is future high- in tensity ﬁre status. T o construct a con trol group, the study conducts a syn thetic con trol analysis ( Abadie et al. , 2010 ), using co v ariate balancing to construct A TT weigh ts such that pre-treatment ﬁre history and geographic cov ariate tra jectories are matc hed. W e focus on a single focal year (2008) and land co ver t yp e (conifer forests), yielding a dataset with n = 141 , 780 observ ations (4,466 treated) and p = 553 cov ariates, measuring past ﬁre history as well as top ographic, meteorological, disturbance, and v egetation 3 l 1 ( η ) is not a canonical GLM loss due to the m ultiple W i terms, this is not an issue as the solver only requires the loss to b e con vex in the linear predictors η ( Y ang and Hastie , 2024 , Section 4). 4 If the resulting im balance remains unsatisfactory , users ma y consider augmenting the IPW estimator with an outcome model that accoun ts for the residual im balance ( A they et al. , 2018 ). 6 c haracteristics. The goal is to ﬁnd weigh ts that match the means of all 553 cov ariates in the synthetic con trol group to the corresp onding treated-group means. The largest standardized mean diﬀerence in the unw eigh ted data is approximately 470, o ccurring for sno w w ater equiv alent in Septem b er 2004. W e set the parameter max.imbalance to 0.05, targeting a maximum allow able standardized co v ariate imbalance of 0.05. In ternally , balnet adjusts the λ sequence to range from λ max = 470 to λ min = 0 . 05 ov er nlambda = 100 logarithmically spaced v alues. Figure 1 summarizes the resulting path ﬁt for a default length of nlambda = 100 5 . Motiv ated b y standard diagnostic considerations in IPW analyses (e.g., Austin and Stu- art ( 2015 ); Greifer ( 2025 ); Keele et al. ( 2025 )), some commonly rep orted metrics are displa yed. F or a giv en λ , the estimated prop ensity scores targeting the con trol arm are ˆ e (0) ( X i ; λ ) and the A TT weigh ts are ˆ γ i ( λ ) = ˆ e (0) ( X i ; λ ) 1 − ˆ e (0) ( X i ; λ ) . Let ¯ X target and ¯ σ target denote the treated-group co v ariate means and standard deviations, and let ¯ X 0 ( λ ) = P i : W i =0 ˆ γ i ( λ ) X i P i : W i =0 ˆ γ i ( λ ) denote the w eigh ted con trol mean. The p -v ector of standardized mean diﬀerences (SMD) is SMD( λ ) = ¯ X 0 ( λ ) − ¯ X target ¯ σ target . The p ercen tage of bias reduction (PBR) is the reduction in av erage absolute SMD relativ e to the un w eighted data (corresp onding to λ max ), PBR( λ ) = 100 ×  1 − a vg | SMD( λ ) | a vg | SMD( λ max ) |  . Finally , the eﬀectiv e sample size (ESS), expressed as a p ercentage, is ESS( λ ) = 100 n 0 ×  P i : W i =0 ˆ γ i ( λ )  2 P i : W i =0 ˆ γ i ( λ ) 2 , n 0 = n X i =1 (1 − W i ) . The results show clear impro v ements in co v ariate balance, along with an increased con- cen tration of w eights on a smaller subset of control units. The righ t axis in Figure 1 displa ys the co eﬃcien t of v ariation of the weigh ts ˆ γ i ( λ ) via the identit y linking ESS to the co eﬃcien t of v ariation, p 100 / ESS( λ ) − 1. Figure 2 displa ys the individual SMD using the estimated w eights at λ min . Figure 2a shows the ten cov ariates with the largest un weigh ted imbalan ces, all related to sno w 5 As the path is not automatically truncated b efore reaching nlambda steps, further reductions in co v ariate imbalance would be p ossible if desired. 7 5e+02 1e+02 1e+01 1e+00 1e−01 0 20 40 60 80 100 Log ( λ ) P ercent (CV of weights) PBR ESS 3 2 1.5 1 0.5 0 Figure 1: Regularization path for cov ariate balancing using data from W u et al. ( 2023 ) for treatmen t y ear 2008 and conifer forests. The plot shows p ercen tage bias reduction (PBR) and eﬀective sample size (ESS) as functions of λ . The righ t-hand axis displa ys the coeﬃcient of v ariation of the in verse probabilit y w eighting (IPW) w eigh ts. w ater equiv alen t. Un w eighted imbalances are shown as blac k dots, while w eighted im- balances at the selected λ are shown in color. By construction of the cov ariate balancing loss function, the absolute standardized mean diﬀerence for eac h cov ariate is b ounded ab o v e by λ . Visualizing all 500+ cov ariates in a single panel is impractical, so we supply the groups argumen t to plot to aggregate cov ariates b y category . Figure 2b shows the resulting group ed imbalances. W e conclude by examining the estimated prop ensit y scores at λ min and comparing them with those from regularized logistic regression using maximum likelihoo d. Figure 3 shows the estimated A TT weigh ts for the con trol units. The treated fraction is only 4 , 466 141 , 780 ≈ 0 . 03, so maximum lik eliho od logistic regression places most prop ensit y esti- mates near zero in order to reﬂect the rarity of treatment, yielding A TT weigh ts with p oin t mass near zero. In contrast, the balancing-loss logistic regression assigns more v ariable w eights so the weigh ted control group more closely matches the treated group. 8 0 100 200 300 400 Standardized mean diff . swe_2005_10 swe_2000_10 swe_2006_8 swe_2005_9 swe_2000_7 swe_2005_11 swe_2004_8 swe_2006_10 swe_2005_8 swe_2004_9 λ max λ (a) Standardized mean diﬀerences for the 10 most imbalanced co v ariates. −0.5 0.0 0.5 1.0 Standardized mean diff . Avg fire brightness Fire frequency Disturbance: drought Disturbance: fire Max fire radiative po wer Disturbance: timber harvest Disturbance: greening Max air temperature Disturbance: browning Elev ation Min air temperature V egetation: tree cover Water v apor pressure Precipitation Snow water equiv alent λ max λ (b) Standardized mean diﬀerences grouped by co v ariate category . Figure 2: Standardized mean diﬀerences before and after weigh ting, ev aluated at λ min from the path ﬁt, with the unw eigh ted data corresp onding to λ max , using data from W u et al. ( 2023 ) for treatmen t y ear 2008 and conifer forests. 9 Figure 3: Estimated A TT w eights using penalized balance loss at λ min and penalized maxim um lik eli- ho od loss (default λ ML min ). 10 3.1 Timings T o assess runtime, w e conduct a small timing exp eriment using the data from this section 6 . W e deﬁne a baseline dataset consisting of the ﬁrst n = 100 , 000 observ ations and p = 500 co v ariates from the data describ ed in the previous section. W e then double b oth the sample size and the num b er of co v ariates b y sampling with replacemen t from the baseline data. F or eac h setting, w e measure the time required for balnet to compute a regularization path targeting the A TT with maximum imbalance λ min ∈ { 0 . 05 , 0 . 01 } . T able 1 rep orts runtimes obtained on a 8-core laptop with 24 GB of memory , using four cores for solver parallelism. The results suggest that runtime is more sensitive to the λ -sequence endp oin ts than to increases in the num b er of observ ations and co v ariates. n p max.imbalance ( λ min ) Run time (s) n, p scaling ( × ) λ min scaling ( × ) 100,000 500 0.05 4 – – 200,000 1,000 0.05 13 3.2 – 400,000 2,000 0.05 53 4.1 – 100,000 500 0.01 43 – 10.8 200,000 1,000 0.01 118 2.7 9.1 400,000 2,000 0.01 385 3.3 7.3 T able 1: Run times (in seconds) and relative scaling when doubling b oth the num ber of observ ations and co v ariates, and tigh tening balance requiremen t. 4 Discussion Regularization is a practical necessit y in many mo dern statistical settings and path wise solutions are now a standard to ol in regularized statistical learning ( F riedman et al. , 2007 , 2010 ). W e use path wise regularization in a causal setting to target balancing w eights for treatment eﬀects under unconfoundedness using logistic cov ariate balancing loss functions. A natural direction for future w ork is panel-data settings where separate balancing weigh ts are estimated along b oth time and unit dimensions, with regularization paths deﬁned accordingly . References Alb erto Abadie, Alexis Diamond, and Jens Hainm ueller. Syn thetic con trol metho ds for comparative case studies: Estimating the eﬀect of california’s tobacco control program. Journal of the Americ an Statistic al Asso ciation , 105(490):493–505, 2010. doi: 10.1198/jasa.2009.ap08746. 6 W e do not include b enc hmarks against the R pack ages men tioned in Section 1 , as under our experi- men tal setup they either returned errors or did not complete within a one-hour time window. 11 Susan A they , Guido W Imbens, and Stefan W ager. Appro ximate residual balancing: Debiased inference of av erage treatment eﬀects in high dimensions. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 80(4):597–623, 2018. doi: 10.1111/rssb.12268. P eter C Austin and Elizab eth A Stuart. Moving tow ards b est practice when using inv erse probabilit y of treatmen t w eigh ting (ipt w) using the prop ensit y score to estimate causal treatmen t eﬀects in observ ational studies. Statistics in me dicine , 34(28):3661–3679, 2015. Eli Ben-Michael, Avi F eller, David A Hirsh b erg, and Jos ´ e R Zubizarreta. The balancing act in causal inference. arXiv pr eprint arXiv:2110.14831 , 2021. Am barish Chattopadhy a y , Christopher H Hase, and Jos ´ e R Zubizarreta. Balancing vs mo deling approac hes to w eighting in practice. Statistics in Me dicine , 39(24):3227– 3254, 2020. Dirk Eddelbuettel and Romain F ran¸ cois. Rcpp: Seamless R and C++ in tegration. Journal of Statistic al Softwar e , 40:1–18, 2011. doi: 10.18637/jss.v040.i08. Christian F ong, Marc Ratk ovic, and Kosuk e Imai. CBPS: Covariate Balancing Pr op en- sity Sc or e , 2022. URL https://CRAN.R- project.org/package=CBPS . R pack age v ersion 0.23. Jerome F riedman, T rev or Hastie, Holger H¨ oﬂing, and Rob ert Tibshirani. P athwise co- ordinate optimization. The Annals of Applie d Statistics , 1(2):302 – 332, 2007. doi: 10.1214/07- A OAS131. Jerome F riedman, T revor Hastie, and Rob Tibshirani. Regularization paths for gener- alized linear mo dels via co ordinate descent. Journal of Statistic al Softwar e , 33(1):1, 2010. doi: 10.18637/jss.v033.i01. Louis Giglio, Jacques Descloitres, Christopher O Justice, and Y oram J Kaufman. An en- hanced con textual ﬁre detection algorithm for modis. R emote Sensing of Envir onment , 87(2-3):273–282, 2003. Noah Greifer. c ob alt: Covariate Balanc e T ables and Plots , 2025. URL https://CRAN. R- project.org/package=cobalt . R pac k age v ersion 4.6.1. Ga ¨ el Guennebaud, Beno ˆ ıt Jacob, et al. Eigen. https://libeigen.gitlab.io , 2010. Jens Hainmueller. En tropy balancing for causal eﬀects: A multiv ariate reweigh ting metho d to pro duce balanced samples in observ ational studies. Politic al analysis , 20 (1):25–46, 2012. doi: 10.1093/pan/mpr025. Da vid A Hirsh b erg and Stefan W ager. Augmen ted minimax linear estimation. The A nnals of Statistics , 49(6):3206–3227, 2021. doi: 10.1214/21- A OS2080. 12 Kosuk e Imai and Marc Ratk ovic. Cov ariate balancing prop ensit y score. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) , 76(1):243–263, 2014. doi: 10.1111/j.1467- 9868.2013.01043.x. Guido W Im b ens and Donald B Rubin. Causal Infer enc e in Statistics, So cial, and Biome dic al Scienc es . Cam bridge Universit y Press, Cam bridge, UK, 2015. doi: 10. 1017/CBO9781139025751. Luk e Keele, Eli Ben-Michael, Matthew Lenard, and Lindsay Page. Balancing w eights for estimating treatmen t eﬀects in educational studies. Journal of R ese ar ch on Edu- c ational Eﬀe ctiveness , pages 1–28, 2025. Thomas Lumley . Analysis of complex survey samples. Journal of Statistic al Softwar e , 9:1–19, 2004. R Core T eam. R: A language and en vironment for statistical computing, 2024. https: //www.R- project.org/ . James M Robins, Andrea Rotnitzky , and Lue Ping Zhao. Estimation of regression co eﬃcien ts when some regressors are not alwa ys observed. Journal of the A meric an Statistic al Asso ciation , 89(427):846–866, 1994. doi: 10.1080/01621459.1994.10476818. P aul R Rosenbaum and Donald B Rubin. The central role of the prop ensit y score in observ ational studies for causal eﬀects. Biometrika , 70(1):41–55, 1983. doi: 10.1093/ biomet/70.1.41. Noah Simon, Jerome F riedman, and T rev or Hastie. A blockwise descent algo- rithm for group-p enalized m ultiresp onse and m ultinomial regression. arXiv pr eprint arXiv:1311.6529 , 2013. Zhiqiang T an. Regularized calibrated estimation of prop ensity scores with mo del mis- sp eciﬁcation and high-dimensional data. Biometrika , 107(1):137–158, 2020. doi: 10.1093/biomet/asz059. Zhiqiang T an and Baoluo Sun. R CAL: R e gularize d Calibr ate d Estimation , 2020. R pac k age version 2.0. Rob ert Tibshirani. Regression shrink age and selection via the lasso. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 58(1):267–288, 1996. Rob ert Tibshirani, Jacob Bien, Jerome F riedman, T rev or Hastie, Noah Simon, Jonathan T a ylor, and Ryan J Tibshirani. Strong rules for discarding predictors in lasso-type problems. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 74(2):245–266, 2012. doi: 10.1111/j.1467- 9868.2011.01004.x. Karel V ermeulen and Stijn V ansteelandt. Bias-reduced doubly robust estimation. Jour- nal of the A meric an Statistic al Asso ciation , 110(511):1024–1036, 2015. 13 Stefan W ager. Causal Infer enc e: A Statistic al L e arning Appr o ach . Cam bridge Uni- v ersity Press (in preparation), 2024. https://web.stanford.edu/ ~ swager/causal_ inf_book.pdf . Xiao W u, Erik Sverdrup, Michael D Mastrandrea, Michael W W ara, and Stefan W ager. Lo w-intensit y ﬁres mitigate the risk of high-intensit y wildﬁres in California’s forests. Scienc e A dvanc es , 9(45), 2023. doi: 10.1126/sciadv.adi4123. James Y ang and T revor Hastie. A fast and scalable path wise-solver for group lasso and elastic net pe nalized regression via blo c k-co ordinate descent. arXiv pr eprint arXiv:2405.08631 , 2024. Qingyuan Zhao. Cov ariate balancing propensity score b y tailored loss functions. The A nnals of Statistics , 47(2):965–993, 2019. doi: 10.1214/18- A OS1698. Hui Zou and T rev or Hastie. Regularization and v ariable selection via the elastic net. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 67(2):301– 320, 2005. Jos ´ e R Zubizarreta. Stable weigh ts that balance cov ariates for estimation with incomplete outcome data. Journal of the Americ an Statistic al Asso ciation , 110(511):910–922, 2015. doi: 10.1080/01621459.2015.1023805. 14

balnet: Pathwise Estimation of Covariate Balancing Propensity Scores

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment