balnet: Pathwise Estimation of Covariate Balancing Propensity Scores
We present balnet, an R package for scalable pathwise estimation of covariate balancing propensity scores via logistic covariate balancing loss functions. Regularization paths are computed with Yang and Hastie (2024)'s generic elastic net solver, sup…
Authors: Erik Sverdrup, Trevor Hastie
balnet : P ath wise Estimation of Co v ariate Balancing Prop ensit y Scores Erik Sv erdrup 1 T rev or Hastie 2 1 Departmen t of Econometrics & Business Statistics, Monash Univ ersity 2 Departmen t of Statistics, Stanford Universit y Abstract W e present balnet , an R pack age for scalable path wise estimation of co v ariate balancing prop ensit y scores via logistic cov ariate balancing loss functions. Regular- ization paths are computed with Y ang and Hastie ( 2024 )’s generic elastic net solver, supp orting con vex losses with non-smo oth p enalties, as well as group penalties and feature-sp ecific p enalt y factors. F or lasso p enalization, balnet computes a regular- ized balance path from the largest observ ed cov ariate imbalance to a user-specified fraction of this maxim um. W e illustrate the metho d with an application to spa- tial pixel-level balancing for constructing synthetic con trol w eights for the av erage treatmen t effect on the treated, using satellite data on wildfires. 1 In tro duction The prop ensit y score plays a cen tral role in estimating causal effects in observ ational settings ( Rosen baum and Rubin , 1983 ). It is defined as the probabilit y that an individual with a giv en pre-treatmen t co v ariate profile is assigned to the treatmen t, and is frequently estimated using logistic regression. In high-dimensional settings, such as applications with many confounders, rich basis expansions, or higher-order interactions, a standard approac h is to estimate the prop ensit y score using lasso ( Tibshirani , 1996 ) or elastic net p enalized logistic regression ( Zou and Hastie , 2005 ), as implemented in the widely used R pack age glmnet ( F riedman et al. , 2010 ). A primary use of the prop ensit y score is inv erse probabilit y weigh ting (IPW), where outcomes are rew eighted to construct treated and control groups that are comparable in terms of their pre-treatmen t co v ariates. Under the true (oracle) propensity score, IPW exactly balances cov ariate means b et w een treatment arms and the ov erall p opula- tion. Ho wev er, ev en without mo del-missp ecification, this exact balancing prop ert y do es not generally hold for prop ensit y scores estimated using standard maximum lik eliho o d logistic regression. Maxim um likelihoo d estimation prioritizes accurate prediction of treatmen t assignment, rather than cov ariate balance. Imai and Ratk ovic ( 2014 ) intro- duced the notion of cov ariate balancing prop ensit y scores, prop ensity score estimators explicitly designed to optimize cov ariate balance. Causal effects estimated using these 1 kinds of weigh ts are more robust to model mis-specification, and more efficient than approac hes that estimate prop ensity scores via maximum lik eliho od ( Ben-Michael et al. , 2021 ; Chattopadhy a y et al. , 2020 ). A n um b er of metho ds ha v e b een prop osed to directly target cov ariate balance. These include the co v ariate balancing propensity score estimator, CBPS ( F ong et al. , 2022 ), whic h enforces moment conditions via generalized metho d of momen ts, and en tropy balancing, ebal ( Hainmueller , 2012 ), whic h directly optimizes weigh ts to achiev e balance. Suc h metho ds may ac hiev e exact balance in the relativ ely small datasets commonly encoun tered in the so cial sciences. How ev er, in large-scale datasets, these approac hes can b e computationally challenging to scale and are often infeasible, as exact balance ma y not b e attainable 1 . The stable balancing weigh ts approac h, sbw ( Zubizarreta , 2015 ), relaxes the requirement of exact balance and instead optimizes w eights to satisfy balance lev els pre-sp ecified b y the researc her. While this approac h is effectiv e, it requires the user to choose these levels a priori. If the c hosen level is infeasible for a given dataset, or if a differen t bias–v ariance tradeoff is desired, the underlying optimization problem m ust b e re-solved. In practice, this leads to iterative tuning b et w een balance constraints and downstream mo del v alidation ( Keele et al. , 2025 ). Motiv ated by these challenges, we seek an algorithmic solution that simplifies the selection of balance levels in a manner analogous to regularization paths for p enalized generalized linear mo dels. Our approach builds on recent w ork that unifies man y balance- orien ted estimators through cov ariate balancing loss functions ( Zhao , 2019 ). F or the logistic link, W ager ( 2024 , Chapter 7) note (emphasis added): . . . if we b eliev e in a linear-logistic sp ecification and wan t to use an IPW estimator, then we should learn the propensity model by minimizing the c ovariate-b alancing loss function rather than by the usual maximum likeli- ho o d loss used for logistic regression. This observ ation motiv ates the use of cov ariate balancing loss functions with mo dern regularization techniques. The p enalized versions of these loss functions corresp ond to the primal of optimization problems that constrains cov ariate imbalance ( Ben-Michael et al. , 2021 ; T an , 2020 ; Zhao , 2019 ). As a consequence, the in terpretation of the p enalt y parameter differs fundamen tally from that in p enalized logistic regression, where it acts as a co efficien t budget. F or lasso in particular, the p enalt y parameter directly controls the maxim um allow able absolute im balance across co v ariates, offering an intuitiv e metric to diagnose o v erlap issues, as well as monitor solv er progress in applications. T o our knowledge, the closest existing softw are is RCAL ( T an and Sun , 2020 ), which implemen ts lasso regularization for cov ariate balancing loss functions. How ev er, b ecause this solver is written entirely in R , it can b e difficult to scale to large datasets. This limitation is particularly salient in applications such as spatial or pixel-level balancing, where b oth the n um b er of observ ations and cov ariates are large. 1 A closely related approach from the survey literature, av ailable in the survey pack age ( Lumley , 2004 ), is raking, which calibrates survey weigh ts so that sample marginal distributions matc h known p opulation margins. 2 Our pac k age, balnet for R ( R Core T eam , 2024 ), addresses these limitations by com bining co v ariate balancing loss functions with mo dern optimization techniques for large-scale regularization problems. Our approach uses proximal quasi-Newton metho ds for generic conv ex losses, including algorithmic refinemen ts for co ordinate descent such as screening and activ e-set rules ( F riedman et al. , 2010 ; Simon et al. , 2013 ; Tibshirani et al. , 2012 ). The core algorithm builds on the adelie C++ e lastic net solver b y Y ang and Hastie ( 2024 ), interfaced with R via Rcpp ( Eddelbuettel and F ran¸ cois , 2011 ), and relies on n umerical linear algebra routines from Eigen ( Guennebaud et al. , 2010 ). In an application from W u et al. ( 2023 ) inv olving spatial balancing with ab out 140,000 observ ations and 500 co v ariates, balnet computes the full regularization path, from ra w, unw eighted imbalance do wn to commonly accepted balance thresholds, in under one minute on a standard laptop. 2 Regularization paths for co v ariate balancing prop ensit y scores Let Y i (1) and Y i (0) denote the p otential outcomes under treatment and control, resp ec- tiv ely ( Imbens and Rubin , 2015 ). The realized treatmen t assignment is W i ∈ { 0 , 1 } and the observed outcome is Y i = Y i ( W i ), for a sample of units i = 1 , . . . , n . In observ ational settings, a common identifying assumption is unconfoundedness: conditional on a set of pre-treatmen t co v ariates X i ∈ R p , treatmen t assignmen t is as go o d as random. Under this assumption, a cen tral ob ject is the prop ensit y score, e ( x ) = P [ W i = 1 | X i = x ] . Giv en suitable o verlap conditions, potential outcome means are identified via in v erse- prop ensit y w eighted (IPW) outcomes. W e b egin by considering the treated mean, ex- tensions to the av erage treatment effect (A TE) and the a verage treatmen t effect on the treated (A TT) follo w by symmetry . The treated mean is iden tified by E [ Y i (1)] = E W i Y i e ( X i ) . The oracle propensity score also satisfies p opulation-lev el cov ariate balance, E W i X ij e ( X i ) = E [ X ij ] , j = 1 , . . . , p. F or notational simplicit y , we describe the setting in terms of linear dep endence on X i , the discussion extends directly to nonlinear sp ecifications via basis expansions and in- teractions. Under a logistic prop ensit y score model, e ( x ) = 1 1 + exp ( − ( β 0 + x ⊤ β )) . 3 If β is estimated by maxim um likelihoo d using the standard Bernoulli log-likelihoo d, the gradien t equations are 1 n n X i =1 W i − ˆ e ( X i ) X ij = 0 , j = 1 , . . . , p, (1) where ˆ e ( X i ) are the fitted prop ensit y scores. Maxim um likelihoo d essentially balances predicted and observed treatmen t. Achieving finite-sample co v ariate balance using the fitted prop ensities requires that 1 n n X i =1 W i X ij ˆ e ( X i ) = 1 n n X i =1 X ij , j = 1 , . . . , p. (2) T o enforce this cov ariate balancing condition, w e instead work backw ards and require the gradients satisfy 1 n n X i =1 W i ˆ e ( X i ) − 1 X ij = 0 , j = 1 , . . . , p. (3) Substituting in for the logistic prop ensity scores, this is equiv alen t to requiring that ˆ β satisfies 1 n n X i =1 W i h 1 + exp − ( ˆ β 0 + X i ˆ β ) i − 1 ! X ij = 0 , j = 1 , . . . , p. (4) The loss function that enforces ( 4 ) is l 1 ( η ) = 1 n n X i =1 W i exp( − η i ) + (1 − W i ) η i , (5) where η i = β 0 + X i β . This can b e v erified b y setting the gradien t of l 1 ( η ) to zero, whic h yields the balance conditions in ( 4 ). The IPW estimator constructed using such weigh ts has fa vorable statistical prop erties: when the logistic model is correctly sp ecified it is √ n - consisten t with efficien t v ariance ( W ager , 2024 , Theorem 7.1). Intuitiv ely , this approac h targets the ratio W i /e ( X i ) (the Riesz represen ter for E [ Y i (1)]; see e.g. Hirsh b erg and W ager ( 2021 )), rather than estimating e ( X i ) via maximum lik eliho o d and inv erting the fitted probabilities 2 . The balance condition for the con trol group mean is 1 n n X i =1 (1 − W i ) X ij 1 − ˆ e ( X i ) = 1 n n X i =1 X ij , j = 1 , . . . , p, (6) whic h yields an analogous loss l 0 ( η ) for estimating propensity scores that target E [ Y i (0)], l 0 ( η ) = 1 n n X i =1 (1 − W i ) exp( η i ) − W i η i . (7) 2 These weigh ts can also improv e doubly robust metho ds by stabilizing the AIPW debiasing term ( Robins et al. , 1994 ); see, for example, V ermeulen and V ansteelandt ( 2015 ). 4 Remark 1. F or estimating the A TE, E [ Y i (1)] − E [ Y i (0)] , this appr o ach r e quir es fitting two pr op ensity sc or e mo dels: one tar geting b alanc e for the tr e ate d me an and one for the c ontr ol me an, unlike maximum likeliho o d which fits one set of c o efficients. Remark 2. F or softwar e, it is sufficient to implement only the loss in ( 5 ) . By symme- try, pr op ensities tar geting E [ Y i (0)] ar e obtaine d by solving ( 5 ) with inverte d tr e atment indic ators W ′ i = 1 − W i . The pr op ensities tar geting E [ Y i (0)] c an also b e use d to tar get the A TT, E [ Y i (1) − Y i (0) | W i = 1] , sinc e the losses only differ by a sc aling (the former weights the c ontr ol to match 1 n P n i =1 X i while the latter weights the c ontr ols to match 1 P n i =1 W i P n i =1 W i X i ). In practice, directly minimizing ( 5 ) may not b e feasible. Exact balance ma y b e unattainable in settings with high-dimensional co v ariates ( p > n ) or limited ov erlap, p o- ten tially pro ducing extreme IPW w eights. The cov ariate balancing loss imp oses stronger requiremen ts than maximum likelihoo d estimation, whic h only requires o verlap, e.g., the absence of complete separation. Prop osition 1 (Prop osition S1, T an ( 2020 )) . The loss l 1 ( η ) is strictly c onvex, b ounde d b elow, and has a unique minimizer if and only if the fol lowing set is empty: ( β = 0 : η i ≥ 0 if W i = 1 for i = 1 , . . . , n, and 1 n n X i =1 (1 − W i ) η i ≤ 0 ) . This condition is stricter than logistic separation: it not only forbids a linear hy- p erplane from p erfectly separating treated and con trols, but also prev ents the w eighted con trol mean from exceeding the treated mean along any linear combination of co v ari- ates. Regularization is therefore key to making this approach practical. W e consider the p enalized ob jective l 1 ( η ) + λP ( β ) , (8) where P ( · ) is a suitable p enalty . The default c hoice in balnet is the lasso, P ( β ) = P p j =1 | β j | . This p enalt y has a particularly con venien t interpretation for balancing losses ( T an , 2020 ; Zhao , 2019 ). Exact balance requires 1 n n X i =1 W i X ij ˆ e ( X i ) − 1 n n X i =1 X ij = 0 , j = 1 , . . . , p. (9) Under lasso p enalization, this constraint is relaxed to a b o x constraint since the corre- sp onding KKT conditions giv es 1 n n X i =1 W i X ij ˆ e ( X i ) − 1 n n X i =1 X ij ≤ λ, j = 1 , . . . , p. (10) That is, the regularization path indexed by λ yields a sequence of solutions with grad- ually decreasing maximum im balance tolerances, corresp onding to an ℓ ∞ b ound on the co v ariate imbalance v ector. 5 balnet minimizes ( 8 ) using the group elastic net solv er of Y ang and Hastie ( 2024 ), whic h supports lasso, ridge, elastic net, and group p enalties 3 . Group p enalties, optionally com bined with p enalt y factors, allow balnet to balance sets of related cov ariates jointly , suc h as in teractions or categorical expansions, an approach that is useful in man y applied settings (e.g., Ben-Mic hael et al. , 2021 ). P athwise optimization is a natural fit for this loss and can provide useful practical diagnostic for o verlap. balnet constructs a log-spaced λ sequence in the spirit of glmnet , starting at the smallest λ max for which the solution has ˆ β = 0, and pro ceeding to a fixed fraction of this v alue (with default path length 100). F or the treated mean, this maxim um v alue is simply giv en by the largest un w eighted treated imbalance, λ max 1 = max j 1 n n X i =1 W i X ij − 1 n n X i =1 X ij . F or lasso p enalization, balnet also allows users to sp ecify a target λ min corresp onding directly to a maximum allow able im balance. The path solv er then attempts to reac h this tolerance and gracefully truncates the path if further reductions are infeasible 4 . By default, balnet standardizes the cov ariates, so λ can b e interpreted as a limit on the standardized mean difference. Another b enefit of path solvers is the efficient computation of cross-v alidation grids, as implemen ted in cv.balnet . Direct approac hes to cross-v alidation for regularized mo dels fix a grid of p enalty parameters and re-solve the optimization problem at eac h v alue (e.g., Chattopadhy a y et al. ( 2020 )). In contrast, the path-based approac h computes the full sequence of regularization parameters at a computational cost that is often comparable to solving the optimization problem for a single v alue of λ ( F riedman et al. , 2007 ). 3 Application: Pixel-lev el balancing W e illustrate balnet using data from W u et al. ( 2023 ), who estimate the causal effect of past lo w-in tensity fire on future high-in tensity fires using NASA’s MODIS instrument ( Giglio et al. , 2003 ). T reatmen t status W i is equal to 1 for pixels that exp erienced a lo w-intensit y fire in a given fo cal year and 0 otherwise, and the outcome is future high- in tensity fire status. T o construct a con trol group, the study conducts a syn thetic con trol analysis ( Abadie et al. , 2010 ), using co v ariate balancing to construct A TT weigh ts such that pre-treatment fire history and geographic cov ariate tra jectories are matc hed. W e focus on a single focal year (2008) and land co ver t yp e (conifer forests), yielding a dataset with n = 141 , 780 observ ations (4,466 treated) and p = 553 cov ariates, measuring past fire history as well as top ographic, meteorological, disturbance, and v egetation 3 l 1 ( η ) is not a canonical GLM loss due to the m ultiple W i terms, this is not an issue as the solver only requires the loss to b e con vex in the linear predictors η ( Y ang and Hastie , 2024 , Section 4). 4 If the resulting im balance remains unsatisfactory , users ma y consider augmenting the IPW estimator with an outcome model that accoun ts for the residual im balance ( A they et al. , 2018 ). 6 c haracteristics. The goal is to find weigh ts that match the means of all 553 cov ariates in the synthetic con trol group to the corresp onding treated-group means. The largest standardized mean difference in the unw eigh ted data is approximately 470, o ccurring for sno w w ater equiv alent in Septem b er 2004. W e set the parameter max.imbalance to 0.05, targeting a maximum allow able standardized co v ariate imbalance of 0.05. In ternally , balnet adjusts the λ sequence to range from λ max = 470 to λ min = 0 . 05 ov er nlambda = 100 logarithmically spaced v alues. Figure 1 summarizes the resulting path fit for a default length of nlambda = 100 5 . Motiv ated b y standard diagnostic considerations in IPW analyses (e.g., Austin and Stu- art ( 2015 ); Greifer ( 2025 ); Keele et al. ( 2025 )), some commonly rep orted metrics are displa yed. F or a giv en λ , the estimated prop ensity scores targeting the con trol arm are ˆ e (0) ( X i ; λ ) and the A TT weigh ts are ˆ γ i ( λ ) = ˆ e (0) ( X i ; λ ) 1 − ˆ e (0) ( X i ; λ ) . Let ¯ X target and ¯ σ target denote the treated-group co v ariate means and standard deviations, and let ¯ X 0 ( λ ) = P i : W i =0 ˆ γ i ( λ ) X i P i : W i =0 ˆ γ i ( λ ) denote the w eigh ted con trol mean. The p -v ector of standardized mean differences (SMD) is SMD( λ ) = ¯ X 0 ( λ ) − ¯ X target ¯ σ target . The p ercen tage of bias reduction (PBR) is the reduction in av erage absolute SMD relativ e to the un w eighted data (corresp onding to λ max ), PBR( λ ) = 100 × 1 − a vg | SMD( λ ) | a vg | SMD( λ max ) | . Finally , the effectiv e sample size (ESS), expressed as a p ercentage, is ESS( λ ) = 100 n 0 × P i : W i =0 ˆ γ i ( λ ) 2 P i : W i =0 ˆ γ i ( λ ) 2 , n 0 = n X i =1 (1 − W i ) . The results show clear impro v ements in co v ariate balance, along with an increased con- cen tration of w eights on a smaller subset of control units. The righ t axis in Figure 1 displa ys the co efficien t of v ariation of the weigh ts ˆ γ i ( λ ) via the identit y linking ESS to the co efficien t of v ariation, p 100 / ESS( λ ) − 1. Figure 2 displa ys the individual SMD using the estimated w eights at λ min . Figure 2a shows the ten cov ariates with the largest un weigh ted imbalan ces, all related to sno w 5 As the path is not automatically truncated b efore reaching nlambda steps, further reductions in co v ariate imbalance would be p ossible if desired. 7 5e+02 1e+02 1e+01 1e+00 1e−01 0 20 40 60 80 100 Log ( λ ) P ercent (CV of weights) PBR ESS 3 2 1.5 1 0.5 0 Figure 1: Regularization path for cov ariate balancing using data from W u et al. ( 2023 ) for treatmen t y ear 2008 and conifer forests. The plot shows p ercen tage bias reduction (PBR) and effective sample size (ESS) as functions of λ . The righ t-hand axis displa ys the coefficient of v ariation of the in verse probabilit y w eighting (IPW) w eigh ts. w ater equiv alen t. Un w eighted imbalances are shown as blac k dots, while w eighted im- balances at the selected λ are shown in color. By construction of the cov ariate balancing loss function, the absolute standardized mean difference for eac h cov ariate is b ounded ab o v e by λ . Visualizing all 500+ cov ariates in a single panel is impractical, so we supply the groups argumen t to plot to aggregate cov ariates b y category . Figure 2b shows the resulting group ed imbalances. W e conclude by examining the estimated prop ensit y scores at λ min and comparing them with those from regularized logistic regression using maximum likelihoo d. Figure 3 shows the estimated A TT weigh ts for the con trol units. The treated fraction is only 4 , 466 141 , 780 ≈ 0 . 03, so maximum lik eliho od logistic regression places most prop ensit y esti- mates near zero in order to reflect the rarity of treatment, yielding A TT weigh ts with p oin t mass near zero. In contrast, the balancing-loss logistic regression assigns more v ariable w eights so the weigh ted control group more closely matches the treated group. 8 0 100 200 300 400 Standardized mean diff . swe_2005_10 swe_2000_10 swe_2006_8 swe_2005_9 swe_2000_7 swe_2005_11 swe_2004_8 swe_2006_10 swe_2005_8 swe_2004_9 λ max λ (a) Standardized mean differences for the 10 most imbalanced co v ariates. −0.5 0.0 0.5 1.0 Standardized mean diff . Avg fire brightness Fire frequency Disturbance: drought Disturbance: fire Max fire radiative po wer Disturbance: timber harvest Disturbance: greening Max air temperature Disturbance: browning Elev ation Min air temperature V egetation: tree cover Water v apor pressure Precipitation Snow water equiv alent λ max λ (b) Standardized mean differences grouped by co v ariate category . Figure 2: Standardized mean differences before and after weigh ting, ev aluated at λ min from the path fit, with the unw eigh ted data corresp onding to λ max , using data from W u et al. ( 2023 ) for treatmen t y ear 2008 and conifer forests. 9 Figure 3: Estimated A TT w eights using penalized balance loss at λ min and penalized maxim um lik eli- ho od loss (default λ ML min ). 10 3.1 Timings T o assess runtime, w e conduct a small timing exp eriment using the data from this section 6 . W e define a baseline dataset consisting of the first n = 100 , 000 observ ations and p = 500 co v ariates from the data describ ed in the previous section. W e then double b oth the sample size and the num b er of co v ariates b y sampling with replacemen t from the baseline data. F or eac h setting, w e measure the time required for balnet to compute a regularization path targeting the A TT with maximum imbalance λ min ∈ { 0 . 05 , 0 . 01 } . T able 1 rep orts runtimes obtained on a 8-core laptop with 24 GB of memory , using four cores for solver parallelism. The results suggest that runtime is more sensitive to the λ -sequence endp oin ts than to increases in the num b er of observ ations and co v ariates. n p max.imbalance ( λ min ) Run time (s) n, p scaling ( × ) λ min scaling ( × ) 100,000 500 0.05 4 – – 200,000 1,000 0.05 13 3.2 – 400,000 2,000 0.05 53 4.1 – 100,000 500 0.01 43 – 10.8 200,000 1,000 0.01 118 2.7 9.1 400,000 2,000 0.01 385 3.3 7.3 T able 1: Run times (in seconds) and relative scaling when doubling b oth the num ber of observ ations and co v ariates, and tigh tening balance requiremen t. 4 Discussion Regularization is a practical necessit y in many mo dern statistical settings and path wise solutions are now a standard to ol in regularized statistical learning ( F riedman et al. , 2007 , 2010 ). W e use path wise regularization in a causal setting to target balancing w eights for treatment effects under unconfoundedness using logistic cov ariate balancing loss functions. A natural direction for future w ork is panel-data settings where separate balancing weigh ts are estimated along b oth time and unit dimensions, with regularization paths defined accordingly . References Alb erto Abadie, Alexis Diamond, and Jens Hainm ueller. Syn thetic con trol metho ds for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the Americ an Statistic al Asso ciation , 105(490):493–505, 2010. doi: 10.1198/jasa.2009.ap08746. 6 W e do not include b enc hmarks against the R pack ages men tioned in Section 1 , as under our experi- men tal setup they either returned errors or did not complete within a one-hour time window. 11 Susan A they , Guido W Imbens, and Stefan W ager. Appro ximate residual balancing: Debiased inference of av erage treatment effects in high dimensions. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 80(4):597–623, 2018. doi: 10.1111/rssb.12268. P eter C Austin and Elizab eth A Stuart. Moving tow ards b est practice when using inv erse probabilit y of treatmen t w eigh ting (ipt w) using the prop ensit y score to estimate causal treatmen t effects in observ ational studies. Statistics in me dicine , 34(28):3661–3679, 2015. Eli Ben-Michael, Avi F eller, David A Hirsh b erg, and Jos ´ e R Zubizarreta. The balancing act in causal inference. arXiv pr eprint arXiv:2110.14831 , 2021. Am barish Chattopadhy a y , Christopher H Hase, and Jos ´ e R Zubizarreta. Balancing vs mo deling approac hes to w eighting in practice. Statistics in Me dicine , 39(24):3227– 3254, 2020. Dirk Eddelbuettel and Romain F ran¸ cois. Rcpp: Seamless R and C++ in tegration. Journal of Statistic al Softwar e , 40:1–18, 2011. doi: 10.18637/jss.v040.i08. Christian F ong, Marc Ratk ovic, and Kosuk e Imai. CBPS: Covariate Balancing Pr op en- sity Sc or e , 2022. URL https://CRAN.R- project.org/package=CBPS . R pack age v ersion 0.23. Jerome F riedman, T rev or Hastie, Holger H¨ ofling, and Rob ert Tibshirani. P athwise co- ordinate optimization. The Annals of Applie d Statistics , 1(2):302 – 332, 2007. doi: 10.1214/07- A OAS131. Jerome F riedman, T revor Hastie, and Rob Tibshirani. Regularization paths for gener- alized linear mo dels via co ordinate descent. Journal of Statistic al Softwar e , 33(1):1, 2010. doi: 10.18637/jss.v033.i01. Louis Giglio, Jacques Descloitres, Christopher O Justice, and Y oram J Kaufman. An en- hanced con textual fire detection algorithm for modis. R emote Sensing of Envir onment , 87(2-3):273–282, 2003. Noah Greifer. c ob alt: Covariate Balanc e T ables and Plots , 2025. URL https://CRAN. R- project.org/package=cobalt . R pac k age v ersion 4.6.1. Ga ¨ el Guennebaud, Beno ˆ ıt Jacob, et al. Eigen. https://libeigen.gitlab.io , 2010. Jens Hainmueller. En tropy balancing for causal effects: A multiv ariate reweigh ting metho d to pro duce balanced samples in observ ational studies. Politic al analysis , 20 (1):25–46, 2012. doi: 10.1093/pan/mpr025. Da vid A Hirsh b erg and Stefan W ager. Augmen ted minimax linear estimation. The A nnals of Statistics , 49(6):3206–3227, 2021. doi: 10.1214/21- A OS2080. 12 Kosuk e Imai and Marc Ratk ovic. Cov ariate balancing prop ensit y score. Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) , 76(1):243–263, 2014. doi: 10.1111/j.1467- 9868.2013.01043.x. Guido W Im b ens and Donald B Rubin. Causal Infer enc e in Statistics, So cial, and Biome dic al Scienc es . Cam bridge Universit y Press, Cam bridge, UK, 2015. doi: 10. 1017/CBO9781139025751. Luk e Keele, Eli Ben-Michael, Matthew Lenard, and Lindsay Page. Balancing w eights for estimating treatmen t effects in educational studies. Journal of R ese ar ch on Edu- c ational Effe ctiveness , pages 1–28, 2025. Thomas Lumley . Analysis of complex survey samples. Journal of Statistic al Softwar e , 9:1–19, 2004. R Core T eam. R: A language and en vironment for statistical computing, 2024. https: //www.R- project.org/ . James M Robins, Andrea Rotnitzky , and Lue Ping Zhao. Estimation of regression co efficien ts when some regressors are not alwa ys observed. Journal of the A meric an Statistic al Asso ciation , 89(427):846–866, 1994. doi: 10.1080/01621459.1994.10476818. P aul R Rosenbaum and Donald B Rubin. The central role of the prop ensit y score in observ ational studies for causal effects. Biometrika , 70(1):41–55, 1983. doi: 10.1093/ biomet/70.1.41. Noah Simon, Jerome F riedman, and T rev or Hastie. A blockwise descent algo- rithm for group-p enalized m ultiresp onse and m ultinomial regression. arXiv pr eprint arXiv:1311.6529 , 2013. Zhiqiang T an. Regularized calibrated estimation of prop ensity scores with mo del mis- sp ecification and high-dimensional data. Biometrika , 107(1):137–158, 2020. doi: 10.1093/biomet/asz059. Zhiqiang T an and Baoluo Sun. R CAL: R e gularize d Calibr ate d Estimation , 2020. R pac k age version 2.0. Rob ert Tibshirani. Regression shrink age and selection via the lasso. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 58(1):267–288, 1996. Rob ert Tibshirani, Jacob Bien, Jerome F riedman, T rev or Hastie, Noah Simon, Jonathan T a ylor, and Ryan J Tibshirani. Strong rules for discarding predictors in lasso-type problems. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 74(2):245–266, 2012. doi: 10.1111/j.1467- 9868.2011.01004.x. Karel V ermeulen and Stijn V ansteelandt. Bias-reduced doubly robust estimation. Jour- nal of the A meric an Statistic al Asso ciation , 110(511):1024–1036, 2015. 13 Stefan W ager. Causal Infer enc e: A Statistic al L e arning Appr o ach . Cam bridge Uni- v ersity Press (in preparation), 2024. https://web.stanford.edu/ ~ swager/causal_ inf_book.pdf . Xiao W u, Erik Sverdrup, Michael D Mastrandrea, Michael W W ara, and Stefan W ager. Lo w-intensit y fires mitigate the risk of high-intensit y wildfires in California’s forests. Scienc e A dvanc es , 9(45), 2023. doi: 10.1126/sciadv.adi4123. James Y ang and T revor Hastie. A fast and scalable path wise-solver for group lasso and elastic net pe nalized regression via blo c k-co ordinate descent. arXiv pr eprint arXiv:2405.08631 , 2024. Qingyuan Zhao. Cov ariate balancing propensity score b y tailored loss functions. The A nnals of Statistics , 47(2):965–993, 2019. doi: 10.1214/18- A OS1698. Hui Zou and T rev or Hastie. Regularization and v ariable selection via the elastic net. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 67(2):301– 320, 2005. Jos ´ e R Zubizarreta. Stable weigh ts that balance cov ariates for estimation with incomplete outcome data. Journal of the Americ an Statistic al Asso ciation , 110(511):910–922, 2015. doi: 10.1080/01621459.2015.1023805. 14
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment