genriesz: A Python Package for Automatic Debiased Machine Learning with Generalized Riesz Regression

genriesz : A Python P ac k age for Automatic Debiased Mac hine Learning with Generalized Riesz Regression Masahiro Kato ∗ Data Analytics Departmen t, Mizuho-DL Financial T ec hnology , Co., Ltd. F ebruary 20, 2026 Abstract Eﬃcien t estimation of causal and structural parameters can b e automated using the Riesz represen tation theorem and debiased machine learning (DML). W e present genriesz , an op en-source Python pack age that implemen ts automatic DML and gen- er alize d R iesz r e gr ession , a uniﬁed framework for estimating Riesz representers b y minimizing empirical Bregman div ergences. This framework includes co v ariate balanc- ing, nearest-neigh b or matching, calibrated estimation, and density ratio estimation as sp ecial cases. A k ey design principle of the pack age is automatic r e gr essor b alanc- ing (ARB): given a Bregman generator g and a representer mo del class, genriesz automatically constructs a compatible link function so that the generalized Riesz re- gression estimator satisﬁes balancing (momen t-matching) optimalit y conditions in a user-c hosen basis. The pac k age pro vides a mo dular interface for specifying (i) the target linear functional via a blac k-b o x ev aluation oracle, (ii) the represen ter mo del via basis functions (polynomial, RKHS appro ximations, random forest leaf enco dings, neural em b eddings, and a nearest-neigh b or catchmen t basis), and (iii) the Bregman generator, with optional user-supplied deriv ativ es. It returns regression adjustment (RA), Riesz w eighting (R W), augmen ted Riesz w eighting (AR W), and TMLE-style estimators with cross-ﬁtting, conﬁdence interv als, and p -v alues. W e highlight represen tative w orkﬂows for estimation problems such as the a verage treatmen t eﬀect (A TE), A TE on treated (A TT), and av erage marginal eﬀect estimation. The Python pack age is a v ailable at https://github.com/MasaKat0/genriesz and on PyPI. 1 In tro duction Man y targets in causal inference and econometrics can b e expressed as linear functionals of an unkno wn regression function. Prominent examples include the a v erage treatmen t eﬀect (A TE), the av erage treatment eﬀect on the treated (A TT), and a verage marginal eﬀects (AME). ∗ Email: mkato-csecon@g.ecc.u-tokyo.ac.jp 1 Automatic debiased machine learning (ADML) pro vides general tools for v alid inference on suc h targets ( Chernozh uko v et al. , 2022b ). The ADML w orkﬂow separates the problem in to t wo parts. First, one estimates n uisance comp onen ts, t ypically a regression function and a Riesz representer. Second, one plugs these estimates in to a Neyman orthogonal score and a verages the resulting score o v er the sample. A Donsk er condition or cross-ﬁtting then yields v alid inference under w eak rate conditions ( Chernozh uko v et al. , 2018 ; Klaassen , 1987 ). V arious metho ds ha ve been prop osed for Riesz representer estimation, suc h as Riesz regres- sion ( Chernozh uko v et al. , 2021 ; Chen & Liao , 2015 ), co v ariate balancing ( Imai & Ratko vic , 2013 ), and density ratio estimation ( Sugiyama et al. , 2012 ). Kato ( 2026 , 2025a ) demonstrates a unifying p erspective, Riesz r epr esenter ﬁtting under Br e gman diver genc es , which is also referred to as generalized Riesz regression, Bregman-Riesz regression, or generalized co v ariate balancing. Hereafter, w e refer to this metho d as generalized Riesz regression, a name that emphasizes the connection b etw een Riesz representer ﬁtting and balancing w eigh ts through the c hoice of link function. In this framew ork, one can reco ver the v arious metho ds listed ab o v e by sp ecifying particular forms of the Bregman div ergence. Our genriesz pac k age implemen ts generalized Riesz regression for ADML. Users sp ecify the estimand through an ev aluation oracle and choose a Riesz representer mo del and a Bregman generator. The pac k age then constructs a generator-induced link function to deliv er automatic regressor balancing, estimates the representer b y con vex optimization with ℓ p regularization, and rep orts regression adjustmen t (RA), Riesz w eighting (R W), augmen ted Riesz w eighting (AR W), and targeted maxim um likelihoo d estimation (TMLE)-st yle estimates with conﬁdence in terv als, optionally using cross-ﬁtting ( Bang & Robins , 2005 ; v an der Laan , 2006 ). The ﬂo wc hart of this automatic procedure is below, where each ob ject will b e deﬁned in the subsequen t sections (Figure 1 ): (i) the user sp eciﬁes the parameter functional m ( W , γ 0 ) for the parameter of in terest θ 0 : = E [ m ( W , γ 0 )] , the Bregman generator g for Riesz represen ter estimation, and the basis functions ϕ ( X ). (ii) the pack age automatically computes the link function ζ  X , ϕ ( X ) ⊤ β  that yields regres- sor balancing, and estimates the Riesz representer α 0 and, when needed, the regression function γ 0 . (iii) the pac k age outputs RA, R W, AR W, and TMLE-style estimators with standard errors and conﬁdence interv als. During this process, the user do es not need to specify the analytic form of the Riesz represen ter α 0 or the link function ζ . These are constructed from the chosen basis and generator. Relation to existing softw are. The genriesz pac k age complemen ts established DML libraries suc h as DoubleML ( Bac h et al. , 2022 ) and EconML ( Batto cchi et al. , 2019 ), as w ell as broader causal inference to olkits such as DoWh y ( Sharma & Kiciman , 2020 ) and CausalML ( Chen et al. , 2020 ). These libraries oﬀer ric h sets of estimators for canonical causal models and heterogeneous treatmen t eﬀects. In con trast, genriesz is estimand-and-b alancing-c entric : 2 Figure 1: Flow c hart of genriesz . the user provides the functional m , the represen ter mo del, and the speciﬁc form of the Bregman div ergence, while the pac k age constructs generalized Riesz representers with a link function that yields balancing w eigh ts and returns the corresponding estimators. This fo cus also mak es explicit the connection betw een Riesz representer ﬁtting and balancing weigh ts, including stable weigh ts ( Zubizarreta , 2015 ) and entrop y balancing ( Hainm ueller , 2012 ), as w ell as density ratio estimation metho ds av ailable in sp ecialized pac k ages suc h as densratio ( Makiy ama , 2019 ). 2 Key Ingredien ts of Generalized Riesz Regression Generalized Riesz regression connects Riesz represen ter estimation, co v ariate balancing, and debiased estimation through Bregman divergence minimization. This section summarizes the metho dological and theoretical foundations that underlie the soft w are design. F ull technical details, proofs, and extensions are provided in Kato ( 2026 ). 2.1 Linear F unctionals, Riesz Representers, and Orthogonal Scores Let W : = ( X , Y ) ∼ P , where X ∈ R d is a regressor and Y ∈ R is an outcome. Let γ 0 ( x ) : = E [ Y | X = x ]. W e consider targets of the form θ 0 : = E [ m ( W , γ 0 )] , (1) where m ( W , γ ) is linear in γ . In genriesz , users supply m as a callable that ev aluates γ at mo diﬁed inputs, for example, b y switching a treatmen t comp onent. Under standard conditions, there exists a Riesz representer α 0 suc h that E [ m ( W , γ )] = E [ α 0 ( X ) γ ( X )] for all suitable γ . (2) The Riesz representer en ters the Neyman orthogonal score ψ ( W ; θ , γ , α ) : = m ( W , γ ) + α ( X ) ( Y − γ ( X )) − θ , (3) whic h plays a cen tral role in debiased estimation and v alid inference under weak conditions, with cross-ﬁtting ( Chernozhuk ov et al. , 2018 ). 3 T able 1: Corresp ondence among Bregman div ergence losses, density ratio (DR) estimation metho ds, and Riesz represen ter (RR) estimation. RR estimation for A TE includes prop ensit y score estimation and cov ariate balancing weigh ts. In the table, C ∈ R denotes a constant that is determined b y the problem and the loss function. The pac k age also supports user-deﬁned generators. g ( α ) DR estimation RR estimation ( α − C ) 2 LSIF SQ-Riesz regression ( Kanamori et al. , 2009 ) ( genriesz ) KuLSIF Riesz regression (RieszNet and F orestRiesz) ( Kanamori et al. , 2012 ) ( Chernozhuk ov et al. , 2021 , 2022a ) Hyv¨ arinen score matc hing RieszBo ost ( Hyv¨ arinen , 2005 ) ( Lee & Sc huler , 2025 ) KRRR ( Singh , 2024 ) Nearest neigh bor matching ( Lin et al. , 2023 ) Causal forest and generalized random forest ( W ager & Athey , 2018 ; Athey et al. , 2019 ) Dual solution with a linear link function Kernel mean matc hing Sieve Riesz representer ( Gretton et al. , 2009 ) ( Chen & Liao , 2015 ; Chen & Pouzo , 2015 ) Stable balancing w eights ( Zubizarreta , 2015 ; Bruns-Smith et al. , 2025 ) Approximate residual balancing ( Athey et al. , 2018 ) Cov ariate balancing by SVM ( T arr & Imai , 2025 ) Distributional balancing ( Santra et al. , 2026 )  ( | α | − C )  log  ( | α | − C )  − | α | UKL div ergence minimization UKL-Riesz regression ( Nguyen et al. , 2010 ) ( genriesz ) T ailored loss minimization ( α = β = − 1) ( Zhao , 2019 ) Calibrated estimation ( T an , 2019 ) Dual solution with a logistic or log link function KLIEP Entrop y balancing weigh ts ( Sugiyama et al. , 2008 ) ( Hainmueller , 2012 ) ( | α | − C ) log  ( | α | − C )  − ( | α | + C ) log  ( | α | + C )  BKL div ergence minimization BKL-Riesz regression ( Qin , 1998 ) ( genriesz ) TRE MLE of the prop ensity score ( Rhodes et al. , 2020 ) (Standard approac h) T ailored loss minimization ( α = β = 0) ( Zhao , 2019 )  ( | α |− C )  1+ ω −  ( | α |− C )  ω − ( | α | − C ) BP div ergence minimization BP-Riesz regression for some ω ∈ (0 , ∞ ) ( Sugiyama et al. , 2011 ) ( genriesz ) C log (1 − α ) + C α (log ( α ) − log (1 − α )) PU learning PU-Riesz regression for α ∈ (0 , 1) ( du Plessis et al. , 2015 ) ( genriesz ) Nonnegative PU learning ( Kiryo et al. , 2017 ) General form ulation b y Bregman Density-ratio matching Generalized Riesz regression divergence minimization ( Sugiyama et al. , 2011 ) ( genriesz ) D3RE ( Kato & T eshima , 2021 ) 2.2 Bregman-Riesz Ob jectiv es Let g ( x, α ) b e con vex in the scalar α for eac h ﬁxed x . The point wise Bregman div ergence is BD g ( α 0 ( x ) ∥ α ( x )) : = g ( x, α 0 ( x )) − g ( x, α ( x )) − ∂ α g ( x, α ( x )) ( α 0 ( x ) − α ( x )) . (4) 4 Generalized Riesz regression estimates α 0 b y minimizing an empirical Bregman-Riesz ob jectiv e of the form b L ( α ) : = 1 n n X i =1  − g ( X i , α ( X i )) + ∂ α g ( X i , α ( X i )) α ( X i ) − m ( W i , ∂ α g ( · , α ( · )))  + λ Ω ( α ) , (5) where Ω is a regularizer, for example an ℓ p p enalt y on a coeﬃcient vector. Squared-loss gener- ators reco ver Riesz regression (primal, Chernozhuk o v et al. , 2021 ) and series Riesz representer (dual, Chen & Liao , 2015 ), while KL-t yp e generators reco ver tailored-loss, calibrated estima- tion form ulations (primal, Zhao , 2019 ; T an , 2019 ) and en tropy balancing (dual, Hainmueller , 2012 ). The pac k age pro vides built-in generators, squared distance (SQ), unnormalized KL (UKL) div ergence, binary KL (BKL) divergence, Basu’s p ow er (BP) div ergence, and PU families, and it also supp orts user-deﬁned generators. Dual co ordinate and conjugate ob jective. A key iden tit y b ehind the implemen tation is the conjugacy relation g ∗ ( x, v ) : = sup α ∈A ( x ) { αv − g ( x, α ) } , α ∗ ( x, v ) : = arg max α ∈A ( x ) { αv − g ( x, α ) } , (6) where A ( x ) is the domain of α giv en x . F or diﬀeren tiable generators, α ∗ ( x, v ) = ( ∂ α g ) − 1 ( x, v ) . Using v ( x ) : = ∂ α g ( x, α ( x )) and g ∗ ( x, v ( x )) = α ( x ) v ( x ) − g ( x, α ( x )) , the ob jectiv e ( 5 ) is equiv alent, up to constan ts, to b L ∗ ( v ) : = 1 n n X i =1  g ∗ ( X i , v ( X i )) − m ( W i , v )  + λ Ω ∗ ( v ) . (7) This dual view is con venien t for optimization and for deriving the balancing optimalit y conditions. 2.3 Automatic Regressor Balancing via Generator-Induced Links T o ﬁt α 0 in a mo del class, genriesz fo cuses on GLM-st yle parameterizations α β ( x ) = ζ − 1 ( x, f β ( x )) , f β ( x ) = ϕ ( x ) ⊤ β , (8) where ϕ is a user-c hosen basis and ζ is a link. A key choice is to set the link to the deriv ative of the generator, ζ ( x, α ) = ∂ α g ( x, α ) , ζ − 1 ( x, · ) = ( ∂ α g ( x, · )) − 1 , (9) so that the dual coordinate v ( x ) = ∂ α g ( x, α ( x )) is linear in β . This is the pac k age’s notion of ARB. In genriesz , the mo del is estimated b y solving a conv ex program in β : b β : = arg min β ∈ R p ( 1 n n X i =1  g ∗ ( X i , f β ( X i )) − m ( W i , f β )  + λ Ω ( β ) ) . (10) After estimating b β , the ﬁtted Riesz representer is b α ( x ) = ζ − 1  x, f b β ( x )  . 5 Prop osition 2.1 (ARB implies balancing optimalit y conditions) . Consider the mo del ( 8 )–( 9 ). Fix q ∈ [1 , ∞ ] and set Ω( β ) = 1 q ∥ β ∥ q q . L et b β b e any empiric al minimizer of ( 10 ) and deﬁne b α ( x ) = α b β ( x ) . Under mild r e gularity c onditions, the KKT c onditions imply that ther e exist sc alars s 1 , . . . , s p such that 1 n n X i =1  b α ( X i ) ϕ j ( X i ) − m ( W i , ϕ j )  + λs j = 0 , s j ∈ ∂  1 q | β j | q  | β j = b β j , j = 1 , . . . , p. (11) Conse quently, the implie d b alancing c ondition takes the fol lowing explicit form: • If q = 1 , then | s j | ≤ 1 and henc e      1 n n X i =1  b α ( X i ) ϕ j ( X i ) − m ( W i , ϕ j )       ≤ λ, j = 1 , . . . , p. (12) • If q > 1 , then s j = sign  b β j     b β j    q − 1 and henc e      1 n n X i =1  b α ( X i ) ϕ j ( X i ) − m ( W i , ϕ j )       = λ    b β j    q − 1 , j = 1 , . . . , p. (13) In p articular, when λ = 0 and the c onstr aints ar e fe asible, ( 11 ) yields exact sample b alancing. Prop osition 2.1 is a soft w are-relev ant consequence of the duality theory in Kato ( 2026 ). ARB ensures that the solv er automatically enforces the correct balancing equations for the user-sp eciﬁed basis, ev en when the primal mo del ( 8 ) is nonlinear in β , for example under KL-t yp e links. Remark (Automatic link construction in genriesz ) . Users c an supply analytic derivatives ∂ α g and ( ∂ α g ) − 1 . If they do not, genriesz appr oximates ∂ α g by ﬁnite diﬀer enc es and c omputes ( ∂ α g ) − 1 by r o ot ﬁnding. This al lows r apid pr ototyping of new Br e gman gener ators, at the c ost of additional numeric al c ar e, such as domain c onstr aints and br anch sele ction. Remark. ℓ 1 p enalty and sp arsity The ℓ 1 choic e is useful b e c ause ( 12 ) dir e ctly c ontr ols the maximum absolute moment imb alanc e by the single tuning p ar ameter λ . In genriesz , this r ole is primarily ab out fe asibility r elaxation and stability of r epr esenter ﬁtting, r ather than r e c overing a sp arse c o eﬃcient ve ctor. In p articular, using ℓ 1 her e should b e understo o d as imp osing an interpr etable slack on b alancing e quations, not as a sp arsity assumption on the true r epr esenter mo del. 2.4 Sp ecial Cases and Connections to Balancing W eigh ts The ARB construction ( 9 ) mak es explicit ho w generalized Riesz regression reco v ers classical balancing-w eight estimators as dual solutions. Intuitiv ely , the dual v ariable associated with the linear moment conditions corresp onds to p er-sample w eigh ts, and diﬀerent generators g induce diﬀeren t weigh t regularizers. T able 1 summarizes common c hoices implemented in genriesz , see Kato ( 2026 ) for precise statements. 6 Domain constraints and branc h selection. Some generators require constrain ts suc h as | α | > C and α ∈ (0 , 1) . genriesz exp oses a branch fn in terface that selects a v alid branch, for example treated versus control, so that the ﬁtted represen ter resp ects the generator domain b y construction. 2.5 Estimators: RA, R W, AR W, and TMLE Giv en nuisance estimates b γ and b α , optionally cross-ﬁtted, genriesz outputs four plug-in estimators, regression adjustment (RA), Riesz w eighting (R W), augmented Riesz weigh ting (AR W), and targeted maxim um likelihoo d estimation (TMLE)-st yle estimators, deﬁned as follo ws: b θ RA : = 1 n n X i =1 m ( W i , b γ ) , (14) b θ R W : = 1 n n X i =1 b α ( X i ) Y i , (15) b θ AR W : = 1 n n X i =1  b α ( X i ) ( Y i − b γ ( X i )) + m ( W i , b γ )  , (16) b θ TMLE : = 1 n n X i =1 m  W i , b γ (1)  , (17) In TMLE, b γ (1) is a one-dimensional ﬂuctuation up date of b γ along the direction b α . W e consider t wo lik eliho o d choices, Gaussian and Bernoulli with a logit link, which lead to diﬀeren t up date maps for b γ (1) . See App endix B . Prop osition 2.2 (Asymptotic normalit y) . Supp ose b α and b γ ar e obtaine d by cr oss-ﬁtting and satisfy me an-squar e-err or r ate c onditions such as ∥ b α − α 0 ∥ L 2 ( P ) ∥ b γ − γ 0 ∥ L 2 ( P ) = o p  n − 1 / 2  along with mild moment c onditions. Then the AR W estimator in ( 17 ) is asymptotic al ly line ar with inﬂuenc e function ψ ( W ; θ 0 , γ 0 , α 0 ) in ( 3 ), so that √ n  b θ AR W − θ 0  d − → N (0 , V [ ψ ( W ; θ 0 , γ 0 , α 0 )]) . A nalo gous statements hold for the TMLE-style estimator. Inference. F or eac h estimator, genriesz computes W ald-type standard errors from the empirical v ariance of the corresp onding estimated inﬂuence-function scores. Cross-ﬁtting is recommended in high-capacity settings ( Chernozhuk o v et al. , 2018 ). 3 API Design and Soft w are Arc hitecture The design goal of genriesz is to separate statistic al intent , the functional m and the represen ter class, from numeric al implementation , basis matrices, generator deriv atives, solv ers, and cross-ﬁtting. 7 3.1 Core Abstractions The main entry point is grr functional : from genriesz import grr_functional res = grr_functional( X=X, Y=Y, m=m, # Functional object basis=basis, # Feature map phi(x) generator=gen, # BregmanGenerator (or g=..., grad_g=..., inv_grad_g=..., grad2_g=...) cross_fit=True, folds=5, estimators=("ra","rw","arw","tmle"), ) print(res.summary_text()) The returned result ob ject stores p oin t estimates, standard errors, conﬁdence in terv als, p -v alues, and optional out-of-fold nuisance predictions. Estimators, cross-ﬁtting, and outcome mo dels. The estimators argumen t selects whic h plug-in estimators to report. The built-in names follo w the genriesz naming con v ention, "ra" for regression adjustment, "rw" for Riesz weigh ting, "arw" for augmen ted Riesz weigh ting, and "tmle" for the TMLE-st yle up date. Cross-ﬁtting is enabled b y cross fit=True with folds con trolling the n um b er of folds. F or RA, AR W, and TMLE, the pac k age needs an outcome regression mo del b γ . Users can con trol its construction b y outcome models , for example, "shared" ﬁts a linear mo del using the same basis and regularization in terface as the Riesz mo del, while "separate" ﬁts the same outcome regression on a user-supplied outcome basis outcome basis . Setting outcome models="none" skips outcome mo deling, then only R W is av ailable. 3.2 Built-In F unctionals and W rapp ers F or common causal estimands with a binary treatment indicator D stored in a column of X , the pac k age provides conv enience wrappers that call grr functional with predeﬁned m : • grr ate : A TE for X = [ D , Z ] with m ( W , γ ) = γ (1 , Z ) − γ (0 , Z ). • grr att : A TT for X = [ D , Z ] . One con v enient linear functional is m ( W , γ ) = D π 1 ( γ (1 , Z ) − γ (0 , Z )) with π 1 : = E [ D ] . In the wrapp er, π 1 is estimated b y b π 1 : = 1 n P n i =1 D i . • grr did : panel diﬀerence-in-diﬀerence (DID) implemen ted as A TT on ∆ Y : = Y 1 − Y 0 , where Y 0 and Y 1 are pre and p ost outcomes for the same units. • grr ame : AME via deriv atives, requiring a basis that implements ∂ ϕ ( x ) ∂ x k . These wrapp ers reduce b oilerplate for standard w orkﬂows while still exp osing basis and generator c hoices. 8 Algorithm 1 Simpliﬁed w orkﬂo w implemen ted b y grr functional Require: Data ( X i , Y i ) n i =1 , functional m , basis ϕ , generator g , folds K . 1: for k = 1 to K (if cross-ﬁtting; else K = 1) do 2: Fit represen ter mo del b α ( − k ) on training fold via generalized Riesz regression. 3: Fit outcome mo del b γ ( − k ) on training fold if RA, AR W, or TMLE is requested. 4: Predict b α i and b γ i on held-out fold. 5: end for 6: Compute RA, R W, AR W, and TMLE estimators. 7: Compute standard errors and conﬁdence in terv als from inﬂuence-function scores. 8: return Estimates and inference summary . 3.3 Basis F unctions and Extensibilit y The genriesz pac k age treats the represen ter mo del as linear in a user-sp eciﬁed feature map ϕ ( x ). The core module includes polynomial features and treatment-in teraction features for A TE and A TT w orkﬂo ws, as well as RKHS-st yle appro ximations via random F ourier features and Nystr¨ om features. Tw o optional mo dules expand the basis library: (i) genriesz.sklearn basis pro vides a random forest leaf one-hot basis that turns a ﬁtted ensem ble into a sparse feature map, and (ii) genriesz.torch basis wraps a PyT orch embedding net w ork as a frozen feature map. Finally , genriesz pro vides a kNN catchmen t basis KNNCatchmentBasis and matching utilities in genriesz.matching , whic h connect nearest-neigh b or matc hing to squared-loss Riesz regression ( Kato , 2025b ; Lin et al. , 2023 ). F eature maps. Keeping the solver linear in β in the dual co ordinate preserv es con v exity and the exact ARB optimality conditions in Prop osition 2.1 . This suggests a practical pattern for neural pip elines, learn an em b edding, freeze it, and then ﬁt the representer in the induced feature space. 3.4 Optimization and Regularization The GLM solver supports ℓ p p enalties for an y p ≥ 1. P enalty in terface. F or the Riesz model, set riesz penalty="l2" for ridge, riesz penalty="l1" for lasso, and riesz penalty="lp" with riesz p norm=p for general p ≥ 1. A shorthand suc h as riesz penalty="l1.5" is also supported. When using the default linear outcome regression, the same interface is a v ailable via outcome penalty and outcome p norm . F or p ≥ 1, genriesz uses L-BF GS-B via SciPy , with a smo oth appro ximation of the ℓ 1 subgradien t when p = 1. The solv er exp oses iteration limits and tolerances, but defaults aim to w ork out-of-the-b o x for mo derate feature dimensions. 9 4 Assumptions on Input Data genriesz is designed around a simple data interface: X is a NumPy arra y of shap e ( n, d ) and Y is a vector of shap e ( n, ) . The meaning of columns of X is left to the user and to the functional ob ject. Binary treatmen t con v entions. F or wrapp ers for A TE, A TT, and DID estimation, the treatmen t indicator D is assumed to b e binary and stored at a known column index of X . The user can freely c ho ose the remaining co v ariates Z . P anel DID. F or grr did , the pac k age exp ects tw o outcome vectors Y0 and Y1 represen ting pre and p ost outcomes for the same units and computes ∆ Y : = Y 1 − Y 0 in ternally . Sampling and inference. The default inference uses an i.i.d. appro ximation and W ald- t yp e conﬁdence in terv als. As with standard DML softw are, clustered or dep endent data require user-supplied adaptations, for example cluster-robust v ariance estimators, whic h are not y et part of the core release. 5 Examples and Repro ducibilit y The repository includes runnable scripts and Jupyter noteb o oks illustrating the workﬂo ws b elo w. 5.1 A TE with P olynomial F eatures and UKL Generator Let X = [ D , Z ] and choose a polynomial basis with treatmen t in teractions: from genriesz import ( grr_ate, PolynomialBasis, TreatmentInteractionBasis, UKLGenerator ) psi = PolynomialBasis(degree=2, include_bias=True) phi = TreatmentInteractionBasis(base_basis=psi) gen = UKLGenerator(C=1.0, branch_fn=lambda x: int(x[0] == 1.0)).as_generator() res = grr_ate( X=X, Y=Y, basis=phi, generator=gen, cross_fit=True, folds=5, estimators=("ra","rw","arw","tmle"), riesz_penalty="l2", riesz_lam=1e-3, ) print(res.summary_text()) 10 5.2 A TT and P anel DID A TT and panel DID are a v ailable via wrapper functions: from genriesz import ( grr_att, grr_did, PolynomialBasis, TreatmentInteractionBasis, SquaredGenerator ) psi = PolynomialBasis(degree=2, include_bias=True) phi = TreatmentInteractionBasis(base_basis=psi) gen = SquaredGenerator().as_generator() res_att = grr_att( X=X, Y=Y, basis=phi, generator=gen, cross_fit=True, folds=5, estimators=("arw","tmle"), ) res_did = grr_did( X=X, Y0=Y0, Y1=Y1, basis=phi, generator=gen, cross_fit=True, folds=5, estimators=("arw","tmle"), ) 5.3 Av erage Marginal Eﬀects Av erage marginal eﬀects can b e computed when the chosen basis implements deriv ativ es: from genriesz import grr_ame, PolynomialBasis, SquaredGenerator phi = PolynomialBasis(degree=2, include_bias=True) res_ame = grr_ame( X=X, Y=Y, coordinate=2, basis=phi, generator=SquaredGenerator().as_generator(), cross_fit=True, folds=5, estimators=("ra","rw","arw","tmle"), ) 11 5.4 Nearest-Neigh b or Matc hing Nearest-neigh b or matc hing w eights can b e computed using the built-in matc hing Riesz metho d: from genriesz import grr_ate, PolynomialBasis # A basis is required by the API. When outcome_models="none", it is not used. basis = PolynomialBasis(degree=1, include_bias=True) res_match = grr_ate( X=X, Y=Y, basis=basis, riesz_method="nn_matching", M=1, cross_fit=False, outcome_models="none", estimators=("rw",), ) 6 Pro ject Dev elopmen t and Dep endencies Av ailabilit y and installation. The pac k age is a v ailable on PyPI and can b e installed b y pip install genriesz . The pack age is also a v ailable on GitHub https://github.com/ MasaKat0/genriesz . Optional extras enable scikit-learn-based and PyT orc h-based bases. The documentation is hosted at https://genriesz.readthedocs.io . Dep endencies. The core pac k age dep ends only on NumPy and SciPy ( Harris et al. , 2020 ; Virtanen et al. , 2020 ). Optional extras provide scikit-learn-based tree feature maps and PyT orch-based neural feature maps ( P edregosa et al. , 2011 ; Buitinck et al. , 2013 ; Paszk e et al. , 2019 ). Qualit y control. The pro ject includes unit tests, t yp e hin ts, and contin uous in tegration to catc h regressions. The soft ware is released under the GNU General Public License v3.0 (GPL-3.0). 7 Comparison to Related Soft w are Debiased ML and causal ML libraries. DoubleML ( Bac h et al. , 2022 , 2024 ) and EconML ( Batto cc hi et al. , 2019 ) provide pro duction-quality implemen tations of canonical DML estimators, with a fo cus on treatmen t eﬀect estimation and, in EconML , heterogeneous eﬀects. DoWhy ( Sharma & Kiciman , 2020 ) pro vides an end-to-end causal inference in terface cen tered on causal graphs. CausalML ( Chen et al. , 2020 ) oﬀers a broad suite of uplift mo deling and heterogeneous eﬀect estimators. 12 genriesz is complemen tary . It targets low-dimensional linear functionals with v alid inference via orthogonalization and emphasizes representer estimation and balancing. This estimand-cen tric API makes it conv enient to prototype new targets b y deﬁning only the oracle m , while keeping the represen ter ﬁtting problem explicit. Balancing-w eight and densit y ratio to olkits. Several to olkits implement sp eciﬁc balancing-w eight estimators, such as en tropy balancing or stable weigh ts, for A TE-lik e problems. Density ratio estimation pack ages such as densratio ( Makiy ama , 2019 ) implement metho ds suc h as uLSIF, RuLSIF, and KLIEP . genriesz connects these approac hes to Riesz represen ter estimation and DML-st yle inference through a single in terface. 8 Conclusion The genriesz pac k age provides an estimand-and-balancing-cen tric implementation of ADML via generalized Riesz regression under Bregman div ergences. By unifying Riesz representer ﬁtting, automatic regressor balancing, and debiased estimation behind mo dular abstractions, the pac k age aims to mak e ADML practical for a wide range of causal and structural parameter estimation problems. References Susan A they , Guido W. Im b ens, and Stefan W ager. Approximate residual balancing: debiased inference of av erage treatmen t eﬀects in high dimensions. Journal of the R oyal Statistic al So ciety. Series B (Statistic al Metho dolo gy) , 80(4):597–623, 2018. 4 Susan A they , Julie Tibshirani, and Stefan W ager. Generalized random forests. The Annals of Statistics , 47(2):1148 – 1178, 2019. 4 Philipp Bach, Victor Chernozh uk ov, Malte S. Kurz, and Martin Spindler. Doubleml - an ob ject-oriented implementation of double machine learning in python. Journal of Machine L e arning R ese ar ch , 23(53):1–6, 2022. 2 , 12 Philipp Bach, Malte S. Kurz, Victor Chernozhuk o v, Martin Spindler, and Sven Klaassen. Doubleml: An ob ject-oriented implemen tation of double mac hine learning in r. Journal of Statistic al Softwar e , 108(3):1–56, 2024. 12 Heejung Bang and James M. Robins. Doubly robust estimation in missing data and causal inference models. Biometrics , 61(4):962–973, 2005. 2 Keith Battocchi, Eleanor Dillon, Maggie Hei, Greg Lewis, Paul Ok a, Miruna Oprescu, and V asilis Syrgk anis. EconML: A Python pac k age for ML-based heterogeneous treatmen t eﬀects estimation. https://github.com/py- why/EconML , 2019. 2 , 12 Da vid Bruns-Smith, Oliver Duk es, Avi F eller, and Elizab eth L Ogburn. Augmen ted balancing w eights as linear regression. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 04 2025. 4 13 Lars Buitinc k, Gilles Louppe, Mathieu Blondel, F abian P edregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Rob ert La yton, Jake V anderPlas, Arnaud Joly , Brian Holt, and Ga ¨ el V aro quaux. API design for mac hine learning softw are: exp eriences from the scikit-learn pro ject. In ECML PKDD Workshop: L anguages for Data Mining and Machine L e arning , 2013. 12 Huigang Chen, T otte Harinen, Jeong-Y o on Lee, Mike Y ung, and Zhenyu Zhao. CausalML: Python pac k age for causal mac hine learning, 2020. 2 , 12 Xiaohong Chen and Zhipeng Liao. Siev e semiparametric t wo-step gmm under weak dep endence. Journal of Ec onometrics , 189(1):163–186, 2015. 2 , 4 , 5 Xiaohong Chen and Demian Pouzo. Siev e w ald and qlr inferences on semi/nonparametric conditional momen t mo dels. Ec onometric a , 83(3):1013–1079, 2015. 4 Victor Chernozh uk ov, Denis Chetverik o v, Mert Demirer, Esther Duﬂo, Christian Hansen, Whitney New ey , and James Robins. Double/debiased machine learning for treatmen t and structural parameters. The Ec onometrics Journal , 2018. 2 , 3 , 7 Victor Chernozh uk o v, Whitney K. Newey , Victor Quin tas-Martinez, and V asilis Syrgk anis. Automatic debiased machine learning via riesz regression, 2021. arXiv:2104.14737. 2 , 4 , 5 Victor Chernozh uk o v, Whitney New ey , V ´ ıctor M Quintas-Mart ´ ınez, and V asilis Syrgk anis. RieszNet and ForestRiesz: Automatic debiased machine learning with neural nets and random forests. In International Confer enc e on Machine L e arning (ICML) , 2022a. 4 Victor Chernozh uk o v, Whitney K. Newey , and Rah ul Singh. Automatic debiased machine learning of causal and structural eﬀects. Ec onometric a , 90(3):967–1027, 2022b. 2 Marthin us Christoﬀel du Plessis, Gang. Niu, and Masashi Sugiy ama. Con vex form ulation for learning from p ositive and unlab eled data. In International Confer enc e on Machine L e arning (ICML) , 2015. 4 A. Gretton, A. J. Smola, J. Huang, Marcel Schmittfull, K. M. Borgw ardt, and B. Sc h¨ olkopf. Co v ariate shift b y kernel mean matc hing. Dataset Shift in Machine L e arning, 131-160 (2009) , 01 2009. 4 Jens Hainm ueller. En tropy balancing for causal eﬀects: A m ultiv ariate rew eighting metho d to produce balanced samples in observ ational studies. Politic al A nalysis , 20(1):25–46, 2012. 3 , 4 , 5 Charles R. Harris, K. Jarro d Millman, St´ efan J. v an der W alt, Ralf Gommers, P auli Virtanen, Da vid Cournap eau, Eric Wieser, Julian T aylor, Sebastian Berg, Nathaniel J. Smith, Rob ert Kern, Matti Picus, Stephan Ho yer, Marten H. v an Kerkwijk, Matthew Brett, Allan Haldane, Jaime F ern´ andez del R ´ ıo, Mark Wieb e, P earu Peterson, Pierre G´ erard-Marchan t, Kevin Sheppard, T yler Reddy , W arren W eck esser, Hameer Abbasi, Christoph Gohlke, and T ra vis E. Oliphan t. Arra y programming with n ump y . Natur e , 585(7825):357–362, 2020. 12 14 Aap o Hyv¨ arinen. Estimation of non-normalized statistical mo dels b y score matching. Journal of Machine L e arning R ese ar ch , 6(24):695–709, 2005. 4 Kosuk e Imai and Marc Ratk o vic. Cov ariate balancing prop ensit y score. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 76(1):243–263, 07 2013. ISSN 1369-7412. 2 T ak afumi Kanamori, Shohei Hido, and Masashi Sugiy ama. A least-squares approach to direct imp ortance estimation. Journal of Machine L e arning R ese ar ch , 10(Jul.):1391–1445, 2009. 4 T ak afumi Kanamori, T aiji Suzuki, and Masashi Sugiy ama. Statistical analysis of kernel-based least-squares densit y-ratio estimation. Mach. L e arn. , 86(3):335–367, Marc h 2012. ISSN 0885-6125. 4 Masahiro Kato. Direct bias-correction term estimation for prop ensit y scores and av erage treatmen t eﬀect estimation, 2025a. arXiv: 2509.22122. 2 Masahiro Kato. Nearest neigh b or matching as least squares densit y ratio estimation and riesz regression, 2025b. arXiv: 2510.24433. 9 Masahiro Kato. A uniﬁed framew ork for debiased mac hine learning: Riesz represen ter ﬁtting under bregman divergence, 2026. arXiv: 2601.07752. 2 , 3 , 6 Masahiro Kato and T akeshi T eshima. Non-negative bregman divergence minimization for deep direct densit y ratio estimation. In International Confer enc e on Machine L e arning (ICML) , 2021. 4 Ryuic hi Kiryo, Gang Niu, Marthinus Christoﬀel du Plessis, and Masashi Sugiyama. Positiv e- unlab eled learning with non-negativ e risk estimator. In A dvanc es in Neur al Information Pr o c essing Systems (NeurIPS) , 2017. 4 Chris A. J. Klaassen. Consisten t estimation of the inﬂuence function of locally asymptotically linear estimators. Annals of Statistics , 15, 1987. 2 Kaitlyn J. Lee and Alejandro Sch uler. Rieszb o ost: Gradien t b o osting for riesz regression, 2025. arXiv: 2501.04871. 4 Zhexiao Lin, Peng Ding, and F ang Han. Estimation based on nearest neigh b or matc hing: from densit y ratio to a v erage treatment eﬀect. Ec onometric a , 91(6):2187–2217, 2023. 4 , 9 Ko ji Makiyama. densratio: Densit y ratio estimation. https://CRAN.R- project.org/ package=densratio , 2019. R pac k age v ersion 0.2.1. 3 , 13 XuanLong Nguy en, Martin W ainwrigh t, and Michael Jordan. Estimating div ergence func- tionals and the likelihoo d ratio b y con vex risk minimization. IEEE , 2010. 4 Adam P aszke, Sam Gross, F rancisco Massa, Adam Lerer, James Bradbury , Gregory Chanan, T revor Killeen, Zeming Lin, Natalia Gimelshein, Luca An tiga, Alban Desmaison, Andreas K¨ opf, Edw ard Y ang, Zac h DeVito, Martin Raison, Alykhan T ejani, Sasank Chilamkurth y , 15 Benoit Steiner, Lu F ang, Junjie Bai, and Soumith Chin tala. Pytorc h: an imp erativ e st yle, high-p erformance deep learning library . In International Confer enc e on Neur al Information Pr o c essing Systems (NeurIPS) , 2019. 12 F. P edregosa, G. V aro quaux, A. Gramfort, V. Mic hel, B. Thirion, O. Grisel, M. Blon- del, P . Prettenhofer, R. W eiss, V. Dub ourg, J. V anderplas, A. P assos, D. Cournap eau, M. Brucher, M. Perrot, and E. Duchesna y . Scikit-learn: Mac hine learning in Python. Journal of Machine L e arning R ese ar ch , 12:2825–2830, 2011. 12 Jing Qin. Inferences for case-con trol and semiparametric t wo-sample density ratio mo dels. Biometrika , 85(3):619–630, 1998. 4 B. Rhodes, K. Xu, and M.U. Gutmann. T elescoping densit y-ratio estimation. In NeurIPS , 2020. 4 Diptanil San tra, Guanh ua Chen, and Chan Park. Distributional balancing for causal inference: A uniﬁed framework via characteristic function distance, 2026. arXiv: 2601.15449. 4 Amit Sharma and Emre Kiciman. Dowh y: An end-to-end library for causal inference, 2020. 2 , 12 Rah ul Singh. Kernel ridge riesz represen ters: Generalization, mis-sp eciﬁcation, and the coun terfactual eﬀective dimension, 2024. arXiv: 2102.11076. 4 Masashi Sugiy ama, T aiji Suzuki, Shinic hi Nak a jima, Hisashi Kashima, P aul von B ¨ unau, and Motoaki Ka w anab e. Direct imp ortance estimation for cov ariate shift adaptation. A nnals of the Institute of Statistic al Mathematics , 60(4):699–746, 2008. 4 Masashi Sugiyama, T aiji Suzuki, and T ak afumi Kanamori. Density ratio matching under the bregman divergence: A uniﬁed framework of densit y ratio estimation. A nnals of the Institute of Statistic al Mathematics , 64, 10 2011. 4 Masashi Sugiy ama, T aiji Suzuki, and T ak afumi Kanamori. Density R atio Estimation in Machine L e arning . Cam bridge Univ ersity Press, 2012. 2 Zhiqiang T an. Regularized calibrated estimation of prop ensit y scores with model missp eciﬁ- cation and high-dimensional data. Biometrika , 107(1):137–158, 2019. 4 , 5 Alexander T arr and Kosuk e Imai. Estimating av erage treatmen t eﬀects with supp ort v ector mac hines. Statistics in Me dicine , 44(5), 2025. 4 v an der Laan. T argeted maximum likelihoo d learning, 2006. U.C. Berkeley Division of Biostatistics W orking P ap er Series. W orking Paper 213. https://biostats.bepress. com/ucbbiostat/paper213/ . 2 P auli Virtanen, Ralf Gommers, T ravis E. Oliphant, Matt Haberland, Tyler Reddy , Da vid Cournap eau, Evgeni Burovski, P earu Peterson, W arren W eck esser, Jonathan Brigh t, St ´ efan J. v an der W alt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikola y Ma yoro v, Andrew R. J. Nelson, Eric Jones, Rob ert Kern, Eric Larson, C J Carey , ˙ Ilhan 16 P olat, Y u F eng, Eric W. Mo ore, Jak e V anderPlas, Denis Laxalde, Josef Perktold, Rob ert Cimrman, Ian Henriksen, E. A. Quin tero, Charles R. Harris, Anne M. Arc hibald, Antˆ onio H. Rib eiro, F abian P edregosa, Paul v an Mulbregt, and SciPy 1.0 Con tributors. SciPy 1.0: F undamen tal Algorithms for Scientiﬁc Computing in Python. Natur e Metho ds , 17:261–272, 2020. 12 Stefan W ager and Susan Athey . Estimation and inference of heterogeneous treatmen t eﬀects using random forests. Journal of the Americ an Statistic al Asso ciation , 113(523):1228–1242, 2018. 4 Qingyuan Zhao. Cov ariate balancing propensity score b y tailored loss functions. The Annals of Statistics , 47(2):965 – 993, 2019. 4 , 5 Jos ´ e R. Zubizarreta. Stable w eigh ts that balance co v ariates for estimation with incomplete outcome data. Journal of the A meric an Statistic al Asso ciation , 110(511):910–922, 2015. 3 , 4 17 , A Built-in Bregman generators This appendix lists the built-in Bregman generators and their induced links. Squared distance generator. SquaredGenerator uses g ( α ) = ( α − C ) 2 . The link is ζ ( x, α ) = ∂ α g ( α ) = 2 ( α − C ). Unnormalized KL divergence generator. UKLGenerator uses, for | α | > C , g ( α ) = ( | α | − C ) log ( | α | − C ) − | α | . The link is ζ ( x, α ) = sign ( α ) log ( | α | − C ). Binary KL divergence generator. BKLGenerator uses, for | α | > C , g ( α ) = ( | α | − C ) log ( | α | − C ) − ( | α | + C ) log ( | α | + C ) . The corresponding link is ζ ( x, α ) = sign ( α )  log ( | α | − C ) − log ( | α | + C )  . Basu’s p ow er divergence generator. BPGenerator uses, for some ω > 0 and | α | > C , g ( α ) = ( | α | − C ) 1+ ω − ( | α | − C ) ω − ( | α | − C ) . The link is ζ ( x, α ) = sign ( α ) 1+ ω ω (( | α | − C ) ω − 1). PU learning loss generator. PUGenerator uses, for | α | ∈ (0 , 1), g ( α ) = C  | α | log ( | α | ) + (1 − | α | ) log (1 − | α | )  . The link is ζ ( x, α ) = sign ( α ) C  log ( | α | ) − log (1 − | α | )  . B Lik eliho o ds in TMLE Gaussian lik eliho o d. When Y is con tin uous, we use the additiv e ﬂuctuation b γ (1) ( x ) : = b γ ( x ) + b ϵ b α ( x ) , b ϵ : = 1 n P n i =1 b α ( X i ) ( Y i − b γ ( X i )) 1 n P n i =1 b α ( X i ) 2 . (18) Since the functional γ 7→ m ( W, γ ) is linear, ( 17 ) can be written as b θ TMLE = b θ RA + b ϵ 1 n n X i =1 m ( W i , b α ) . (19) 18 Bernoulli lik eliho o d. When Y ∈ { 0 , 1 } , w e use a logistic ﬂuctuation that updates b γ on the logit scale. Deﬁne Λ( t ) : = 1 / (1 + exp ( − t )) and logit ( p ) : = log  p 1 − p  . W e set b γ (1) ( x ) : = Λ (logit ( b γ ( x )) + b ϵ b α ( x )) , (20) where b ϵ is deﬁned as a solution to the one-dimensional score equation 1 n n X i =1 b α ( X i )  Y i − b γ (1) ( X i )  = 0 . (21) The TMLE-st yle estimator is then computed by plugging b γ (1) in to ( 17 ). In con trast to the Gaussian case, the represen tation ( 19 ) do es not generally hold b ecause the map ϵ 7→ b γ (1) is nonlinear under the logit ﬂuctuation. 19

genriesz: A Python Package for Automatic Debiased Machine Learning with Generalized Riesz Regression

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment